Reviewer #2 (Public Review):
In their study, Chen et al. consider a set of 415 genetically diverse, outbred mice. This population is assembled from eight distinct cohorts, each entering the study at a separate chronological age ranging from three to twenty-four months. By employing a commercially-available automated-phenotyping system, the authors collected high-dimensional phenotyping data that quantifies both behavior and physiologic properties like oxygen consumption. Animals were placed in the phenotyping system for week-long measurement intervals, alternated by three-week intervals in more standard cages. In this way, the authors cleverly overcome challenges in longitudinal measurement by stitching together eight overlapping longitudinal time series into a single forty-week characterization of the entire murine lifespan.
The authors found many of their measurements covary at short timescales according to an individual's behavioral state-sleeping, eating, running, etc. To control for this effect, the authors developed a hidden markov model that allowed them to automatically identify an animals' behavioral state, thus segmenting longitudinal measurements into distinct behavioral stages. This allowed the authors to more accurately study the long-term effects of aging by removing the confounding effects of short-term behavioral changes.
The authors find that circadian rhythms changed with chronological age, as did energy expenditure while resting declined. In fact, eighty percent of all metrics correlated significantly with chronological age.
The authors genotyped each mouse using an array of SNP probes, allowing them to identify genotype-phenotype correlations. The authors observed a low heritability on average among all traits (median correlation = 0.22), but found that these heritable factors tended to affect multiple phenotypes simultaneously. Notably, the heritability of body mass was relatively high, in agreement with previous studies.
Irrespective of genetics, 250 features clustered into 20 groups based on covariation over time. The authors identified a general increase in the covariation of traits between and within these clusters as animals aged. The authors refer to these increases in covariation as "decreases in resilience".
Finally, the authors developed a model of aging that integrates phenotypic data and lifespan data. This model appears to draw implicitly from concepts developed by OO Aalen and James Vaupel under the name of "frailty" models, positing that each individual exhibits a characteristic rate of aging that contributes to differences in lifespan among peers. The authors fit their model using a maximum likelihood approach-implemented using gradient boosted decision trees-that allows them to estimate the relative rate of each individuals' aging using longitudinal phenotypic data and compare this to inter-individual differences in lifespan. The authors' model produces rather unimpressive predictions of chronological age, with correlations ranging between 0.5 to 0.75 depending on model tuning. The model has more difficulty predicting an individuals' remaining lifespan, only correlating between 0.25 and 0.425 depending on model tuning.
*Strengths*
The main strengths of this manuscript are its thoughtful study design, which combines high-dimensional phenotyping, genotypic data, and large population size. An impressive effort went into collecting these measurements and the result seems likely to be useful for many future analyses. An additional strength of this manuscript is the HMM model. By subdividing time-series measurements into distinct short-term behavioral periods, long-term trends in behavior and physiology can be identified without the confounding influence of short-term behavioral states. Finally, the authors' "CASPAR" model seems like a thoughtful attempt to relate longitudinal phenotypic aging to lifespan, even if its performance is not yet so impressive.
*Weaknesses*
The manuscript is substantially weakened by a lack of clarity on several important conceptual points. First, the authors appear to assume that any change that occurs at month-long timescales must be "aging". The authors choose to discard the first day of measurements in a cage to account for behavioral adaptation, demonstrating their concern for distinguishing behavioral adaptations from aging phenomena. However, the authors' efforts to do this seem rather cursory, as mice surely learn and adapt over time-scales longer than twenty-four hours. The reader is left wondering to what extent this study measures the phenotypic consequences of aging, and to which extent is the study measuring long-term adaptation of individuals to a four-week rotation schedule in and out of different cages.
As a second conceptual issue, the authors adopt a rather shallow and limited practical definition of the term "resilience". Conceptually, they define resilience as "the ability of a system to maintain function in the face of change", which seems reasonable and corresponds with the general thinking about resilience. However, in practice, the authors define resilience as the inverse of correlation among traits-an animal is more "resilient" when its different phenotypic traits are less correlated. This practical definition lends itself well for measurement using the data in this study, but leads to an incongruity between conceptual and practical definitions of "resilience". Correlation of traits is not uniquely determined by an organism's resilience--there could be any number of reasons for traits to increase in covariance beyond a failure of resilience. Any change in the physiologic relationship between two traits will alter the causal structure of the traits' interactions and therefore alter the trait's covariance. Are the authors arguing that any change in physiology must inherently involve changes in resilience? A more convincing practical definition of resilience would involve a more direct test of conceptual definition, as defined by the authors as "the ability of a system to maintain function in the face of change". For example, the authors might have provided some sort of physiologic challenge and measured animals' response to it-a physical stress test, a test of thermoregulation in response to changes in temperature, the speed of adaptation to a novel environment. Given the data collected, the authors can measure many interesting aspects of aging, but they do not seem adequately justified in calling one of these aspects "resilience".
The manuscript also raises technical concerns. First, it is unclear whether all analyses in the manuscript are performed using features normalized body mass or whether only analyses in certain sections of the manuscript are performed using features normalized for body mass. The details here are crucial because improper normalization would undermine the main conclusions of the manuscript. Normalization of multiple features to any shared reference has the potential to introduce a correlation between normalized features and the shared normalization factor. In fact, many approaches for normalization to body mass will always introduce a correlation between normalized features and body mass, with the only exception being if the un-normalized features and body mass are perfectly correlated. If the authors normalize traits before performing their various correlation analyses, such normalization could introduce artefactual correlations between traits. Any normalized quantity will correlate with body mass and all traits correlated with body mass will in consequence correlate with each other. In summary, the authors must explain their normalization procedure in more detail to identify or exclude any improper normalization that could confound their analyses. Analyses at risk of being confounded include the heritability analysis, the network analysis of phenotypes during aging, and the CASPAR analyses.
In the methods section, the "CASPAR" model is described clearly. However, the intuitive description provided in the main text invokes the concept of an "unavoidable tension" between chronological age and inter-individual heterogeneity in the aging rate. The reviewer finds this latter description unhelpful and potentially misleading. The sigma parameter can in some sense be considered a hyperparameter, because tuning it alters the model's behavior and performance. However, the sigma parameter is, more importantly, a potentially measurable property of the system being studied. Individuals within the population exhibited some amount of variability in their individual aging rates, which if measured would determine the value of an empirically-grounded sigma parameter. Unfortunately, the authors are currently unable to estimate this sigma empirically and so they can only speculate about its true value. The authors are correct that different assumptions regarding variability in individual aging rates will produce different model behavior and differential performance in predicting chronological age and aging-rate heterogeneity. However, the authors err in implying that any "tension" exists in some grander, theoretic sense. More simply, the authors simply cannot currently measure an important parameter of their model. Readers would benefit from a clearer description of this parameter and the challenges in statistical inference it highlights.
Though impressive, this study's data has two limitations that the authors already acknowledge: 1) an absence of lifespan data for all animals and 2) a limited population size. Despite such limitations, the current data represents an impressive effort that will likely support many additional analyses.
Finally, the authors seem to neglect substantial prior experimental characterizations of phenotypic aging and methodological work in studying multi-dimensional phenotyping of aging. For example, in nematodes a similar characterization has already been performed: CN Martineau et al PLoS computational biology 2020, and related analytic methods have already been developed that show similar performance: Zhang et all Cell Systems 2016. If the authors wish to draw conclusions that generalize beyond their particular mouse model, they cannot focus myopically on only mouse experiments.
In summary, the manuscript describes a solid and commendable effort that has produced a valuable data set. However, in contextualizing and analyzing this data, the authors fall noticeably short of their self-proclaimed "sophistication and rigor".