10,000 Matching Annotations
  1. Nov 2025
    1. Reviewer #2 (Public review):

      Summary:

      The authors tested tactile acuity on the breast of females using several tasks.

      Results:

      Tactile acuity, assessed by just-noticeable differences in judging whether a touch was above or below a comparison stimulus, was lower on both the lateral and medial breast than on the hand and back. Acuity also scaled inversely with breast size, echoing earlier findings that larger hands exhibit lower acuity, presumably because a similar number of tactile receptors must be distributed over larger or smaller body surfaces. Observing this principle in the breast as on the hand strengthens the view that fixed innervation is a general organizing principle of the tactile system. Both methodology and analysis appear sound.

      Most participants were unable to localize touch to a specific quadrant of the nipple, suggesting it is perceived as a single tactile unit. However, the study does not address whether touches to the nipple and areola are confused; conceptualizing the nipple as a perceptual (landmark) unit would suggest that such confusion should not take place. Aside from this limitation, the methodology and analysis appear sound.

      Absolute touch localization, assessed by asking participants to indicate locations on a 3D rendering of their own torso, revealed a bias toward the nipple. The authors interpret this as evidence that the nipple serves as a landmark attracting perceived touch. However, as reviewers noted during review, alternative explanations cannot be fully ruled out: because the stimulus array was centered on the nipple, the observed bias may stem from stimulus distribution rather than landmark status. Aside from this caveat, the methodology and analysis appear sound.

      Overall assessment:

      The study offers a welcome exception to the prevailing bias in tactile research that limits investigation to the hand and arm. Its support for the fixed innervation hypothesis and its suggestion that the nipple may serve as a potential landmark-though requiring further scrutiny-illustrate the value of extending research to other body regions. By employing multiple tasks, the authors address several key aspects of tactile perception and create links to earlier findings.

    2. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      The statistically adequate way of testing the biases is a hierarchical regression model (LMM) with a distance of the physical location from the nipple as a predictor, and a distance of the reported location from the nipple as a dependent variable. Either variable can be unsigned or signed for greater power, for example, coding the lateral breast as negative and the medial breast as positive. The bias will show in regression coefficients smaller than 1.

      Thank you for this suggestion. We have subsequently replaced the relevant ANOVA analyses with LMM analyses. Specifically, we use an LMM for breast and back separately to show the different effects of distance, then use a combined LMM to compare the interaction. Finally, we use an LMM to assess the differences between precision and bias on the back and breast. The new analysis confirms earlier statements and do not change the results/interpretation of the data.

      Moreover, any bias towards the nipple could simply be another instance of regression to the mean of the stimulus distribution, given that the tested locations were centered on the nipple. This confound can only be experimentally solved by shifting the distribution of the tested locations. Finally, given that participants indicated the locations on a 3D model of the body part, further experimentation would be required to determine whether there is a perceptual bias towards the nipple or whether the authors merely find a response bias.

      A localization bias toward the nipple in this context does not show that the nipple is the anchor of the breast's tactile coordinate system. The result might simply be an instance of regression to the mean of the stimulus distribution (also known as experimental prior). To convincingly show localization biases towards the nipple, the tested locations should be centered at another location on the breast.

      Another problem is the visual salience of the nipple, even though Blender models were uniformly grey. With this type of direct localization, it is very difficult to distinguish perceptual from response biases even if the regression to the mean problem is solved. There are two solutions to this problem: 1) Varying the uncertainty of the tactile spatial information, for example, by using a pen that exerts lighter pressure. A perceptual bias should be stronger for more uncertain sensory information; a response bias should be the same across conditions. 2) Measure bias with a 2IFC procedure by taking advantage of the fact that sensory information is noisier if the test is presented before the standard.

      We believe that the fact that we explicitly tested two locations with equally distributed test locations, both of which had landmarks, makes this unlikely. Indeed, testing on the back is exactly what the reviewer suggests. It would also be impossible to test this “on another location on the breast” as we are sampling across the whole breast. Moreover, as markers persisted on the model within each block, the participants were generating additional landmarks on each trial. Thus, if there were any regression to the mean, this would be observed for both locations. Nevertheless, we recognize that this test cannot distinguish between a sensory bias towards the nipple and consistent response bias that is always in the direction of the nipple, though to what extent these are the same thing is difficult to disentangle. That said, if we had restricted testing to half of the breast such that the distribution of points was asymmetrical this would allow us to test the hypothesis put forward by the reviewer. We recognize that this is a limitation of the data and have downplayed statements and added caveats accordingly.

      We have changed the appropriate heading and text in the discussion to downplay the finding:

      “Reports are biased towards the nipple”

      “suggesting that the nipple plays a pivotal role in the mental representation of the breast.”

      it might be harder to learn the range of locations on the back given that stimulation is not restricted to an anatomically defined region as it is the case for the breast.

      We apologize for any confusion but the point distribution is identical between tasks, as described in the methods.

      The stability of the JND differences between body parts across subjects is already captured in the analysis of the JNDs; the ANOVA and the post-hoc testing would not be significant if the order were not relatively stable across participants. Thus, it is unclear why this is being evaluated again with reduced power due to improper statistics.

      We apologize for any confusion here. Only one ANOVA with post-hoc testing was performed on the data. The second parenthetical describing the test was perhaps redundant and confusing, so I have removed it.

      “(Error! Reference source not found.A, B, 1-way ANOVA with Tukey’s HSD post-hoc t-test: p = 0.0284)”

      The null hypothesis of an ANOVA is that at least one of the mean values is different from the others; adding participants as a factor does not provide evidence for similarity.

      We agree with this statement and have removed the appropriate text.

      The pairwise correlations between body parts seem to be exploratory in nature. Like all exploratory analyses, the question arises of how much potential extra insights outweigh the risk of false positives. It would be hard to generate data with significant differences between several conditions and not find any correlations between pairs of conditions. Thus, the a priori chance of finding a significant correlation is much higher than what a correction accounts for.

      We broadly agree with this statement. However, we believe that the analyses were important to determine if participants were systematically more or less acute across body parts. Moreover, both the fact that we actually did not observe any other significant relationships and that we performed post-hoc correction imply that no false positives were observed. Indeed, in the one relationship that was observed, we would need to have an assumed FDR over 10x higher than the existing post hoc correction required implying a true relationship.

      If the JND at mid breast (measured with locations centered at the nipple) is roughly the same size as the nipple, it is not surprising that participants have difficulty with the categorical localization task on the nipple but perform better than chance on the significantly larger areola.

      We agree that it is not surprising given the previously shown data, however, the initial finding is surprising to many and this experiment serves to reinforce the previous finding.

      Neither signed nor absolute localization error can be compared to the results of the previous experiments. The JND should be roughly proportional to the variance of the errors.

      We apologize for any confusion, however we are not comparing the values, merely observing that the results are consistent.

      Reviewer #2 (Public review):

      I had a hard time understanding some parts of the report. What is meant by "broadly no relationship" in line 137?

      We have removed the qualifier to simplify the text.

      It is suggested that spatial expansion (which is correlated with body part size) is related between medial breast and hand - is this to say that women with large hands have large medial breast size? Nipple size was measured, but hand size was not measured, is this correct?

      Correct. We have added text to state as such.

      It is furthermore unclear how the authors differentiate medial breast and NAC. The sentence in lines 140-141 seems to imply the two terms are considered the same, as a conclusion about NAC is drawn from a result about the medial breast. This requires clarification.

      Thank you for catching this, we have corrected it in the text.

      Finally, given that the authors suspect that overall localization ability (or attention) may be overshadowed by a size effect, would not an analysis be adequate that integrates both, e.g. a regression with multiple predictors?

      If the reviewer means that participants would be consistently “acute” then we believe that SF1 would have stronger correlations. Consequently, we see no reason to add “overall tactile acuity” as a predictor.

      In the paragraph about testing quadrants of the nipple, it is stated that only 3 of 10 participants barely outperformed chance with a p < 0.01. It is unclear how a significant ttest is an indication of "barely above chance".

      We have adjusted the text to clarify our meaning.

      “On the nipple, however, participants were consistently worse at locating stimuli on the nipple than the breast (paired t-test, t = 3.42, p < 0.01) where only 3 of the 10 participants outperformed chance, though the group as a whole outperformed chance (Error! Reference source not found.B, 36% ± 13%; Z = 5.5, p < 0.01).”

      The final part of the paragraph on nipple quadrants (starting line 176) explains that there was a trend (4 of 10 participants) for lower tactile acuity being related to the inability to differentiate quadrants. It seems to me that such a result would not be expected: The stated hypothesis is that all participants have the same number of tactile sensors in their nipple and areola, independent of NAC size. In this section, participants determine the quadrant of a single touch. Theoretically, all participants should be equally able to perform this task, because they all have the same number of receptors in each quadrant of nipple and areola. Thus, the result in Figure 2C is curious.

      We agree that this result seemingly contradicts observations from the previous experiment, however we believe that it relates to the distinction between the ability to perform relative distinctions and absolute localizations. In the first experiment, the presentation of two sequential points provides an implicit reference whereas in the quadrant task there is no reference. With the results of the third experiment in mind, biases towards the nipple would effectively reduce the ability of participants to identify the quadrant. What this result may imply is that the degree of bias is greater for women with greater expansion. We have added text to the discussion to lay this out.

      “This negative trend implicitly contradicts the previous result where one might expect equal performance regardless of size as the location of the stimuli was scaled to the size of the nipple and areola. However, given the absence of a reference point, systematic biases are more likely to occur and thus may reflect a relationship between localization bias and breast size.”

      This section reports an Anova (line 193/194) with a factor "participant". This doesn't appear sensible. Please clarify. The factor distance is also unclear; is this a categorical or a continuous variable? Line 400 implies a 6-level factor, but Anovas and their factors, respectively, are not described in methods (nor are any of the other statistical approaches).

      We believe this comment has been addressed above with our replacement of the ANOVA with an LMM. We have also added descriptions of the analysis throughout the methods.

      The analysis on imprecision using mean pairwise error (line 199) is unclear: does pairwise refer to x/y or to touch vs. center of the nipple?

      We have clarified this to now read:

      “To measure the imprecision, we computed the mean pairwise distance between each of the reported locations for a given stimulus location and the mean reported location.”

      p8, upper text, what is meant by "relative over-representation of the depth axis"? Does this refer to the breast having depth but the equivalent area on the back not having depth? What are the horizontal planes (probably meant to be singular?) - do you simply mean that depth was ignored for the calculation of errors? This seems to be implied in Figure 3AB.

      This is indeed what we meant. We have attempted to clarify in the text.

      “Importantly, given the relative over-representation of the depth axis for the breast, we only considered angles in the horizontal planes such that the shape of the breast did not influence the results.” Became:

      “Importantly, because the back is a relatively flat surface in comparison to the breast, errors were only computed in the horizontal plane and depth was excluded when computing the angular error.”

      Lines 232-241, I cannot follow the conclusions drawn here. First, it is not clear to a reader what the aim of the presented analyses is: what are you looking for when you analyze the vectors? Second, "vector strength" should be briefly explained in the main text. Third, it is not clear how the final conclusion is drawn. If there is a bias of all locations towards the nipple, then a point closer to the nipple cannot exhibit a large bias, because the nipple is close-by. Therefore, one would expect that points close to the nipple exhibit smaller errors, but this would not imply higher acuity - just less space for localizing anything. The higher acuity conclusion is at odds with the remaining results, isn't it: acuity is low on the outer breast, but even lower at the NAC, so why would it be high in between the two?

      Thank you for pointing out the circular logic. We have replaced this sentence with a more accurate statement.

      “Given these findings, we conclude that the breast has lower tactile acuity than the hand and is instead comparable to the back. Moreover, localization of tactile events to both the back and breast are inaccurate but localizations to the breast are consistently biased towards the nipple.”

      The discussion makes some concrete suggestions for sensors in implants (line 283). It is not clear how the stated numbers were computed. Also, why should 4 sensors nipple quadrants receive individual sensors if the result here was that participants cannot distinguish these quadrants?

      Thank you for catching this, it should have been 4 sensors for the NAC, not just the nipple. We have fixed this in the text.

      I would find it interesting to know whether participants with small breast measurement delta had breast acuity comparable to the back. Alternatively, it would be interesting to know whether breast and back acuity are comparable in men. Such a result would imply that the torso has uniform acuity overall, but any spatial extension of the breast is unaccounted for. The lowest single participant data points in Figure 1B appear similar, which might support this idea.

      We agree that this is an interesting question and as you point out, the data does indicate that in cases of minimal expansion acuity may be constant on the torso. However, in the comparison of the JNDs, post-hoc testing revealed no significant difference between the back and either breast region. Consequently, subsampling the group would result in the same result. We have added a sentence to the discussion stating this.

      “Consequently, the acuity of the breast is likely determined initially by torso acuity and then any expansion.”

    1. eLife Assessment

      This valuable study reports that EEG recordings of the earliest stage of information processing in the human visual cortex can be used to predict subsequent choice responses. The findings provide novel, solid evidence for integrative processing in low-level sensory cortices, though the exact nature of the neural signals measured here requires some clarification. While some conceptual issues need to be addressed, the paper is likely to be of interest to neuroscientists interested in the contribution of early sensory signals to decision-making.

    2. Reviewer #1 (Public review):

      General assessment of the work:

      In this manuscript, Mohr and Kelly show that the C1 component of the human VEP is correlated with binary choices in a contrast discrimination task, even when the stimulus is kept constant and confounding variables are considered in the analysis. They interpret this as evidence for the role V1 plays during perceptual decision formation. Choice-related signals in single sensory cells are enlightening because they speak to the spatial (and temporal) scale of the brain computations underlying perceptual decision-making. However, similar signals in aggregate measures of neural activity offer a less direct window and thus less insight into these computations. For example, although I am not a VEP specialist, it seems doubtful that the measurements are exclusively picking up (an unbiased selection of) V1 spikes. Moreover, although this is not widely known, there is in fact a long history to this line of work. In 1972, Campbell and Kulikowski ("The Visual Evoked Potential as a function of contrast of a grating pattern" - Journal of Physiology) already showed a similar effect in a contrast detection task (this finding inspired the original Choice Probability analyses in the monkey physiology studies conducted in the early 1990's). Finally, it is not clear to me that there is an interesting alternative hypothesis that is somehow ruled out by these results. Should we really consider that simple visual signals such as spatial contrast are *not* mediated by V1? This seems to fly in the face of well-established anatomy and function of visual circuits. Or should we be open to the idea that VEP measurements are almost completely divorced from task-relevant neural signals? Why would this be an interesting technique then? In sum, while this work reports results in line with several single-cell and VEP studies and perhaps is technically superior in its domain, I find it hard to see how these findings would meaningfully impact our thinking about the neural and computational basis of spatial contrast discrimination.

      Summary of substantive concerns:

      (1) The study of choice probability in V1 cells is more extensive than portrayed in the paper's introduction. In recent years, choice-related activity in V1 has also been studied by Nienborg & Cumming (2014), Goris et al (2017), Jasper et al (2019), Lange et al (2023), and Boundy-Singer et al (2025). These studies paint a complex picture (a mixture of positive, absent, and negative results), but should be mentioned in the paper's introduction.

      (2) The very first study to conduct an analysis of stimulus-conditioned neural activity during a perceptual decision-making task was, in fact, a VEP study: Campbell and Kulikowski (1972). This study never gained the fame it perhaps deserves. But it would be appropriate to weave it into the introduction and motivation of this paper.

      (3) What are interesting alternative hypotheses to be considered here? I don't understand the (somewhat implicit) suggestion here that contrast representations late in the system can somehow be divorced from early representations. If they were, they would not be correlated with stimulus contrast.

      (4) I find the arguments about the timing of the VEP signals somewhat complex and not very compelling, to be honest. It might help if you added a simulation of a process model that illustrated the temporal flow of the neural computations involved in the task. When are sensory signals manifested in V1 activity informing the decision-making process, in your view? And how is your measure of neural activity related to this latent variable? Can you show in a simulation that the combination of this process and linking hypothesis gives rise to inverted U-shaped relationships, as is the case for your data?

    3. Reviewer #2 (Public review):

      Summary:

      Mohr and Kelly report a high-density EEG study in healthy human volunteers in which they test whether correlations between neural activity in the primary visual cortex and choice behavior can be measured non-invasively. Participants performed a contrast discrimination task on large arrays of Gabor gratings presented in the upper left and lower right quadrants of the visual field. The results indicate that single-trial amplitudes of C1, the earliest cortical component of the visual evoked potential in humans, predict forced-choice behavior over and beyond other behavioral and electrophysiological choice-related signals. These results constitute an important advance for our understanding of the nature and flexibility of early visual processing.

      Strengths:

      (1) The findings suggest a previously unsuspected role for aggregate early visual cortex activity in shaping behavioral choices.

      (2) The authors extend well-established methods for assessing covariation between neural signals and behavioral output to non-invasive EEG recordings.

      (3) The effects of initial afferent information in the primary visual cortex on choice behavior are carefully assessed by accounting for a wide range of potential behavioral and electrophysiological confounds.

      (4) Caveats and limitations are transparently addressed and discussed.

      Weaknesses:

      (1) It is not clear whether integration of contrast information across relatively large arrays is a good test case for decision-related information in C1. The authors raise this issue in the Discussion, and I agree that it is all the more striking that they do find C1 choice probability. Nevertheless, I think the choice of task and stimuli should be explained in more detail.

      (2) In a similar vein, while C1 has canonical topographical properties at the grand-average level, these may differ substantially depending on individual anatomy (which the authors did not assess). This means that task-relevant information will be represented to different degrees in individuals' single-trial data. My guess is that this confound was mitigated precisely by choosing relatively extended stimulus arrays. But given the authors' impressive track record on C1 mapping and modeling, I was surprised that the underlying rationale is only roughly outlined. For example, given the topographies shown and the electrode selection procedure employed, I assume that the differences between upper and lower targets are mainly driven by stimulus arms on the main diagonal. Did the authors run pilot experiments with more restricted stimulus arrays? I do not mean to imply that such additional information needs to be detailed in the main article, but it would be worth mentioning.

      (3) Also, the stimulus arrangement disregards known differences in conduction velocity between the upper and lower visual fields. While no such differences are evident from the maximal-electrode averages shown in Figure 1B, it is difficult to assess this issue without single-stimulus VEPs and/or a dedicated latency analysis. The authors touch upon this issue when discussing potential pre-C1 signals emanating from the magnocellular pathway.

      (4) I suspect that most of these issues are at least partly related to a lack of clarity regarding levels of description: the authors often refer to 'information' contained in C1 or, apparently interchangeably, to 'visual representations' before, during, or following C1. However, if I understand correctly, the signal predicting (or predicted by) behavioral choice is much cruder than what an RSA-primed readership may expect, and also cruder than the other choice-predictive signals entered as control variables: namely, a univariate difference score on single-trial data integrated over a 10 ms window determined on the basis of grand-averaged data. I think it is worth clarifying and emphasizing the nature of this signal as the difference of aggregate contrast responses that *can* only be read out at higher levels of the visual system due to the limited extent of horizontal connectivity in V1. I do not think that this diminishes the importance of the findings - if anything, it makes them more remarkable.

      (5) Arguably even more remarkable is the finding that C1 amplitudes themselves appear to be influenced by choice history. The authors address this issue in the Discussion; however, I'm afraid I could not follow their argument regarding preparatory (and differential?) weighting of read-outs across the visual hierarchy. I believe this point is worth developing further, as it bears on the issue of whether C1 modulations are present and ecologically relevant when looking (before and) beyond stimulus-locked averages.

    1. eLife Assessment

      This study shows that excitatory cholecystokinin (CCK)-expressing neurons in hippocampal area CA3 influence hippocampal-dependent memory using multiple methods to manipulate excitatory CCK-expressing CA3 neurons selectively. The work is valuable because most past studies of CCK-expressing neurons have focused on those neurons that co-express CCK and GABA. Currently, the strength of evidence is incomplete; however, if additional evidence were to be presented that the methods were selective, the evaluation would potentially be higher.

    2. Reviewer #1 (Public review):

      Summary:

      CCK is the most abundant neuropeptide in the brain, and many studies have investigated the role of CCK and inhibitory CCK interneurons in modulating neural circuits, especially in the hippocampus. The manuscript presents interesting questions regarding the role of excitatory CCK+ neurons in the hippocampus, which has been much less studied compared to the well-known roles of inhibitory CCK neurons in regulating network function. The authors adopt several methods, including transgenic mice and viruses, optogenetics, chemogenetics, RNAi, and behavioral tasks to explore these less-studied roles of excitatory CCK neurons in CA3. They find that the excitatory CCK neurons are involved in hippocampal-dependent tasks such as spatial learning and memory formation, and that CCK-knockdown impairs these tasks.

      However, these questions are very dependent on ensuring that the study is properly targeting excitatory CCK neurons (and thus their specific contributions to behavior).

      There needs to be much more characterization of the CCK transgenic mice and viruses to confirm the targeting. Without this, it is unclear whether the study is looking at excitatory CCK neurons or a more general heterogeneous CCK neuron population.

      Strengths:

      This field has focused mainly on inhibitory CCK+ interneurons and their role in network function and activity, and thus, this manuscript raises interesting questions regarding the role of excitatory CCK+ neurons, which have been much less studied.

      Weaknesses:

      (1a) This manuscript is dependent on ensuring that the study is indeed investigating the role of excitatory CCK-expressing neurons themselves and their specific contribution to behavior. There needs to be much more characterization of the CCK-expressing mice (crossed with Ai14 or transduced with various viruses) to confirm the excitatory-cell targeting. Without this, it is unclear whether the study is looking at excitatory CCK neurons or a more general heterogeneous CCK neuron population.

      (1b) For the experiments that use a virus with the CCK-IRES-Cre mouse, there is no information or characterization on how well the virus targets excitatory CCK-expressing neurons. (Additionally, it has been reported that with CaMKIIa-driven protein expression, using viruses, can be seen in both pyramidal and inhibitory cells.)

      (2) The methods and figure legends are extremely sparse, leading to many questions regarding methodology and accuracy. More details would be useful in evaluating the tools and data. More details would be useful in evaluating the tools and data. Additionally, further quantification would be useful-e.g. in some places, only % values are noted, or only images are presented.

      (3) It is unclear whether the reduced CCK expression is correlated, or directly causing the impairments in hippocampal function. Does the CCK-shRNA have any additional detrimental effects besides affecting CCK-expression (e.g., is the CCK-shRNA also affecting some other essential (but not CCK-related) aspect of the neuron itself?)? Is there any histology comparison between the shRNA and the scrambled shRNA?

    3. Reviewer #2 (Public review):

      Summary:

      In this study, the authors have demonstrated, through a comprehensive approach combining electrophysiology, chemogenetics, fiber photometry, RNA interference, and multiple behavioral tasks, the necessity of projections from CCK+ CAMKIIergic neurons in the hippocampal CA3 region to the CA1 region for regulating spatial memory in mice. Specifically, authors have shown that CA3-CCK CAMKIIergic neurons are selectively activated by novel locations during a spatial memory task. Furthermore, authors have identified the CA3-CA1 pathway as crucial for this spatial working memory function, thereby suggesting a pivotal role for CA3 excitatory CCK neurons in influencing CA1 LTP. The data presented appear to be well-organized and comprehensive.

      Strengths:

      (1) This work combined various methods to validate the excitatory CCK neurons in the CA3 area; these data are convincing and solid.

      (2) This study demonstrated that the CA3-CCK CAMKIIergic neurons are involved in the spatial memory tasks; these are interesting findings, which suggest that these neurons are important targets for manipulating the memory-related diseases.

      (3) This manuscript also measured the endogenous CCK from the CA3-CCK CAMKIIergic neurons; this means that CCK can be released under certain conditions.

      Weaknesses:

      (1) The authors do not mention which receptors of the CCK modulate these processes.

      (2) This author does not test the CCK gene knockout mice or the CCK receptor knockout mice in these neural processes.

      (3) The author does not test the source of CCK release during the behavioral tasks.

    4. Reviewer #3 (Public review):

      Summary:

      Fengwen Huang et al. used multiple neuroscience techniques (transgenetic mouse, immunochemistry, bulk calcium recording, neural sensor, hippocampal-dependent task, optogenetics, chemogenetics, and interfer RNA technique) to elucidate the role of the excitatory cholecystokinin-positive pyramidal neurons in the hippocampus in regulating the hippocampal functions, including navigation and neuroplasticity.

      Strengths:

      (1) The authors provided the distribution profiles of excitatory cholecystokinin in the dorsal hippocampus via the transgenetic mice (Ai14::CCK Cre mice), immunochemistry, and retrograde AAV.

      (2) The authors used the neural sensor and light stimulation to monitor the CCK release from the CA3 area, indicating that CCK can be secreted by activation of the excitatory CCK neurons.

      (3) The authors showed that the activity of the excitatory CCK neurons in CA3 is necessary for navigation learning.

      (4) The authors demonstrated that inhibition of the excitatory CCK neurons and knockdown of the CCK gene expression in CA3 impaired the navigation learning and the neuroplasticity of CA3-CA1 projections.

      Weaknesses:

      (1) The causal relationship between navigation learning and CCK secretion?

      (2) The effect of overexpression of the CCK gene on hippocampal functions?

      (3) What are the functional differences between the excitatory and inhibitory CCK neurons in the hippocampus?

      (4) Do CCK sources come from the local CA3 or entorhinal cortex (EC) during the high-frequency electrical stimulation?

    1. eLife Assessment

      This study integrates large-scale behavioral, genetic, and molecular analyses in animal models to investigate morphine response. Utilizing high-quality, time-series Quantitative Trait Loci (QTL) mapping, the work provides compelling evidential support for novel, time-dependent genetic interactions (epistasis). A fundamental result of this rigorous analysis is the discovery of a novel Oprm1-Fgf12-MAPK signaling pathway, which offers new insights into the mechanisms of opioid sensitivity.

    2. Reviewer #1 (Public review):

      Summary:

      The study by Lemen et al. represents a comprehensive and unique analysis of gene networks in rat models of opioid use disorder, using multiple strains and both sexes. It provides a time-series analysis of Quantitative Trait Loci (QTLs) in response to morphine exposure.

      Strengths:

      A key finding is the identification of a previously unknown morphine-sensitive pathway involving Oprm1 and Fgf12, which activates a cascade through MAPK kinases in D1 medium spiny neurons (MSNs). Strengths include the large-scale, multi-strain, sex-inclusive design, the time-series QTL mapping provides dynamic insights, and the discovery of an Oprm1-Fgf12-MAPK signaling pathway in D1 MSNs, which is novel and relevant.

      Weaknesses:

      (1) The proposed involvement of Nav1.2 (SCN2A) as a downstream target of the Oprm1-Fgf12 pathway requires further analysis/evidence. Is Nav1.2 (SCN2A) expressed in D1 neurons?

      The authors mentioned that SCN8A (Nav1.6) was tested as a candidate mediator of Oprm1-Fgf12 loci and variation in locomotor activity. However, the proposed model supports SCN2A as a target rather than SCN8A. This is somewhat unexpected since SCN8A is highly abundant in MSN.

      Can the authors provide expression data for SCN2A, Oprm1, and Fgf12 in D1 vs. D2 MSNs?

      (2) The authors should consider adding a reference to FGF12 in Schizophrenia (PMC8027596) in the Introduction.

      (3) There is recent evidence supporting the druggability of other intracellular FGFs, such as FGF14 (PMC11696184) and FGF13 (PMC12259270), through their interactions with Nav channels. What are the implications of these findings for drug discovery in the context of the present study? Could FGF12 be considered a potential druggable therapeutic target for opioid use disorder (OUD)?

    3. Reviewer #2 (Public review):

      Summary:

      This highly novel and significant manuscript re-analyzes behavioral QTL data derived from morphine locomotor activity in the BXD recombinant inbred panel. The combination of interacting behavioral-pharmacology (morphine and naltrexone) time course data, high-resolution mouse genetic analyses, genetic analysis of gene expression (eQTLs), cross-species analysis with human gene expression and genetic data, and molecular modeling approaches with Bayesian network analysis produces new information on loci modulating morphine locomotor activity.

      Furthermore, the identification of time-wise epistatic interactions between the Oprm1 and Fgf12 loci is highly novel and points to methodological approaches for identifying other epistatic interactions using animal model genetic studies.

      Strengths:

      (1) Use of state-of-the art genetic tools for mapping behavioral phenotypes in mouse models.

      (2) Adequately powered analysis incorporating both sexes and time course analyses.

      (3) Detection of time and sex-dependent interactions of two QTL loci modulating morphine locomotor activity.

      (4) Identification of putative candidate genes by combined expression and behavioral genetic analyses.

      (5) Use of Bayesian analysis to model causal interactions between multiple genes and behavioral time points.

      Weaknesses:

      (1) There is a need for careful editing of the text and figures to eliminate multiple typographical and other compositional errors.

      (2) There are multiple examples of overstating the possible significance of results that should be corrected or at least directly pointed out as weaknesses in the Discussion. These include:

      a) Assumption that the Oprm1 gene is the causal candidate gene for the major morphine locomotor Chr10 QTL at the early time epochs. Oprm1 is 400,000 bp away from the support interval of the Mor10a QTL locus, and there is no mention as to whether the Oprm1 mRNA eQTL overlaps with Mor10a.

      b) Although the Bayesian analysis of possible complex interactions between Oprm1, Fgf12, other interacting genes, and behaviors is very innovative and produces testable hypotheses, a more straightforward mediation analysis of causal relationships between genotype, gene expression, and phenotype would have added strength to the arguments for the causal role of these individual genes.

      c) The GWAS data analysis for Oprm1 and Fgf12 is incomplete in not mentioning actual significance levels for Oprm1 and perhaps overstating the nominal significance findings for Fgf12.

      Appraisal:

      The authors largely succeeded in reaching goals with novel findings and methodology.

      Significance of Findings:

      This study will likely spur future direct experimental studies to test hypotheses generated by this complex analysis. Additionally, the broad methodological approach incorporating time course genetic analyses may encourage other studies to identify epistatic interactions in mouse genetic studies.

    4. Reviewer #3 (Public review):

      Summary:

      This is a clearly written paper that describes the reanalysis of data from a BXD study of the locomotor response to morphine and naloxone. The authors detect significant loci and an epistatic interaction between two of those loci. Single-cell data from outbred rats is used to investigate the interaction. The authors also use network methods and incorporate human data into their analysis.

      Strengths:

      One major strength of this work is the use of granular time-series data, enabling the identification of time-point-specific QTL. This allowed for the identification of an additional, distinct QTL (the Fgf12 locus) in this work compared to previously published analysis of these data, as well as the identification of an epistatic effect between Oprm1 (driving early stages of locomotor activation) and Fgf12 (driving later stages).

      Weaknesses:

      (1) What criteria were used to determine whether the epistatic interaction was significant? How many possible interactions were explored?

      (2) Results are presented for males and females separately, but the decision to examine the two sexes separately was never explained or justified. Since it is not standard to perform GWAS broken down by sex, some initial explanation of this decision is needed. Perhaps the discussion could also discuss what (if anything) was learned as a result of the sex-specific analysis. In the end, was it useful?

      (3) The confidence intervals for the results were not well described, although I do see them in one of the tables. The authors used a 1.5 support interval, but didn't offer any justification for this decision. Is that a 95% confidence interval? If not, should more consideration have been given to genes outside that interval? For some of the QTLs that are not the focus of this paper, the confidence intervals were very large (>10 Mb). Is that typical for BXDs?

    1. eLife Assessment

      This important study presents JABS, an open-source platform that integrates hardware and user-friendly software for standardized mouse behavioral phenotyping. The work has practical implications for improving reproducibility and accessibility in behavioral neuroscience, especially for linking behavior to genetics across diverse mouse strains. The strength of evidence is convincing, with comprehensive validation of the platform's components and enthusiastic reviewer support.

    2. Reviewer #1 (Public review):

      Summary:

      This manuscript provides an open-source tool including hardware and software, and dataset to facilitate and standardize behavioral classification in laboratory mice. The hardware for behavioral phenotyping was extensively tested for safety. The software is GUI based facilitating the usage of this tool across the community of investigators that do not have a programming background. The behavioral classification tool is highly accurate, and the authors deposited a large dataset of annotations and pose tracking for many strains of mice. This tool has great potential for behavioral scientists that use mice across many fields, however there are many missing details that currently limit the impact of this tool and publication.

      Strengths:

      Software-hardware integration for facilitating cross-lab adaptation of the tool and minimizing the need to annotate new data for behavioral classification.

      Data from many strains of mice was included in the classification and genetic analyses in this manuscript.

      Large dataset annotated was deposited for the use of the community

      GUI based software tool decreases barriers of usage across users with limited coding experience.

      Weaknesses:

      The GUI requires pose tracking for classification but, the software provided in JABS does not do pose tracking, so users must do pose tracking using a separate tool. The pose tracking quality directly impacts the classification quality, given that it is used for the feature calculation

      Comments on revisions:

      The authors addressed all my concerns.

    3. Reviewer #2 (Public review):

      Summary:

      This manuscript presents the JAX Animal Behavior System (JABS), an integrated mouse phenotyping platform that includes modules for data acquisition, behavior annotation, and behavior classifier training and sharing. The manuscript provides details and validation for each module, demonstrating JABS as a useful open-source behavior analysis tool that removes barriers to adopting these analysis techniques by the community. In particular, with the JABS-AI module users can download and deploy previously trained classifiers on their own data, or annotate their own data and train their own classifiers. The JABS-AI module also allows users to deploy their classifiers on the JAX strain survey dataset and receive an automated behavior and genetic report.

      Strengths:

      (1) The JABS platform addresses the critical issue of reproducibility in mouse behavior studies by providing an end-to-end system from rig setup to downstream behavioral and genetic analyses. Each step has clear guidelines, and the GUIs are an excellent way to encourage best practices for data storage, annotation, and model training. Such a platform is especially helpful for labs without prior experience in this type of analysis.

      (2) A notable strength of the JABS platform is its reuse of large amounts of previously collected data at JAX Labs, condensing this into pretrained pose estimation models and behavioral classifiers. JABS-AI also provides access to the strain survey dataset through automated classifier analyses, allowing large-scale genetic screening based on simple behavioral classifiers. This has the potential to accelerate research for many labs by identifying particular strains of interest.

      (3) The ethograph analysis will be a useful way to compare annotators/classifiers beyond the JABS platform.

      Weaknesses:

      (1) The manuscript contains many assertions that lack references in both the Introduction and Discussion. For example, in the Discussion, the assertion "published research demonstrates that keypoint detection models maintain robust performance despite the presence of headstages and recording equipment" lacks reference.

      (2) The provided GUIs lower the barrier to entry for labs that are just starting to collect and analyze mouse open field behavior data. However, users must run pose estimation themselves outside of the provided GUIs, which introduces a key bottleneck in the processing pipeline, especially for users without strong programming skills. The authors have provided pretrained pose estimation models and an example pipeline, which is certainly to be commended, but I believe the impact of these tools could be greatly magnified by an additional pose estimation GUI (just for running inference, not for labeling/training).

      (3) While the manuscript does a good job of laying out best practices, there is an opportunity to further improve reproducibility for users of the platform. The software seems likely to perform well with perfect setups that adhere to the JABS criteria, but it is very likely there will be users with suboptimal setups - poorly constructed rigs, insufficient camera quality, etc. It is important, in these cases, to give users feedback at each stage of the pipeline so they can understand if they have succeeded or not. Quality control (QC) metrics should be computed for raw video data (is the video too dark/bright? are there the expected number of frames? etc.), pose estimation outputs (do the tracked points maintain a reasonable skeleton structure; do they actually move around the arena?), and classifier outputs (what is the incidence rate of 1-3 frame behaviors? a high value could indicate issues). In cases where QC metrics are difficult to define (they are basically always difficult to define), diagnostic figures showing snippets of raw data or simple summary statistics (heatmaps of mouse location in the open field) could be utilized to allow users to catch glaring errors before proceeding to the next stage of the pipeline, or to remove data from their analyses if they observe critical issues.

      Comments on revisions:

      I thank the authors for taking the time to address my comments. They have provided a lot of important context in their responses. My only remaining recommendation is to incorporate more of this text into the manuscript itself, as this context will also be interesting/important for readers (and potential users) to consider. Specifically:

      the quality control/user feedback features that have already been implemented (these are extremely important, and unfortunately, not standard practice in many labs)

      top-down vs bottom-up imaging trade-offs (you make very good points!)

      video compression, spatial and temporal resolution trade-offs

      more detail on why the authors chose pose-based rather than pixel-based classifiers

      I believe the proposed system can be extremely useful for behavioral neuroscientists, especially since the top-down freely moving mouse paradigm is one of the most ubiquitous in the field. Many labs have reinvented the wheel here, and as a field it makes sense to coalesce around a set of pipelines and best practices to accelerate the science we all want to do. I make the above recommendation with this in mind: bringing together (properly referenced) observations and experiences of the authors themselves, as well as others in the field, provides a valuable resource for the community. Obviously, the main thrust of the manuscript should be about the tools themselves; it should not turn into a review paper, so I'm just suggesting some additional sentences/references sprinkled throughout as motivation for why the authors made the choices that they did.

      Intro typo: "one link in the chainDIY rigs"

    4. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      (1) The authors only report the quality of the classification considering the number of videos used for training, but not considering the number of mice represented or the mouse strain. Therefore, it is unclear if the classification model works equally well in data from all the mouse strains tested, and how many mice are represented in the classifier dataset and validation.

      We agree that strain-level performance is critical for assessing generalizability. In the revision we now report per-strain accuracy and F1 for the grooming classifier, which was trained on videos spanning 60 genetically diverse strains (n = 1100 videos) and evaluated on the test set videos spanning 51 genetically diverse strains (n=153 videos). Performance is uniform across most strains (median F1 = 0.94, IQR = 0.899–0.956), with only modest declines in albino lines that lack contrast under infrared illumination; this limitation and potential remedies are discussed in the text. The new per-strain metrics are presented in the Supplementary figure (corresponding to Figure 4).

      (2) The GUI requires pose tracking for classification, but the software provided in JABS does not do pose tracking, so users must do pose tracking using a separate tool. Currently, there is no guidance on the pose tracking recommendations and requirements for usage in JABS. The pose tracking quality directly impacts the classification quality, given that it is used for the feature calculation; therefore, this aspect of the data processing should be more carefully considered and described.

      We have added a section to the methods describing how to use the pose estimation models used in JABS. The reviewer is correct that pose tracking quality will impact classification quality. We recommend that classifiers should only be re-used on pose files generated by the same pose models used in the behavior classifier training dataset. We hope that the combination of sharing classifier training data and making a more unified framework for developing and comparing classifiers will get us closer to having foundational behavior classification models that work in many environments. We also would like to emphasize that deviating from using our pose model will also likely hinder re-using our shared large datasets in JABS-AI (JABS1200, JABS600, JABS-BxD).

      (3) Many statistical and methodological details are not described in the manuscript, limiting the interpretability of the data presented in Figures 4,7-8. There is no clear methods section describing many of the methods used and equations for the metrics used. As an example, there are no details of the CNN used to benchmark the JABS classifier in Figure 4, and no details of the methods used for the metrics reported in Figure 8.

      We thank the reviewer for bringing this to our attention. We have added a methods section to the manuscript to address this concern. Specifically, we now provide: (1) improved citation visibility of the source of CNN experiments such that the reader can locate the architecture information, (2) mathematical formulations for all performance metrics (precision, recall, F1, …) with explicit equations;  (3) detailed statistical procedures including permutation testing methods, power analysis and multiple testing corrections used throughout Figures 7-8. These additions facilitate reproducibility and proper interpretation of all quantitative results presented in the manuscript.

      Reviewer #2 (Public review):

      (1) The manuscript as written lacks much-needed context in multiple areas: what are the commercially available solutions, and how do they compare to JABS (at least in terms of features offered, not necessarily performance)? What are other open-source options?

      JABS adds to a list of commercial and open source animal tracking platforms. There are several reviews and resources that cover these technologies. JABS covers hardware, behavior prediction, a shared resource for classifiers, and genetic association studies. We’re not aware of another system that encompasses all these components. Commercial packages such as EthoVision XT and HomeCage Scan give users a ready-made camera-plus-software solution that automatically tracks each mouse and reports simple measures such as distance travelled or time spent in preset zones, but they do not provide open hardware designs, editable behavior classifiers, or any genetics workflow. At the open-source end, the >100 projects catalogued on OpenBehavior and summarised in recent reviews (Luxem et al., 2023; Işık & Ünal 2023) usually cover only one link in the chain—DIY rigs, pose-tracking libraries (e.g., DeepLabCut, SLEAP) or supervised and unsupervised behaviour-classifier pipelines (e.g., SimBA, MARS, JAABA, B-SOiD, DeepEthogram). JABS provides an open source ecosystem that integrates all four: (i) top-down arena hardware with parts list and assembly guide; (ii) an active-learning GUI that produces shareable classifiers; (iii) a public web service that enables sharing of the trained classifier and applies any uploaded classifier to a large and diverse strain survey; and (iv) built-in heritability, genetic-correlation and GWAS reporting. We have added a concise paragraph in the Discussion that cites these resources and makes this end-to-end distinction explicit.

      (2) How does the supervised behavioral classification approach relate to the burgeoning field of unsupervised behavioral clustering (e.g., Keypoint-MoSeq, VAME, B-SOiD)? 

      The reviewer raises an important point about the rapidly evolving landscape of automated behavioral analysis, where both supervised and unsupervised approaches offer complementary strengths for different experimental contexts. Unsupervised methods like Keypoint-MoSeq , VAME , and B-SOiD , which prioritize motif discovery from unlabeled data but may yield less precise alignments with expert annotations, as evidenced by lower F1 scores in comparative evaluations. Supervised approaches (like ours), by contrast, employ fully supervised classifiers to deliver frame-accurate, behavior-specific scores that align directly with experimental hypotheses. Ultimately, a pragmatic hybrid strategy, starting with unsupervised pilots to identify motifs and transitioning to supervised fine-tuning with minimal labels, can minimize annotation burdens and enhance both discovery and precision in ethological studies. This has been added in the discussion section of the manuscript.

      (3) What kind of studies will this combination of open field + pose estimation + supervised classifier be suitable for? What kind of studies is it unsuited for? These are all relevant questions that potential users of this platform will be interested in.

      This approach is suitable for a wide array of neuroscience, genetics, pharmacology, preclinical, and ethology studies. We have published in the domains of action detection for complex behaviors such as grooming, gait and posture, frailty, nociception, and sleep. We feel these tools are indispensable for modern behavior analysis. 

      (4) Throughout the manuscript, I often find it unclear what is supported by the software/GUI and what is not. For example, does the GUI support uploading videos and running pose estimation, or does this need to be done separately? How many of the analyses in Figures 4-6 are accessible within the GUI?

      We have now clarified these. The JABS framework comprises two distinct GUI applications with complementary functionalities. The JABS-AL (active learning) desktop application handles video upload, behavioral annotation, classifier training, and inference -- it does not perform pose estimation, which must be completed separately using our pose tracking pipeline (https://github.com/KumarLabJax/mouse-tracking-runtime). If a user does not want to use our pose tracking pipeline, we have provided conversions through SLEAP to convert to our JABS pose format.  The web-based GUI enables classifier sharing and cloud-based inference on our curated datasets (JABS600, JABS1200) and downstream behavioral statistics and genetic analyses (Figures 4-6). The JABS-AL application also supports CLI (command line interface) operation for batch processing.  We have clarified these distinctions and provided a comprehensive workflow diagram in the revised Methods section.

      (5) While the manuscript does a good job of laying out best practices, there is an opportunity to further improve reproducibility for users of the platform. The software seems likely to perform well with perfect setups that adhere to the JABS criteria, but it is very likely that there will be users with suboptimal setups - poorly constructed rigs, insufficient camera quality, etc. It is important, in these cases, to give users feedback at each stage of the pipeline so they can understand if they have succeeded or not. Quality control (QC) metrics should be computed for raw video data (is the video too dark/bright? are there the expected number of frames? etc.), pose estimation outputs (do the tracked points maintain a reasonable skeleton structure; do they actually move around the arena?), and classifier outputs (what is the incidence rate of 1-3 frame behaviors? a high value could indicate issues). In cases where QC metrics are difficult to define (they are basically always difficult to define), diagnostic figures showing snippets of raw data or simple summary statistics (heatmaps of mouse location in the open field) could be utilized to allow users to catch glaring errors before proceeding to the next stage of the pipeline, or to remove data from their analyses if they observe critical issues.

      These are excellent suggestions that align with our vision for improving user experience and data quality assessment. We recognize the critical importance of providing users with comprehensive feedback at each stage of the pipeline to ensure optimal performance across diverse experimental setups. Currently, we provide end-users with tools and recommendations to inspect their own data quality. In our released datasets (Strain Survey OFA and BXD OFA), we provide video-level quality summaries for coverage of our pose estimation models. 

      For behavior classification quality control, we employ two primary strategies to ensure proper operation: (a) outlier manual validation and (b) leveraging known characteristics about behaviors. For each behavior that we predict on datasets, we manually inspect the highest and lowest expressions of this behavior to ensure that the new dataset we applied it to maintains sufficient similarity. For specific behavior classifiers, we utilize known behavioral characteristics to identify potentially compromised predictions. As the reviewer suggested, high incidence rates of 1-3 frame bouts for behaviors that typically last multiple seconds would indicate performance issues.

      We currently maintain in-house post-processing scripts that handle quality control according to our specific use cases. Future releases of JABS will incorporate generalized versions of these scripts, integrating comprehensive QC capabilities directly into the platform. This will provide users with automated feedback on video quality, pose estimation accuracy, and classifier performance, along with diagnostic visualizations such as movement heatmaps and behavioral summary statistics.

      Reviewer #1 (Recommendations for the authors):

      (1) A weakness of this tool is that it requires pose tracking, but the manuscript does not detail how pose tracking should be done and whether users should expect that the data deposited will help their pose tracking models. There is no specification on how to generate pose tracking that will be compatible with JABS. The classification quality is directly linked to the quality of the pose tracking. The authors should provide more details of the requirements of the pose tracking (skeleton used) and what pose tracking tools are compatible with JABS. In the user website link, I found no such information. Ideally, JABS would be integrated with the pose tracking tool into a single pipeline. If that is not possible, then the utility of this tool relies on more clarity on which pose tracking tools are compatible with JABS.

      The JABS ecosystem was deliberately designed with modularity in mind, separating the pose estimation pipeline from the active learning and classification app (JABS-AL) to offer greater flexibility and scalability for users working across diverse experimental setups. Our pose estimation pipeline is documented in detail within the new Methods subsection, outlining the steps to obtain JABS-compatible keypoints with our recommended runtime (https://github.com/KumarLabJax/mouse-tracking-runtime) and frozen inference models (https://github.com/KumarLabJax/deep-hrnet-mouse). This pipeline is an independent component within the broader JABS workflow, generating skeletonized keypoint data that are then fed into the JABS-AL application for behavior annotation and classifier training.

      By maintaining this separation, users have the option to use their preferred pose tracking tools— such as SLEAP —while ensuring compatibility through provided conversion utilities to the JABS skeleton format. These details, including usage instructions and compatibility guidance, are now thoroughly explained in the newly added pose estimation subsection of our Methods section. This modular design approach ensures that users benefit from best-in-class tracking while retaining the full power and reproducibility of our active learning pipeline.

      (2) The authors should justify why JAABA was chosen to benchmark their classifier. This tool was published in 2013, and there have been other classification tools (e.g., SIMBA) published since then.  

      We appreciate the reviewer’s suggestion regarding SIMBA. However, our comparisons to JAABA and a CNN are based on results from prior work (Geuther, Brian Q., et al. "Action detection using a neural network elucidates the genetics of mouse grooming behavior." Elife 10 (2021): e63207.), where both were used to benchmark performance on our publicly released dataset. In this study, we introduce JABS as a new approach and compare it against those established baselines. While SIMBA may indeed offer competitive performance, we believe the responsibility to demonstrate this lies with SIMBA’s authors, especially given the availability of our dataset for benchmarking.

      (3) I had a lot of trouble understanding the elements of the data calculated in JABS vs outside of JABS. This should be clarified in the manuscript.

      (a) For example, it was not intuitive that pose tracking was required and had to be done separately from the JABS pipeline. The diagrams and figures should more clearly indicate that.

      (b) In section 2.5, are any of those metrics calculated by JABS? Another software GEMMA, but no citation is provided for this tool. This created ambiguity regarding whether this is an analysis that is separate from JABS or integrated into the pipeline.  

      We acknowledge the confusion regarding the delineation between JABS components and external tools, and we have comprehensively addressed this throughout the manuscript. The JABS ecosystem consists of three integrated modules: JABS-DA (data acquisition), JABS-AL (active learning for behavior annotation and classifier training), and JABS-AI (analysis and integration via web application). Pose estimation, while developed by our laboratory, operates as a preprocessing pipeline that generates the keypoint coordinates required for subsequent JABS classifier training and annotation workflows. We have now added a dedicated Methods subsection that explicitly maps each analytical step to its corresponding software component, clearly distinguishing between core JABS modules and external tools (such as GEMMA for genetic analysis). Additionally, we have provided proper citations and code repositories for all external pipelines to ensure complete transparency regarding the computational workflow and enable full reproducibility of our analyses.

      (4) There needs to be clearer explanations of all metrics, methods, and transformations of the data reported.

      (a) There is very little information about the architecture of the classification model that JABS uses.

      (b) There are no details on the CNN used for comparing and benchmarking the classifier in JABS.

      (c) Unclear how the z-scoring of the behavioral data in Figure 7 was implemented.

      (d) There is currently no information on how the metrics in Figure 8 are calculated.

      We have added a comprehensive Methods section that not only addresses the specific concerns raised above but provides complete methodological transparency throughout our study. This expanded section includes detailed descriptions of all computational architectures (including the JABS classifier and grooming benchmark models and metrics), statistical procedures and data transformations (including the z-scoring methodology for Figure 7), downstream genetic analysis (including all measures presented in Figure 8), and preprocessing pipelines. 

      (5) The authors talk about their datasets having visual diversity, but without seeing examples, it is hard to know what they mean by this visual diversity. Ideally, the manuscript would have a supplementary figure with a representation of the variety of setups and visual diversity represented in the datasets used to train the model. This is important so that readers can quickly assess from reading the manuscript if the pre-trained classifier models could be used with the experimental data they have collected.

      The visual diversity of our training datasets has been comprehensively documented in our previous tracking work (https://www.nature.com/articles/s42003-019-0362-1), which systematically demonstrates tracking performance across mice with diverse coat colors (black, agouti, albino, gray, brown, nude, piebald), body sizes including obese mice, and challenging recording conditions with dynamic lighting and complex environments. Notably, Figure 3B in that publication specifically illustrates the robustness across coat colors and body shapes that characterize the visual diversity in our current classifier training data. To address the reviewer's concern and enable readers to quickly assess the applicability of our pre-trained models to their experimental data, we have now added this reference to the manuscript to ground our claims of visual diversity in published evidence.

      (6) All figures have a lot of acronyms used that are not defined in the figure legend. This makes the figures really hard to follow. The figure legends for Figures 1,2, 7, and 9 did not have sufficient information for me to comprehend the figure shown.

      We have fixed this in the manuscript. 

      (7) In the introduction, the authors talk about compression artifacts that can be introduced in camera software defaults. This is very vague without specific examples.

      This is a complex topic that balances the size and quality of video data and is beyond the scope of this paper. We have carefully optimized this parameter and given the user a balanced solution. A more detailed blog post on compression artifacts can be found at our lab’s webpage (https://www.kumarlab.org/2018/11/06/brians-video-compression-tests/). We have also added a comment about keyframes shifting temporal features in the main manuscript. 

      (8) More visuals of the inside of the apparatus should be included as supplementary figures. For example, to see the IR LEDs surrounding the camera.

      We have shared data from JABS as part of several papers including the tracking paper (Geuther et al 2019), grooming, gait and posture, mouse mass. We have also released entire datasets that as part of this paper (JABS1800, JABS-BXD). We also have step by step assembly guide that shows the location of the lights/cameras and other parts (see Methods, JABS workflow guide, and this PowerPoint file in the GitHub repository (https://github.com/KumarLabJax/JABS-datapipeline/blob/main/Multi-day%20setup%20PowerPoint%20V3.pptx).

      (9) Figure 2 suggests that you could have multiple data acquisition systems simultaneously. Do each require a separate computer? And then these are not synchronized data across all boxes?

      Each JABS-DA unit has its own edge device (Nvidia Jetson). Each system (which we define as multiple JABS-DA areas associated with one lab/group) can have multiple recording devices (arenas). The system requires only 1 control portal (RPi computer) and can handle as many recording devices as needed (Nvidia computer w/ camera associated with each JABS-DA arena). To collect data, 1 additional computer is needed to visit the web control portal and initiate a recording session. Since this is a web portal, users can use any computer or a tablet. The recording devices are not strictly synchronized but can be controlled in a unified manner.

      (10) The list of parts on GitHub seems incomplete; many part names are not there.

      We thank referee for bringing this to our attention. We have updated the GitHub repository (and its README) which now links out to the design files. 

      (11) The authors should consider adding guidance on how tethers and headstages are expected to impact the use of JABS, as many labs would be doing behavioral experiments combined with brain measurements.

      While our pose estimation model was not specifically trained on tethered animals, published research demonstrates that keypoint detection models maintain robust performance despite the presence of headstages and recording equipment. Once accurate pose coordinates are extracted, the downstream behavior classification pipeline operates independently of the pose estimation method and would remain fully functional. We recommend users validate pose estimation accuracy in their specific experimental setup, as the behavior classification component itself is agnostic to the source of pose coordinates.

      Reviewer #2 (Recommendations for the authors):

      (1) "Using software-defaults will introduce compression artifacts into the video and will affect algorithm performance." Can this be quantified? I imagine most of the performance hit comes from a decrease in pose estimation quality. How does a decrease in pose estimation quality translate to action segmentation? Providing guidelines to potential users (e.g., showing plots of video compression vs classifier performance) would provide valuable information for anyone looking to use this system (and could save many labs countless hours replicating this experiment themselves). A relevant reference for the effect of compression on pose estimation is Mathis, Warren 2018 (bioRxiv): On the inference speed and video-compression robustness of DeepLabCut.

      Since our behavior classification approach depends on features derived from keypoint, changes in keypoint accuracy will affect behavior segmentation accuracy. We agree that it is important to try and understand this further, particularly with the shared bioRxiv paper investigating the effect of compression on pose estimation accuracy. Measuring the effect of compression on keypoint and behavior classification is a complex task to evaluate concisely, given the number of potential variables to inspect. To list a few variables that should be investigated are: discrete cosine transform quality (Mathis, Warren experiment), Frame Size (Mathis, Warren experiment), Keyframe Interval (new, unique to video data), inter-frame settings (new, unique to video data), behavior of interest, Pose models with compression-augmentation used in training ( https://arxiv.org/pdf/1506.08316?) and type of CNN used (under active development). The simplest recommendation that we can make at this time is that we know compression will affect behavior predictions and that users should be cautious about using our shared classifiers on compressed video data. To show that we are dedicated in sharing these results as we run those experiments, in a related work ( CV4Animals conference accepted paper (https://www.cv4animals.com/) and can be downloaded here https://drive.google.com/file/d/1UNQIgCUOqXQh3vcJbM4QuQrq02HudBLD/view) we have already begun to inspect how changing some factors affect behavior segmentation performance. In this work, we investigate the robustness of behavior classification across multiple behaviors using different keypoint subsets. Our findings in this work is that classifiers are relatively stable across different keypoint subsets. We are actively working on follow-up effort to investigate the effect of keypoint noise, CNN model architecture, and other factors we've listed above on behavior segmentation tasks.

      (2) The analysis of inter-annotator variability is very interesting. I'm curious how these differences compare to two other types of variability:

      (a) intra-annotator variability; I think this is actually hard to quantify with the presented annotation workflow. If a given annotator re-annotated a set of videos, but using different sparse subsets of the data, it is not possible to disentangle annotator variability versus the effect of training models on different subsets of data. This can only be rigorously quantified if all frames are labeled in each video.

      We propose an alternative approach to behavior classifier development in the text associated with Figure 3C. We do not advocate for high inter-annotator agreement since individual behavior experts have differing labeling style (an intuitive understanding of the behavior). Rather, we allow multiple classifiers for the same behavior and allow the end user to prioritize classifiers based on heritability of the behavior from a classifier.  

      (b) In lieu of this, I'd be curious to see the variability in model outputs trained on data from a single annotator, but using different random seeds or train/val splits of the data. This analysis would provide useful null distributions for each annotator and allow for more rigorous statistical arguments about inter-annotator variability. 

      JABS allows the user to use multiple classifiers (random forest, XGBoost). We do not expect the user to carry out hyperparameter tuning or other forms of optimization. We find that the major increase in performance comes from optimizing the size of the window features and folds of cross validation. However, future versions of JABS-AL could enable a complete hyper-parameter scan across seeds and data splits to obtain a null distribution for each annotator. 

      (c) I appreciate the open-sourcing of the video/pose datasets. The authors might also consider publicly releasing their pose estimation and classifier training datasets (i.e., data plus annotations) for use by method developers.

      We thank the referee for acknowledging our commitment to open data sharing practices. Building upon our previously released strain survey dataset, we have now also made our complete classifier training resources publicly available, including the experimental videos, extracted pose coordinates, and behavioral annotations. The repository link has been added to the manuscript to ensure full reproducibility and facilitate community adoption of our methods.  

      (3) More thorough discussion on the limitations of the top-down vs bottom-up camera viewpoint; are there particular scientific questions that are much better suited to bottomup videos (e.g., questions about paw tremors, etc.).

      Top-down imaging, bottom-up, and multi-view imaging have a variety of pros and cons. Generally speaking, multi-view imaging will provide the most accurate pose models but requires increased resources on both hardware setup as well as processing of data. Top-down provides the advantage of flexibility for materials, since the floor doesn’t need to be transparent. Additionally lighting and potential reflection with the bottom-up perspective. Since the paws are not occluded from the bottom-up perspective, models should have improved paw keypoint precision allowing the model to observe more subtle behaviors. However, the appearance of the arena floor will change over time as the mice defecate and urinate. Care must be taken to clean the arena between recordings to ensure transparency is maintained. This doesn’t impact top-down imaging that much but will occlude or distort from the bottom-up perspective. Additionally, the inclusion of bedding for longer recordings, which is required by IACUC, will essentially render bottom-up imaging useless because the bedding will completely obscure the mouse. Overall, while bottomup may provide a precision benefit that will greatly enhance subtle motion, top-down imaging is overall more robust for obtaining consistent imaging across large experiments for longer periods of time.

      (4) More thorough discussion on what kind of experiments would warrant higher spatial or temporal resolution (e.g., investigating slight tremors in a mouse model of neurodegenerative disease might require this greater resolution).

      This is an important topic that deserves its own perspective guide. We try to capture some of this in the paper on specifications. However, we only scratch the surface. Overall, there are tradeoffs between frame rate, resolution, color/monochrome, and compression. Labs have collected data at hundreds of frames per second to capture the kinetics of reflexive behavior for pain (AbdoosSaboor lab) or whisking behavior. Labs have also collected data a low 2.5 frames per second for tracking activity or centroid tracking (see Kumar et al PNAS). The data collection specifications are largely dependent on the behaviors being captured. Our rule of thumb is the Nyquist Limit, which states that the data capture rate needs to be twice that of the frequency of the event. For example, certain syntaxes of grooming occur at 7Hz and we need 14FPS to capture this data. JABS collects data at 30FPS, which is a good compromise between data load and behavior rate. We use 800x800 pixel resolution which is a good compromise to capture animal body parts while limiting data size. Thank you for providing the feedback that the field needs guidance on this topic. We will work on creating such guidance documents for video data acquisition parameters to capture animal behavior data for the community as a separate publication.

      (5) References 

      (a) Should add the following ref when JAABA/MARS are referenced: Goodwin et al.2024, Nat Neuro (SimBA)

      (b) Could also add Bohnslav et al. 2021, eLife (DeepEthogram).

      (c) The SuperAnimal DLC paper (Ye et al. 2024, Nature Comms) is relevant to the introduction/discussion as well.

      We thank the referee for the suggestions. We have added these references.  

      (6) Section 2.2:

      While I appreciate the thoroughness with which the authors investigated environmental differences in the JABS arena vs standard wean cage, this section is quite long and eventually distracted me from the overall flow of the exposition; might be worth considering putting some of the more technical details in the methods/appendix.

      These are important data for adopters of JABS to gain IACUC approval in their home institution. These committees require evidence that any new animal housing environment has been shown to be safe for the animals. In the development of JABS, we spent a significant amount of time addressing the JAX veterinary and IACUC concerns. Therefore, we propose that these data deserve to be in the main text. 

      (7) Section 2.3.1:

      (a) Should again add the DeepEthogram reference here

      (b) Should reference some pose estimation papers: DeepLabCut, SLEAP, Lightning Pose. 

      We thank the referee for the suggestions. We have added these references.  

      (c) "Pose based approach offers the flexibility to use the identified poses for training classifiers for multiple behaviors" - I'm not sure I understand why this wouldn't be possible with the pixel-based approach. Is the concern about the speed of model training? If so, please make this clearer.

      The advantage lies not just in training speed, but in the transferability and generalization of the learned representations. Pose-based approaches create structured, low-dimensional latent embeddings that capture behaviorally relevant features which can be readily repurposed across different behavioral classification tasks, whereas pixel-based methods require retraining the entire feature extraction pipeline for each new behavior. Recent work demonstrates that pose-based models achieve greater data efficiency when fine-tuned for new tasks compared to pixel-based transfer learning approaches [1], and latent behavioral representations can be partitioned into interpretable subspaces that generalize across different experimental contexts [2]. While pixel-based approaches can achieve higher accuracy on specific tasks, they suffer from the "curse of dimensionality" (requiring thousands of pixels vs. 12 pose coordinates per frame) and lack the semantic structure that makes pose-based features inherently reusable for downstream behavioral analysis.

      (1) Ye, Shaokai, et al. "SuperAnimal pretrained pose estimation models for behavioral analysis." Nature communications 15.1 (2024): 5165.

      (2) Whiteway, Matthew R., et al. "Partitioning variability in animal behavioral videos using semi-supervised variational autoencoders." PLoS computational biology 17.9 (2021): e1009439.  

      (d) The pose estimation portion of the pipeline needs more detail. Do users use a pretrained network, or do they need to label their own frames and train their own pose estimator? If the former, does that pre-trained network ship with the software? Is it easy to run inference on new videos from a GUI or scripts? How accurate is it in compliant setups built outside of JAX? How long does it take to process videos?

      We have added the guidance on pose estimation in the manuscript (section “2.3.1 Behavior annotation and classifier training” and in the methods section titled “Pose tracking pipeline”)

      (e) The final paragraph describing how to arrive at an optimal classifier is a bit confusing - is this the process that is facilitated by the app, or is this merely a recommendation for best practices? If this is the process the app requires, is it indeed true that multiple annotators are required? While obviously good practice, I imagine there will be many labs that just want a single person to annotate, at least in the beginning prototyping stages. Will the app allow training a model with just a single annotator?

      We have clarified this in the text. 

      (8) Section 2.5:

      (a) This section contained a lot of technical details that I found confusing/opaque, and didn't add much to my overall understanding of the system; sec 2.6 did a good job of clarifying why 2.5 is important. It might be worth motivating 2.5 by including the content of 2.6 first, and moving some of the details of 2.5 to the method/appendix.

      We moved some of the technical details in section 2.5 to the methods section titled “Genetic analysis”. Furthermore, we have added few statements to motivate the need of genetic analysis and how the webapp can facilitate this (which is introduced in the section 2.6)    

      (9) Minor corrections:

      (a) Bottom of first page, "always been behavior quantification task" missing "a".

      (b) "Type" column in Table S2 is undocumented and unused (i.e., all values are the same); consider removing.

      (c) Figure 4B, x-axis: add units.

      (d) Page 8/9: all panel references to Figure S1 are off by one

      We have fixed them in the updated manuscript.

    1. eLife Assessment

      In this important study, the authors conducted extensive atomistic and coarse-grained simulations as well as a lattice Monte Carlo analysis to probe the driving force and functional impact of supercomplex formation in the inner mitochondrial membrane. The study highlighted the major contribution from membrane mechanics to the supercomplex formation and revealed interesting differences in structural and dynamical features of the protein components upon complex formation. Upon revision, the analysis is considered solid, although the magnitude of estimated membrane deformation energies seem somewhat large. Overall, the study is thorough, creative and the impact on the field of bioenergetics is expected to be significant.

    2. Reviewer #1 (Public review):

      This paper by Poverlein et al reports the substantial membrane deformation around the oxidative phosphorylation super complex, proposing that this deformation is a key part of super complex formation. I found the paper interesting and well-written.

      * Analysis of the bilayer curvature is challenging on the fine lengthscales they have used and produces unexpectedly large energies (Table 1). Additionally, the authors use the mean curvature (Eq. S5) as input to the (uncited, but it seems clear that this is Helfrich) Helfrich Hamiltonian (Eq. S7). If an errant factor of one half has been included with curvature, this would quarter the curvature energy compared to the real energy, due to the squared curvature. The bending modulus used (ca. 5 kcal/mol) is small on the scale of typically observed biological bending moduli. This suggests the curvature energies are indeed much higher even than the high values reported. Some of this may be due to the spontaneous curvature of the lipids and perhaps the effect of the protein modifying the nearby lipids properties.

      * It is unclear how CDL is supporting SC formation if its effect stabilizing the membrane deformation is strong or if it is acting as an electrostatic glue. While this is a weakness for a definite quantification of the effect of CDL on SC formation, the study presents an interesting observation of CDL redistribution and could be an interesting topic for future work.

      In summary, the qualitative data presented are interesting (especially the combination of molecular modeling with simpler Monte Carlo modeling aiding broader interpretation of the results). The energies of the membrane deformations are quite large. This might reflect the roles of specific lipids stabilizing those deformations, or the inherent difficulty in characterizing nanometer-scale curvature.

    3. Reviewer #3 (Public review):

      Summary:

      In this contribution, the authors report atomistic, coarse-grained and lattice simulations to analyze the mechanism of supercomplex (SC) formation in mitochondria. The results highlight the importance of membrane deformation as one of the major driving forces for the SC formation, which is not entirely surprising given prior work on membrane protein assembly, but certainly of major mechanistic significance for the specific systems of interest.

      Strengths:

      The combination of complementary approaches, including an interesting (re)analysis of cryo-EM data, is particularly powerful, and might be applicable to the analysis of related systems. The calculations also revealed that SC formation has interesting impacts on the structural and dynamical (motional correlation) properties of the individual protein components, suggesting further functional relevance of SC formation. In the revision, the authors further clarified and quantified their analysis of membrane responses, leading to further insights into membrane contributions. They have also toned down the decomposition of membrane contributions into enthalpic and entropic contributions, which is difficult to do. Overall, the study is rather thorough, highly creative and the impact on the field is expected to be significant.

      Weaknesses:

      Upon revision, I believe the weakness identified in previous work has been largely alleviated.

    4. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #1 (Public review):

      This paper by Poverlein et al reports the substantial membrane deformation around the oxidative phosphorylation super complex, proposing that this deformation is a key part of super complex formation. I found the paper interesting and well-written.

      We thank the Reviewer for finding our work interesting. 

      Analysis of the bilayer curvature is challenging on the fine lengthscales they have used and produces unexpectedly large energies (Table 1). Additionally, the authors use the mean curvature (Eq. S5) as input to the (uncited, but it seems clear that this is Helfrich) Helfrich Hamiltonian (Eq. S7). If an errant factor of one half has been included with curvature, this would quarter the curvature energy compared to the real energy, due to the squared curvature.

      We thank the Reviewer for raising this important issue. We have now clarified in the SI and main manuscript that we employ the Helfrich model. In our initial implementation, we indeed used the mean curvature H, thereby missing a factor of 2. As the Reviewer correctly noted, this resulted in curvature deformation energies that were underestimated by a factor of ~4. We have now corrected for this effect in the revised analysis, and the updated Table 1. Importantly, however, this correction does not alter the general conclusions of our work that supercomplex formation relieves membrane strain and stabilizes the system. We have added an additional paragraph where we discuss the magnitude of the observed bending effects, and compared the previous estimates in literature:

      SI: 

      “The local mean curvature of the membrane midplane was computed using the Helfrich model (4,5) …”

      (4) W. Helfrich, Elastic properties of lipid bilayers theory and possible experiments. Zeitschrift für Naturforschung 28c, 693-703 (1973).

      (5) F. Campelo et al., Helfrich model of membrane bending: From Gibbs theory of liquid interfaces to membranes as thick anisotropic elastic layers. Advances in Colloid and Interface Science 208, 25-33 (2014).

      Main Text: 

      “which measures the energetic cost of deforming the membrane from a flat geometry (ΔG<sub>curv</sub>) based on the Helfrich model (45, 46). …

      Our analysis suggests that both contributions are substantially reduced upon formation of the SC, with the curvature penalty decreasing by 79.2 ± 5.2 kcal mol<sup>-1</sup> (for a membrane area of ca. 1000 nm<sup>2</sup>) and the thickness penalty by 2.8 ± 2.0 kcal mol<sup>-1</sup> (Table 1).”

      “We note that the magnitude of the estimated bending energies (~10² kcal mol<sup>-1</sup>) (Table 1), while seemingly high at first glance, falls within the range expected for large-scale membrane deformation processes induced by large multi-domain proteins. For example, the Piezo mechanosensitive channel performs roughly 150k<sub>B</sub>T (≈ 90 kcal mol⁻¹) of work to bend the bilayer into its dome-like shape (65). Comparable energies have also been estimated for the nucleation of small membrane pores (66), while vesicle formation typically requires bending energies on the order of 300 kcal mol<sup>-1</sup>, largely independent of vesicle size (67). When normalized by the affected membrane area (~1000 nm<sup>2</sup>), these values correspond to an energy density of approximately 0.1 kcal mol<sup>-1</sup> nm<sup>-2</sup>, which places our estimates within a biophysically reasonable regime. Notably, cryo-EM structures of several supercomplexes shows that such assemblies can impose significant curvature on the surrounding bilayer (36, 50, 68), supporting the notion that respiratory chain organization is closely coupled to local membrane deformation. Nevertheless, we expect that the absolute deformation energies may be overestimated, as the continuum Helfrich model neglects molecular-level effects such as lipid tilt and local rearrangements, which can partially relax curvature stresses and reduce the effective bending penalty near protein–membrane interfaces (69, 70).”

      The bending modulus used (ca. 5 kcal/mol) is small on the scale of typically observed biological bending moduli. This suggests the curvature energies are indeed much higher even than the high values reported. Some of this may be due to the spontaneous curvature of the lipids and perhaps the effect of the protein modifying the nearby lipids properties.

      The SI initially included an incorrect value for the bending modulus (20 kJ mol<sup>-1</sup> instead of 20k<sub>B</sub>T), which has now been corrected. The revised value is consistent with experimentally reported bending moduli from X-ray scattering measurements, although there remains substantial uncertainty in the precise values across different experimental and computational studies.

      “The bending deformation energy was computed from the mean curvature field H(x,y), assuming a constant bilayer bending modulus κ (taken as 20k<sub>b</sub>T  = 11.85 kcal mol<sup>-1</sup> (6)):”

      (6) S. Brown et al., Comparative analysis of bending moduli in one-component membranes via coarsegrained molecular dynamics simulations. Biophysical Journal 124, 1–13 (2025).

      It is unclear how CDL is supporting SC formation if its effect stabilizing the membrane deformation is strong or if it is acting as an electrostatic glue. While this is a weakenss for a definite quantification of the effect of CDL on SC formation, the study presents an interesting observation of CDL redistribution and could be an interesting topic for future work.

      We agree with the Reviewer that future studies would be important to investigate the relationship between CDL-induced stabilization of membrane and its electrostatic effects.  

      In summary, the qualitative data presented are interesting (especially the combination of molecular modeling with simpler Monte Carlo modeling aiding broader interpretation of the results). The energies of the membrane deformations are quite large. This might reflect the roles of specific lipids stabilizing those deformations, or the inherent difficulty in characterizing nanometer-scale curvature.

      We thank the Reviewer for appreciating our work and for the help in further improving our findings.

      Reviewer #3 (Public review):

      Summary:

      In this contribution, the authors report atomistic, coarse-grained and lattice simulations to analyze the mechanism of supercomplex (SC) formation in mitochondria. The results highlight the importance of membrane deformation as one of the major driving forces for the SC formation, which is not entirely surprising given prior work on membrane protein assembly, but certainly of major mechanistic significance for the specific systems of interest.

      We thank Reviewer 3 for appreciating the importance of our study. 

      Strengths:

      The combination of complementary approaches, including an interesting (re)analysis of cryo-EM data, is particularly powerful, and might be applicable to the analysis of related systems. The calculations also revealed that SC formation has interesting impacts on the structural and dynamical (motional correlation) properties of the individual protein components, suggesting further functional relevance of SC formation. In the revision, the authors further clarified and quantified their analysis of membrane responses, leading to further insights into membrane contributions. They have also toned down the decomposition of membrane contributions into enthalpic and entropic contributions, which is difficult to do. Overall, the study is rather thorough, highly creative and the impact on the field is expected to be significant.

      Weaknesses:

      Upon revision, I believe the weakness identified in previous work has been largely alleviated.

      We thank the Reviewer for their previous remarks, which allowed us to significantly improve our manuscript.

    1. eLife Assessment

      This study provides important findings on the understanding of circannual timing in mammals, for which iodothyronine deiodinases (DIOs) have been suggested to be of critical importance, yet functional genetic evidence has been missing. The authors convincingly implicate dio3, the major inactivator of the biologically active thyroid hormone T3, in circannual timing in Djungarian hamsters, using a combination of correlative and gene knock-out experiments; thus this provides key insights into the evolution and function of animal annual timing mechanisms.

    2. Reviewer #1 (Public review):

      Circannual timing is a phylogenetically widespread phenomenon in long-lived organisms and is central to the seasonal regulation of reproduction, hibernation, migration, fur color changes, body weight, and fat deposition in response to photoperiodic changes. Photoperiodic control of thyroid hormone T3 levels in the hypothalamus dictates this timing. However, the mechanisms that regulate these changes are not fully understood. The study by Stewart et al. reports that hypothalamic iodothyronine deiodinase 3 (Dio3), the major inactivator of the biologically active thyroid hormone T3, plays a critical role in circannual timing in the Djungarian hamster. Overall, the study yields important results for the field and is well-conducted, with the exception of the CRISPR/Cas9 manipulation.

      Comments on revisions:

      The authors have satisfactorily addressed all my comments. I no longer have concerns about the CRISPR/Cas9 experiments which have been conducted properly and are now reported appropriately.

    3. Reviewer #2 (Public review):

      Summary:

      Several animals and plants adjust their physiology and behavior to seasons. These changes are timed to precede the seasonal transitions, maximizing chances of survival and reproduction. The molecular mechanisms used for this process are still unclear. Studies in mammals and birds have shown that the expression of deiodinase type-1, 2, and 3 (Dio1, 2, 3) in the hypothalamus spikes right before the transition to winter phenotypes. Yet, whether this change is required or an unrelated product of the seasonal changes has not been shown, particularly because of the genetic intractability of the animal models used to study seasonality. Here, the authors show for the first time a direct link between Dio3 expression and the modulation of circannual rhythms.

      The work is concise and presents the data in a clear manner. The data is, for the most part, solid and supports the author's main claims. The use of CRISPR is a clear advancement in the field. This is, to my knowledge, the first study showing a clear (i.e., causal) role of Dio3 in the circannual rhythms in mammals. Having established a clear component of the circannual timing and a clean approach to address causality, this study could serve as a blueprint to decipher other components of the timing mechanism. It could also help to enlighten the elusive nature of the upstream regulators, in particular, on how the integration of day length takes place, maybe within the components in the Pars tuberalis, and the regulation of tanycytes.

      Comments on revisions:

      The authors have provided an improved version of the manuscript, particularly clarifying the methodology for their CRISPR manipulations. I am satisfied with their response and commend the authors for their work.

    4. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewer #1 (Public review):

      Circannual timing is a phylogenetically widespread phenomenon in long-lived organisms and is central to the seasonal regulation of reproduction, hibernation, migration, fur color changes, body weight, and fat deposition in response to photoperiodic changes. Photoperiodic control of thyroid hormone T3 levels in the hypothalamus dictates this timing. However, the mechanisms that regulate these changes are not fully understood. The study by Stewart et al. reports that hypothalamic iodothyronine deiodinase 3 (Dio3), the major inactivator of the biologically active thyroid hormone T3, plays a critical role in circannual timing in the Djungarian hamster. Overall, the study yields important results for the field and is well-conducted, with the exception of the CRISPR/Cas9 manipulation.

      We appreciate the positive and supportive comment from the Reviewer. We have clarified the oversight in the Crispr/Cas9 data representation below. Our correction should alleviate any concern raised.

      Figure 1 lays the foundation for examining circannual timing by establishing the timing of induction, maintenance, and recovery phases of the circannual timer upon exposure of hamsters to short photoperiod (SP) by monitoring morphological and physiological markers. Measures of pelage color, torpor, body mass, plasma glucose, etc, established that the initiation phase occurred by weeks 4-8 in SP, the maintenance by weeks 12-20, and the recovery after week 20, where all morphological and physiological changes started to reverse back to long photoperiod phenotypes.

      The statistical analyses look fine, and the results are unambiguous.

      We thank the Reviewer for recognizing our attempts to highlight the phenomenon of circannual interval timing.

      Their representation could, however, be improved. In Figures 1d and 1e, two different measures are plotted on each graph and differentiated by dots and upward or downward arrowheads. The plots are so small, though, that distinguishing between the direction of the arrows is difficult. Some color coding would make it more reader-friendly. The same comment applies to Figure S4. 

      We have increased the panel size for Figure 1d and 1e. We have also changed the colour of the graphs in Figure 1d and 1e to facilitate the differentiation of the two dependent variables. For the circos plots, we attempted different ways to represent the data. We have opted to keep the figures in their current stage. The overall aim is to provide a ‘gestalt’ view of the timing of changes in transcript expression and highlighted only a few key genes. The whole dataset is provided in the supplementary materials for Reviewer/Reader interrogation.

      The authors went on to profile the transcriptome of the mediobasal and dorsomedial hypothalamus, paraventricular nucleus, and pituitary gland (all known to be involved in seasonal timing) every 4 weeks over the different phases of the circannual interval timer. A number of transcripts displaying seasonal rhythms in expression levels in each of the investigated structures were identified, including transcripts whose expression peaks during each phase. This included two genes of particular interest due to their known modulation of expression in response to photoperiod, Dio3 and Sst, found among the transcripts upregulated during the induction and maintenance phases, respectively. The experiments are technically sound and properly analyzed, revealing interesting candidates. Again, my main issues lie with the representation in the figure. In particular, the authors should clarify what the heatmaps on the right of Figures 1f and 1g represent. I suspect they are simply heatmaps of averaged expression of all genes within a defined category, but a description is missing in the legend, as well as a scale for color coding near the figure.

      We have clarified the heatmap and density maps in the Figure legend. We apologise for the lack of information to describe the figure panels. (see lines 644-648)

      Figure 2 reveals that SP-programmed body mass loss is correlated to increased Dio3-dependent somatostatin (Sst) expression. First, to distinguish whether the body mass loss was controlled by rheostatic mechanisms and not just acute homeostatic changes in energy balance, experiments from hamsters fed ad lib or experiencing an acute food restriction in both LP and SP were tested. Unlike plasma insulin, food restriction had no additional effect on SP-driven epididymal fat mass loss (Figure S7). This clearly establishes a rheostatic control of body mass loss across weeks in SP conditions. Importantly, Sst expression in the mediobasal hypothalamus increased in both ad lib fed or restriction fed SP hamsters and this increase in expression could be reduced by a single subcutaneous injection of active T3, clearly suggesting that increase in Sst expression in SP is due to a decrease of active T3 likely via Dio3 increase in expression in the hypothalamus. The results are unambiguous

      We thank the Reviewer for the supportive and affirmative feedback.

      Figure 3 provides a functional test of Dio3's role in the circannual timer. Mediobasal hypothalamic injections of CRISPR-Cas9 lentiviral vectors expressing two guide RNAs targeting the hamster Dio3 led to a significant reduction in the interval between induction and recovery phases seen in SP as measured by body mass, and diminished the extent of pelage color change by weeks 15-20. In addition, hamsters that failed to respond to SP exposure by decreasing their body mass also had undetectable Dio3 expression in the mediobasal hypothalamus. Together, these data provide strong evidence that Dio3 functions in the circannual timer. I noted, however, a few problems in the way the CRISPR modification of Dio3 in the mediobasal hypothalamus was reported in Figure S8. One is in Figure S8b, where the PAM sites are reported to be 9bp and 11bp downstream of sgRNA1 and sgRNA2, respectively. Is this really the case? If so, I would have expected the experiment to fail to show any effect as PAM sites need to immediately follow the target genomic sequence recognized by the sgRNA for Cas9 to induce a DNA double-stranded break. It seems that each guide contains a 3' NGG sequence that is currently underlined as part of sgRNAs in both Fig S8b and in the method section. If this is not a mistake in reporting the experimental design, I believe that the design is less than optimal and the efficiencies of sgRNAs are rather low, if at all functional.

      We apologize for the oversight and indeed the reporting in Figure S8b was a mistake. The PAM site previously indicated was the ‘secondary PAM site’ (which as the Reviewer notes would likely have low efficiency). The PAM site is described within the gRNA in the figure. We use Adobe Illustrator to generate figures, and during the editing process, the layer for PAM text was accidentally moved ‘back’ to a lower level. The oversight was not rectified before submission. We apologise for this unreservedly. The PAM site text has been moved forward, to highlight the location of the primary site (ie immediately following gRNA) and labelled the gRNA and PAM site in the ‘Target region’. The secondary PAM site text was removed to eliminate any confusion.

      The authors report efficiencies around 60% (line 325), but how these were obtained is not specified. 

      The efficiency provided are based on bioinformatic analyses and not in vivo assays. To reduce any confusion, we have removed the text. The gRNA were clearly effective to induce mutations based on the sequencing analyses.

      Another unclear point is the degree to which the mediobasal hypothalamus was actually mutated. Only one mutated (truncated) sequence in Figure S8c is reported, but I would have expected a range of mutations in different cells of the tissue of interest.

      The tissue punch would include multiple different cells (e.g., neuronal, glial, etc). We agree with the Reviewer that genomic samples from different cells would be included in the sequencing analyses. Given the large mutation in the target region, the gRNA was effective. We have only shown one representative sequence. If the Reviewer would like to see all mutations, we can easily show the other samples.

      Although the authors clearly find a phenotypic effect with their CRISPR manipulation, I suspect that they may have uncovered greater effects with better sgRNA design. These points need some clarification. I would also argue that repeating this experiment with properly designed sgRNAs would provide much stronger support for causally linking Dio3 in circannual timing.

      The gRNA was designed using the Gold-standard approach – ChopChop [citation Labon et al., 2019]. If the Reviewer’s concern re design is due to the comment above re PAM site; this issue was clarified and there are no concerns for the gRNA design. The major challenge with the Dio3 gene (single exon) with a very short sequence length (approx.. 412bp). There is limited scope within this sequence length to generate gRNA.

      A proposed schematic model for mechanisms of circannual interval timing is presented in Figure S9. I think this represents a nice summary of the findings put in a broader context and should be presented as a main figure in the manuscript itself rather than being relayed in supplementary materials.

      We agree with the Reviewer position and moved the figure to the main manuscript. The figure is now Figure 4.

      Reviewer #2 (Public review):

      Several animals and plants adjust their physiology and behavior to seasons. These changes are timed to precede the seasonal transitions, maximizing chances of survival and reproduction. The molecular mechanisms used for this process are still unclear. Studies in mammals and birds have shown that the expression of deiodinase type-1, 2, and 3 (Dio1, 2, 3) in the hypothalamus spikes right before the transition to winter phenotypes. Yet, whether this change is required or an unrelated product of the seasonal changes has not been shown, particularly because of the genetic intractability of the animal models used to study seasonality. Here, the authors show for the first time a direct link between Dio3 expression and the modulation of circannual rhythms.

      We appreciate the clear synthesis and support for the manuscript.

      Strengths:

      The work is concise and presents the data in a clear manner. The data is, for the most part, solid and supports the author's main claims. The use of CRISPR is a clear advancement in the field. This is, to my knowledge, the first study showing a clear (i.e., causal) role of Dio3 in the circannual rhythms in mammals. Having established a clear component of the circannual timing and a clean approach to address causality, this study could serve as a blueprint to decipher other components of the timing mechanism. It could also help to enlighten the elusive nature of the upstream regulators, in particular, on how the integration of day length takes place, maybe within the components in the Pars tuberalis, and the regulation of tanycytes.

      We thank the Reviewer for this positive summary.

      Weaknesses:

      Due to the nature of the CRISPR manipulation, the low N number is a clear weakness. This is compensated by the fact that the phenotypes shown here are strong enough. Also, this is the only causal evidence of Dio3's role; thus, additional evidence would have significantly strengthened the author's claims. The use of the non-responsive population of hamsters also helps, but it falls within the realm of correlations.

      We would also like to remind the Reviewer that one Crispr-Cas9 Dio3<sup>cc</sup> treated hamster did not show any mutation in the genome. This hamster was observed to have a change in body mass and pelage colour like controls. This animal provides another positive control.

      We also conducted a statistical power analysis to examine whether n=3 is sufficient for the Dio3<sup>cc</sup> treatment group. Using the appropriate expected difference in means and standard deviations for an alpha of 0.05; we regularly observed beta >0.8 across the dependent variables. 

      Additionally, the consequences of the mutations generated by CRISPR are not detailed; it is not clear if the mutations affect the expression of Dio3 or generate a truncation or deletion, resulting in a shorter protein.

      We agree with the Reviewer that transcript and protein assays would strengthen the genome mutation data. Due to the small brain region under investigation, we are limited in the amount of biological material to extract. Dio3 is an intronless gene and very short – approximately 412 base pairs in length. We opted to maximize resources into sequencing the gene as the confirmation of genetic mutation is paramount. Given the large size of the mutation in the treated hamsters, there would be no amplification of transcript or protein translated.

      Reviewer #3 (Public review):

      The authors investigated SP-induced physiological and molecular changes in Djungarian hamsters and the endogenous recovery from it after circa half a year. The study aimed to elucidate the intrinsic mechanism and included nice experiments to distinguish between rheostatic effects on energy state and homeostatic cues driven by an interval timer. It also aimed to elucidate the role of Dio3 by introducing a targeted mutation in the MBH by ICV. The experiments and analyses are sound, and the amount of work is impressive. The impact of this study on the field of seasonal chronobiology is probably high.

      We thank the Reviewer for their positive comments and support for our work.

      Even though the general conclusions are well-founded, I have fundamental criticism concerning 3 points, which I recommend revising:

      (1) The authors talk about a circannual interval timer, but this is no circannual timer. This is a circasemiannual timer. It is important that the authors use precise wording throughout the manuscript.

      We agree with the Reviewer that the change in physiology and behaviour does not approximate a full year (e.g. annual) and only a half of the year. We opted to use circannual timer as this term is established in the field (see doi: 10.1177/0748730404266626; doi: 10.1098/rstb.2007.2143). We cannot identify any publication that has used the term ‘semiannual timer’. We do not feel this manuscript is the appropriate time to introduce a new term to the field; we will endeavour to push the field to consider the use of ‘semiannual timer’. A Review or Opinion paper is best place for this discussion. We hope the Reviewer will understand our position.

      (2) The authors put their results in the context of clocks. For example, line 180/181 seasonal clock. But they have described and investigated an interval timer. A clock must be able to complete a full cycle endogenously (and ideally repeatedly) and not only half of it. In contrast, a timer steers a duration. Thus, it is well possible that a circannual clock mechanism and this circa-semiannual timer of photoperiodic species are 2 completely different mechanisms. The argumentation should be changed accordingly.

      We agree with the Reviewers definitions of circannual ‘clock’ and ‘timer’. We were careful to distinguish between the two concepts early in the manuscript (lines 41-46). We have added italics to emphasis the different terms. The use of seasonal clock on line 180/191 was imprecise and we appreciate the Reviewer highlighting our oversight and the text was revised. We have also revised the Abstract accordingly.

      (3) The authors chose as animal model the Djungarian hamster, which is a predominantly photoperiodic species and not a circannual species. A photoperiodic species has no circannual clock. That is another reason why it is difficult to draw conclusions from the experiment for circannual clocks. However, the Djungarian hamster is kind of "indifferent" concerning its seasonal timing, since a small fraction of them are indeed able to cycle (Anchordoquy HC, Lynch GR (2000), Evidence of an annual rhythm in a small proportion of Siberian hamsters exposed to chronic short days. J Biol Rhythms 15:122-125.). Nevertheless, the proportion is too small to suggest that the findings in the current study might reflect part of the circannual timing. Therefore, the authors should make a clear distinction between timers and clocks, as well as between circa-annual and circa-semiannual durations/periods.

      This comment is not clear to us. The Reviewer states the hamsters are not a circannual species, but then highlight one study that shows circannual rhythmicity. We agree that circannual rhythmicity in Djungarian hamsters is dependent on the physiological process under investigation (e.g. body mass versus reproduction) and that photoperiodic response system either dampen or mask robust cycles. We have corrected the text oversight highlighted above and the manuscript is focused on interval timers. We have kept the term circannual over semicircannual due to the prior use in the scientific literature.

      Reviewing Editor Comments:

      The detailed suggestions of the reviewers are outlined below (or above in case of reviewer 1). In light of the criticism, we ask the authors to especially pay attention to the comments on the Cas9/Crisp experiment, raised by Reviewers 1 and 2. As currently described, there are serious questions on the design of the sgRNAs, and also missing critical methodological details. If the latter are diligently taken care of, they may resolve the questions on the sgRNA design. Please also reconsider the wording along the suggestions of Reviewer 3.

      We appreciate the Editors time and support for the manuscript. We have clarified and corrected our oversight for the PAM site. This correction confirms the strength of the Crispr-cas9 gRNA used in the study. The correction should remove all concerns. We have also considered using semicircannual in the text. As there is existing scientific literature using circannual interval timer, and there is no publication to our knowledge for using ‘semicircannual; we have opted to keep with the current approach and use circannual. We feel a subsequent Opinion paper is more suitable to introduce a new term.

      Reviewer #2 (Recommendations for the authors):

      First, I want to commend the authors for their work. It is a clear advancement for our field. Below are a couple of comments and suggestions I have:

      we thank the Review for the positive comment and support. We have endeavoured to incorporate their suggested improvements to the manuscript.

      (1) Looking at the results of Figure 1A and Figure S8, the control in S8 showed a lower pelage color score as compared to the hamsters in 1A. Is this a byproduct of the ICV injection?

      The difference between Figure 1 and 3 is likely due to the smaller sample sizes. The controls in Figure 1 had a higher proportion of hamsters show complete white fur (score =3) at 1618 weeks compared to controls in Figure 3. It is possible, although unlikely that the ICV injection would reduce the development of winter phenotype. There was no substance in the ICV injection that would impact the prolactin signalling pathway. Our perspective is that the difference between the two figures is due to the different sampling population. Overall, the timing of the change in pelage colour is the same between the figures and suggest that the mechanisms of interval timer were unaffected.

      (2) Is there a particular reason why the pelage color for the CRISPR mutants is relegated to the supplemental information? In my opinion, this is also important, even though the results might be difficult to explain. Additionally, did the authors check for food intake and adipose mass in these animals?

      We agree with the Reviewer the pelage change is very interesting. We decided to have Figure 3 focus on body mass. The rationale was due to the robust nature of the data collection from Crispr-cas9 study (Fig.3b), in addition to the non-responsive hamsters (Fig.3e). We disagree that the data patterns are hard to explain, as pelage changes was similar to the photoperiodic induced change in body mass. No differences were observed for food intake or adipose tissue. We have added this information in the text (see lines 162-163).

      (3) I might have missed it, but did the authors check for the expression of Dio3 on the CRISPR mutants? Does the deletion cause reduced expression or any other mRNA effect, such as those resulting in the truncation of a protein?

      Due to the limited biological material extracted from the anatomical punches, we decided to focus on genomic mutations. Dio3 has a very short sequence length and the size of the mutations identified indicate that no RNA could be transcribed.

      (4) Could the authors clarify which reference genome or partial CDS (i.e., accession numbers) they used to align the gRNA? Did they use the SSSS strain or the Psun_Stras_1 isolate?

      The gRNAs were designed using the online tool CHOPCHOP, using the Mus musculus

      Dio3 gene. The generated gRNAs were subsequently aligned via blast with the Phodopus sungorus Dio3 partial cds (GenBank: MF662622.1), to ensure alignment with the species. We are confident that the gRNA designed align 100% in hamsters. Furthermore, we conducted BLAST to ensure there were no off-targets. The only gene identified in the BLAST was the rodent (i.e. hamster, mouse) Dio3 sequence.

      (5) Figure 3b. I do agree with the authors in pointing out that the decrease in body mass is occurring earlier in Dio3wt hamsters; however, the shape of the body mass dynamic is also different. Do the authors have any comments on the possible role of Dio3 in the process of exist of overwintering?

      This is a very interesting question. We do not have the data to evaluate the role of Dio3 for overwintering. We argue that disruption in Dio3 reduced the circannual interval period. For this interpretation, yes, Dio3 is necessary for overwintering. However, we would need to show the sufficiency of Dio3 to induce the winter phenotype in hamsters housed in long photoperiod. At this time, we do not have the technical ability to conduct this experiment.

      (6) In Figure 3d, the Dio3wt group does not show any dispersion. Is this correct? If that's true, and no dispersion is observed, no normality can be assumed, and a t-test can't be performed (Line 692).The Mann-Whitney test might be better suited.

      We conducted a Welch’s t-test to compare the difference in body mass period. We used the Welch’s test as the variance were not equal; Mann-Whitney test is best for skewed distributions. To clarify the test used, we have added ‘Welch’s test’ to the Figure legend.

      (9) Figure 1 h. It might be convenient to add the words "Induction", "maintenance", and "recovery" over each respective line on the polar graph for easier reading.

      We have added the text as suggested by the Reviewer.

      Reviewer #3 (Recommendations for the authors):

      (1) Figure 1: Please enlarge all partial graphics at least to the size of Figure 2. In the print version, labels are barely readable

      we have increased the panels in Figure 1 and 3 by 20% to accommodate the Reviewers suggestion.

      (2) Legend Figure 2: Add that the food restriction was 16h.

      We have added 16h to the text.

      (3) Figure 3b: enlarge font size. In the legend: Dio3cc hamsters delayed.... The delay might have been a week or so, but not more (and even that is unclear since the rise in body mass in that week seems to be rather a disturbance of the curve). Thus 'delay' might not be the most appropriate wording. Instead, the initial decline is slower, but both started at nearly the same week (=> no delay). Minimum body mass is reached at the identical week as in wt (=> no delay). Also, the increase started at the same week but was much faster in Dio3cc than in wt. Figure 3c: How can there be a period when there is no repeated cycle (rhythm)? This is rather a duration. Moreover, according to the displayed data, I am wondering which start point and which endpoint is used. The first and last values are the highest of the graph, but have they been the maximum? Especially for Dio3wt, it can be assumed that animals haven't reached the maximum at the end of the graph.

      We have increased the font size in Figure 3b. We have changed ‘delayed’ to ‘slower’ in the text. Period analyses, such as the Lomb-Scargle measure the duration of a cycle (and multiple cycles). The start point and end point used in the analyses were the initial data collection date (week 0) and the final data collection date (week 32). The Lomb-Scargle analyses determines the duration of the period that occurs within these phases of the cycle. We believe the period analyses conducted by the Lomb-Scargle is the most suitable for the scientific question.

      (4) Figure S9: This is a very nice graph and summarises your main results. It should appear in the main manuscript and not in the supplements.

      We appreciate the positive comment and suggestion. We agree with the Reviewer and have move the graph to the main figure. The revised manuscript indicates the graph as Figure 4.

    1. eLife Assessment

      The results in this study are useful because they begin to establish a causal link between physical activity and the cellular mechanisms of regeneration. The evidence presented is largely solid, supporting the conclusion that exercise-induced changes in the extracellular matrix disrupt regeneration; however, some claims are incomplete, requiring additional controls and a clearer distinction between the effects of mechanical loading and mechanical injury to the blastema. The work will be of interest to researchers in regenerative medicine.

    2. Reviewer #1 (Public review):

      Summary:

      The goal of the manuscript was to determine if strenuous exercise negatively impacted regeneration. Indeed, the major conclusion of the manuscript is that elevated exercise during the early stages of regeneration compromises the regenerative process. The authors further conclude that regeneration is disrupted due to defects in blastema formation, which is caused by impaired HA deposition and reduced active (nuclear) Yap.

      Strengths:

      (1) The paradigm of elevated exercise disrupting ECM and regeneration is significant, and provides an experimental model to better understand connections between the ECM and cell/tissue activities.

      (2) The conclusion that exercise intensity correlates with defects in regeneration is supported.

      (3) The demonstration for the requirement for HA is well supported via transcriptomics and multiple independent strategies to manipulate HA levels.

      (4) The demonstration that nuclear Yap depends on the amount of HA is well-supported.

      Weaknesses:

      (1) The authors conclude throughout the manuscript that "blastema formation" is disrupted, but they do not provide any insights into how blastema formation is disrupted (reduced de-differentiation? reduced cell migration? both?). While they show that there are fewer dividing cells, the timing of exercise is prior to outgrowth. So, the effect of dividing cells is likely secondary, which is not considered (or not clearly explained).

      (2) The authors conclude that patterning is affected, but their analyses of patterns (bifurcations) are very limited. It is also not clear if patterning is believed to be affected by a common exercise-induced mechanism or a different exercise-induced mechanism (or by a secondary mechanism).

      (3) The significance of HA in regeneration has been shown before in zebrafish fins, as well as in a handful of other models of regeneration. Although largely cited, explaining some of this work in more detail would give the reader a better picture of how HA is believed to promote regeneration. It may also highlight some emerging questions about the role of HA in regeneration that would permit a richer story and specific future directions.

      (4) In general, parts of the text lack specificity/clarity, and in other cases, there seems to be contradictory information.

      (5) Overall, many of the conclusions were well supported by the data, and this study is likely to provide a foundation for future research on the role of the ECM in tissue repair and regeneration. The main limitations were in connecting the experimental details with the specific processes required for regeneration, and in clearly explaining the findings.

    3. Reviewer #2 (Public review):

      In this study, Lewis et al. established a forced swimming paradigm to investigate how mechanical loading influences caudal fin regeneration. They found that forced exercise impaired the normally robust regeneration process, particularly in the peripheral/lateral ray regions. Transcriptomic profiling of exercised fish further revealed that extracellular matrix (ECM) gene programs were affected, and the authors provided evidence that disruption of hyaluronic acid (HA) synthesis may underlie this impairment. While the question of how mechanical loading impacts tissue regeneration is rather intriguing and the study nicely demonstrates a role for HA in fin regeneration, I have some concerns regarding the specificity of forced exercise as a model for mechanical loading, and thus the causal link between mechanical loading and HA synthesis disruption.

      Major concerns:

      (1) Forced exercise as a model for mechanical loading.

      Is it possible that the forced exercise paradigm imposes greater shear stress on the peripheral/lateral ray regions, thereby disrupting the fragile wound epidermis at this early stage and consequently affecting the regeneration program and phenotypes? The wound epidermis appears visibly torn or disrupted (Figure 1A, right panel, 2 dpa image). Given the critical role of the wound epidermis in blastema establishment and fin regeneration (PMID: 11002347; PMID: 34038742; PMID: 26305099), could this be a simpler explanation to consider, instead of the proposed role of mechanical loading and cryptic mechanical sensors?

      (2) The general effect of HA on fin regeneration.

      While the authors convincingly show that exogenous HA can ameliorate fin regeneration defects caused by forced exercise (Figure S7), it would be important to include a control examining the effect of HA supplementation in non-exercised animals. Does HA act as a general enhancer of fin regeneration even in the absence of forced exercise? Additionally, please consider merging Figure S7 (HA supplement) with Figure 5 (HA depletion) to improve clarity for readers.

      (3) Proper annotation of the investigated ray regions.

      As the authors clearly demonstrate that peripheral and central rays respond differently to forced exercise, it is important to explicitly define the regions corresponding to these rays. Do the peripheral rays refer to the dorsal-most and ventral-most rays among the 18-20 rays across the amputation plane? Which rays are considered central? Please clarify.

    4. Reviewer #3 (Public review):

      Summary:

      In the submitted article by Lewis et al., the authors investigate how mechanical stimulation influences organ regeneration using the well-characterized zebrafish caudal fin regeneration model. Using a swim flume and a 30min/day exercise regime, the authors found that exercise during the establishment of the blastema reduced regeneration and led to skeletal deformations. Transcriptional profiling of regenerated caudal fin tissue revealed reduced expression of extracellular matrix-associated genes, which were found to be expressed by blastemal fibroblast and osteoblast lineage cells.

      Downregulated genes included hyaluronic acid synthases 1 and 2; accordingly, hyaluronic acid levels were found to be reduced in regenerating fins exposed to exercise. The link between regeneration and HA was further confirmed through HA depletion and HA overexpression experiments, which showed a reduction in blastema size and partial rescue of blastema formation, respectively. The authors further show that HA levels, as well as the extent of mechanical loading correlate with nuclear localization of the mechanotransducer Yap and conclude that biomechanical forces play a significant role during regeneration through regulation of HA levels in the ECM and therewith regulation of YAP downstream signaling.

      This work expands our understanding of the biochemical signaling connecting biomechanical forces with tissue regeneration. The conclusions are well supported by the data.

      Strengths:

      (1) Analysis is performed in multiple replicate experimental groups and shows the robust response to the experimental conditions.

      (2) The link of HA levels to blastema formation was confirmed through HA overexpression and two different HA depletion experiments.

      (3) The use of a previously established fin regeneration single cell dataset does elegantly show the correlation of changes in gene expression levels and specific tissue types, which was further confirmed by in vivo imaging of cell type-specific transgenic lines.

      Weaknesses:

      Tissue sections stained with hematoxylin and eosin would be helpful to show the changes in tissue architecture more clearly.

    5. Author response:

      Reviewer #1

      We agree that further clarification how elevated exercise disrupts blastema formation would strengthen the manuscript. Our data suggests a major contribution of proliferation. Exercise reduced the fraction of proliferative cells at 3 dpa, consistent with disrupted HA production and downstream Yap signaling. This interpretation aligns with prior studies showing that proliferation contributes to blastema establishment and is not restricted to the outgrowth phase of fin regeneration (Poleo et al, 2001; Poss et al, 2002; Wang et al, 2019; Pfefferli et al, 2014; Hou et al, 2020). We will explore additional experiments to reinforce these insights into the cellular mechanisms underlying exercise-disrupted blastema formation.

      We acknowledge that our analysis of ray branching abnormalities is limited in the current manuscript. We focus our study on introducing the zebrafish swimming and regeneration model and then characterizing ECM and signaling changes accounting for disrupted blastema establishment. For completeness, we included the observation of skeletal patterning defects (branching delays and bone fusions) but without detailed analysis. We note that decreased expression of shha and Shh-pathway components following early exercise corresponds with the branching defects. However, we recognize exercise could have additional effects during the outgrowth  phase when branching morphogenesis actively occurs. Therefore, we will expand our discussion to outline future research directions related to exercise impacts on regenerative skeletal patterning.

      We will expand the Introduction and/or Discussion sections to provide more context on known HA roles across regeneration contexts, including in zebrafish fins. Finally, we will improve the text’s clarity and specificity throughout the manuscript, including to resolve or explain any apparent contradictions.

      Reviewer #2

      We appreciate the Reviewer's concern regarding the specificity of forced exercise as a model for mechanical loading. Forced exercise has been widely used in vivo to induce mechanical loading without the requirement for specialized implants or animal restraint, including in mouse (Wallace et al, 2015; Bomer et al, 2016), rat (Honda et al, 2003; Boerckel et al, 2011; Boerckel et al, 2012), and, most relevant to our study, zebrafish models (Fiaz et al, 2012; Fiaz et al, 2014; Suniaga et al, 2018). However, we will expand our discussion of this approach and ensure precise language distinguishing exercise from mechanical loading.

      We acknowledge the possibility that early shear stress disrupts the wound epidermis, which we will elaborate on in a revised Discussion. However, exercise-induced disruptions to the fin epidermis of early regenerates (1–2 dpa; Figure 2) typically resolve within one day, whereas fibroblast lineage cells still fail to establish a robust blastema. Therefore, sustained effects of mechanical loading and/or mechanosensation are likely major contributors to the observed regeneration phenotypes.

      We will explore whether HA acts as a general enhancer of fin regeneration by comparing blastemal HA supplementation vs. controls in non-exercised regenerating animals, if technically feasible. We will merge Figure S7 (HA supplementation) with Figure 5 (HA depletion) for clarity, as suggested.

      We will include a schematic and clear definitions for 'peripheral' and 'central' rays in a revised manuscript.

      Reviewer #3

      We included Hoechst and eosin fluorescent staining in the manuscript to show changes in tissue architecture following swimming exercise (Supplemental Figure 4). We will extend this histological analysis to include hematoxylin and eosin staining to provide additional tissue visualization.

      References

      Poleo G, Brown CW, Laforest L, Akimenko MA. Cell proliferation and movement during early fin regeneration in zebrafish. Dev Dyn. 2001 Aug;221(4):380-90.

      Poss KD, Nechiporuk A, Hillam AM, Johnson SL, Keating MT. Mps1 defines a proximal blastemal proliferative compartment essential for zebrafish fin regeneration. Development. 2002 Nov;129(22):5141-9.

      Wang YT, Tseng TL, Kuo YC, Yu JK, Su YH, Poss KD, Chen CH. Genetic Reprogramming of Positional Memory in a Regenerating Appendage. Curr Biol. 2019 Dec 16;29(24):4193-4207.e4.

      Pfefferli C, Müller F, Jaźwińska A, Wicky C. Specific NuRD components are required for fin regeneration in zebrafish. BMC Biol. 2014 Apr 29;12:30.

      Hou Y, Lee HJ, Chen Y, Ge J, Osman FOI, McAdow AR, Mokalled MH, Johnson SL, Zhao G, Wang T. Cellular diversity of the regenerating caudal fin. Sci Adv. 2020 Aug 12;6(33):eaba2084.

      Wallace IJ, Judex S, Demes B. Effects of load-bearing exercise on skeletal structure and mechanics differ between outbred populations of mice. Bone. 2015 Mar;72:1-8.

      Bomer N, Cornelis FM, Ramos YF, den Hollander W, Storms L, van der Breggen R, Lakenberg N, Slagboom PE, Meulenbelt I, Lories RJ. The effect of forced exercise on knee joints in Dio2(-/-) mice: type II iodothyronine deiodinase-deficient mice are less prone to develop OA-like cartilage damage upon excessive mechanical stress. Ann Rheum Dis. 2016 Mar;75(3):571-7.

      Honda A, Sogo N, Nagasawa S, Shimizu T, Umemura Y. High-impact exercise strengthens bone in osteopenic ovariectomized rats with the same outcome as Sham rats. J Appl Physiol (1985). 2003 Sep;95(3):1032-7.

      Boerckel JD, Kolambkar YM, Stevens HY, Lin AS, Dupont KM, Guldberg RE. Effects of in vivo mechanical loading on large bone defect regeneration. J Orthop Res. 2012 Jul;30(7):1067-75.

      Boerckel JD, Uhrig BA, Willett NJ, Huebsch N, Guldberg RE. Mechanical regulation of vascular growth and tissue regeneration in vivo. Proc Natl Acad Sci U S A. 2011 Sep 13;108(37):E674-80.

      Fiaz AW, Léon-Kloosterziel KM, Gort G, Schulte-Merker S, van Leeuwen JL, Kranenbarg S. Swim-training changes the spatio-temporal dynamics of skeletogenesis in zebrafish larvae (Danio rerio). PLoS One. 2012;7(4):e34072.

      Fiaz AW, Léon‐Kloosterziel KM, van Leeuwen JL, Kranenbarg S. Exploring the molecular link between swim‐training and caudal fin development in zebrafish (Danio rerio) larvae. Journal of Applied Ichthyology. 2014 Aug;30(4):753-61.

      Suniaga S, Rolvien T, Vom Scheidt A, Fiedler IAK, Bale HA, Huysseune A, Witten PE, Amling M, Busse B. Increased mechanical loading through controlled swimming exercise induces bone formation and mineralization in adult zebrafish. Sci Rep. 2018 Feb 26;8(1):3646.

    1. eLife Assessment

      This important work characterizes layers of neuropeptidergic modulations that collectively regulate the intake of sugar in a hunger state-dependent manner. Combinations of genetic, physiological, and behavioral approaches present convincing evidence that neurons that release Hugin and Allatostatin A are in an active state in sated flies, leading to suppression of sugar feeding behavior by reducing the sensitivity of sugar-sensitive gustatory neurons that express Gr5a. They also demonstrate that neurons that release Neuromedin U, a vertebrate homolog of Hugin, have common physiological properties as the fly Hugin neurons, revealing a similar function of evolutionarily conserved peptides across animal phyla.

    2. Reviewer #1 (Public review):

      In this manuscript, Qin and colleagues aim to delineate a neural mechanism by which the internal satiety levels modulate the intake of sugar solution. They identified a three-step neuropeptidergic system that downregulates the sensitivity of sweet-sensing gustatory sensory neurons in sated flies. First, neurons that release a neuropeptide Hugin (which is an insect homolog of vertebrate Neuromedin U (NMU)) are in an active state when the concentration of glucose is high. This activation does not require synaptic inputs, suggesting that Hugin-releasing neurons sense hemolymph glucose levels directly. Next, the Hugin neuropeptides activate Allatostatin A (AstA)-releasing neurons via one of Hugin's receptors, PK2-R1. Finally, the released AstA neuropeptide suppresses sugar response in sugar-sensing Gr5a-expressing gustatory sensory neurons through AstA-R1 receptor. Suppression of sugar response in Gr5a-expressing neurons reduces the fly's sugar intake motivation (measured by proboscis extension reflex). They also found that NMU-expressing neurons in the ventromedial hypothalamus (VMH) of mice (which project to the rostral nucleus of the solitary tract (rNST)) are also activated by high concentrations of glucose, independent of synaptic transmission, and that injection of NMU reduces the glucose-induced activity in the downstream of NMU-expressing neurons in rNST. These data suggest that the function of Hugin neuropeptide in the fly is analogous to the function of NMU in the mouse.

      Generally, their central conclusions are well-supported by multiple independent approaches. The parallel study in mice adds a unique comparative perspective that makes the paper interesting to a wide range of readers. It is easier said than done: the rigor of this study, which effectively combined pharmacological and genetic approaches to provide multiple lines of behavioral and physiological evidence, deserves recognition and praise.

      A perceived weakness is that the behavioral effects of the manipulations of Hugin and AstA systems are modest compared to a dramatic shift of sugar solution-induced PER (the behavioral proxy of sugar sensitivity) induced by hunger, as presented in Figure 1B and E. It is true that the mutation of tyrosine hydroxylase (TH), which synthesizes dopamine, does not completely abolish the hunger-induced PER change, but the remaining effect is small. Moreover, the behavioral effect of the silencing of the Hugin/AstA system (Figure Supplement 13B, C) is difficult to interpret, leaving a possibility that this system may not be necessary for shifting PER in starved flies. These suggest that the Hugin-AstA system accounts for only a minor part of the behavioral adaptation induced by the decreased sugar levels. Their aim to "dissect out a complete neural pathway that directly senses internal energy state and modulates food-related behavioral output in the fly brain" is likely only partially achieved. While this outcome is not a shortcoming of a study per se, the depth of discussion on the mechanism of interactions between the Hugin/AstA system and the other previously characterized molecular circuit mechanisms mediating hunger-induced behavioral modulation is insufficient for readers to appreciate the novelty of this study and future challenges in the field. In this context, authors are encouraged to confront a limitation of the study due to the lack of subtype-level circuit characterization, despite their intriguing finding that only a subtype of Hugin- and AstA-releasing neurons are responsive to the elevated level of bath-applied glucose.

    3. Reviewer #2 (Public review):

      Summary:

      The question of how caloric and taste information interact and consolidate remains both active and highly relevant to human health and cognition. The authors of this work sought to understand how nutrient sensing of glucose modulates sweet sensation. They found that glucose intake activates hugin signaling to AstA neurons to suppress feeding, which contributes to our mechanistic understanding of nutrient sensation. They did this by leveraging the genetic tools of Drosophila to carry out nuanced experimental manipulations and confirmed the conservation of their main mechanism in a mammalian model. This work builds on previous studies examining sugar taste and caloric sensing, enhancing the resolution of our understanding.

      Strengths:

      Fully discovering neural circuits that connect body state with perception remains central to understanding homeostasis and behavior. This study expands our understanding of sugar sensing, providing mechanistic evidence for a hugin/AstA circuit that is responsive to sugar intake and suppresses feeding. In addition to effectively leveraging the genetic tools of Drosophila, this study further extends their findings into a mammalian model with the discovery that NMU neural signaling is also responsive to sugar intake.

      Weaknesses:

      The effect of Glut1 knockdown on PER in hugin neurons is modest, and does not show a clear difference between fed and starved flies as might be expected if this mechanism acts as a sensor of internal energy state. This could suggest that glucose intake through Glut1 may only be part of the mechanism.

    4. Reviewer #3 (Public review):

      Summary:

      This study identifies a novel energy-sensing circuit in Drosophila and mice that directly regulates sweet taste perception. In flies, hugin+ neurons function as a glucose sensor, activated through Glut1 transport and ATP-sensitive potassium channels. Once activated, hugin neurons release hugin peptide, which stimulates downstream Allatostatin A (AstA)+ neurons via PK2-R1 receptors. AstA+ neurons then inhibit sweet-sensing Gr5a+ gustatory neurons through AstA peptide and its receptor AstA-R1, reducing sweet sensitivity after feeding. Disrupting this pathway enhances sweet taste and increases food intake, while activating the pathway suppresses feeding.

      The mammalian homolog of neuromedin U (NMU) was shown to play an analogous role in mice. NMU knockout mice displayed heightened sweet preference, while NMU administration suppressed it. In addition, VMH NMU+ neurons directly sense glucose and project to rNST Calb2+ neurons, dampening sweet taste responses. The authors suggested a conserved hugin/NMU-AstA pathway that couples energy state to taste perception.

      Strengths:

      Interesting findings that extend from insects to mammals. Very comprehensive.

      Weaknesses:

      Coupling energy status to taste sensitivity is not a new story. Many pathways appear to be involved, and therefore, it raises a question as to how this hugin-AstA pathway is unique.

    5. Author response:

      Reviewer #1 (Public review):

      In this manuscript, Qin and colleagues aim to delineate a neural mechanism by which the internal satiety levels modulate the intake of sugar solution. They identified a three-step neuropeptidergic system that downregulates the sensitivity of sweet-sensing gustatory sensory neurons in sated flies. First, neurons that release a neuropeptide Hugin (which is an insect homolog of vertebrate Neuromedin U (NMU)) are in an active state when the concentration of glucose is high. This activation does not require synaptic inputs, suggesting that Hugin-releasing neurons sense hemolymph glucose levels directly. Next, the Hugin neuropeptides activate Allatostatin A (AstA)-releasing neurons via one of Hugin's receptors, PK2-R1. Finally, the released AstA neuropeptide suppresses sugar response in sugar-sensing Gr5a-expressing gustatory sensory neurons through AstA-R1 receptor. Suppression of sugar response in Gr5a-expressing neurons reduces the fly's sugar intake motivation (measured by proboscis extension reflex). They also found that NMU-expressing neurons in the ventromedial hypothalamus (VMH) of mice (which project to the rostral nucleus of the solitary tract (rNST)) are also activated by high concentrations of glucose, independent of synaptic transmission, and that injection of NMU reduces the glucose-induced activity in the downstream of NMU-expressing neurons in rNST. These data suggest that the function of Hugin neuropeptide in the fly is analogous to the function of NMU in the mouse.

      Generally, their central conclusions are well-supported by multiple independent approaches. The parallel study in mice adds a unique comparative perspective that makes the paper interesting to a wide range of readers. It is easier said than done: the rigor of this study, which effectively combined pharmacological and genetic approaches to provide multiple lines of behavioral and physiological evidence, deserves recognition and praise.

      A perceived weakness is that the behavioral effects of the manipulations of Hugin and AstA systems are modest compared to a dramatic shift of sugar solution-induced PER (the behavioral proxy of sugar sensitivity) induced by hunger, as presented in Figure 1B and E. It is true that the mutation of tyrosine hydroxylase (TH), which synthesizes dopamine, does not completely abolish the hunger-induced PER change, but the remaining effect is small. Moreover, the behavioral effect of the silencing of the Hugin/AstA system (Figure Supplement 13B, C) is difficult to interpret, leaving a possibility that this system may not be necessary for shifting PER in starved flies. These suggest that the Hugin-AstA system accounts for only a minor part of the behavioral adaptation induced by the decreased sugar levels. Their aim to "dissect out a complete neural pathway that directly senses internal energy state and modulates food-related behavioral output in the fly brain" is likely only partially achieved. While this outcome is not a shortcoming of a study per se, the depth of discussion on the mechanism of interactions between the Hugin/AstA system and the other previously characterized molecular circuit mechanisms mediating hunger-induced behavioral modulation is insufficient for readers to appreciate the novelty of this study and future challenges in the field.

      We thank the reviewer for the thoughtful comment. We agree that the behavioral effects of manipulating the Hugin–AstA system alone were considerably weaker than the pronounced PER shifts induced by starvation. We will revise our Discussion to address it by positioning our findings within the broader context of energy regulation.

      More specifically, we will discuss that feeding behavior is controlled by two distinct, yet synergistic, types of mechanisms:

      (1) Hunger-driven 'accelerators': as the reviewer notes, pathways involving dopamine and NPF are powerful drivers of sweet sensitivity. These systems are strongly activated by hunger to promote food-seeking and consumption.

      (2) Satiety-driven 'brakes': our study identifies the counterpart to those systems above, aka. a satiety-driven 'brake'. The Hugin–AstA pathway acts as a direct sensor of high internal energy (glucose), which is specifically engaged during satiety to actively suppress sweet sensation and prevent overconsumption.

      This framework explains the seemingly discrepancy in effect size. The dramatic PER shift seen upon starvation is a combined result of engaging the 'accelerators' (hunger pathways like TH/NPF) while simultaneously releasing the 'brake' (our Hugin–AstA pathway being inactive).

      Our manipulations, which specifically target only the 'brake' system, are therefore expected to have a more modest effect than this combined physiological state. Thus, rather than being a "minor part," the Hugin–AstA pathway is a mechanistically defined, satiety-specific circuit that is essential for the precise "braking" required for energy homeostasis. We will update our Discussion to emphasize how these 'accelerator' and 'brake' circuits must work in concert to ensure precise energy regulation.

      In this context, authors are encouraged to confront a limitation of the study due to the lack of subtype-level circuit characterization, despite their intriguing finding that only a subtype of Hugin- and AstA-releasing neurons are responsive to the elevated level of bath-applied glucose.

      We thank the reviewer for highlighting the critical issue of subtype-level specialization within the Hugin and AstA populations.

      We fully agree that the Hugin system is known for its functional heterogeneity (pleiotropy), with different Hugin neuron subclusters implicated in regulating a variety of behaviors, including feeding, aversion, and locomotion (we will cite relevant literature here). Our finding that only a specific subcluster of Hugin neurons is responsive to glucose elevation provides a crucial first step in functionally dissecting this complexity. 

      We will add a dedicated paragraph to elaborate on this functional partitioning. We propose that this subtype-level specialization allows the Hugin system to precisely link specific physiological states (like high circulating glucose) to appropriate behavioral outputs (like the suppression of sweet taste), demonstrating an elegant solution to coordinating multiple survival behaviors. Future work using high-resolution tools such as split-GAL4 and single-cell sequencing will be invaluable in fully mapping the specific functional roles corresponding to each Hugin and AstA subcluster.

      Reviewer #2 (Public review):

      Summary:

      The question of how caloric and taste information interact and consolidate remains both active and highly relevant to human health and cognition. The authors of this work sought to understand how nutrient sensing of glucose modulates sweet sensation. They found that glucose intake activates hugin signaling to AstA neurons to suppress feeding, which contributes to our mechanistic understanding of nutrient sensation. They did this by leveraging the genetic tools of Drosophila to carry out nuanced experimental manipulations and confirmed the conservation of their main mechanism in a mammalian model. This work builds on previous studies examining sugar taste and caloric sensing, enhancing the resolution of our understanding.

      Strengths:

      Fully discovering neural circuits that connect body state with perception remains central to understanding homeostasis and behavior. This study expands our understanding of sugar sensing, providing mechanistic evidence for a hugin/AstA circuit that is responsive to sugar intake and suppresses feeding. In addition to effectively leveraging the genetic tools of Drosophila, this study further extends their findings into a mammalian model with the discovery that NMU neural signaling is also responsive to sugar intake.

      Weaknesses:

      The effect of Glut1 knockdown on PER in hugin neurons is modest, and does not show a clear difference between fed and starved flies as might be expected if this mechanism acts as a sensor of internal energy state. This could suggest that glucose intake through Glut1 may only be part of the mechanism.

      We thank the reviewer for this insightful comment and agree that the modest behavioral effect of Glut1 knockdown is a critical finding that warrants further clarification. This observation strongly supports the idea that internal energy state is monitored by a sophisticated and robust network, not a single, fragile component. We believe the effect size is modest for two main reasons, which we will further address in revised Discussion.

      Firstly, the effect size is likely attenuated by technical and molecular redundancy. Specifically, the RNAi-mediated knockdown of Glut1 may be incomplete, leaving residual transporter function. Furthermore, Glut1 is likely only one part of the Hugin neuron's intrinsic sensing mechanism; other components, such as alternative glucose transporters or downstream K<sub>ATP</sub> channel signaling, may provide molecular redundancy, meaning that the full energy-sensing function is not easily abolished by a single manipulation.

      Secondly, and more importantly, the final feeding decision is an integrated output of competing circuits. While hunger-sensing pathways like the dopamine and NPF circuits act as powerful "accelerators" to drive sweet consumption, the Hugin–AstA pathway serves as a satiety-specific "brake". The modest effect of partially inhibiting just one component of this 'brake' system is the hallmark of a precisely regulated, multi-layered homeostatic system. We will further clarify in the Discussion that the Hugin pathway represents one essential inhibitory circuit within this cooperative network that works together with the hunger-promoting systems to ensure precise control over energy intake.

      Reviewer #3 (Public review):

      Summary:

      This study identifies a novel energy-sensing circuit in Drosophila and mice that directly regulates sweet taste perception. In flies, hugin+ neurons function as a glucose sensor, activated through Glut1 transport and ATP-sensitive potassium channels. Once activated, hugin neurons release hugin peptide, which stimulates downstream Allatostatin A (AstA)+ neurons via PK2-R1 receptors. AstA+ neurons then inhibit sweet-sensing Gr5a+ gustatory neurons through AstA peptide and its receptor AstA-R1, reducing sweet sensitivity after feeding. Disrupting this pathway enhances sweet taste and increases food intake, while activating the pathway suppresses feeding.

      The mammalian homolog of neuromedin U (NMU) was shown to play an analogous role in mice. NMU knockout mice displayed heightened sweet preference, while NMU administration suppressed it. In addition, VMH NMU+ neurons directly sense glucose and project to rNST Calb2+ neurons, dampening sweet taste responses. The authors suggested a conserved hugin/NMU-AstA pathway that couples energy state to taste perception.

      Strengths

      Interesting findings that extend from insects to mammals. Very comprehensive.

      Weaknesses:

      Coupling energy status to taste sensitivity is not a new story. Many pathways appear to be involved, and therefore, it raises a question as to how this hugin-AstA pathway is unique.

      The reviewer is correct that several energy-sensing pathways are known. However, we now clarify that these previously established mechanisms, such as the dopaminergic and NPF pathways, primarily function as hunger-driven "accelerators." They are activated by low energy states to promote sweet sensitivity and drive consumption.

      The crucial, missing piece of the puzzle—which our study provides—is the satiety-specific "brake" mechanism. We identify the Hugin–AstA circuit as one of the “brakes”: a dedicated, central sensor that responds directly to high circulating glucose (satiety) to suppress sweet sensation and prevent overconsumption.

      Thus, our work is unique because it defines the essential counterpart to the hunger pathways. In the revised Discussion, we will further explain how these 'accelerator' (hunger) and 'brake' (satiety) systems work in concert to allow for the precise, bidirectional regulation of energy intake. Furthermore, by demonstrating that this Hugin/NMU 'brake' circuit is evolutionarily conserved in mice, our findings reveal a fundamental energy-sensing strategy and suggest that this pathway could represent a promising new therapeutic target for managing conditions of excessive food intake.

    1. eLife Assessment

      This important study extends the previous interesting work of this group to address the potentially different control of movement and posture. Through experiments in which stroke participants used a robotic manipulandum, the authors provide solid evidence supporting a lack of a relation between the resting force postural bias they measure (closely related to the flexor synergy in stroke) and kinematic deficits during movement. Based on these results, the authors propose a conceptual framework that differentially weights the two main descending pathways (corticospinal tract and reticulospinal tract) for neurologically intact and stroke patients.

    2. Reviewer #1 (Public review):

      This study extends the previous interesting work of this group to address the potentially differential control of movement and posture. Their earlier work explored a broad range of data to make the case for a downstream neural integrator hypothesized to convert descending velocity movement commands into postural holding commands. Included in that data were observations from people with hemiparesis due to stroke. The current study uses similar data, but pushes into a different, but closely related direction, suggesting that these data may address the independence of these two fundamental components of motor control. The study makes observations about the different expression movement deficits during postural fixation and movement, and the different effect of force perturbations during these periods, consistent with their hypothesis that movement and postural control are separate motor functions. They speculate that the appearance of the stereotypic flexor synergies characteristic of stroke, are the result of a breakdown of this normal separation between the two control modes.

      Comments on revisions:

      I had only two very trivial comments in the previous version. One was simply a figure that was mistakenly not updated, and the other was the use of the terms "proximal" and "distal" to describe the location of a target. Both have been corrected.

    3. Reviewer #2 (Public review):

      The reported findings by Hadjiosif and colleagues address an important question in sensorimotor neuroscience related to the idea that movement and postural control are regulated by unique circuits. To explain the reported compromised postural control for stroke patients, the authors propose a conceptual framework that differentially weights corticospinal tract and reticulospinal tract for neurologically intact and stroke patients. Based on the currently reported findings and experimental design, the interpretation of the authors provides support to this idea.

      The authors have done well to include a limitations paragraph in their discussion. While it is difficult to truly compare across many of the experimental conditions to draw any strong conclusions, the authors have included additional analyses and a limitations paragraph highlighting some weaknesses in the paper.

    4. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      This study extends the previous interesting work of this group to address the potentially differential control of movement and posture. Their earlier work explored a broad range of data to make the case for a downstream neural integrator hypothesized to convert descending velocity movement commands into postural holding commands. Included in that data were observations from people with hemiparesis due to stroke. The current study uses similar data, but pushes into a different, but closely related direction, suggesting that these data may address the independence of these two fundamental components of motor control. I find the logic laid out in the second sentence of the abstract ("The paretic arm after stroke is notable for abnormalities both at rest and during movement, thus it provides an opportunity to address the relationships between control of reaching, stopping, and stabilizing") less then compelling, but the study does make some interesting observations. Foremost among them, is the relation between the resting force postural bias and the effect of force perturbations during the target hold periods, but not during movement. While this interesting observation is consistent with the central mechanism the authors suggest, it seems hard to me to rule out other mechanisms, including peripheral ones. These limitations should should be discussed.

      Thank you for summarizing our work. Note we have improved the logic in our abstract (…”providing an opportunity to ask whether control of these behaviors is independently affected in stroke”) based on your comments as outlined in our previous revision. We now extensively discuss limitations and potential alternative mechanisms in greater detail, in a dedicated section (lines 846-895; see response to reviewer 2 for further details).

      Reviewer #2 (Public review):

      Summary:

      Here the authors address the idea that postural and movement control are differentially impacted with stroke. Specifically, they examined whether resting postural forces influenced several metrics of sensorimotor control (e.g., initial reach angle, maximum lateral hand deviation following a perturbation, etc.) during movement or posture. The authors found that resting postural forces influenced control only following the posture perturbation for the paretic arm of stroke patients, but not during movement. They also found that resting postural forces were greater when the arm was unsupported, which correlated with abnormal synergies (as assessed by the Fugl-Meyer). The authors suggest that these findings can be explained by the idea that the neural circuitry associated with posture is relatively more impacted by stroke than the neural circuitry associated with movement. They also propose a conceptual model that differentially weights the reticulospinal tract (RST) and corticospinal tract (CST) to explain greater relative impairments with posture control relative to movement control, due to abnormal synergies, in those with stroke.

      Thank you for the brief but comprehensive summary. We would like to clarify one point: we do not suggest that our findings are necessarily due to the neural circuitry associated with posture being more impacted than the neural circuitry associated with movement. (rather, our conceptual model suggests that increased outflow through the (ipsilateral) RST, involved in posture, compensates for CST damage, at the expense of posture abnormalities spilling over into movement). Instead, we suggest that the neural circuitry for posture vs. movement control remains relatively separate in stroke, with impairments in posture control not substantially explaining impairments in movement control.

      Comments on revisions:

      The authors should be commended for being very responsive to comments and providing several further requested analyses, which have improved the paper. However, there is still some outstanding issues that make it difficult to fully support the provided interpretation.

      Thank you for appreciating our response to your earlier comments. We address the outstanding issues below.

      The authors say within the response, "We would also like to stress that these perturbations were not designed so that responses are directly compared to each other ***(though of course there is an *indirect* comparison in the sense that we show influence of biases in one type of perturbation but not the other)***." They then state in the first paragraph of the discussion that "Remarkably, these resting postural force biases did not seem to have a detectable effect upon any component of active reaching but only emerged during the control of holding still after the movement ended. The results suggest a dissociation between the control of movement and posture." The main issue here is relying on indirect comparisons (i.e., significant in one situation but not the other), instead of relying on direct comparisons. Using well-known example, just because one group / condition might display a significant linear relationship (i.e., slope_1 > 0) and another group / condition does not (slope_2 = 0), does not necessarily mean that the two groups / conditions are statistically different from one another [see Figure 1 in Makin, T. R., & Orban de Xivry, J. J. (2019). Ten common statistical mistakes to watch out for when writing or reviewing a manuscript. eLife, 8, e48175.].

      We agree and are well aware of the limitation posed by an indirect comparison – hence the language we used to comment on the data (“did not seem”, “suggest”, etc.). To address this limitation, we performed a more direct comparison of how the two types of perturbations (moving vs. holding) interact with resting biases. For this comparison, we calculated a Response Asymmetry Index (RAI):

      Above, 𝑟<sub>𝐴</sub> is the response on direction where resting bias is most-aligned with the perturbation, and 𝑟<sub>𝑂</sub> is the response on direction where resting bias is most-opposed to the perturbation.

      We calculated RAIs for two response metrics used for both moving and holding perturbations: maximum deviation and time to stabilization/settling time. For these two response metrics, positive RAIs indicate an asymmetry in line with an effect of resting bias.

      The idea behind the RAI is that, while the magnitude of responses may well differ between the two types of perturbations, this will be accounted for by the ratio used to calculate the asymmetry. The same approach has been used to assess symmetry/laterality across a variety of different modalities, such as gait asymmetry (Robinson et al., 1987), the relative fMRI activity in the contralateral vs. ipsilateral sensorimotor cortex while performing a motor task (Cramer et al., 1997), or the relative strength of ipsilateral vs. contralateral responses to transcranial magnetic stimulation (McPherson et al., 2018). Notably, the normalization also addresses potential differences in overall stiffness between holding vs. moving perturbations, which would similarly affect aligned and opposing cases (see our response to your following point).

      Figure 8 shows RAIs we obtained for holding (red) vs. moving/pulse (blue) perturbations. For the maximum deviation (left), there is more asymmetry for the holding case though the pvalue is marginal (p=0.088) likely due to the large variability in the pulse case (individual values shown in black dots). For time to stabilization/settling time (right) the difference is significant (p=0.0048). Together, these analyses indicate that resting biases interact substantially more with holding compared to movement control, in line with a relative independence between these two control modalities. We now include this panel as Figure 8, and describe it in Results (lines 587-611).

      Note that even a direct comparison does not prove that resting biases and active movement control are perfectly independent. We now discuss these issues in more depth, in the new Limitations section suggested by the Reviewer (lines 836-849).

      The authors have provided reasonable rationale of why they chose certain perturbation waveforms for different. Yet it still holds that these different waveforms would likely yield very different muscular responses making it difficult to interpret the results and this remains a limitation. From the paper it is unknown how these different perturbations would differentially influence a variety of classic neuromuscular responses, including short-range stiffness and stretch reflexes, which would be at play here.

      Much of the results can be interpreted when one considers classic neuromuscular physiology. In Experiment 1, differences in resting postural bias in supported versus unsupported conditions can readily be explained since there is greater muscle activity in the unsupported condition that leads to greater muscle stiffness to resist mechanical perturbations (Rack, P. M., & Westbury, D. R. (1974). The short-range stiffness of active mammalian muscle and its effect on mechanical properties. The Journal of physiology, 240(2), 331-350.). Likewise muscle stiffness would scale with changes in muscle contraction with synergies. Importantly for experiment 2, muscle stiffness is reduced during movement (Rack and Westbury, 1974) which may explain why resting postural biases do not seem to be impacting movement. Likewise, muscle spindle activity is shown to scale with extrafusal muscle fiber activity and forces acting through the tendon (Blum, K. P., Campbell, K. S., Horslen, B. C., Nardelli, P., Housley, S. N., Cope, T. C., & Ting, L. H. (2020). Diverse and complex muscle spindle afferent firing properties emerge from multiscale muscle mechanics. eLife, 9, e55177.). The concern here is that the authors have not sufficiently considered muscle neurophysiology, how that might relate to their findings, and how that might impact their interpretation. Given the differences in perturbations and muscle states at different phases, the concern is that it is not possible to disentangle whether the results are due to classic neurophysiology, the hypothesis they propose, or both. Can the authors please comment.

      It is possible that neuromuscular physiology may explain part of our results. However, this would not contradict our conceptual model.

      Regarding Experiment 1, it is possible that stiffness would scale with changes in background muscle contraction as the reviewer suggests. Indeed, Bennett and al.(Bennett et al., 1992) used brief perturbations on the wrist to assess elbow stiffness, finding that, during movement, stiffness was increased in positions with a higher gravity load (and, in general, in positions where the net muscle torque was higher). However, during posture maintenance (like in our Experiment 1), they found that stiffness did not vary with (elbow) position or gravity load (two characteristics of our findings in Experiment 1):

      “The observed stiffness variation was not simply due to passive tissue or other joint angle dependent properties, as stiffnesses measured during posture were position invariant. Note that the minimum stiffness found in posture was higher than the peak stiffness measured during movement, and did not change much with the gravity load.” (illustrated in Fig. 5 of that paper)

      We thus find it very unlikely that stiffness explains the difference between the supported vs. unsupported conditions in Experiment 1.

      Even if stiffness modulation between the supported vs. unsupported conditions could explain our finding of stronger posture biases in the latter case, it would not be incompatible with our interpretation of increased RST drive: increased stiffness would potentially magnify the effects of the RST drive we propose to drive these resting biases. It is possible that the increase in resting biases under conditions of increased muscle contraction (lack of arm support) is mediated through an increase in muscle stiffness. In other words, the increase in resting biases may not directly reflect additional RST outflow per se, but the scaling, through stiffness, of the same magnitude of RST outflow. Understanding this interaction was beyond the scope of our experiment design; in line with this, we briefly comment about it in our Limitations section.

      Regarding Experiment 2, stiffness has indeed been shown to be lower during movement, and we now comment the potential effect of this on our results in the “Limitations” section (lines 815-830, replicated below). Importantly, for the case of holding perturbations, the increased stiffness associated with holding would increase resistance to both extension and flexion-inducing perturbations. Thus, higher stiffness would be unlikely to explain our finding whereby resting biases resist or aggravate the effects of holding perturbations depending on perturbation direction. In addition, the framework in Blum et al., that describes how interactions between alpha and gramma drive can explain muscle activity patterns, does not rule out central neural control of stiffness: “muscle spindles have a unique muscle-within-muscle design such that their firing depends critically on both peripheral and central factors” (emphasis ours). It may be, for example, that gamma motoneurons controlling muscle spindles and stiffness are modulated from input from the reticular formation, making this a mechanism in line with our conceptual model.

      “Moreover, it has been shown that joint stiffness is reduced during movement compared to holding control (Rack and Westbury, 1974; Bennett et al., 1992). Along similar lines, muscle spindle activity – which may modulate stiffness – scales with extrafusal muscle fiber activity (such as muscle exertion involved in holding) and forces acting through the tendon (Blum et al., 2020). Such observations could, in principle, explain why we were unable to detect a relationship between resting biases and active movement control but we readily found a relationship between resting biases and active holding control: reduced joint stiffness during movement could scale down the influence of resting abnormalities. There are two issues with this explanation, however. First, it is debatable whether this should be considered an alternative explanation per se: stiffness modulation could be, in total or in part, the manifestation of a central movement/posture CST/RST mechanism similar to the one we propose in our conceptual model. For example, (Blum et al., 2020) argue that muscle spindle firing depends on both peripheral and central factors. Second, increased stiffness would not necessarily help detect differences in how active postural control responds to within-resting-posture vs. out-of-resting-posture perturbations. This is because an overall increase in stiffness would likely increase resistance to perturbations in any direction.”

      The authors should provide a limitations paragraph. They should address 1) how they used different perturbation force profiles, 2) the muscles were in different states which would change neuromuscular responses between trial phase / condition, 3) discuss a lack of direct statistical comparisons that support their hypothesis, and 4) provide a couple of paragraphs on classic neurophysiology, such as muscle stiffness and stretch reflexes, and how these various factors could influence the findings (i.e., whether they can disentangle whether the reported results are due to classic neurophysiology, the hypothesis they propose, or both).

      Thank you for your suggestion. We now discuss these points in a separate paragraph (lines 846895), bringing together our previous discussion on stretch reflexes, our description of different perturbation types, and the additional issues raised by the reviewer above.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      The authors have responded well to all my concerns, save two minor points.

      Figure 2 appears to be unchanged, although they describe appropriate changes in the response letter.

      Thank you for catching this error – we now include the updated figure (further updated to use the terms near/distant in place of proximal/distal).

      I still take issue with the use of proximal and distal to describe the locations of targets. Taking definitions somewhat randomly from the internet, "The terms proximal and distal are used in structures that are considered to have a beginning and an end," and "Proximal and distal are anatomical terms used to describe the position of a body part in relation to another part or its origin." In any case, the hand does not become proximal just because you bring it to your chest. Why not simply stick to the common and clearly defined terms "near" and "distant"?

      Point taken. We have updated the paper to use the terms near/distant.

      Additional changes/corrections not outlined above

      We now include a link to the data and code supporting our findings (https://osf.io/hufy8/). In addition, we made several minor edits throughout the text to improve readability, and corrected occasional mislabeling of CCW and CW pulse data. Note that this correction did not alter the (lack of) relationship between resting biases and responses to perturbations during active movement.

      Response letter references

      Bennett D, Hollerbach J, Xu Y, Hunter I (1992) Time-varying stiffness of human elbow joint during cyclic voluntary movement. Exp Brain Res 88:433–442.

      Blum KP, Campbell KS, Horslen BC, Nardelli P, Housley SN, Cope TC, Ting LH (2020) Diverse and complex muscle spindle afferent firing properties emerge from multiscale muscle mechanics. Elife 9:e55177.

      Cramer SC, Nelles G, Benson RR, Kaplan JD, Parker RA, Kwong KK, Kennedy DN, Finklestein SP, Rosen BR (1997) A functional MRI study of subjects recovered from hemiparetic stroke. Stroke 28:2518–2527.

      McPherson JG, Chen A, Ellis MD, Yao J, Heckman C, Dewald JP (2018) Progressive recruitment of contralesional cortico-reticulospinal pathways drives motor impairment post stroke. J Physiol 596:1211–1225 Available at: https://doi.org/10.1113/JP274968.

      Rack PM, Westbury D (1974) The short range stiffness of active mammalian muscle and its effect on mechanical properties. J Physiol 240:331–350.

      Robinson R, Herzog W, Nigg BM (1987) Use of force platform variables to quantify the effects of chiropractic manipulation on gait symmetry. J Manipulative Physiol Ther 10:172–176.

      Williams PE, Goldspink G (1973) The effect of immobilization on the longitudinal growth of striated muscle fibres. J Anat 116:45.

    1. eLife Assessment

      This study investigates how people adapt their speech when auditory feedback is altered. The analyses are rigorous and the work makes a valuable contribution by extending methods from limb motor control to speech. However, because the paradigm does not directly measure sensory error, the evidence for the proposed mechanism of sensorimotor learning is incomplete. The findings are best viewed as evidence for how prior motor adjustments influence subsequent behaviour, highlighting the need for future studies to more precisely separate sensory and motor contributions to adaptation.

    2. Reviewer #1 (Public review):

      Summary:

      In this submitted manuscript, Lu, Tang, and colleagues implement a novel serial perturbation paradigm during speech to isolate the effects of sensory and motor processes on compensation. They perform three main studies: in the first study, they validate their method by randomly perturbing pitch in a series of produced vowels. They demonstrate that the amount of perturbation is driven (in part) by the previous trial's amount of motor compensation applied as opposed to the sensory perturbation. In the second experiment, they found that this effect carries over to single vowel words, but the effect was much weaker when different words were produced. Thirdly, the authors reproduce these findings in a more linguistically relevant way (during sentences) and show that the previously shown compensation effect only occurs within syntactic structures and not across them, suggesting an interplay between sensorimotor systems and linguistic structure processing.

      Strengths:

      Overall, this is a very unique study and strikes me as being potentially quite impactful. The authors have performed a large number of experiments to validate their findings that provide novel insights into the processes underlying compensation during speech production. These findings are also likely to produce new avenues for studying the neural mechanisms that support these processes.

      Weaknesses:

      While the authors go to great lengths to disassociate the serial effects of sensory and motor compensation, which is commendable, one weakness is that they are intrinsically linked (motor actions produce sensory consequences). Therefore, there is no obvious way to decouple them for the purposes of investigation. It would be beneficial to discuss future research that could further disentangle these factors.

    3. Reviewer #2 (Public review):

      This study aims to disentangle the contribution of sensory and motor processes (mapped onto the inverse and forward components of speech motor control models like DIVA) to production changes as a result of altered auditory feedback. After five experiments, the authors conclude that it is the motor compensation on the previous trial, and not the sensory error, that drives compensatory responses in subsequent trials.

      Assessment:

      The goal of this paper is great, and the question is timely. Quite a bit of work has gone into the study, and the technical aspects are sound. That said, I just don't understand how the current design can accomplish what the authors have set as their goal. This may, of course, be a misunderstanding on my part, so I'll try to explain my confusion below. If it is indeed my mistake, then I encourage the authors to dedicate some space to unpacking the logic in the Introduction, which is currently barely over a page long. They should take some time to lay out the logic of the experimental design and the dependent and independent variables, and how this design disentangles sensory and motor influences. Then clearly discuss the opposing predictions supporting sensory-driven vs. motor-driven changes. Given that I currently don't understand the logic and, consequently, the claims, I will focus my review on major points for now.

      Main issues

      (1) Measuring sensory change. As acknowledged by the authors, making a motor correction as a function of altered auditory feedback is an interactive process between sensory and motor systems. However, one could still ask whether it is primarily a change to perception vs. a change to production that is driving the motor correction. But to do this, one has to have two sets of measurements: (a) perceptual change, and (b) motor change. As far as I understand, the study has the latter (i.e., C), but not the former. Instead, the magnitude of perceptual change is estimated through the proxy of the magnitude of perturbation (P), but the two are not the same; P is a physical manipulation; perceptual change is a psychological response to that physical manipulation. It is theoretically possible that a physical change does not cause a psychological change, or that the magnitude of the two does not match. So my first confusion centers on the absence of any measure of sensory change in this study.

      To give an explicit example of what I mean, consider a study like Murphy, Nozari, and Holt (2024; Psychonomic Bulletin & Review). This work is about changes to production as a function of exposure to other talkers' acoustic properties - rather than your own altered feedback - but the idea is that the same sensory-motor loop is involved in both. When changing the acoustic properties of the input, the authors obtain two separate measures: (a) how listeners' perception changes as a function of this physical change in the acoustics of the auditory signal, and (b) how their production changes. This allows the authors to identify motor changes above and beyond perceptual changes. Perhaps making a direct comparison with this study would help the reader understand the parallels better.

      (2) A more fundamental issue for me is a theoretical one: Isn't a compensatory motor change ALWAYS a consequence of a perceptual change? I think it makes sense to ask, "Does a motor compensation hinge on a previous motor action or is sensory change enough to drive motor compensation?" This question has been asked for changed acoustics for self-produced speech (e.g., Hantzsch, Parrell, & Niziolek, 2022) and other-produced speech (Murphy, Holt, & Nozari, 2025), and in both cases, the answer has been that sensory changes alone are, in fact, sufficient to drive motor changes. A similar finding has been reported for the role of cerebellum in limb movements (Tseng et al., 2007), with a similar answer (note that in that study, the authors explicitly talk about "the addition" of motor corrections to sensory error, not one vs. the other as two independent factors. So I don't understand a sentence like "We found that motor compensation, rather than sensory errors, predicted the compensatory responses in the subsequent trials", which views motor compensations and sensory errors as orthogonal variables affecting future motor adjustments.

      In other words, there is a certain degree of seriality to the compensation process, with sensory changes preceding motor corrections. If the authors disagree with this, they should explain how an alternative is possible. If they mean something else, a comparison with the above studies and explaining the differences in positions would greatly help.

      (3) Clash with previous findings. I used the examples in point 2 to bring up a theoretical issue, but those examples are also important in that all three of them reach a conclusion compatible with one another and different from the current study. The authors do discuss Tseng et al.'s findings, which oppose their own, but dismiss the opposition based on limb vs. articulator differences. I don't find the authors reasoning theoretically convincing here, but more importantly, the current claims also oppose findings from speech motor studies (see citations in point 2), to which the authors' arguments simply don't apply. Strangely, Hantzsch et al.'s study has been cited a few times, but never in its most important capacity, which is to show that speech motor adaptation can take place after a single exposure to auditory error. Murphy et al. report a similar finding in the context of exposure to other talkers' speech.

      If the authors can convincingly justify their theoretical position in 2, the next step would be to present a thorough comparison with the results of the three studies above. If indeed there is no discrepancy, this comparison would help clarify it.

      References

      Hantzsch, L., Parrell, B., & Niziolek, C. A. (2022). A single exposure to altered auditory feedback causes observable sensorimotor adaptation in speech. eLife, 11, e73694.

      Murphy, T. K., Nozari, N., & Holt, L. L. (2024). Transfer of statistical learning from passive speech perception to speech production. Psychonomic Bulletin & Review, 31(3), 1193-1205.

      Murphy, T. K., Holt, L. L. & Nozari, N. (2025). Exposure to an Accent Transfers to Speech Production in a Single Shot. Preprint available at: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=5196109.

      Tseng, Y. W., Diedrichsen, J., Krakauer, J. W., Shadmehr, R., & Bastian, A. J. (2007). Sensory prediction errors drive cerebellum-dependent adaptation of reaching. Journal of neurophysiology, 98(1), 54-62.

    1. eLife Assessment

      This study extends prior work on head bristle mechanosensation by delivering a synaptic-resolution map of second-order partners that preserves somatotopy and highlights a cholinergic pathway linking sensory input to grooming circuits, providing a valuable resource for the field. The reconstructions and quantitative connectivity analyses provide solid support the main anatomical claims, while causal sufficiency for the behavioral sequence remains inferential and could be strengthened by a simple rank-order test relating wiring to the known grooming hierarchy.

    2. Reviewer #1 (Public review):

      Summary:

      Calle-Schuler et. al. reconstruct all the pre- and post-synaptic neurons to the bristle mechanosensory neurons on the adult fly head to understand how neural circuits determine the sequential motor patterns during fly grooming. They find that most presynaptic neurons, interneurons, and excitatory postsynaptic neurons are also somatotopically organized, such that each neuron is more connected to bristles mechanosensory neurons that are closer on the head and less connected to bristles mechanosensory neurons that are further away. These include the direct BMN-BMN circuits, excitatory interneurons, as well as the inhibitory networks. They also identify that the entire hemi-lineage 23b forms excitatory postsynaptic circuits with BMNs, highlighting how these circuits and hence their function could be developmentally determined.

      Strengths:

      This is a complete map of all the neurons that make 5 or more pre- and post-synaptic connections of the fly head BMNs. Using this, the authors have identified various trends, such as ascending neurons providing most of the GABAergic inhibitory input, which could provide the presynaptic inhibition essential for the parallel model for sequential grooming generation. Moreover, they identified that the entire cholinergic hemilineage 23b is postsynaptic to BMNs.

      Weaknesses:

      Although the somatotropic organization is an elegant mechanism to generate sequential motor sequences during grooming, none of the analyses in the paper directly demonstrate that this somatotropic connectivity is sufficient to generate hierarchical suppression and reconstruct the grooming sequence. If somatotropic organization is sufficient, then hierarchical clustering should recover the grooming sequence. Their detailed connectome enables the authors to test if some networks are more crucial for grooming sequence than others: to what extent can each network individually (ascending neurons-BMN alone) or a combination (BMN-BMN, ascending-BMN, BMN-descending, etc.) recover the sequence observed during grooming. If all the pre- and post-synaptic neurons put together cannot explain the sequence, then the sequence is probably determined by individual synaptic strengths or other key downstream neurons.

    3. Reviewer #2 (Public review):

      Summary:

      Schuler et al. present an extensive analysis of the synaptic connectivity of mechanosensory head bristles in the brain of Drosophila melanogaster. Based on the previously described set of bristle afferent neurons, (BMNs), located on the head, the study aims to provide a complete, quantitative assessment of all synaptic partners in the ventral brain. Activation of head bristles induces grooming behavior, which is hierarchically organized, and hypothesized to be grounded in a parallel cellular architecture in the central brain. The authors found evidence that, at the synaptic level, neurons downstream of the BMN afferents, namely the postsynaptic LB23 interneurons and recurrent GABAergic neurons (involved in sensory gain control), are organized in parallel, following the somatotopic organization described for the BMN afferents. This study, therefore, represents an important step towards a better understanding of the cellular circuits that govern the hierarchical order of sequentially organized grooming behavior in Drosophila melanogaster.

      The study is well done, the images are well designed and extensive in number, but the account is challenging to read and digest for the reader outside the Drosophila /connectome community. It is amazing what can be done with the connectome nowadays using the up-to-date FAFB dataset, the analytical and visual tools (as in FlyWire), in combination with known anatomy/physiology/behavior in DM. I suggest that the authors provide more detail on hemilineages, their relationship to the FAB connectome, the predicted neurotransmitter identity, and the use of statistical CatMAID tools used in some of the Figures.

      A graphical summary at the end of the study would be very useful to highlight the important findings focusing on neuron populations identified in this study and their position in the hypothesized parallel central circuitry of BMNs.

    4. Reviewer #3 (Public review):

      Summary:

      The authors set out to extend their previous mapping of Drosophila head mechanosensory neurons (Eichler et al., 2024) by reconstructing their full second-order connectome. Their aim is to reveal how bristle mechanosensory neurons (BMNs) interface with excitatory and inhibitory partners to generate location-specific grooming movements, and to identify the circuit motifs and developmental lineages that support this transformation.

      Strengths:

      The strengths of this work are clear. The authors present a comprehensive synaptic-resolution connectome for BMNs, identifying nearly all of their pre- and postsynaptic partners. This dataset reveals important circuit motifs:

      (1) BMNs provide feedforward excitation to descending neurons, feedforward inhibition to interneurons, and are themselves strongly regulated by GABAergic presynaptic inhibition.

      (2) These motifs together support the idea that BMN activity is locally gated and hierarchically suppressed, fitting well with known behavioural sequences of grooming.

      (3) The study also shows that connectivity preserves somatotopy, such that BMNs from neighbouring bristle populations converge onto shared partners, while distant BMNs remain segregated.

      (4) A developmental analysis reveals both primary and secondary partners, suggesting a layered scaffold plus adult-specific elaborations.

      (5) Finally, the identification of hemilineage 23b (LB23) as a core postsynaptic pathway - incorporating previously described antennal grooming neurons (aBN2) - provides a striking link between developmental lineage, anatomical connectivity, and behavioral output.

      (6) Together, the dataset represents a valuable resource for the neuroscience community and a foundation for future functional studies.

      Weaknesses:

      There are also some weaknesses that mostly only limit clarity.

      (1) The writing is dense, with results often presented in a cryptic fashion and the functional implications deferred to the discussion. As a result, the significance of circuit motifs such as BMN→motor or reciprocal inhibitory loops is sometimes buried, rather than highlighted when first described.

      (2) Some assumptions require more explanation for non-specialist readers - for example, how bristle identity is inferred in EM in the absence of cuticular structures, or what is meant by "ascending" and "descending" in a dataset that does not include the ventral nerve cord. While some of this comes from the earlier paper, it would help readers of this one to explain this.

      (3) Visualization choices also sometimes obscure key conclusions: network graphs can be visually appealing but do not clearly convey somatotopy or BMN-type differences; heatmaps or region-level matrices would make the parallel, block-like organization of the circuit more evident.

      (4) The data might also speak to roles beyond grooming (e.g., mechanosensory modulation of posture or feeding), and a brief acknowledgement of this would broaden the impact.

      (5) The restriction to one hemisphere should be explicitly acknowledged as a limitation when framing this as a 'comprehensive' connectome.

      Overall, the authors achieve their main goal: they convincingly show that BMNs connect into parallel, somatotopically organized pathways, with LB23 providing a key lineage-based link from sensory input to grooming output. The dataset is carefully analyzed, and while the presentation could be streamlined, the connectome will be a valuable resource for researchers studying sensory processing, motor control, and the logic of circuit organization.

    1. eLife Assessment

      This timely and fundamental study introduces a human iPSC-based co-culture system that models Kupffer cell-hepatocyte interactions and aims to recapitulate liver-specific immune-parenchymal dynamics. Direct contact between iMacs and iHeps promotes mutual tissue-specific maturation, with iHeps downregulating fetal genes while iMacs acquire a Kupffer cell-like profile. This convincing in vitro model holds significant promise and is a leap forward; future experimental understanding will enhance its translational impact.

    2. Reviewer #1 (Public review):

      The manuscript presents a compelling new in vitro system based on isogenic co-cultures of human iPSC-derived hepatocytes and macrophages, enabling the modelling of hepatic immune responses with unprecedented physiological relevance. The authors show that co-culture leads to enhanced maturation of hepatocytes and tissue-resident macrophage identity, which cannot be achieved through conditioned media alone. Using this system, they functionally validate immune-driven hepatotoxic responses to a panel of drugs and compare the system's predictive power to that of monocyte-derived macrophages. The results underscore the necessity of macrophage-hepatocyte crosstalk for accurate modelling of liver inflammation and drug toxicity in vitro.

      The manuscript is clearly written and addresses a key limitation in liver organoid systems: the lack of immune complexity and tissue-specific macrophage imprinting. Nevertheless, several conclusions would benefit from a more careful interpretation of the data, and some important controls or explanations are missing, particularly in the flow cytometry gating strategies, stress marker validation, and cluster interpretations.

      Strengths:

      (1) Novelty and Relevance: The study presents a highly innovative co-culture system based on isogenic human iPSCs, addressing an unmet need in modelling immune-mediated hepatotoxicity.

      (2) Mechanistic Insight: The reciprocal reprogramming between iHeps and iMacs, including induction of KC-specific pathways and hepatocyte maturation markers, is convincingly demonstrated.

      (3) Functional Readouts: The application of the model to detect IL-6 responses to hepatotoxic compounds enhances its translational relevance.

      Weaknesses:

      (1) Several key claims, particularly those derived from PCA plots and DEG analyses, are overinterpreted and require more conservative language or further validation.

      (2) The purity of sorted hepatocytes and macrophages is not convincingly demonstrated; contamination across gates may confound transcriptomic readouts.

      (3) Stress response genes and ER stress/apoptosis signatures are not properly assessed, despite being potentially activated in the system.

      (4) Some figure panels and legends lack statistical annotations, and microscopy validation of morphological changes is missing.

      (5) The co-culture model with monocyte-derived macrophages is not fully characterised, making comparisons less informative.

    3. Reviewer #2 (Public review):

      Summary:

      This study builds on work by Glass and Guilliams showing that mouse Kupffer cells depend on the surrounding cells, including endothelium, hepatocytes, and stellate cells, for their identity. Herein, the authors extend the work to human systems. It nicely highlights why taking monocyte-derived macrophages and pretending they are Kupffer cells is simply misleading.

      Strengths:

      Many, including human cells, difficult culture assays, and important new data.

      Weaknesses:

      This reviewer identified minor queries only, rather than 'weaknesses' as such.

    4. Reviewer #3 (Public review):

      Summary:

      In this study, the authors establish a human in vitro liver model by co-culturing induced hepatocyte-like cells (iHEPs) with induced macrophages (iMACs). Through flow cytometry-based sorting of cell populations at days 3 and 7 of co-culture, followed by bulk RNA sequencing, they demonstrate that bidirectional interactions between these two cell types drive functional maturation. Specifically, the presence of iMACs accelerates the hepatic maturation program of iHEPs, while contact-dependent cues from iHEPs enhance the acquisition of Kupffer cell identity in iMACs, indicating that direct cell-cell interactions are critical for establishing tissue-resident macrophage characteristics.

      Functionally, the authors show that iMAC-derived Kupffer-like cells respond to pathological stimuli by producing interleukin-6 (IL-6), a hallmark cytokine of hepatic immune activation. When exposed to a panel of clinically relevant hepatotoxic drugs, the co-culture system exhibited concentration-dependent modulation of IL-6 secretion consistent with reported drug-induced liver injury (DILI) phenotypes. Notably, this response was absent when hepatocytes were co-cultured with monocyte-derived macrophages from peripheral blood, underscoring the liver-specific phenotype and functional relevance of the iMAC-derived Kupffer-like cells. Collectively, the study proposes this co-culture platform as a more physiologically relevant model for interrogating macrophage-hepatocyte crosstalk and assessing immune-mediated hepatotoxicity in vitro.

      Strengths:

      A major strength of this study lies in its systematic dissection of cell-cell interactions within the co-culture system. By isolating each cell type following co-culture and performing comprehensive transcriptomic analyses, the authors provide direct evidence of bidirectional crosstalk between iMACs and iHEPs. The comparison with single-culture controls is particularly valuable, as it clearly demonstrates how co-culture enhances functional maturation and lineage-specific gene expression in both cell types. This approach allows for a more mechanistic understanding of how hepatocyte-macrophage interactions contribute to the acquisition of tissue-specific phenotypes.

      Weaknesses:

      (1) Overreliance on bulk RNA-seq data:

      The primary evidence supporting cell maturation is derived from bulk RNA sequencing, which has inherent limitations in resolving heterogeneous cellular states and functional maturation. The conclusions regarding hepatocyte maturation are based largely on increased expression of a subset of CYP genes and decreased AFP levels - markers that, while suggestive, are insufficient on their own to substantiate functional maturation. Additional phenotypic or functional assays (e.g., metabolic activity, protein-level validation) would significantly strengthen these claims.

      (2) Insufficient characterization of input cell populations:

      The manuscript lacks adequate validation of the cellular identities prior to co-culture. Although the authors reference previously published protocols for generating iHEPs and iMACs, it remains unclear whether the cells used in this study faithfully retain expected lineage characteristics. For example, hepatocyte preparations should be characterized by flow cytometry for ALB and AFP expression, while iMACs should be assessed for canonical macrophage markers such as CD45, CD11b, and CD14 before co-culture. Without these baseline data, it is difficult to interpret the magnitude or significance of any co-culture-induced changes.

      (3) Quantitative assessment of IL-6 production is insufficient:

      The analysis of drug-induced IL-6 responses is based primarily on relative changes compared to control conditions. However, percentage changes alone are inadequate to capture the biological relevance of these responses. Absolute cytokine production levels - particularly in response to LPS stimulation - should be reported and directly compared to PBMC-derived macrophages to determine whether iMAC-derived Kupffer-like cells exhibit enhanced cytokine output. Moreover, the Methods section should clearly describe how ELISA results were normalized or corrected to account for potential differences in cell number, viability, or culture conditions.

      (4) Unclear mechanistic interpretation of IL-6 modulation:

      The observed changes in IL-6 production upon drug treatment cannot be interpreted solely as evidence of Kupffer cell-specific functionality. For instance, IL-6 suppression by NSAIDs such as diclofenac is well known to result from altered prostaglandin synthesis due to COX inhibition, while leflunomide's effects are linked to metabolite-induced modulation of immune cell proliferation and broader cytokine networks. These mechanisms are distinct from Kupffer cell identity and may not directly reflect liver-specific macrophage function. Consequently, changes in IL-6 secretion alone - particularly without additional mechanistic evidence or analysis of other cytokines - are insufficient to conclude that co-culture with hepatocytes drives the acquisition of bona fide Kupffer cell maturity.

    5. Author response:

      Reviewer #1:

      In line with the reviewer’s suggestions, we will be adjusting the text with more conservative language regarding the claims of maturation within the co-culture system, and emphasize that the conclusion is based on limited transcriptomic evidence. We acknowledge that the results from bulk RNA sequencing might contain contaminants across the gates, but would like to point out that the CD45+ CD14+ population is clear, and any resulting contamination would likely be small. We will be addressing this caveat clearly in a new limitations section, as suggested by reviewer 3 as well. We will also be taking the reviewer’s suggestion to look further into the stress response genes to further characterize the system. We apologise if we might have missed out any statistical annotations and will take care to include them in the updated version.

      Reviewer #3:

      We acknowledge the reviewer’s concerns that the study was primarily focused on bulk RNA sequencing data and might not fully represent the complex metabolic and functional shifts, especially in a cell type like the hepatocyte , and will be addressing these concerns in a new limitations section in the revised manuscript. We also apologise if it was unclear in the manuscript that the iHeps and iMacs were characterised prior to coculturing, for example the iMacs are routinely assessed for CD45, CD14 and CD163 prior to the start of any experiment, and likewise the iHeps are tested by qPCR, which also served as the baseline of the fold expression changes in Fig 3. The primary aim of the IL-6 assays is to demonstrate that the hepatocyte co-culture systems behave differently based on the source of the macrophages, and that the use of primary macrophages might not be suitable in studying drug responses in-vitro. We will clarify in the revised manuscript that the overall effect might not be directly related to specific Kupffer cell identity.

    1. eLife Assessment

      This study constitutes a fundamental advance for the uveal melanoma research field that might be exploited to target this deadly cancer and more generally for targeting transcriptional dependency in cancers. This work substantially advances our understanding of pharmacological inhibition of SWI/SNF as a therapeutic approach for cancer. The study is well written and provides compelling evidence, including comprehensive datasets, compound screens, gene expression analysis, epigenetics, as well as animal studies.

    2. Reviewer #1 (Public review):

      Summary:

      The presented study by Centore and colleagues investigates the inhibition of BAF chromatin remodeling complexes. The study is well written and includes comprehensive datasets, including compound screens, gene expression analysis, epigenetics, as well as animal studies. This is an important piece of work for the uveal melanoma research field, and sheds light on a new inhibitor class, as well as a mechanism that might be exploited to target this deadly cancer for which no good treatment options exist.

      Strengths:

      This is a comprehensive and well-written study.

    3. Reviewer #2 (Public review):

      Summary:

      The authors generate an optimized small molecule inhibitor of SMARCA2/4 and test it in a panel of cell lines. All uveal melanoma (UM) cell lines in the panel are growth inhibited by the inhibitor making the focus of the paper. This inhibition is correlated with loss of promoter occupancy of key melanocyte transcription factors e.g. SOX10. SOX10 overexpression and a point mutation in SMARCA4 can rescue growth inhibition exerted by the SMARCA2/4 inhibitor. Treatment of a UM xenograft model results in growth inhibition and regression which correlates with reduced expression of SOX10 but not discernible toxicity in the mice. Collectively, the data suggest a novel treatment of uveal melanoma.

      Strengths:

      There are many strengths of the study, including the strong challenge of the on-target effect, the assays used and the mechanistic data. The results are compelling as are the effects of the inhibitor. The in vivo data is dose-dependent and doses are low enough to be meaningful and associated with evidence of target engagement.

    4. Reviewer #3 (Public review):

      Summary:

      This manuscript reports the discovery of new compounds that selectively inhibit SMARCA4/SMARCA2 ATPase activity and have pronounced effects on uveal melanoma cell proliferation. They induce apoptosis and suppress tumor growth, with no toxicity in vivo. The report provides biological significance by demonstrating that the drugs alter chromatin accessibility at lineage specific gene enhancer regions and decrease expression of lineage specific genes, including SOX10 and SOX10 target genes.

      Strengths:

      The study provides compelling evidence for the therapeutic use of these compounds and does a thorough job at elucidating the mechanisms by which the drugs work. The study will likely have a high impact on the chromatin remodeling and cancer fields. The datasets will be highly useful to these communities.

      [Editors' note: The authors have addressed all of the outstanding issues.]

    5. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #1 (Public review): 

      Summary: 

      The presented study by Centore and colleagues investigates the inhibition of BAF chromatin remodeling complexes. The study is well written and includes comprehensive datasets, including compound screens, gene expression analysis, epigenetics, as well as animal studies. This is an important piece of work for the uveal melanoma research field, and sheds light on a new inhibitor class, as well as a mechanism that might be exploited to target this deadly cancer for which no good treatment options exist. 

      Strengths: 

      This is a comprehensive and well-written study. 

      Weaknesses: 

      There are minimal weaknesses. 

      Reviewer #2 (Public review): 

      Summary: 

      The authors generate an optimized small molecule inhibitor of SMARCA2/4 and test it in a panel of cell lines. All uveal melanoma (UM) cell lines in the panel are growth inhibited by the inhibitor making the focus of the paper. This inhibition is correlated with loss of promoter occupancy of key melanocyte transcription factors e.g. SOX10. SOX10 overexpression and a point mutation in SMARCA4 can rescue growth inhibition exerted by the SMARCA2/4 inhibitor. Treatment of a UM xenograft model results in growth inhibition and regression which correlates with reduced expression of SOX10 but not discernible toxicity in the mice. Collectively, the data suggest a novel treatment of uveal melanoma. 

      Strengths: 

      There are many strengths of the study, including the strong challenge of the on-target effect, the assays used and the mechanistic data. The results are compelling as are the effects of the inhibitor. The in vivo data is dose-dependent and doses are low enough to be meaningful and associated with evidence of target engagement. 

      Weaknesses: 

      The authors have addressed weaknesses in the revised version. 

      Reviewer #3 (Public review): 

      Summary: 

      This manuscript reports the discovery of new compounds that selectively inhibit SMARCA4/SMARCA2 ATPase activity and have pronounced effects on uveal melanoma cell proliferation. They induce apoptosis and suppress tumor growth, with no toxicity in vivo. The report provides biological significance by demonstrating that the drugs alter chromatin accessibility at lineage specific gene enhancer regions and decrease expression of lineage specific genes, including SOX10 and SOX10 target genes. 

      Strengths: 

      The study provides compelling evidence for the therapeutic use of these compounds and does a thorough job at elucidating the mechanisms by which the drugs work. The study will likely have a high impact on the chromatin remodeling and cancer fields. The datasets will be highly useful to these communities. 

      Weaknesses: 

      The authors have addressed all my concerns. 

      Recommendations for the authors: 

      We would, however, like to draw the authors attention to 2 comments by the referees. 

      Referee 1 comments: While BAP1 mutant UM cell lines were included for some of the experiments, it seems the in-vivo data mentioned in the response to the reviewers comment is missing? The authors stated that "MP46 (Supplementary Fig. 3a) is BAP1null uveal melanoma cell line with no detectable protein expression (AmiroucheneAngelozzi et al., Mol Oncol 2014), and we have observed strong tumor growth inhibition in this CDX model with our BAF ATPase inhibitor." But the CDX model data shown in Figure 4 is from 92.1 cells. If this data is available, then the manuscript would benefit from its addition. 

      We thank the reviewer for bringing this to our attention. As the reviewer mentioned, we show 92-1 CDX model in our manuscript. Additionally, strong tumor growth inhibition was observed in MP-46  CDX model treated with our BAF ATPase inhibitor and can be found in Vaswani et al., 2025 (PMID:39801091, https://pubmed.ncbi.nlm.nih.gov/39801091/).

      Referee 3 comments: 

      Supplementary Figure 2C 

      Is the T910M mutation in the parental MP41 cells heterozygous? If so, the authors should indicate this in the figure legend. If this is a homozygous mutation, the authors should explain how the inhibitors suppress SMARCA4 activity in cells that have a LOF mutation. 

      Could the authors please comment on these issues before a final version is posted online? 

      We thank the reviewer for bringing this to our attention. T910M mutation is heterozygous and the variant allele frequency for that mutation is 0.5. We updated the figure legend accordingly to reflect the genotype of the mutations highlighted in the table.

      Reviewer #1 (Recommendations for the authors): 

      The authors have addressed most of the questions in their review. 

      While BAP1 mutant UM cell lines were included for some of the experiments, it seems the in-vivo data mentioned in the response to the reviewers comment is missing? The authors stated that "MP46 (Supplementary Fig. 3a) is BAP1-null uveal melanoma cell line with no detectable protein expression (Amirouchene-Angelozzi et al., Mol Oncol 2014), and we have observed strong tumor growth inhibition in this CDX model with our BAF ATPase inhibitor." But the CDX model data shown in Figure 4 is from 92.1 cells. If this data is available, then the manuscript would benefit from its addition. 

      Reviewer #3 (Recommendations for the authors): 

      Supplementary Figure 2C 

      Is the T910M mutation in the parental MP41 cells heterozygous? If so, the authors should indicate this in the figure legend. If this is a homozygous mutation, the authors should explain how the inhibitors suppress SMARCA4 activity in cells that have a LOF mutation.

    1. eLife Assessment

      This manuscript provides a single-cell transcriptomic atlas for AML (222 samples comprising 748,679 cells) integrating data from multiple studies. They use this dataset to investigate t(8;21) AML, and they reconstruct the Gene Regulatory Network and enhancer Gene Regulatory Network, which allowed identification of interesting targets. This aggregation is important and can help infer differences in genetic regulatory modules based on the age of disease onset. Their compelling effort may help explain age-related variations in prognosis and disease development in subtype-specific manner.

    2. Reviewer #1 (Public review):

      Summary:

      In this manuscript, the authors performed an integration of 48 scRNA-seq public datasets and created a single-cell transcriptomic atlas for AML (222 samples comprising 748,679 cells). This is important since most AML scRNA-seq studies suffer from small sample size coupled with high heterogeneity. They used this atlas to further dissect AML with t(8;21) (AML-ETO/RUNX1-RUNX1T1), which is one of the most frequent AML subtypes in young people. In particular, they were able to predict Gene Regulatory Networks in this AML subtype using pySCENIC, which identified the paediatric regulon defined by a distinct group of hematopoietic transcription factors (TFs) and the adult regulon for t(8;21). They further validated this in bulk RNA-seq with AUCell algorithm and inferred prenatal signature to 5 key TFs (KDM5A, REST, BCLAF1, YY1, and RAD21), and the postnatal signature to 9 TFs (ENO1, TFDP1, MYBL2, KLF1, TAGLN2, KLF2, IRF7, SPI1, and YXB1). They also used SCENIC+ to identify enhancer-driven regulons (eRegulons), forming an eGRN, and found that prenatal origin shows a specific HSC eRegulon profile, while a postnatal shows a GMP profile. They also did an in silico perturbation and found AP-1 complex (JUN, ATF4, FOSL2), P300 and BCLAF1 as important TFs to induce differentiation. Overall, I found this study very important in creating a comprehensive resource for AML research.

      Strengths:

      • The generation of an AML atlas integrating multiple datasets with almost 750K cells will further support the community working on AML

      • Characterisation of t(8;21) AML proposes new interesting leads.

      • The t(8;21) TFs/regulons identified from any of the single dataset are not complete and now the authors showed that the increase in the number of cells that allowed identification of novel ones.

      Comments on revisions:

      In the revised version of the manuscript, the authors addressed all my comments.

    3. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review): 

      Summary: 

      In this manuscript, the authors performed an integration of 48 scRNA-seq public datasets and created a single-cell transcriptomic atlas for AML (222 samples comprising 748,679 cells). This is important since most AML scRNA-seq studies suffer from small sample size coupled with high heterogeneity. They used this atlas to further dissect AML with t(8;21) (AML-ETO/RUNX1-RUNX1T1), which is one of the most frequent AML subtypes in young people. In particular, they were able to predict Gene Regulatory Networks in this AML subtype using pySCENIC, which identified the paediatric regulon defined by a distinct group of hematopoietic transcription factors (TFs) and the adult regulon for t(8;21). They further validated this in bulk RNA-seq with AUCell algorithm and inferred prenatal signature to 5 key TFs (KDM5A, REST, BCLAF1, YY1, and RAD21), and the postnatal signature to 9 TFs (ENO1, TFDP1, MYBL2, KLF1, TAGLN2, KLF2, IRF7, SPI1, and YXB1). They also used SCENIC+ to identify enhancer-driven regulons (eRegulons), forming an eGRN, and found that prenatal origin shows a specific HSC eRegulon profile, while a postnatal origin shows a GMP profile. They also did an in silico perturbation and found AP-1 complex (JUN, ATF4, FOSL2), P300, and BCLAF1 as important TFs to induce differentiation. Overall, I found this study very important in creating a comprehensive resource for AML research. 

      Strengths: 

      (1) The generation of an AML atlas integrating multiple datasets with almost 750K cells will further support the community working on AML. 

      (2) Characterisation of t(8;21) AML proposes new interesting leads. 

      We thank the reviewer for a succinct summary of our work and highlighting its strengths.

      Weaknesses: 

      Were these t(8;21) TFs/regulons identified from any of the single datasets? For example, if the authors apply pySCENIC to any dataset, would they find the same TFs, or is it the increase in the number of cells that allows identification of these? 

      We implemented pySCENIC on individual datasets and compared the TFs (defining the regulons) identified to those from the combined AML scAtlas analysis. There were some common TFs identified, but these vary between individual studies. The union of all TFs identified makes a very large set - comprising around a third of all known TFs. AML scAtlas provides a more refined repertoire of TFs, perhaps as the underlying network inference approach is more robust with a higher number of cells. The findings of these investigations are included in Supplementary Figure 4DE, we hope this is useful for other users of pySCENIC.

      Reviewer #2 (Public review): 

      Summary: 

      The authors assemble 222 publicly available bone marrow single-cell RNA sequencing samples from healthy donors and primary AML, including pediatric, adolescent, and adult patients at diagnosis. Focusing on one specific subtype, t(8;21), which, despite affecting all age classes, is associated with better prognosis and drug response for younger patients, the authors investigate if this difference is reflected also in the transcriptomic signal. Specifically, they hypothesize that the pediatric and part of the young population acquires leukemic mutations in utero, which leads to a different leukemogenic transformation and ultimately to differently regulated leukemic stem cells with respect to the adult counterpart. The analysis in this work heavily relies on regulatory network inference and clustering (via SCENIC tools), which identifies regulatory modules believed to distinguish the pre-, respectively, post-natal leukemic transformation. Bulk RNA-seq and scATAC-seq datasets displaying the same signatures are subsequently used for extending the pool of putative signature-specific TFs and enhancer elements. Through gene set enrichment, ontology, and perturbation simulation, the authors aim to interpret the regulatory signatures and translate them into potential onset-specific therapeutic targets. The putative pre-natal signature is associated with increased chemosensitivity, RNA splicing, histone modification, stemness marker SMARCA2, and potentially maintained by EP300 and BCLAF1. 

      Strengths: 

      The main strength of this work is the compilation of a pediatric AML atlas using the efficient Cellxgene interface. Also, the idea of identifying markers for different disease onsets, interpreting them from a developmental angle, and connecting this to the different therapy and relapse observations, is interesting. The results obtained, the set of putative up-regulated TFs, are biologically coherent with the mechanisms and the conclusions drawn. I also appreciate that the analysis code was made available and is well documented. 

      We thank the reviewer for evaluating our work, and highlighting its key features, including creation of AML atlas, downstream analysis and interpretation for t(8;21) subtype.

      Weaknesses:

      There were fundamental flaws in how methods and samples were applied, a general lack of critical examination of both the results and the appropriateness of the methods for the data at hand, and in how results were presented. In particular: 

      (1) Cell type annotation: 

      (a) The 2-phase cell type annotation process employed for the scRNA-seq sample collection raised concerns. Initially annotated cells are re-labeled after a second round with the same cell types from the initial label pool (Figure 1E). The automatic annotation tools were used without specifying the database and tissue atlases used as a reference, and no information was shown regarding the consensus across these tools. 

      Cell type annotations are heavily influenced by the reference profiles used and vary significantly between tools. To address this, we used multiple cell type annotation tools which predominantly encompassed healthy peripheral blood cell types and/or healthy bone marrow populations. This determined the primary cluster cell types assigned. 

      Existing tools and resources are not leukemia specific, thus, to identify AMLassociated HSPC subpopulations we created a custom SingleR reference, using a CD34 enriched AML single-cell dataset. This was not suitable for the annotation of the full AML scAtlas, as it is derived from CD34 sorted cell types so is biased towards these populations. 

      We have made this much clearer in the revised manuscript, by splitting Figure 1 into two separate figures (now Figure 1 and Figure 2) reflecting both different analyses performed. The methods have also been updated with more detail on the cell type annotations, and we have included the automated annotation outputs as a supplementary table, as this may be useful for others in the single-cell community. 

      (b) Expression of the CD34 marker is only reported as a selection method for HSPCs, which is not in line with common practice. The use of only is admitted as a surface marker, while robust annotation of HSPCs should be done on the basis of expression of gene sets. 

      Most of the cells used in the HSPC analysis were in fact annotated as HSPCs with some exceptions. In line with this feedback, we have re-worked this analysis and simply taken HSPC annotated clusters forward for the subsequent analysis, yielding the same findings. 

      (c) During several analyses, the cell types used were either not well defined or contradictory, such as in Figure 2D, where it is not clear if pySCENIC and AUC scores were computed on HSPCs alone or merged with CMPs. In other cases, different cell type populations are compared and used interchangeably: comparing the HSPCderived regulons with bulk (probably not enriched for CD34+ cells) RNA samples could be an issue if there are no valid assumptions on the cell composition of the bulk sample. 

      We apologize for the lack of clarity regarding which cell types were used, the text has been updated to clarify that in the pySCENIC analysis all myeloid progenitor cells were included. 

      The bulk RNA-seq samples were used only to test the enrichment of our AML scAtlas derived regulons in an unbiased and large-scale way. While CD34 enriched samples could be preferable, this was not available to us. 

      We agree that more effort could be made to ensure the single-cell/myeloid progenitor derived regulons are comparable to the bulk-RNA sequencing data. In the original bulk RNA-seq validation analysis, we used all bulk-RNA sequencing timepoints (diagnostic, on-treatment, relapse) and included both bone marrow and peripheral blood. Upon reflection, and to better harmonize the bulk RNA-seq selection strategy with that of AML scAtlas, we revised our approach to include only diagnostic bone marrow samples. We expect that, since the leukemia blast count for pediatric AML is typically high at diagnosis, these samples will predominantly contain leukemic blasts. 

      (2) Method selection: 

      (a) The authors should explain why they use pySCENIC and not any other approach.They should briefly explain how pySCENIC works and what they get out in the main text. In addition they should explain the AUCell algorithm and motivate its usage. 

      pySCENIC is state-of-the-art method for network inference from scRNA data and is widely used within the single-cell community (over 5000 citations for both versions of the SCENIC pipeline). The pipeline has been benchmarked as one of the top performers for GRN analysis (Nguyen et al, 2021. Briefings in Bioinformatics). AUCELL is a module within the pySCENIC pipeline to summarize the activity of a set of genes (a regulon) into a single number which helps compare and visualize different regulons.  We have modified the manuscript (Results section 2 paragraph 2) to better explain this method and provided some rationale and accompanying citations to justify its use for this analysis. We thank the reviewer for highlighting this and hope our updates add some clarity.

      (b) The obtained GRN signatures were not critically challenged on an external dataset. Therefore, the evidence that supports these signatures to be reliable and significant to the investigated setting is weak. 

      These signatures were inferred using the most suitable AML single-cell RNA datasets currently available. To validate our findings, we used two independent datasets (the TARGET AML bulk RNA sequencing cohort, and the Lambo et al. scRNA-seq dataset). To clarify this workflow in the manuscript, we have added a panel to Figure 3 outlining the analytical process. To our knowledge, there are no other better-suited datasets for validation. Experimental validations on patient samples, while valuable, are beyond the scope of this study.

      (3) There are some issues with the analysis & visualization of the data. 

      Based on this feedback, we have improved several aspects of the analysis, changed some visualizations, and improved figure resolution throughout the manuscript. 

      (4) Discussion: 

      (a) What exactly is the 'regulon signature' that the authors infer? How can it be useful for insights into disease mechanisms? 

      The ’regulon signature’ here refers to a gene regulatory program (multiple gene modules, each defined by a transcription factor and its targets) which are specific to different age groups. Further investigation into this can be useful for understanding why patients of different ages confer a different clinical course. We have amended the text to explain this.  

      (b) The authors write 'Together this indicates that EP300 inhibition may be particularly effective in t(8;21) AML, and that BCLAF1 may present a new therapeutic target for t(8;21) AML, particularly in children with inferred pre-natal origin of the driver translocation.' I am missing a critical discussion of what is needed to further test the two targets. Put differently: Would the authors take the risk of a clinical study given the evidence from their analysis? 

      Indeed, many extensive studies would be required before these findings are clinically translatable. We have included a discussion paragraph (discussion paragraph 7) detailing what further work is required in terms of experimental validation and potential subsequent clinical study.

      Reviewer #1 (Recommendations for the authors): 

      In addition to the point raised above, Cytoscape files for the GRNs and eGRNs inferred would be useful to have. 

      We have now provided Cytoscape/eGRN tables in supplementary materials.

      Reviewer #2 (Recommendations for the authors): 

      (1) Figures 1F and 1G: You show the summed-up frequencies for all patients, right? It would be very interesting to see this per patient, or add error bars, since the shown frequencies might be driven by single patients with many cells. 

      While this type of plot could be informative, the large number of samples in the AML scAtlas rendered the output difficult to interpret. As a result, we decided not to include it in the manuscript.

      (2) An issue of selection bias has to be raised when only the two samples expressing the expected signatures are selected from the external scRNA dataset. Similarly, in the DepMap analysis, the age and nature of the other cell lines sensitive to EP300 and BCLAF1 should be reported. 

      Since the purpose of this analysis was to build on previously defined signatures, we selected the two samples which we had preliminary hypotheses for. It would indeed be interesting to explore those not matching these signatures; however, samples numbers are very small, so without preliminary findings robust interpretation and validation would be difficult. An expanded validation would be more appropriate once more data becomes available in the future. 

      We agree that investigating the age and nature of other BCLAF1/EP300 sensitive cell lines is a very valuable direction. Our analysis suggests that our BCLAF1 findings may also be applicable to other in-utero origin cancers, and we have now summarized these observations in Supplementary Figure 7H. 

      (3) Is there statistical evidence for your claim that "This shows that higher-risk subtypes have a higher proportion of LSCs compared to favorable risk disease."? At least intermediate and adverse look similar to me. How does this look if you show single patients?  

      We are grateful to the reviewer for noticing this oversight and have now included an appropriate statistical test in the revised manuscript. As before, while showing single patients may be useful, the large number of patients makes such plot difficult to interpret. For this reason, we have chosen not to include them.

      (4) Specify the statistical test you used to 'identify significantly differentially expressed TFs' (line 192). 

      The methods used for differential expression analysis are now clearly stated in the text as well as in the methods section. We hope this addition improves clarity for the reader.

      (5) Figure 2B: You show the summed up frequencies for all patients, right? It would be intriguing to see this figure per patient, since the shown frequencies might be driven by single patients with many cells. 

      Yes, the plot includes all patients. Showing individual patients on a single plot is not easily interpretable. 

      (6) Y axis in 2D is not samples, but single cells? Please specify. 

      We thank the reviewer for bringing this to our attention and have now updated Figure 3D accordingly. 

      (7) Figure 3A: I don't get why the chosen clusters are designated as post- and prenatal, given the occurrence of samples in them. 

      This figure serves to validate the previously defined regulon signatures, so the cluster designations are based on this. We have amended the text to elaborate on this point, which will hopefully provide greater clarity.

      (8) Figure 3E: What is shown on the y axis? Did you correct your p-values for multiple testing? 

      We apologize for this oversight and have now added a y axis label. P values were not corrected for multiple testing, as there are only few pairwise T tests performed.

      (9) Robustness: You find some gene sets up- and down-regulated. How would that change if you used an eg bootstrapped number of samples, or a different analysis approach? 

      To address this, we implemented both edgeR and DESeq2 for DE testing. Our findings (Supplementary Figure 5B) show that 98% of edgeR genes are also detected by DESeq2. We opted to use the smaller edgeR gene list for our analysis, due to the significant overlap showing robust findings. We thank the reviewer for this helpful suggestion, which has strengthened our analysis

      (10) Multiomics analysis:

      (a) Why only work on 'representative samples'? The idea of an integrated atlas is to identify robust patterns across patients, no? I'd love to see what regulons are robust, ie,  shared between patients.

      As discussed in point 2, there are very few samples available for the multiomics analysis. Therefore, we chose to focus on those samples which we had a working hypothesis for, as a validation for our other analyses. 

      (b) I don't agree that finding 'the key molecular processes, such as RNA splicing, histone modification, and TF binding' expressed 'further supports the stemness signature in presumed prenatal origin t(8;21) AML'.

      Following the improvements made on the bulk RNA-Seq analysis in response to the previous reviewer comments, we ended up with a smaller gene set. Consequently, the ontology results have changed. The updated results are now more specific and indicate that developmental processes are upregulated in presumed prenatal origin t(8;21) AML. 

      (c) Please clarify if the multiome data is part of the atlas.

      The multiome data is not a part of AML scAtlas, as it was published at a later date. We used this dataset solely for validation purposes and have updated the figures and text to clearly indicate that it is used as a validation dataset.  

      (d) Please describe the used data with respect to the number of patients, cells, age, etc.

      We clarified this point in the text and have also included supplementary tables detailing all samples used in the atlas and validation datasets. 

      (e) The four figures in Figure 4E look identical to me. What is the take-home message here? Do all perturbations have the same impact on driving differentiation? Please elaborate.

      The perturbation figure is intended to illustrate that other genes can behave similarly to members of the AP-1 complex (JUN and ATF4 here) following perturbation. Since the AP-1 complex is well known to be important in t(8;21) AML, we hypothesize that these other genes are also important. We apologize for the previous lack of interpretation here and have amended the text to clarify this point. 

      (11) Abstract: Please detail: how many of the 159 AML patients are t(8;21)? 

      We have now amended the abstract to include this. 

      (12) Figures: Increase font size where possible, eg age in 1B or risk group in 1G is super small and hard to read. 

      Extra attention has been given to improving the figure readability and resolution throughout the whole manuscript.  

      (13) Color codes in Figures 2B and 2C are all over the place and misleading: Sort 2C along age, indicate what is adult and adolescent, sort the x axis in 2B along age. 

      We have changed this figure accordingly.  

      (14) I suggest not coloring dendrograms, in my opinion this is highly irritating. 

      The dendrogram colors correspond to clusters which are referenced in the text, this coloring provides informative context and aids interpretation, making it a useful addition to the figure.

      (15) The resolution in Figure 4B is bad, I can't read the labels. 

      This visualization has been revised, to make presentation of this data clearer.  

      (16) In addition to selecting bulk RNA samples matching the two regulon signatures, some effort should have been put into investigating the samples not aligned with those, or assessing how unique these GRN signatures are to the specific cell type and disease of interest, excluding the influence of cell type composition and random noise. The lateonset signatures should also be excluded from being present in an external pre-natal cohort in a more statistically rigorous manner. 

      Our use of the bulk RNA-Seq data is solely intended for the validation of predefined regulon signatures, for which we already have a working hypothesis.  While we agree that further investigation of the samples that do not align with these signatures could yield interesting insights, we believe that such an analysis would extend beyond the scope of the current manuscript.

      (17) The specific bulk RNA samples used should be specified, along with the tissue of origin. The same goes for the Lambo dataset. 

      We have clarified this point in the text and provided a supplementary table detailing all samples used for validation, alongside the sample list from AML scAtlas.

      (18) In Supplementary Figure 5 B, the axes should be define. 

      We have updated this figure to include axis legends.

      (19) Supplementary Figure 4A. There is a mistake in the sex assignment for sample AML14D. Since chrY-genes are expressed, this sample is likely male, while the Xist expression is mostly zero. 

      We thank the reviewer for pointing out this error, which has now been corrected.  

      (20) Wording suggestions: 

      (a) Line 54: not compelling phrasing. 

      (b) Line 83: "allows to decipher". 

      (c) Line 88: repetition from line 85. 

      (d) Line 90: the expression "clean GRN" is not clear. 

      These wording suggestions have all been incorporated in the revised manuscript.

      (21) Supplementary Figure 3D is not interpretable, I suggest a different visualization. 

      We agree that the original figure was not the most informative and have replaced it with UMAPs displaying LSC6 and LSC17 scores.

    1. eLife Assessment

      The use of DNA tethers is an important advance for studying how motor proteins respond to load. The authors use a convincing methodology to investigate the detachment and reattachment kinetics of kinesin-1, 2, and 3 motors against loads oriented parallel to the microtubule. As the manuscript stands, the conclusions drawn from the experiments, as well as the overall interpretation of the results, are incompletely supported by the presented data, and the novelty over previous reports appears less clear.

    2. Reviewer #1 (Public review):

      Summary:

      Noell et al have presented a careful study of the dissociation kinetics of Kinesin (1,2,3) classes of motors moving in vitro on a microtubule. These motors move against the opposing force from a ~1 micron DNA strand (DNA tensiometer) that is tethered to the microtubule and also bound to the motor via specific linkages (Figure 1A). The authors compare the time for which motors remain attached to the microtubule when they are tethered to the DNA, versus when they are not. If the former is longer, the interpretation is that the force on the motor from the stretched DNA (presumed to be working solely along the length of the microtubule) causes the motor's detachment rate from the microtubule to be reduced. Thus, the specific motor exhibits "catch-bond" like behaviour.

      Strengths:

      The motivation is good - to understand how kinesin competes against dynein through the possible activation of a catch bond. Experiments are well done, and there is an effort to model the results theoretically.

      Weaknesses:

      The motivation of these studies is to understand how kinesin (1/2/3) motors would behave when they are pitted in a tug of war against dynein motors as they transport cargo in a bidirectional manner on microtubules. Earlier work on dynein and kinesin motors using optical tweezers has suggested that dynein shows a catch bond phenomenon, whereas such signatures were not seen for kinesin. Based on their data with the DNA tensiometer, the authors would like to claim that (i) Kinesin1 and Kinesin2 also show catch-bonding and (ii) the earlier results using optical traps suffer from vertical forces, which complicates the catch-bond interpretation.

      While the motivation of this work is reasonable, and the experiments are careful, I find significant issues that the authors have not addressed:

      (1) Figure 1B shows the PREDICTED force-extension curve for DNA based on a worm-like chain model. Where is the experimental evidence for this curve? This issue is crucial because the F-E curve will decide how and when a catch-bond is induced (if at all it is) as the motor moves against the tensiometer. Unless this is actually measured by some other means, I find it hard to accept all the results based on Figure 1B.

      (2) The authors can correct me on this, but I believe that all the catch-bond studies using optical traps have exerted a load force that exceeds the actual force generated by the motor. For example, see Figure 2 in reference 42 (Kunwar et al). It is in this regime (load force > force from motor) that the dissociation rate is reduced (catch-bond is activated). Such a regime is never reached in the DNA tensiometer study because of the very construction of the experiment. I am very surprised that this point is overlooked in this manuscript. I am therefore not even sure that the present experiments even induce a catch-bond (in the sense reported for earlier papers).

      (3) I appreciate the concerns about the Vertical force from the optical trap. But that leads to the following questions that have not at all been addressed in this paper:

      (i) Why is the Vertical force only a problem for Kinesins, and not a problem for the dynein studies?

      (ii) The authors state that "With this geometry, a kinesin motor pulls against the elastic force of a stretched DNA solely in a direction parallel to the microtubule". Is this really true? What matters is not just how the kinesin pulls the DNA, but also how the DNA pulls on the kinesin. In Figure 1A, what is the guarantee that the DNA is oriented only in the plane of the paper? In fact, the DNA could even be bending transiently in a manner that it pulls the kinesin motor UPWARDS (Vertical force). How are the authors sure that the reaction force between DNA and kinesin is oriented SOLELY along the microtubule?

      (4) For this study to be really impactful and for some of the above concerns to be addressed, the data should also have included DNA tensiometer experiments with Dynein. I wonder why this was not done?

      While I do like several aspects of the paper, I do not believe that the conclusions are supported by the data presented in this paper for the reasons stated above.

    3. Reviewer #2 (Public review):

      Summary:

      To investigate the detachment and reattachment kinetics of kinesin-1, 2, and 3 motors against loads oriented parallel to the microtubule, the authors used a DNA tensiometer approach comprising a DNA entropic spring attached to the microtubule on one end and a motor on the other. They found that for kinesin-1 and kinesin-2, the dissociation rates at stall were smaller than the detachment rates during unloaded runs. With regard to the complex reattachment kinetics found in the experiments, the authors argue that these findings were consistent with a weakly-bound 'slip' state preceding motor dissociation from the microtubule. The behavior of kinesin-3 was different and (by the definition of the authors) only showed prolonged "detachment" rates when disregarding some of the slip events. The authors performed stochastic simulations that recapitulate the load-dependent detachment and reattachment kinetics for all three motors. They argue that the presented results provide insight into how kinesin-1, -2, and -3 families transport cargo in complex cellular geometries and compete against dynein during bidirectional transport.

      Strengths:

      The present study is timely, as significant concerns have been raised previously about studying motor kinetics in optical (single-bead) traps where significant vertical forces are present. Moreover, the obtained data are of high quality, and the experimental procedures are clearly described.

      Weaknesses:

      However, in the present version of the manuscript, the conclusions drawn from the experiments, the overall interpretation of the results, and the novelty over previous reports appear less clear.

      Major comments:

      (1) The use of the term "catch bond" is misleading, as the authors do not really mean consistently a catch bond in the classical sense (i.e., a protein-protein interaction having a dissociation rate that decreases with load). Instead, what they mean is that after motor detachment (i.e., after a motor protein dissociating from a tubulin protein), there is a slip state during which the reattachment rate is higher as compared to a motor diffusing in solution. While this may indeed influence the dynamics of bidirectional cargo transport (e.g., during tug-of-war events), the used terms (detachment (with or without slip?), dissociation, rescue, ...) need to be better defined and the results discussed in the context of these definitions. It is very unsatisfactory at the moment, for example, that kinesin-3 is at first not classified as a catch bond, but later on (after tweaking the definitions) it is. In essence, the typical slip/catch bond nomenclature used for protein-protein interaction is not readily applicable for motors with slippage.

      (2) The authors define the stall duration as the time at full load, terminated by >60 nm slips/detachments. Isn't that a problem? Smaller slips are not detected/considered... but are also indicative of a motor dissociation event, i.e., the end of a stall. What is the distribution of the slip distances? If the slip distances follow an exponential decay, a large number of short slips are expected, and the presented data (neglecting those short slips) would be highly distorted.

      (3) Along the same line: Why do the authors compare the stall duration (without including the time it took the motor to reach stall) to the unloaded single motor run durations? Shouldn't the times of the runs be included?

      (4) At many places, it appears too simple that for the biologically relevant processes, mainly/only the load-dependent off-rates of the motors matter. The stall forces and the kind of motor-cargo linkage (e.g., rigid vs. diffusive) do likely also matter. For example: "In the context of pulling a large cargo through the viscous cytoplasm or competing against dynein in a tug-of-war, these slip events enable the motor to maintain force generation and, hence, are distinct from true detachment events." I disagree. The kinesin force at reattachment (after slippage) is much smaller than at stall. What helps, however, is that due to the geometry of being held close to the microtubule (either by the DNA in the present case or by the cargo in vivo) the attachment rate is much higher. Note also that upon DNA relaxation ,the motor is likely kept close to the microtubule surface, while, for example, when bound to a vesicle, the motor may diffuse away from the microtubule quickly (e.g., reference 20).

      (5) Why were all motors linked to the neck-coil domain of kinesin-1? Couldn't it be that for normal function, the different coils matter? Autoinhibition can also be circumvented by consistently shortening the constructs.

      (6) I am worried about the neutravidin on the microtubules, which may act as roadblocks (e.g. DOI: 10.1039/b803585g), slip termination sites (maybe without the neutravidin, the rescue rate would be much lower?), and potentially also DNA-interaction sites? At 8 nM neutravidin and the given level of biotinylation, what density of neutravidin do the authors expect on their microtubules? Can the authors rule out that the observed stall events are predominantly the result of a kinesin motor being stopped after a short slippage event at a neutravidin molecule?

      (7) Also, the unloaded runs should be performed on the same microtubules as in the DNA experiments, i.e., with neutravidin. Otherwise, I do not see how the values can be compared.

      (8) If, as stated, "a portion of kinesin-3 unloaded run durations were limited by the length of the microtubules, meaning the unloaded duration is a lower limit." corrections (such as Kaplan-Meier) should be applied, DOI: 10.1016/j.bpj.2017.09.024.

      (9) Shouldn't Kaplan-Meier also be applied to the ramp durations ... as a ramp may also artificially end upon stall? Also, doesn't the comparison between ramp and stall duration have a problem, as each stall is preceded by a ramp ...and the (maximum) ramp times will depend on the speed of the motor? Kinesin-3 is the fastest motor and will reach stall much faster than kinesin-1. Isn't it obvious that the stall durations are longer than the ramp duration (as seen for all three motors in Figure 3)?

      (10) It is not clear what is seen in Figure S6A: It looks like only single motors (green, w/o a DNA molecule) are walking ... Note: the influence of the attached DNA onto the stepping duration of a motor may depend on the DNA conformation (stretched and near to the microtubule (with neutravidin!) in the tethered case and spherically coiled in the untethered case).

      (11) Along this line: While the run time of kinesin-1 with DNA (1.4 s) is significantly shorter than the stall time (3.0 s), it is still larger than the unloaded run time (1.0 s). What do the authors think is the origin of this increase?

      (12) "The simplest prediction is that against the low loads experienced during ramps, the detachment rate should match the unloaded detachment rate." I disagree. I would already expect a slight increase.

      (13) Isn't the model over-defined by fitting the values for the load-dependence of the strong-to-weak transition and fitting the load dependence into the transition to the slip state?

      (14) "When kinesin-1 was tethered to a glass coverslip via a DNA linker and hydrodynamic forces were imposed on an associated microtubule, kinesin-1 dissociation rates were relatively insensitive to loads up to ~3 pN, inconsistent with slip-bond characteristics (37)." This statement appears not to be true. In reference 37, very similar to the geometry reported here, the microtubules were fixed on the surface, and the stepping of single kinesin motors attached to large beads (to which defined forces were applied by hydrodynamics) via long DNA linkers was studied. In fact, quite a number of statements made in the present manuscript have been made already in ref. 37 (see in particular sections 2.6 and 2.7), and the authors may consider putting their results better into this context in the Introduction and Discussion. It is also noteworthy to discuss that the (admittedly limited) data in ref. 37 does not indicate a "catch-bond" behavior but rather an insensitivity to force over a defined range of forces.

    4. Reviewer #3 (Public review):

      Summary:

      Several recent findings indicate that forces perpendicular to the microtubule accelerate kinesin unbinding, where perpendicular and axial forces were analyzed using the geometry in a single-bead optical trapping assay (Khataee and Howard, 2019), comparison between single-bead and dumbbell assay measurements (Pyrpassopoulos et al., 2020), and comparison of single-bead optical trap measurements with and without a DNA tether (Hensley and Yildiz, 2025).

      Here, the authors devise an assay to exert forces along the microtubule axis by tethering kinesin to the microtubule via a dsDNA tether. They compared the behavior of kinesin-1, -2, and -3 when pulling against the DNA tether. In line with previous optical trapping measurements, kinesin unbinding is less sensitive to forces when the forces are aligned with the microtubule axis. Surprisingly, the authors find that both kinesin-1 and -2 detach from the microtubule more slowly when stalled against the DNA tether than in unloaded conditions, indicating that these motors act as catch bonds in response to axial loads. Axial loads accelerate kinesin-3 detachment. However, kinesin-3 reattaches quickly to maintain forces. For all three kinesins, the authors observe weakly attached states where the motor briefly slips along the microtubule before continuing a processive run.

      Strengths:

      These observations suggest that the conventional view that kinesins act as slip bonds under load, as concluded from single-bead optical trapping measurements where perpendicular loads are present due to the force being exerted on the centroid of a large (relative to the kinesin) bead, needs to be reconsidered. Understanding the effect of force on the association kinetics of kinesin has important implications for intracellular transport, where the force-dependent detachment governs how kinesins interact with other kinesins and opposing dynein motors (Muller et al., 2008; Kunwar et al., 2011; Ohashi et al., 2018; Gicking et al., 2022) on vesicular cargoes.

      Weaknesses:

      The authors attribute the differences in the behaviour of kinesins when pulling against a DNA tether compared to an optical trap to the differences in the perpendicular forces. However, the compliance is also much different in these two experiments. The optical trap acts like a ~ linear spring with stiffness ~ 0.05 pN/nm. The dsDNA tether is an entropic spring, with negligible stiffness at low extensions and very high compliance once the tether is extended to its contour length (Fig. 1B). The effect of the compliance on the results should be addressed in the manuscript.

      Compared to an optical trapping assay, the motors are also tethered closer to the microtubule in this geometry. In an optical trap assay, the bead could rotate when the kinesin is not bound. The authors should discuss how this tethering is expected to affect the kinesin reattachment and slipping. While likely outside the scope of this study, it would be interesting to compare the static tether used here with a dynamic tether like MAP7 or the CAP-GLY domain of p150glued.

      In the single-molecule extension traces (Figure 1F-H; S3), the kinesin-2 traces often show jumps in position at the beginning of runs (e.g., the four runs from ~4-13 s in Fig. 1G). These jumps are not apparent in the kinesin-1 and -3 traces. What is the explanation? Is kinesin-2 binding accelerated by resisting loads more strongly than kinesin-1 and -3?

      When comparing the durations of unloaded and stall events (Fig. 2), there is a potential for bias in the measurement, where very long unloaded runs cannot be observed due to the limited length of the microtubule (Thompson, Hoeprich, and Berger, 2013), while the duration of tethered runs is only limited by photobleaching. Was the possible censoring of the results addressed in the analysis?

      The mathematical model is helpful in interpreting the data. To assess how the "slip" state contributes to the association kinetics, it would be helpful to compare the proposed model with a similar model with no slip state. Could the slips be explained by fast reattachments from the detached state?

    5. Author response:

      Reviewer 1 (Public review):

      (1) Figure 1B shows the PREDICTED force-extension curve for DNA based on a worm-like chain model. Where is the experimental evidence for this curve? This issue is crucial because the F-E curve will decide how and when a catch-bond is induced (if at all it is) as the motor moves against the tensiometer. Unless this is actually measured by some other means, I find it hard to accept all the results based on Figure 1B.

      The Worm-Like-Chain model for the elasticity of DNA was established by early work from the Bustamante lab (Smith et al., 1992)  and Marko and Siggia (Marko and Siggia, 1995), and was further validated and refined by the Block lab (Bouchiat et al., 1999; Wang et al., 1997). The 50 nm persistence length is the consensus value, and was shown to be independent of force and extension in Figure 3 of Bouchiat et al (Bouchiat et al., 1999). However, we would like to stress that for our conclusions, the precise details of the Force-Extension relationship of our dsDNA are immaterial. The key point is that the motor stretches the DNA and stalls when it reaches its stall force. Our claim of the catch-bond character of kinesin is based on the longer duration at stall compared to the run duration in the absence of load. Provided that the motor is indeed stalling because it has stretched out the DNA (which is strongly supported by the repeated stalling around the predicted extension corresponding to ~6 pN of force), then the stall duration depends on neither the precise value for the extension nor the precise value of the force at stall.

      (2) The authors can correct me on this, but I believe that all the catch-bond studies using optical traps have exerted a load force that exceeds the actual force generated by the motor. For example, see Figure 2 in reference 42 (Kunwar et al). It is in this regime (load force > force from motor) that the dissociation rate is reduced (catch-bond is activated). Such a regime is never reached in the DNA tensiometer study because of the very construction of the experiment. I am very surprised that this point is overlooked in this manuscript. I am therefore not even sure that the present experiments even induce a catch-bond (in the sense reported for earlier papers).

      It is true that Kunwar et al measured binding durations at super-stall loads and used that to conclude that dynein does act as a catch-bond (but kinesin does not) (Kunwar et al., 2011). However, we would like to correct the reviewer on this one. This approach of exerting super-stall forces and measuring binding durations is in fact less common than the approach of allowing the motor to walk up to stall and measuring the binding duration. This ‘fixed trap’ approach has been used to show catch-bond behavior of dynein (Leidel et al., 2012; Rai et al., 2013) and kinesin (Kuo et al., 2022; Pyrpassopoulos et al., 2020). For the non-processive motor Myosin I, a dynamic force clamp was used to keep the actin filament in place while the myosin generated a single step (Laakso et al., 2008). Because the motor generates the force, these are not superstall forces either.

      (3) I appreciate the concerns about the Vertical force from the optical trap. But that leads to the following questions that have not at all been addressed in this paper:

      (i) Why is the Vertical force only a problem for Kinesins, and not a problem for the dynein studies?

      Actually, we do not claim that vertical force is not a problem for dynein; our data do not speak to this question. There is debate in the literature as to whether dynein has catch bond behavior in the traditional single-bead optical trap geometry - while some studies have measured dynein catch bond behavior (Kunwar et al., 2011; Leidel et al., 2012; Rai et al., 2013), others have found that dynein has slip-bond or ideal-bond behavior (Ezber et al., 2020; Nicholas et al., 2015; Rao et al., 2019). This discrepancy may relate to vertical forces, but not in an obvious way.

      (ii) The authors state that "With this geometry, a kinesin motor pulls against the elastic force of a stretched DNA solely in a direction parallel to the microtubule". Is this really true? What matters is not just how the kinesin pulls the DNA, but also how the DNA pulls on the kinesin. In Figure 1A, what is the guarantee that the DNA is oriented only in the plane of the paper? In fact, the DNA could even be bending transiently in a manner that it pulls the kinesin motor UPWARDS (Vertical force). How are the authors sure that the reaction force between DNA and kinesin is oriented SOLELY along the microtubule?

      We acknowledge that “solely” is an absolute term that is too strong to describe our geometry. We will soften this term in our revision to “nearly parallel to the microtubule”. In the Geometry Calculations section of Supplementary Methods, we calculate that if the motor and streptavidin are on the same protofilament, the vertical force will be <1% of the horizontal force. We also note that if the motor is on a different protofilament, there will be lateral forces and forces perpendicular to the microtubule surface, except they are oriented toward rather than away from the microtubule. The DNA can surely bend due to thermal forces, but because inertia plays a negligible role at the nanoscale (Howard, 2001; Purcell, 1977), any resulting upward forces will only be thermal forces, which the motor is already subjected to at all times.

      (4) For this study to be really impactful and for some of the above concerns to be addressed, the data should also have included DNA tensiometer experiments with Dynein. I wonder why this was not done?

      As much as we would love to fully characterize dynein here, this paper is about kinesin and it took a substantial effort. The dynein work merits a stand-alone paper.

      While I do like several aspects of the paper, I do not believe that the conclusions are supported by the data presented in this paper for the reasons stated above.

      The three key points the reviewer makes are the validity of the worm-like-chain model, the question of superstall loads, and the role of DNA bending in generating vertical forces. We hope that we have fully addressed these concerns in our responses above.

      Reviewer #2 (Public review):

      Major comments:

      (1) The use of the term "catch bond" is misleading, as the authors do not really mean consistently a catch bond in the classical sense (i.e., a protein-protein interaction having a dissociation rate that decreases with load). Instead, what they mean is that after motor detachment (i.e., after a motor protein dissociating from a tubulin protein), there is a slip state during which the reattachment rate is higher as compared to a motor diffusing in solution. While this may indeed influence the dynamics of bidirectional cargo transport (e.g., during tug-of-war events), the used terms (detachment (with or without slip?), dissociation, rescue, ...) need to be better defined and the results discussed in the context of these definitions. It is very unsatisfactory at the moment, for example, that kinesin-3 is at first not classified as a catch bond, but later on (after tweaking the definitions) it is. In essence, the typical slip/catch bond nomenclature used for protein-protein interaction is not readily applicable for motors with slippage.

      We appreciate the reviewer’s point and we will work to streamline and define terms in our revision.

      (2) The authors define the stall duration as the time at full load, terminated by >60 nm slips/detachments. Isn't that a problem? Smaller slips are not detected/considered... but are also indicative of a motor dissociation event, i.e., the end of a stall. What is the distribution of the slip distances? If the slip distances follow an exponential decay, a large number of short slips are expected, and the presented data (neglecting those short slips) would be highly distorted.

      The reviewer brings up a good point that there may be undetected slips. To address this question, we plotted the distribution of slip distances for kinesin-3, which by far had the most slip events. As the reviewer suggested, it is indeed an exponential distribution. Our preliminary analysis suggests that roughly 20% of events are missed due to this 60 nm cutoff. This will change our unloaded duration numbers slightly, but this will not alter our conclusions.\

      (3) Along the same line: Why do the authors compare the stall duration (without including the time it took the motor to reach stall) to the unloaded single motor run durations? Shouldn't the times of the runs be included?

      The elastic force of the DNA spring is variable as the motor steps up to stall, and so if we included the entire run duration then it would be difficult to specify what force we were comparing to unloaded. More importantly, if we assume that any stepping and detachment behavior is history independent, then it is mathematically proper to take any arbitrary starting point (such as when the motor reaches stall), start the clock there, and measure the distribution of detachments durations relative to that starting point.

      More importantly, what we do in Fig. 3 is to separate out the ramps from the stalls and, using a statistical model, we compute a separate duration parameter (which is the inverse of the off-rate) for the ramp and the stall. What we find is that the relationship between ramp, stall, and unloaded durations is different for the three motors, which is interesting in itself.

      (4) At many places, it appears too simple that for the biologically relevant processes, mainly/only the load-dependent off-rates of the motors matter. The stall forces and the kind of motor-cargo linkage (e.g., rigid vs. diffusive) do likely also matter. For example: "In the context of pulling a large cargo through the viscous cytoplasm or competing against dynein in a tug-of-war, these slip events enable the motor to maintain force generation and, hence, are distinct from true detachment events." I disagree. The kinesin force at reattachment (after slippage) is much smaller than at stall. What helps, however, is that due to the geometry of being held close to the microtubule (either by the DNA in the present case or by the cargo in vivo) the attachment rate is much higher. Note also that upon DNA relaxation, the motor is likely kept close to the microtubule surface, while, for example, when bound to a vesicle, the motor may diffuse away from the microtubule quickly (e.g., reference 20).

      We appreciate the reviewer’s detailed thinking here, and we offer our perspective. As to the first point, we agree that the stall force is relevant and that the rigidity of the motor-cargo linkage will play a role. The goal of the sentence on pulling cargo that the reviewer highlights is to set up our analysis of slips, which we define as rearward displacements that don’t return to the baseline before force generation resumes. We agree that force after slippage is much smaller than at stall, and we plan to clarify that section of text. However, as shown in the model diagram in Fig. 5, we differentiate between the slip state (and recovery from this slip state) and the detached state (and reattachment from this detached state). This delineation is important because, as the reviewer points out, if we are measuring detachment and reattachment with our DNA tensiometer, then the geometry of a vesicle in a cell will be different and diffusion away from the microtubule or elastic recoil perpendicular to the microtubule will suppress this reattachment.

      Our evidence for a slip state in which the motor maintains association with the microtubule comes from optical trapping work by Tokelis et al (Toleikis et al., 2020) and Sudhakar et al (Sudhakar et al., 2021). In particular, Sudhakar used small, high index Germanium microspheres that had a low drag coefficient. They showed that during ‘slip’ events, the relaxation time constant of the bead back to the center of the trap was nearly 10-fold slower than the trap response time, consistent with the motor exerting drag on the microtubule. (With larger beads, the drag of the bead swamps the motor-microtubule friction.) Another piece of support for the motor maintaining association during a slip is work by Ramaiya et al. who used birefringent microspheres to exert and measure rotational torque during kinesin stepping (Ramaiya et al., 2017). In most traces, when the motor returned to baseline following a stall, the torque was dissipated as well, consistent with a ‘detached’ state. However, a slip event is shown in S18a where the motor slips backward while maintaining torque. This is best explained by the motor slipping backward in a state where the heads are associated with the microtubule (at least sufficiently to resist rotational forces). Thus, we term the resumption after slip to be a rescue from the slip state rather than a reattachment from the detached state.

      To finish the point, with the complex geometry of a vesicle, during slip events the motor remains associated with the microtubule and hence primed for recovery. This recovery rate is expected to be the same as for the DNA tensiometer. Following a detachment, however, we agree that there will likely be a higher probability of reattachment in the DNA tensiometer due to proximity effects, whereas with a vesicle any elastic recoil or ‘rolling’ will pull the detached motor away from the microtubule, suppressing reattachment. We plan to clarify these points in the text of the revision.

      (5) Why were all motors linked to the neck-coil domain of kinesin-1? Couldn't it be that for normal function, the different coils matter? Autoinhibition can also be circumvented by consistently shortening the constructs.

      We chose this dimerization approach to focus on how the mechoanochemical properties of kinesins vary between the three dominant transport families. We agree that in cells, autoinhibition of both kinesins and dynein likely play roles in regulating bidirectional transport, as will the activity of other regulatory proteins. The native coiled-coils may act as as ‘shock absorbers’ due to their compliance, or they might slow the motor reattachment rate due to the relatively large search volumes created by their long lengths (10s of nm). These are topics for future work. By using the neck-coil domain of kinesin-1 for all three motors, we eliminate any differences in autoinhibition or other regulation between the three kinesin families and focus solely on differences in the mechanochemistry of their motor domains.

      (6) I am worried about the neutravidin on the microtubules, which may act as roadblocks (e.g. DOI: 10.1039/b803585g), slip termination sites (maybe without the neutravidin, the rescue rate would be much lower?), and potentially also DNA-interaction sites? At 8 nM neutravidin and the given level of biotinylation, what density of neutravidin do the authors expect on their microtubules? Can the authors rule out that the observed stall events are predominantly the result of a kinesin motor being stopped after a short slippage event at a neutravidin molecule?

      We will address these points in our revision.

      (7) Also, the unloaded runs should be performed on the same microtubules as in the DNA experiments, i.e., with neutravidin. Otherwise, I do not see how the values can be compared.

      We will address this point in our revision.

      (8) If, as stated, "a portion of kinesin-3 unloaded run durations were limited by the length of the microtubules, meaning the unloaded duration is a lower limit." corrections (such as Kaplan-Meier) should be applied, DOI: 10.1016/j.bpj.2017.09.024.

      (9) Shouldn't Kaplan-Meier also be applied to the ramp durations ... as a ramp may also artificially end upon stall? Also, doesn't the comparison between ramp and stall duration have a problem, as each stall is preceded by a ramp ...and the (maximum) ramp times will depend on the speed of the motor? Kinesin-3 is the fastest motor and will reach stall much faster than kinesin-1. Isn't it obvious that the stall durations are longer than the ramp duration (as seen for all three motors in Figure 3)?

      The reviewer rightly notes the many challenges in estimating the motor off-rates during ramps. To estimate ramp off-rates and as an independent approach to calculating the unloaded and stall durations, we developed a Markov model coupled with Bayesian inference methods to estimate a duration parameter (equivalent to the inverse of the off-rate) for the unloaded, ramp, and stall duration distributions. With the ramps, we have left censoring due to the difficulty in detecting the start of the ramps in the fluctuating baseline, and we have right censoring due to reaching stall (with different censoring of the ramp duration for the three motors due to their different speeds). The Markov model assumes a constant detachment probability and history independence, and thus is robust even in the face of left and right censoring (details in the Supplementary section). This approach is preferred over Kaplan-Meier because, although these non-parametric methods make no assumptions for the distribution, they require the user to know exactly where the start time is.

      Regarding the potential underestimate of the kinesin-3 unloaded run duration due to finite microtubule lengths. The first point is that the unloaded duration data in Fig. 2C are quite linear up to 6 s and are well fit by the single-exponential fit (the points above 6s don’t affect the fit very much). The second point is that when we used our Markov model (which is robust against right censoring) to estimate the unloaded and stall durations, the results agreed with the single-exponential fits very well (Table S2). For instance, the single-exponential fit for the kinesin-3 unloaded duration was 2.74 s (2.33 – 3.17 s 95% CI) and the estimate from the Markov model was 2.76 (2.28 – 3.34 s 95% CI). Thus, we chose not to make any corrections due to finite microtubule lengths.

      (10) It is not clear what is seen in Figure S6A: It looks like only single motors (green, w/o a DNA molecule) are walking ... Note: the influence of the attached DNA onto the stepping duration of a motor may depend on the DNA conformation (stretched and near to the microtubule (with neutravidin!) in the tethered case and spherically coiled in the untethered case).

      In Figure S6A kymograph, the green traces are GFP-labeled kinesin-1 without DNA attached (which are in excess) and the red diagonal trace is a motor with DNA attached. There are also two faint horizontal red traces, which are labeled DNA diffusing by (smearing over a large area during a single frame). Panel S6B shows run durations of motors with DNA attached. We agree that the DNA conformation will differ if it is attached and stretched (more linear) versus simply being transported (random coil), but by its nature this control experiment is only addressing random coil DNA.

      (11) Along this line: While the run time of kinesin-1 with DNA (1.4 s) is significantly shorter than the stall time (3.0 s), it is still larger than the unloaded run time (1.0 s). What do the authors think is the origin of this increase?

      Our interpretation of the unloaded kinesin-DNA result is that the much slower diffusion constant of the DNA relative to the motor alone enables motors to transiently detach and rebind before the DNA cargo has diffused away, thus extending the run duration. In contrast, such detachment events for motors alone normally result in the motor diffusing away from the microtubule, terminating the run. This argument has been used to reconcile the longer single-motor run lengths in the gliding assay versus the bead assay (Block et al., 1990). Notably, this slower diffusion constant should not play a role in the DNA tensiometer geometry because if the motor transiently detaches, then it will be pulled backward by the elastic forces of the DNA and detected as a slip or detachment event. We will address this point in the revision.

      (12) "The simplest prediction is that against the low loads experienced during ramps, the detachment rate should match the unloaded detachment rate." I disagree. I would already expect a slight increase.

      Agreed. We will change this text to: “The prediction for a slip bond is that against the low loads experienced during ramps, the detachment rate should be equal to or faster than the unloaded detachment rate.”

      (13) Isn't the model over-defined by fitting the values for the load-dependence of the strong-to-weak transition and fitting the load dependence into the transition to the slip state?

      Essentially, yes, it is overdefined, but that is essentially by design and it is still very useful. Our goal here was to make as simple a model as possible that could account for the data and use it to compare model parameters for the different motor families. Ignoring the complexity of the slip and detached states, a model with a strong and weak state in the stepping cycle and a single transition out of the stepping cycle is the simplest formulation possible. And having rate constants (k<sub>S-W</sub> and k<sub>slip</sub> in our case) that vary exponentially with load makes thermodynamic sense for modeling mechanochemistry (Howard, 2001). Thus, we were pleasantly surprised that this bare-bones model could recapitulate the unloaded and stall durations for all three motors (Fig. 5C-E).

      (14) "When kinesin-1 was tethered to a glass coverslip via a DNA linker and hydrodynamic forces were imposed on an associated microtubule, kinesin-1 dissociation rates were relatively insensitive to loads up to ~3 pN, inconsistent with slip-bond characteristics (37)." This statement appears not to be true. In reference 37, very similar to the geometry reported here, the microtubules were fixed on the surface, and the stepping of single kinesin motors attached to large beads (to which defined forces were applied by hydrodynamics) via long DNA linkers was studied. In fact, quite a number of statements made in the present manuscript have been made already in ref. 37 (see in particular sections 2.6 and 2.7), and the authors may consider putting their results better into this context in the Introduction and Discussion. It is also noteworthy to discuss that the (admittedly limited) data in ref. 37 does not indicate a "catch-bond" behavior but rather an insensitivity to force over a defined range of forces.

      The reviewer misquoted our sentence. The actual wording of the sentence was: “When kinesin-1 was connected to micron-scale beads through a DNA linker and hydrodynamic forces parallel to the microtubule imposed, dissociation rates were relatively insensitive to loads up to ~3 pN, inconsistent with slip-bond characteristics (Urbanska et al., 2021).” The sentence the reviewer quoted was in a previous version that is available on BioRxiv and perhaps they were reading that version. Nonetheless, in the revision we will note in the Discussion that this behavior was indicative of an ideal bond (not a catch-bond), and we will also add a sentence in the Introduction highlighting this work.

      Reviewer #3 (Public review):

      The authors attribute the differences in the behaviour of kinesins when pulling against a DNA tether compared to an optical trap to the differences in the perpendicular forces. However, the compliance is also much different in these two experiments. The optical trap acts like a ~ linear spring with stiffness ~ 0.05 pN/nm. The dsDNA tether is an entropic spring, with negligible stiffness at low extensions and very high compliance once the tether is extended to its contour length (Fig. 1B). The effect of the compliance on the results should be addressed in the manuscript.

      This is an interesting point. To address it, we calculated the predicted stiffness of the dsDNA by taking the slope of theoretical force-extension curve in Fig. 1B. Below 650 nm extension, the stiffness is <0.001 pN/nM; it reaches 0.01 pN/nM at 855 nm, and at 960 nm where the force is 6 pN the stiffness is roughly 0.2 pN/nm. That value is higher than the quoted 0.05 pN/nm trap stiffness, but for reference, at this stiffness, an 8 nm step leads to a 1.6 pN jump in force, which is reasonable. Importantly, the stiffness of kinesin motors has been estimated to be in the range of 0.3 pN (Coppin et al., 1996; Coppin et al., 1997). Granted, this stiffness is also nonlinear, but what this means is that even at stall, our dsDNA tether has a similar predicted compliance to the motor that is pulling on it. We will address this point in our revision.  

      Compared to an optical trapping assay, the motors are also tethered closer to the microtubule in this geometry. In an optical trap assay, the bead could rotate when the kinesin is not bound. The authors should discuss how this tethering is expected to affect the kinesin reattachment and slipping. While likely outside the scope of this study, it would be interesting to compare the static tether used here with a dynamic tether like MAP7 or the CAP-GLY domain of p150glued.

      Please see our response to Reviewer #2 Major Comment #4 above, which asks this same question in the context of intracellular cargo. We plan to address this in our revision. Regarding a dynamic tether, we agree that’s interesting – there are kinesins that have a second, non-canonical binding site that achieves this tethering (ncd and Cin8); p150glued likely does this naturally for dynein-dynactin-activator complexes; and we speculated in a review some years ago (Hancock, 2014) that during bidirectional transport kinesin and dynein may act as dynamic tethers for one another when not engaged, enhancing the activity of the opposing motor.

      In the single-molecule extension traces (Figure 1F-H; S3), the kinesin-2 traces often show jumps in position at the beginning of runs (e.g., the four runs from ~4-13 s in Fig. 1G). These jumps are not apparent in the kinesin-1 and -3 traces. What is the explanation? Is kinesin-2 binding accelerated by resisting loads more strongly than kinesin-1 and -3?

      Due to the compliance of the dsDNA, the 95% limits for the initial attachment position are +/- 290 nm (Fig. S2). Thus, some apparent ‘jumps’ from the detached state are expected. We will take a closer look at why there are jumps for kinesin-2 that aren’t apparent for kinesin-1 or -3.

      When comparing the durations of unloaded and stall events (Fig. 2), there is a potential for bias in the measurement, where very long unloaded runs cannot be observed due to the limited length of the microtubule (Thompson, Hoeprich, and Berger, 2013), while the duration of tethered runs is only limited by photobleaching. Was the possible censoring of the results addressed in the analysis?

      Yes. Please see response to Reviewer #2 points (8) and (9) above.

      The mathematical model is helpful in interpreting the data. To assess how the "slip" state contributes to the association kinetics, it would be helpful to compare the proposed model with a similar model with no slip state. Could the slips be explained by fast reattachments from the detached state?

      In the model, the slip state and the detached states are conceptually similar; they only differ in the sequence (slip to detached) and the transition rates into and out of them. The simple answer is: yes, the slips could be explained by fast reattachments from the detached state. In that case, the slip state and recovery could be called a “detached state with fast reattachment kinetics”. However, the key data for defining the kinetics of the slip and detached states is the distribution of Recovery times shown in Fig. 4D-F, which required a triple exponential to account for all of the data. If we simplified the model by eliminating the slip state and incorporating fast reattachment from a single detached state, then the distribution of Recovery times would be a single-exponential with a time constant equivalent to t<sub>1</sub>, which would be a poor fit to the experimental distributions in Fig. 4D-F.

      We appreciate the efforts and helpful suggestions of all three reviewers and the Editor.

      References:

      Block, S.M., L.S. Goldstein, and B.J. Schnapp. 1990. Bead movement by single kinesin molecules studied with optical tweezers. Nature. 348:348-352.

      Bouchiat, C., M.D. Wang, J. Allemand, T. Strick, S.M. Block, and V. Croquette. 1999. Estimating the persistence length of a worm-like chain molecule from force-extension measurements. Biophys J. 76:409-413.

      Coppin, C.M., J.T. Finer, J.A. Spudich, and R.D. Vale. 1996. Detection of sub-8-nm movements of kinesin by high-resolution optical-trap microscopy. Proc Natl Acad Sci U S A. 93:1913-1917.

      Coppin, C.M., D.W. Pierce, L. Hsu, and R.D. Vale. 1997. The load dependence of kinesin's mechanical cycle. Proc Natl Acad Sci U S A. 94:8539-8544.

      Ezber, Y., V. Belyy, S. Can, and A. Yildiz. 2020. Dynein Harnesses Active Fluctuations of Microtubules for Faster Movement. Nat Phys. 16:312-316.

      Hancock, W.O. 2014. Bidirectional cargo transport: moving beyond tug of war. Nat Rev Mol Cell Biol. 15:615-628.

      Howard, J. 2001. Mechanics of Motor Proteins and the Cytoskeleton. Sinauer Associates, Inc., Sunderland, MA. 367 pp.

      Kunwar, A., S.K. Tripathy, J. Xu, M.K. Mattson, P. Anand, R. Sigua, M. Vershinin, R.J. McKenney, C.C. Yu, A. Mogilner, and S.P. Gross. 2011. Mechanical stochastic tug-of-war models cannot explain bidirectional lipid-droplet transport. Proc Natl Acad Sci U S A. 108:18960-18965.

      Kuo, Y.W., M. Mahamdeh, Y. Tuna, and J. Howard. 2022. The force required to remove tubulin from the microtubule lattice by pulling on its alpha-tubulin C-terminal tail. Nature communications. 13:3651.

      Laakso, J.M., J.H. Lewis, H. Shuman, and E.M. Ostap. 2008. Myosin I can act as a molecular force sensor. Science. 321:133-136.

      Leidel, C., R.A. Longoria, F.M. Gutierrez, and G.T. Shubeita. 2012. Measuring molecular motor forces in vivo: implications for tug-of-war models of bidirectional transport. Biophys J. 103:492-500.

      Marko, J.F., and E.D. Siggia. 1995. Stretching DNA. Macromolecules. 28:8759-8770.

      Nicholas, M.P., F. Berger, L. Rao, S. Brenner, C. Cho, and A. Gennerich. 2015. Cytoplasmic dynein regulates its attachment to microtubules via nucleotide state-switched mechanosensing at multiple AAA domains. Proc Natl Acad Sci U S A. 112:6371-6376.

      Purcell, E.M. 1977. Life at low Reynolds Number. Amer J. Phys. 45:3-11.

      Pyrpassopoulos, S., H. Shuman, and E.M. Ostap. 2020. Modulation of Kinesin's Load-Bearing Capacity by Force Geometry and the Microtubule Track. Biophys J. 118:243-253.

      Rai, A.K., A. Rai, A.J. Ramaiya, R. Jha, and R. Mallik. 2013. Molecular adaptations allow dynein to generate large collective forces inside cells. Cell. 152:172-182.

      Ramaiya, A., B. Roy, M. Bugiel, and E. Schaffer. 2017. Kinesin rotates unidirectionally and generates torque while walking on microtubules. Proc Natl Acad Sci U S A. 114:10894-10899.

      Rao, L., F. Berger, M.P. Nicholas, and A. Gennerich. 2019. Molecular mechanism of cytoplasmic dynein tension sensing. Nature communications. 10:3332.

      Smith, S.B., L. Finzi, and C. Bustamante. 1992. Direct mechanical measurements of the elasticity of single DNA molecules by using magnetic beads. Science. 258:1122-1126.

      Sudhakar, S., M.K. Abdosamadi, T.J. Jachowski, M. Bugiel, A. Jannasch, and E. Schaffer. 2021. Germanium nanospheres for ultraresolution picotensiometry of kinesin motors. Science. 371.

      Toleikis, A., N.J. Carter, and R.A. Cross. 2020. Backstepping Mechanism of Kinesin-1. Biophys J. 119:1984-1994.

      Urbanska, M., A. Ludecke, W.J. Walter, A.M. van Oijen, K.E. Duderstadt, and S. Diez. 2021. Highly-Parallel Microfluidics-Based Force Spectroscopy on Single Cytoskeletal Motors. Small. 17:e2007388.

      Wang, M.D., H. Yin, R. Landick, J. Gelles, and S.M. Block. 1997. Stretching DNA with optical tweezers. Biophys J. 72:1335-1346.

    1. eLife Assessment

      This study provides a valuable insight into how the medial and lateral entorhinal cortices interact through distinct excitatory and inhibitory pathways. Using anatomical tracing, optogenetics, and electrophysiology, the authors show that glutamatergic medial entorhinal neurons provide broad excitatory input to lateral entorhinal, while long-range SST+ interneurons deliver selective inhibition to layer I. These findings reveal a novel layer- and cell-type-specific organization of medial to lateral entorhinal connectivity with implications for spatial and episodic memory. The work is solid, but validation of injection specificity and viral spread is needed to fully confirm the anatomical interpretations; with these clarifications, this will be a significant contribution to understanding entorhinal-hippocampal circuit organization.

    2. Reviewer #1 (Public review):

      The study addresses the organisation of synaptic connections from the medial to the lateral entorhinal cortex. Classic anatomical work has suggested these connections exist, but very little is known about their identity or functional impact. The manuscript argues that these projections are mediated by glutamatergic neurons, providing excitatory input from MEC to all layers of LEC, and by SST+ve interneurons sending inhibitory projections to L1 of LEC. This appears to be the most likely interpretation of the data, although in my opinion, more could be done to rule out the possible impact of the spread of the virus/tracer from the injection site.

      While this concern might seem overly picky, the importance of this level of detail is nicely shown by the authors' previous work clarifying connectivity from postrhinal to entorhinal cortices through careful analysis of similar types of data (Doan et al. 2019). If additional analyses/data can address the concern here, then I think this will be an important set of fundamental results that will influence thinking about circuit mechanisms for spatial cognition and episodic memory. In particular, it will nicely add to an emerging view that MEC and LEC can interact directly, showing that the organisation of these interactions is asymmetric and identifying a potentially interesting long-range inhibitory pathway.

    3. Reviewer #2 (Public review):

      Summary:

      The manuscript by Nilssen et al. presents a comprehensive study of the circuitry linking the medial and lateral entorhinal cortices (MEC and LEC). Using a combination of anatomical tracing, optogenetics, and in vitro electrophysiology, the authors convincingly demonstrate that the MEC sends both glutamatergic and long-range inhibitory SST+ GABAergic projections to the LEC, with distinct laminar and cell-type-specific targeting. Notably, they reveal that SST+ inhibitory projections selectively suppress the activity of layer IIa neurons, whereas excitatory inputs preferentially engage neurons in layers IIb and III, thereby differentially modulating hippocampal-projecting populations.

      Strengths:

      The experiments are carefully executed, the results are compelling, and the conclusions are well supported by the data. This work will be of broad interest to researchers studying memory circuits, cortical inhibition, and the organization of long-range connectivity.

      Weaknesses:

      Although the in vivo relevance of these connections remains to be determined, this is an important and timely contribution to our understanding of entorhinal-hippocampal interactions.

    4. Author response:

      Reviewer #1:

      The issue on validation of injection sites and viral spread is an important one, and we are fully aware of the risks associated with an incomplete assessment. Note that in the supplementary material, section on ‘Brain area identification’ we write the following: ‘In all neuroanatomical tracing experiments, correct placement of tracer injections into the four different areas (MEC, PER, PIR and LEC) was carefully evaluated based on known cytoarchitectonic features (see below). Electrophysiological experiments were initiated after our neuroanatomical experiments had verified the correct surgery coordinates for interrogating pathways to LEC from MEC, PIR, PER and cLEC. In patch-clamp experiments, viral injections were considered to hit the intended target area whenever the axonal innervation patterns in LEC were consistent with the patterns obtained in our neuroanatomical tracing experiments. To ensure that our injections were placed in MEC, without unintended spread to LEC, we examined the innervation patterns in DG.

      In agreement with the current understanding of entorhinal innervation of DG in rodents (Steward, 1976; van Groen et al., 2003), injections targeting MEC or LEC resulted in axonal labelling in the middle one-third or outer one-third of the molecular layer of DG, respectively. Cases where the injection had clearly spread to LEC, evident from the laminar distribution of labelling in DG and labelled cell bodies in LEC, were excluded from analysis.’

      In our view this provides sufficient security that we did not by mistake included intrinsic LEC projections into our dataset. In the result section, we addressed this issue as well by stating that: ‘We carefully checked all sections at and close to the levels we used for our experiments and did not observe any virally labelled neurons in LEC.’ In case of electrophysiological experiments, one normally does not secure whole brain material to exclude viral spread, but since for each animal we did record from multiple adjacent thick slices and in none did we find indications of including LEC. Finally, we included an analysis of SST projections originating from LEC (suppl Figure 1). As can be seen from panel C the local SST axonal pattern in LEC is markedly different form that seen following an injection in MEC. We aim to provide additional supplementary detail of this and include that in the text of the revised version.

      Reviewer #2:

      The remark that the in vivo relevance of these connections remains to be determined is absolutely correct and in the discussion we only speculated on this, since we currently do not have functional data of sufficient quality to address this. However, in an earlier version of the paper, still accessible on bioRxiv (https://biorxiv.org/cgi/content/short/2022.11.29.518323v1), we did include data on changes in expression of the immediate early gene cFos in LEC layer IIa cells upon manipulation of the SST projections from MEC within the context of conspecific memory. These data resulted in a non-significant trend, but we do not have the time, nor the financial means to extent that dataset. Therefore we cannot revise the paper in this respect.

    1. eLife Assessment

      This valuable clinical trial compares the impact of dolutegravir intensification on longitudinal measures of total HIV DNA and day 84 measures of intact HIV DNA. The trial was well-designed, and the paper is easy to read and provides hypothesis generation-level evidence that treatment intensification might decrease intact HIV DNA level in some people after 3 months. The findings are solid, with significant limitations being that study endpoints and hypotheses were not precisely defined prior to the trial, and that effect size is limited and inconsistent across trial participants.

    2. Reviewer #1 (Public review):

      Fombellida-Lopez and colleagues describe the results of an ART intensification trial in people with HIV infection (PWH) on suppressive ART to determine the effect of increasing the dose of one ART drug, dolutegravir, on viral reservoirs, immune activation, exhaustion, and circulating inflammatory markers. The authors hypothesize that ART intensification will provide clues about the degree to which low-level viral replication is occurring in circulation and in tissues despite ongoing ART, which could be identified if reservoirs decrease and/or if immune biomarkers change. The trial design is straightforward and well-described, and the intervention appears to have been well tolerated. The investigators observed an increase in dolutegravir concentrations in circulation, and to a lesser degree in tissues, in the intervention group, indicating that the intervention has functioned as expected (ART has been intensified in vivo). Several outcome measures changed during the trial period in the intervention group, leading the investigators to conclude that their results provide strong evidence of ongoing replication on standard ART. The results of this small trial are intriguing, and a few observations in particular are hypothesis-generating and potentially justify further clinical trials to explore them in depth.

    3. Reviewer #2 (Public review):

      Summary:

      An intensification study with a double dose of 2nd generation integrase inhibitor with a background of nucleoside analog inhibitors of the HIV retrotranscriptase in 2, and inflammation is associated with the development of co-morbidities in 20 individuals randomized with controls, with an impact on the levels of viral reservoirs and inflammation markers. Viral reservoirs in HIV are the main impediment to an HIV cure, and inflammation is associated with co-morbidities.

      Strengths:

      The intervention that leads to a decrease of viral reservoirs and inflammation is quite straightforward forward as a doubling of the INSTI is used in some individuals with INSTI resistance, with good tolerability.

      This is a very well documented study, both in blood and tissues, which is a great achievement due to the difficulty of body sampling in well-controlled individuals on antiretroviral therapy. The laboratory assays are performed by specialists in the field with state-of-the art quantification assays. Both the introduction and the discussion are remarkably well presented and documented.

      The findings also have a potential impact on the management of chronic HIV infection.

    4. Reviewer #3 (Public review):

      The introduction does a very good job of discussing the issue around whether there is ongoing replication in people with HIV on antiretroviral therapy. Sporadic, non-sustained replication likely occurs in many PWH on ART related to adherence, drug-drug interactions and possibly penetration of antivirals into sanctuary areas of replication and as the authors point out proving it does not occur is likely not possible and proving it does occur is likely very dependent on the population studied and the design of the intervention. Whether the consequences of this replication in the absence of evolution toward resistance have clinical significance challenging question to address.

      It is important to note that INSTI-based therapy may have a different impact on HIV replication events that results in differences in virus release for specific cell type (those responsible for "second phase" decay) by blocking integration in cells that have completed reverse transcription prior to ART initiation but have yet to be fully activated. In a PI or NNRTI-based regimen, those cells will release virus, whereas with an INSTI-based regimen, they will not.

      Given the very small sample size, there is a substantial risk of imbalance between the groups in important baseline measures.

      Comments on the revised version from the editor:

      I appreciate that the authors thoroughly address the reviewer's concerns in the response letter. Most importantly, they acknowledge that "The absence of a pre-specified statistical endpoint or sample size calculation reflects the exploratory nature of the trial." This is vital because the transient impact on total HIV DNA in the intensified versus standard dose arm raises questions about any sustained or meaningful anti-reservoir effect and was also not hypothesized a priori. The authors explanation that HIV DNA may have rebounded due to clonal expansion is interesting but not assessed directly in the trial.

      The greater decrease in intact HIV DNA between days 0 and 84 in the intensified arm are notable but are somewhat limited by small sample size, small effect size and lack of data between these two timepoints.

      Unfortunately, the hypothesis generating nature of the conclusions which is outlined nicely in the author's response letter is only acknowledged in the discussion of the revised paper. The abstract and results are only marginally different than the original version and still read as definitive when the evidence is only hypothesis generating. For these reasons, the level of evidence remains incomplete as before.

    5. Author response:

      Reviewer #1 (Public review):

      Fombellida-Lopez and colleagues describe the results of an ART intensification trial in people with HIV infection (PWH) on suppressive ART to determine the effect of increasing the dose of one ART drug, dolutegravir, on viral reservoirs, immune activation, exhaustion, and circulating inflammatory markers. The authors hypothesize that ART intensification will provide clues about the degree to which low-level viral replication is occurring in circulation and in tissues despite ongoing ART, which could be identified if reservoirs decrease and/or if immune biomarkers change. The trial design is straightforward and well-described, and the intervention appears to have been well tolerated. The investigators observed an increase in dolutegravir concentrations in circulation, and to a lesser degree in tissues, in the intervention group, indicating that the intervention has functioned as expected (ART has been intensified in vivo). Several outcome measures changed during the trial period in the intervention group, leading the investigators to conclude that their results provide strong evidence of ongoing replication on standard ART. The results of this small trial are intriguing, and a few observations in particular are hypothesis-generating and potentially justify further clinical trials to explore them in depth. However, I am concerned about over-interpretation of results that do not fully justify the authors' conclusions.

      We thank Reviewer #1 for their thoughtful and constructive comments, which helped us clarify and improve the manuscript. Below, we address each of the reviewer’s points and describe the changes that we implemented in the revised version. We acknowledge the reviewer’s concern regarding potential overinterpretation of certain findings, and in the revised version we took particular care to ensure that all conclusions are supported by the data and framed within the exploratory nature of the study.

      (1) Trial objectives: What was the primary objective of the trial? This is not clearly stated. The authors describe changes in some reservoir parameters and no changes in others. Which of these was the primary outcome? No a priori hypothesis / primary objective is stated, nor is there explicit justification (power calculations, prior in vivo evidence) for the small n, unblinded design, and lack of placebo control. In the abstract (line 36, "significant decreases in total HIV DNA") and conclusion (lines 244-246), the authors state that total proviral DNA decreased as a result of ART intensification. However, in Figures 2A and 2E (and in line 251), the authors indicate that total proviral DNA did not change. These statements are confusing and appear to be contradictory. Regarding the decrease in total proviral DNA, I believe the authors may mean that they observed transient decrease in total proviral DNA during the intensification period (day 28 in particular, Figure 2A), however this level increases at Day 56 and then returns to baseline at Day 84, which is the source of the negative observation. Stating that total proviral DNA decreased as a result of the intervention when it ultimately did not is misleading, unless the investigators intended the day 28 timepoint as a primary endpoint for reservoir reduction - if so, this is never stated, and it is unclear why the intervention would then be continued until day 84? If, instead, reservoir reduction at the end of the intervention was the primary endpoint (again, unstated by the authors), then it is not appropriate to state that the total proviral reservoir decreased significantly when it did not.

      We agree with the reviewer that the primary objective of the study was not explicitly stated in the submitted manuscript. We clarified this in the revised manuscript (lines 361-364). As registered on ClinicalTrials.gov (NCT05351684), the primary outcome was defined as “To evaluate the impact of treatment intensification at the level of total and replication-competent reservoir (RCR) in blood and in tissues”, with a time frame of 3 months. Accordingly, our aim was to explore whether any measurable reduction in the HIV reservoir (total or replication-competent) occurred during the intensification period, including at day 28, 56, or 84. The protocol did not prespecify a single time point for this effect to occur, and the exploratory design allowed for detection of transient or sustained changes within the intensification window.

      We recognize that this scope was not clearly articulated in the original text and may have led to confusion in interpreting the transient drop in total HIV DNA observed at day 28. While total DNA ultimately returned to baseline by the end of intensification, the presence of a transient reduction during this 3-month window still fits within the framework of the study’s registered objective. Moreover, although the change in total HIV DNA was transient, it aligns with the consistent direction of changes observed across the multiple independent measures, including CA HIV RNA, RNA/DNA ratio and intact HIV DNA, collectively supporting a biological effect of intensification.

      We would also like to stress that this is the first clinical trial ever, in which an ART intensification is performed not by adding an extra drug but by increasing the dosage of an existing drug. Therefore, we were more interested in the overall, cumulative, effect of intensification throughout the entire trial period, than in differences between groups at individual time points. We clarified in the revised manuscript that this was a proof-of-concept phase 2 study, designed to reveal biological effects of ART intensification rather than confirm efficacy in a powered comparison. The absence of a prespecified statistical endpoint or sample size calculation reflects the exploratory nature of the trial.

      (2) Intervention safety and tolerability: The results section lacks a specific heading for participant safety and tolerability of the intervention. I was wondering about clinically detectable viremia in the study. Were there any viral blips? Was the increased DTG well tolerated? This drug is known to cause myositis, headache, CPK elevation, hepatotoxicity, and headache. Were any of these observed? What is the authors' interpretation of the CD4:8 ratio change (line 198)? Is this a significant safety concern for a longer duration of intensification? Was there also a change in CD4% or only in absolute counts? Was there relative CD4 depletion observed in the rectal biopsy samples between days 0 and 84? Interestingly, T cells dropped at the same timepoints that reservoirs declined... how do the authors rule out that reservoir decline reflects transient T cell decline that is non-specific (not due to additional blockade of replication)?

      We improved the Methods section to clarify how safety and tolerability were assessed during the study (lines 389-396). Safety evaluations were conducted on day 28 and day 84 and included a clinical examination and routine laboratory testing (liver function tests, kidney function, and complete blood count). Medication adherence was also monitored through pill counts performed by the study nurses.

      No virological blips above 50 copies/mL were observed and no adverse events were reported by participants during the 3-month intensification period. Although CPK levels were not included in the routine biological monitoring, no participant reported muscle pain or other symptoms suggestive of muscle toxicity.

      The CD4:CD8 ratio decrease noted during intensification was not associated with significant changes in absolute CD4 or CD8 counts, as shown in Figure 5. We interpret this ratio change as a transient redistribution rather than an immunological risk, therefore we do not consider it to represent a safety concern.

      We would like to clarify that CD4⁺ T-cell counts did not significantly decrease in any of the treatment groups, as shown in Figure 5. The apparent decline observed concerns the CD4/CD8 ratio, which transiently dropped, but not the absolute number of CD4⁺ T cells. Moreover, although the dynamics of total HIV DNA is indeed similar to that of CD4/CD8 ratio (both declined transiently and then returned to baseline by day 84), the dynamics of unspliced RNA and unspliced RNA/total DNA ratio are clearly different, as these markers demonstrated a sustained decrease that was maintained throughout the trial period, even when the CD4/CD8 ratio already returned to baseline. Also, we observed a significant decrease in intact HIV DNA at day 84 compared to day 0. These effects cannot be easily explained by a transient decline in CD4+ cells.

      (3) The investigators describe a decrease in intact proviral DNA after 84 days of ART intensification in circulating cells (Figure 2D), but no changes to total proviral DNA in blood or tissue (Figures 2A and 2E; IPDA does not appear to have been done on tissue samples). It is not clear why ART intensification would result in a selective decrease in intact proviruses and not in total proviruses if the source of these reservoir cells is due to ongoing replication. These reservoir results have multiple interpretations, including (but not limited to) the investigators' contention that this provides strong evidence of ongoing replication. However, ongoing replication results in the production of both intact and mutated/defective proviruses that both contribute to reservoir size (with defective proviruses vastly outnumbering intact proviruses). The small sample size and well-described heterogeneity of the HIV reservoir (with regard to overall size and composition) raise the possibility that the study was underpowered to detect differences over the 84-day intervention period. No power calculations or prior studies were described to justify the trial size or the duration of the intervention. Readers would benefit from a more nuanced discussion of reservoir changes observed here.

      We sincerely thank the reviewer for this insightful comment. We fully agree that the reservoir dynamics observed in our study might raise several possible interpretations, and that its complexity, resulting from continuous cycles of expansion and contraction, reflects the heterogeneity of the latent reservoir. 

      Total HIV DNA in PBMCs showed a transient decline during intensification (notably at day 28), ultimately returning to baseline by day 84. This biphasic pattern likely reflects the combined effects of suppression of ongoing low-level replication by an increased DTG dosage, followed by the expansion of infected cell clones (mostly harbouring defective proviruses). In other words, the transient decrease in total (intact + defective) DNA at day 28 may be due to an initial decrease in newly infected cells upon ART intensification, however at the subsequent time points this effect was masked by proliferation (clonal expansion) of infected cells with defective proviruses. Recent studies suggest that intact and defective proviruses are subjected to different selection pressures by the immune system on ART (PMID: 38337034) and their decay on therapy is different (intact proviruses are cleared much more rapidly than defectives). In addition, defective proviruses can be preferentially expanded as they can reprogram the host cell proliferation machinery (https://doi.org/10.1101/2025.09.22.676989). This explains why in our study the intact proviruses decreased, but the total proviruses did not change, between days 0 and 84, in the intensification group. Interestingly, in the control group, we observed a significant increase in total DNA at day 84 compared to day 0, with no difference for the intact DNA, which is also in line with the clonal expansion of defective proviruses.

      Importantly, we observed a significant decrease in intact proviral DNA between day 0 and day 84 in the intensification group (Figure 2D). This result directly addresses the study’s primary objective: assessing the impact of intensification on the replication-competent reservoir. In comparison, as the reviewer rightly points out, total HIV DNA includes over 90% defective genomes, which limits its interpretability as a biomarker of biologically relevant reservoir changes. In addition, other reservoir markers, such as cell-associated unspliced RNA and RNA/DNA ratios, also showed consistent trends supporting a biologically relevant effect of intensification. Even in the absence of sustained changes in total HIV DNA, the coherence across the different independent measures of the reservoir (intact DNA, unspliced RNA), suggests an effect indicative of ongoing replication pre-intensification.

      Regarding tissue reservoirs, the lack of substantial change in total HIV DNA between days 0 and 84 is also in line with the predominance of defective sequences in these compartments. Moreover, the limited increase in rectal tissue dolutegravir levels during intensification (from 16.7% to 20% of plasma concentrations) may have limited the efficacy of the intervention in this site.

      As for the IPDA on rectal biopsies, we attempted the assay using two independent DNA extraction methods (Promega Reliaprep and Qiagen Puregene), but both yielded high DNA shearing index values, and intact proviral detection was successful in only 3 of 40 samples. Given the poor DNA integrity, these results were not interpretable.

      That said, we fully acknowledge the limitations of our study, especially the small sample size, and we agree with the reviewer that caution is needed when interpreting these findings. In the revised manuscript, we adopted a more measured tone in the discussion (lines 340-346), stating that these observations are exploratory and hypothesis-generating, and require confirmation in larger, more powered studies. Nonetheless, we believe that the convergence of multiple reservoir markers pointing in the same direction constitutes a meaningful biological effect that deserves further investigation.

      (4) While a few statistically significant changes occurred in immune activation markers, it is not clear that these are biologically significant. Lines 175-186 and Figure 3: The change in CD4 cells + for TIGIT looks as though it declined by only 1-2%, and at day 84, the confidence interval appears to widen significantly at this timepoint, spanning an interquartile range of 4%. The only other immune activation/exhaustion marker change that reached statistical significance appears to be CD8 cells + for CD38 and HLA-DR, however, the decline appears to be a fraction of a percent, with the control group trending in the same direction. Despite marginal statistical significance, it is not clear there is any biological significance to these findings; Figure S6 supports the contention that there is no significant change in these parameters over time or between groups. With most markers showing no change and these two showing very small changes (and the latter moving in the same direction as the control group), these results do not justify the statement that intensifying DTG decreases immune activation and exhaustion (lines 38-40 in the abstract and elsewhere).

      We agree with the reviewer that the observed changes in immune activation and exhaustion markers were modest. We revised the abstract and the manuscript text (including a section header) to reflect this more accurately (lines 39, 175, 185, 253). We noted that these differences, while statistically significant (e.g., in TIGIT+ CD4+ T cells and CD38+HLA-DR+ CD8+ T cells), were limited in magnitude. We explicitly acknowledged these limitations and interpreted the findings with appropriate caution.

      (5) There are several limitations of the study design that deserve consideration beyond those discussed at line 327. The study was open-label and not placebo-controlled, which may have led to some medication adherence changes that confound results (authors describe one observation that may be evidence of this; lines 146-148). Randomized/blinded / cross-over design would be more robust and help determine signal from noise, given relatively small changes observed in the intervention arm.There does not seem to be a measurement of key outcome variables after treatment intensification ceased - evidence of an effect on replication through ART intensification would be enhanced by observing changes once intensification was stopped. Why was intensification maintained for 84 days? More information about the study duration would be helpful. Table 1 indicates that participants were 95% male. Sex is known to be a biological variable, particularly with regard to HIV reservoir size and chronic immune activation in PWH. Worldwide, 50% of PWH are women. Research into improving management/understanding of disease should reflect this, and equal participation should be sought in trials. Table 1 shows differing baseline reservoir sizes between the control and intervention groups. This may have important implications, particularly for outcomes where reservoir size is used as the denominator.

      We expanded the limitations section to address several key aspects raised by the reviewer: the absence of blinding and placebo control, the predominantly male study population, and the lack of postintervention follow-up. While we acknowledge that open-label designs can introduce behavioural biases, including potential changes in adherence, we now explicitly state that placebo-controlled, blinded trials would provide a more robust assessment and are warranted in future research (lines 340346). 

      The 84-day duration of intensification was chosen based on previous studies and provided sufficient time for observing potential changes in viral transcription and reservoir dynamics. However, we agree that including post-intervention follow-up would have strengthened the conclusions, and we highlighted this limitation and future direction in the revised manuscript (lines 340-346). 

      The sex imbalance is now clearly acknowledged as a limitation in the revised manuscript, and we fully support ongoing efforts to promote equitable recruitment in HIV research. We would like to add that, in our study, rectal biopsies were coupled with anal cancer screening through HPV testing. This screening is specifically recommended for younger men who have sex with men (MSM), as outlined in the current EACS guidelines (see: https://eacs.sanfordguide.com/eacs part2/cancer/cancerscreening-methods). As a result, MSM participants had both a clinical incentive and medical interest to undergo this procedure, which likely contributed to the higher proportion of male participants in the study.

      Lastly, although baseline total HIV DNA was higher in the intensified group, our statistical approach is based on a within-subject (repeated-measures) design, in which the longitudinal change of a parameter within the same participant during the study was the main outcome. In other words, we are not comparing absolute values of any marker between the groups, we are looking at changes of parameters from baseline within participants, and these are not expected to be affected by baseline imbalances.

      (6) Figure 1: the increase in DTG levels is interesting - it is not uniform across participants. Several participants had lower levels of DTG at the end of the intervention. Though unlikely to be statistically significant, it would be interesting to evaluate if there is a correlation between change in DTG concentrations and virologic / reservoir / inflammatory parameters. A positive relationship between increasing DTG concentration and decreased cell-associated RNA, for example, would help support the hypothesis that ongoing replication is occurring.

      We agree with the reviewer that assessing correlations between DTG concentrations and virological, immunological, or inflammatory markers would be highly informative. In fact, we initially explored this question in a preliminary way by examining whether individuals who showed a marked increase in DTG levels after intensification also demonstrated stronger changes in the viral reservoir. While this exploratory analysis did not reveal any clear associations, we would like to emphasize that correlating biological effects with DTG concentrations measured at a single timepoint may have limited interpretability. A more comprehensive understanding of the relationship between drug exposure and reservoir dynamics would ideally require multiple pharmacokinetic measurements over time, including pre-intensification baselines. This is particularly important given that DTG concentrations vary across individuals and over time, depending on adherence, metabolism, and other individual factors.

      (7) Figure 2: IPDA in tissue- was this done? scRNA in blood (single copy assay) - would this be expected to correlate with usCaRNA? The most unambiguous result is the decrease in cell-associated RNA - accompanying results using single-copy assay in plasma would be helpful to bolster this result.

      As mentioned in our response to point 3, we attempted IPDA on tissue samples, but technical limitations prevented reliable detection of intact proviruses. Regarding residual viremia, we did perform ultra-sensitive plasma HIV RNA quantification but due to a technical issue (an inadvertent PBMC contamination during plasma separation) that affected the reliability of the results we felt uncomfortable including these data in the manuscript.

      The use of the US RNA / Total DNA ratio is not helpful/difficult to interpret since the control and intervention arms were unmatched for total DNA reservoir size at study entry.

      We respectfully disagree with this comment. The US RNA/total DNA ratio is commonly used to assess the relative transcriptional activity of the viral reservoir, rather than its absolute size. While we acknowledge that the total HIV-1 DNA levels differed at baseline between the two groups, the US RNA/total DNA ratio specifically reflects the relationship between transcriptional activity and reservoir size within each individual, and is therefore not directly confounded by baseline differences in total DNA alone.

      Moreover, our analyses focus on within-subject longitudinal changes from baseline, not on direct between-group comparisons of absolute marker values. As such, the observed changes in the US RNA/total DNA ratio over time are interpreted relative to each participant's baseline, mitigating concerns related to baseline imbalances between groups.

      Reviewer #2 (Public review):

      Summary:

      An intensification study with a double dose of 2nd generation integrase inhibitor with a background of nucleoside analog inhibitors of the HIV retrotranscriptase in 2, and inflammation is associated with the development of co-morbidities in 20 individuals randomized with controls, with an impact on the levels of viral reservoirs and inflammation markers. Viral reservoirs in HIV are the main impediment to an HIV cure, and inflammation is associated with co-morbidities.

      Strengths:

      The intervention that leads to a decrease of viral reservoirs and inflammation is quite straightforward forward as a doubling of the INSTI is used in some individuals with INSTI resistance, with good tolerability.

      This is a very well documented study, both in blood and tissues, which is a great achievement due to the difficulty of body sampling in well-controlled individuals on antiretroviral therapy. The laboratory assays are performed by specialists in the field with state-of-the art quantification assays. Both the introduction and the discussion are remarkably well presented and documented.

      The findings also have a potential impact on the management of chronic HIV infection.

      Weaknesses:

      I do not think that the size of the study can be considered a weakness, nor the fact that it is open-label either.

      We thank Reviewer #2 for their constructive and supportive comments. We appreciate their positive assessment of the study design, the translational relevance of the intervention, and the technical quality of the assays. We also take note of their perspective regarding sample size and study design, which supports our positioning of this trial as an exploratory, hypothesis-generating phase 2 study.

      Reviewer #3 (Public review):

      The introduction does a very good job of discussing the issue around whether there is ongoing replication in people with HIV on antiretroviral therapy. Sporadic, non-sustained replication likely occurs in many PWH on ART related to adherence, drug-drug interactions and possibly penetration of antivirals into sanctuary areas of replication and as the authors point out proving it does not occur is likely not possible and proving it does occur is likely very dependent on the population studied and the design of the intervention. Whether the consequences of this replication in the absence of evolution toward resistance have clinical significance challenging question to address.

      It is important to note that INSTI-based therapy may have a different impact on HIV replication events that results in differences in virus release for specific cell type (those responsible for "second phase" decay) by blocking integration in cells that have completed reverse transcription prior to ART initiation but have yet to be fully activated. In a PI or NNRTI-based regimen, those cells will release virus, whereas with an INSTI-based regimen, they will not.

      Given the very small sample size, there is a substantial risk of imbalance between the groups in important baseline measures. Unfortunately, with the small sample size, a non-significant P value is not helpful when comparing baseline measures between groups. One suggestion would be to provide the full range as opposed to the inter-quartile range (essentially only 5 or 6 values). The authors could also report the proportion of participants with baseline HIV RNA target not detected in the two groups.

      We thank Reviewer #3 for their thoughtful and balanced review. We are grateful for the recognition of the strength of the Introduction, the complexity of evaluating residual replication, and the technical execution of the assays. We also appreciate the insightful suggestions for improving the clarity and transparency of our results and discussion.

      We revised the manuscript to address several of the reviewer’s key concerns. We agree that the small sample size increases the risk of baseline imbalances. We acknowledged these limitations in the manuscript (lines 327-330). For transparency, we now provide both the full range and the IQR for all parameters in Table 1. However, we would like to stress that our statistical approach is based on a within-subject (repeated-measures) design, in which the longitudinal change of a parameter within the same participant during the study was the main outcome. In other words, we are not comparing absolute values of any marker between the groups, we are looking at changes of parameters from baseline within participants, and these are not expected to be affected by baseline imbalances.

      A suggestion that there is a critical imbalance between groups is that the control group has significantly lower total HIV DNA in PBMC, despite the small sample size. The control group also has numerically longer time of continuous suppression, lower unspliced RNA, and lower intact proviral DNA. These differences may have biased the ability to see changes in DNA and US RNA in the control group.

      We acknowledge the significant baseline difference in total HIV DNA between groups, which we have clearly reported. However, the other variables mentioned, such as duration of continuous viral suppression, unspliced RNA levels, and intact proviral DNA, did not differ significantly between groups at baseline, despite differences in the median values (that are always present). These numerical differences do not necessarily indicate a critical imbalance.

      Notably, there was no significant difference in the change in US RNA/DNA between groups (Figure 2C).

      The nonsignificant difference in the change in US RNA/total DNA between groups is not unexpected, given the significant between-group differences for both US RNA and total DNA changes. Since the ratio combines both markers, it is likely to show attenuated between-group differences compared to the individual components. However, while the difference did not reach statistical significance (p = 0.09), we still observed a trend towards a greater reduction in the US RNA/total DNA ratio in the intervention group.

      The fact that the median relative change appears very similar in Figure 2C, yet there is a substantial difference in P values, is also a comment on the limits of the current sample size. 

      Although we surely agree that in general, the limited sample size impacts statistical power, we would like to point out that in Figure 2C, while the medians may appear similar, the ranges do differ between groups. At days 56 and 84, the median fold changes from baseline are indeed close but the full interquartile range in the DTG group stays below 1, while in the control group, the interquartile range is wider and covers approximately equal distance above and below 1. This explains the difference in p values between the groups.

      The text should report the median change in US RNA and US RNA/DNA when describing Figures 2A-2C.

      These data are already reported in the Results section (lines 164–166): "By day 84, US RNA and US RNA/total DNA ratio had decreased from day 0 by medians (IQRs) of 5.1 (3.3–6.4) and 4.6 (3.1–5.3) fold, respectively (p = 0.016 for both markers)."

      This statistical comparison of changes in IPDA results between groups should be reported. The presentation of the absolute values of all the comparisons in the supplemental figures is a strength of the manuscript.

      In the assessment of ART intensification on immune activation and exhaustion, the fact that none of the comparisons between randomized groups were significant should be noted and discussed.

      We would like to point out that a statistically significant difference between the randomized groups was observed for the frequency of CD4⁺ T cells expressing TIGIT, as shown in Figure 3A and reported in the Results section (p = 0.048).

      The changes in CD4:CD8 ratio and sCD14 levels appear counterintuitive to the hypothesis and are commented on in the discussion.

      Overall, the discussion highlights the significant changes in the intensified group, which are suggestive. There is limited discussion of the comparisons between groups where the results are less convincing.

      We observed statistically significant differences between the randomized groups for total DNA (p<0.001) and US RNA (p=0.01), as well as for the frequency of CD4⁺ T cells expressing TIGIT (p=0.048). We would like to stress that US RNA is a key marker of residual replication as it is very sensitive to de novo infection events. As discussed in the manuscript (lines 291-294), a newly infected CD4+ T lymphocyte can contain hundreds to thousands of US HIV RNA copies at the peak of infection. Therefore, a change in the US RNA level upon ART intensification is a very sensitive indicator of new infections. The fact that for US RNA we observed both a significant reduction in the intensified group and a significant difference between the groups is a strong indicator that some new infections had been occurring prior to intensification.

      The limitations of the study should be more clearly discussed. The small sample size raises the possibility of imbalance at baseline. The supplemental figures (S3-S5) are helpful in showing the differences between groups at baseline, and the variability of measurements is more apparent. The lack of blinding is also a weakness, though the PK assessments do help (note 3TC levels rise substantially in both groups for most of the time on study (Figure S2).

      The many assays and comparisons are listed as a strength. The many comparisons raise the possibility of finding significance by chance. In addition, if there is an imbalance at baseline outcomes, measuring related parameters will move in the same direction.

      We agree that the multiple comparisons raise the possibility of chance findings but would like to stress that in an exploratory study like this it is very important to avoid a type II error. In addition, the consistent directionality of the most relevant outcomes (US RNA and intact DNA) lends biological plausibility to the observed effects.

      The limited impact on activation and inflammation should be addressed in the discussion, as they are highlighted as a potentially important consequence of intermittent, not sustained replication in the introduction.

      The study is provocative and well executed, with the limitations listed above. Pharmacokinetic analyses help mitigate the lack of blinding. The major impact of this work is if it leads to a much larger randomized, controlled, blinded study of a longer duration, as the authors point out.

      Finally, we fully endorse the reviewer’s suggestion that the primary contribution of this study lies in its value as a proof-of-concept and foundation for future randomized, blinded trials of greater scale and duration. We highlighted this more clearly in the revised Discussion (lines 340-346).

      Reviewer #1 (Recommendations for the authors):

      (1) Lines 84-87: How would chronic immune activation/inflammation be expected to differ if viral antigen is being released from stable reservoirs rather than low-level replication?

      This is a very insightful question. Although release of viral antigens from stable reservoirs could certainly also trigger immune activation/inflammation, the reservoir cells in PWH on long-term ART are constantly being negatively selected by the immune system (PMID: 38337034; PMID: 36596305) so that after a number of years on therapy, most proviruses are either transcriptionally silent or express only a low amount of viral RNA/antigen. Recent evidence suggests that these selected cells possess specific biological properties that include mechanisms that limit proviral gene expression (PMID: 36599977; PMID: 36599978). In comparison, low-level replication would result in de novo infection of unselected, activated CD4+ cells that are expected to produce much more viral antigen than preselected reservoir cells.

      (2) Lines 249-253: There are multiple ways to explain this observation - alternatively, the total proviral DNA declined due to transient CD4 depletion.

      As discussed above, CD4⁺ T-cell counts did not significantly decrease in any of the treatment groups, as shown in Figure 5. The apparent decline observed concerns the CD4/CD8 ratio, which transiently dropped, but not the absolute number of CD4⁺ T cells. Moreover, although the dynamics of total HIV DNA is indeed similar to that of CD4/CD8 ratio (both declined transiently and then returned to baseline by day 84), the dynamics of unspliced RNA and unspliced RNA/total DNA ratio is clearly different, as these markers demonstrated a sustained decrease that was maintained throughout the trial period. Also, we observed a significant decrease in intact HIV DNA at day 84 compared to day 0. These effects cannot be easily explained by a transient decline in CD4+ cells.

      (3) Lines 301-305: This is a confusing explanation for not seeing an effect in tissue. Overall, there was no change in total proviral DNA in blood between days 0 and 84 either - yet the explanation for this observation is different (249-253). Was IPDA not performed on the tissue? Wouldn't this be the preferred test for reservoir depletion?

      We thank the reviewer for bringing this point to our attention. We modified the Discussion to prevent the confusion (lines 303-305). As for the IPDA on tissue, we attempted this assay on the tissue samples using two independent DNA extraction methods (Promega Reliaprep and Qiagen Puregene), but both yielded high DNA shearing index values, and intact proviral detection was successful in only 3 of 40 samples. Given the poor DNA integrity, these results were not interpretable.

    1. eLife Assessment

      Using a transposon sequencing (TN-seq) approach, the authors identified key genetic determinants of drug tolerance in Mycobacterium abscessus. Given that M. abscessus is inherently resistant to multiple antibiotics, this valuable study makes a significant contribution by uncovering how antibiotic tolerance is linked to reactive oxygen species (ROS) in this non-tuberculous mycobacterial (NTM) species. The solid findings further strengthen the growing evidence that ROS play a central role in the mechanism of antibiotic action and tolerance in mycobacteria. However, the manuscript would benefit from improved clarity of presentation and corrections in the reference section.

    2. Reviewer #1 (Public review):

      Summary:

      Persistence is a phenomenon by which genetically susceptible cells are able to survive exposure to high concentrations of antibiotics. This is especially a major problem when treating infections caused by slow growing mycobacteria such as M. tuberculosis and M. abscessus. Studies on the mechanisms adopted by the persisting bacteria to survive and evade antibiotic killing can potentially lead to faster and more effective treatment strategies.

      To address this, in this study, the authors have used a transposon mutagenesis based sequencing approach to identify the genetic determinants of antibiotic persistence in M. abscessus. To enrich for persisters they employed conditions, that have been reported previously to increase persister frequency - nutrient starvation, to facilitate genetic screening for this phenotype. M.abs transposon library was grown in nutrient rich or nutrient depleted conditions and exposed to TIG/LZD for 6 days, following which Tn-seq was carried out to identify genes involved in spontaneous (nutrient rich) or starvation-induced conditions. About 60% of the persistence hits were required in both the conditions. Pathway analysis revealed enrichment for genes involved in detoxification of nitrosative, oxidative, DNA damage and proteostasis stress. The authors then decided to validate the findings by constructing deletions of 5 different targets (pafA, katG, recR, blaR, Mab_1456c) and tested the persistence phenotype of these strains. Rather surprisingly only 2 of the 5 hits (katG and pafA) exhibited a significant persistence defect when compared to wild type upon exposure to TIG/LZD and this was complemented using an integrative construct. The authors then investigated the specificity of delta-katG susceptibility against different antibiotic classes and demonstrated increased killing by rifabutin. The katG phenotype was shown to be mediated through the production of oxidative stress which was reverted when the bacterial cells were cultured under hypoxic conditions. Interestingly, when testing the role of katG in other clinical strains of Mab, the phenotype was observed only in one of the clinical strains demonstrating that there might be alternative anti-oxidative stress defense mechanisms operating in some clinical strains.

      Strengths:

      While the role of ROS in antibiotic mediated killing of mycobacterial cells have been studied to some extent, this paper presents some new findings with regards to genetic analysis of M. abscessus susceptibility, especially against clinically used antibiotics, which makes it useful. Also, the attempts to validate their observations in clinical isolates is appreciated.

      Weaknesses:

      Amongst the 5 shortlisted candidates from the screen, only 2 showed marginal phenotypes which limits the impact of the screening approach.

      While the role of KatG mediated detoxification of ROS and involvement of ROS in antibiotic killing was well demonstrated, the lack of replication of this phenotype in some of the clinical isolates limits the significance of these findings.

    3. Reviewer #2 (Public review):

      Summary:

      The work set out to better understand the phenomenon of antibiotic persistence in mycobacteria. Three new observations are made using the pathogenic Mycobacterium abscessus as an experimental system: phenotypic tolerance involves suppression of ROS, protein synthesis inhibitors can be lethal for this bacterium, and levofloxacin lethality is unaffected by deletion of catalase, suggesting that this quinolone does not kill via ROS.

      Strengths:

      The ROS experiments are supported in three ways: measurement of ROS by a fluorescent probe, deletion of catalase increases lethality of selected antibiotics, and a hypoxia model suppresses antibiotic lethality. A variety of antibiotics are examined, and transposon mutagenesis identifies several genes involved in phenotypic tolerance, including one that encodes catalase. The methods are adequate for making these statements.

      Weaknesses:

      The work can be improved by a more comprehensive treatment of prior work, especially comparison of E. coli work with mycobacterial studies.<br /> Moreover, the work still has some technical issues to fix regarding description of the methods, supplementary material, and reference formating.

      Overall impact: Showing that ROS accumulation is suppressed during phenotypic tolerance, while expected, adds to the examples of the protective effects of low ROS levels. Moreover, the work, along with a few others, extends the idea of antibiotic involvement with ROS to mycobacteria. These are field-solidifying observations.

      Comments on revisions:

      The authors have moved this paper along nicely. I have a few general thoughts.

      (1) It would be helpful to have more references to specific figures and panels listed in the text to make reading easier.

      (2) I would suggest adding a statement about the importance of the work. From my perspective, the work shows the general nature of many statements derived from work with E. coli. This is important. The abstract says this overall, but a final sentence in the abstract would make it clear to all readers.

      (3) The paper describes properties that may be peculiar to mycobacteria. If the authors agree, I would suggest some stress on the differences from E. coli. Also, I would place more stress on novel findings. This might be done in a section called Concluding Remarks. The paper by Shee 2022 AAC could be helpful in phrasing general properties.

      (4) Several aspects still need work to be of publication quality. Examples are the materials table and the presentation of supplementary material. Reference formatting also needs attention.

    4. Reviewer #3 (Public review):

      Summary:

      The manuscript demonstrates that starvation induces persister formation in M. abscesses. They also utilized Tn-Seq for the identification of genes involved in persistence. They identified the role of catalase-peroxidase KatG in preventing death from translation inhibitors Tigecycline and Linezolid. They further demonstrated that a combination of these translation inhibitors leads to the generation of ROS in PBS-starved cells.

      Strengths:

      The authors used high-throughput genomics-based methods for identification of genes playing a role in persistence.

      Weaknesses:

      The findings could not be validated in clinical strains.

      Comments on revisions: No more comments for the authors.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Weaknesses:

      Only 1 gene (katG) gave a strong and 1 (Mab_1456c) exhibited a minor defect. Two of the clones did not show any persistence phenotype (blaR and recR) and one (pafA) showed a minor phenotype,

      We have now carried out more detailed validation studies on the Tn-Seq, with analysis of timedependent killing over 14 d. This more comprehensive analysis shows that 4 of 5 genes analyzed do indeed have antibiotic tolerance defects under the conditions that Tn-Seq predicted a survival defect (Revised Figure 3). In addition, we found that even before actual cell death, several mutants had delayed resumption of growth after antibiotic removal (Figure 3 Supplemental).

      Fig 3 - Why is there such a huge difference in the extent of killing of the control strain in media, when exposed to TIG/LZD, when compared to Fig. 1C and Fig. 4. In Fig. 1C, M. abs grown in media decreases by >1 log by Day 3 and >4 log by Day 6, whereas in Fig. 3, the bacterial load decreases by <1 log by Day 3 and <2 log by Day 6. This needs to be clarified, if the experimental conditions were different, because if comparing to Fig. 1C data then the katG mutant strain phenotype is not very different.

      We agree with the reviewer that there is variability in the timing and extent of cell death from experiment to experiment. As noted by the reviewer, in Figure 1C the largest decrement in survival is between day 1 - day 3 (also seen in Figure 6A). As they noted in Figure 4 the largest decrement is between day 3 – day 6 (also seen in Figure 3A, Figure 5F). In each experiment with katG mutants we carefully compare the mutant vs. the control strain within that experiment, which is more accurate than comparing the behavior of mutant in one experiment to a control in another experiment.

      Reviewer #2 (Public review):

      Weaknesses:

      .First, word-choice decisions could better conform to the published literature. Alternatively, novel definitions could be included. In particular, the data support the concept of phenotypic tolerance, not persistence. 

      We appreciate the reviewers comments, text modified.

      Second, two of the novel observations could be explored more extensively to provide mechanistic explanations for the phenomena. 

      We have added several additional experiments, these are detailed below in response to specific comments.

      Reviewer #3 (Public review):

      Weaknesses:

      The findings could not be validated in clinical strains.

      We understand the reviewer’s concern that the katG phenotype was only observed in one of the two clinical strains we studied. We feel that our findings are relevant beyond the ATCC 19977 strain for two reasons

      (1) We have performed additional analyses of the two clinical isolates and indeed find significant accumulation of ROS following antibiotic exposure in both of these strains (revised Figure 6A).

      (2) We do in fact see a role for katG in starvation-induced antibiotic tolerance in Mabs clinical strain-2. It is not surprising that different strains from a particular species may have some different responses to stresses – for example, there is wide strain-specific variability in susceptibility to different phages within a species based on which particular phage defense modules a given strain carries (for example PMID: 37160116). We speculate that different Mabs strains may express varying levels of other antioxidant factors and note that the genes encoding several such factors were identified by our Tn-Seq screen including the peroxidases ahpC, ahpD, and ahpE. Our analysis of the genetic interactions between katG and these other factors is ongoing. 

      Comments/Suggestions

      (1) In Fig1E, the authors show no difference in killing Mtb with or without adaptation in PBS. These data are contrary to the data presented in Figure 1B. These also do not align with the data of M. smegmatis and M. abscesses. Please discuss these observations in light of the Duncan model of persistence (Mol Microbiol. 2002 Feb;43(3):717-31.).’

      The above referenced Duncan laboratory study found tolerance after prolonged starvation but did not actually examine tolerance at early time points. While some of the transcriptional and metabolic changes seen by Duncan and others are slow, other groups have described starvation responses in Mtb that are quite rapid. For example, the stringent response mediator ppGpp accumulates within a few hours after onset of starvation in Mtb (PMID: 30906866). We suspect that a rapid signaling response such as this underlies the phenotype we observe. Regarding the difference between Mtb and other mycobacterial species we also find it surprising that Mtb had a much more rapid starvation response. This is a clear species-specific difference that may reflect an adaptation of Mtb to the nutrient-limited physiologic niche within host macrophages.

      (2) Line 151, the authors state that they have used an M. abscesses Tn mutant library of ~ 55,000 mutant strains. The manuscript will benefit from the description of the coverage of total TA sites covered by the mutants.

      Text modified to add this detail. There are 91,559 TA sites in the abscessus genome. Thus, our Tn density is ~60%.

      (3) Line 155: Please explain how long the cells were kept in an Antibiotic medium.

      This technical detail was noted above on line 153 in the original text: “…and then exposed them to TIG/LZD for 6 days”. To clarify the overall conditions, we have also revised the text of the manuscript and added the detail of how long cells were passaged after removal of antibiotics.

      (4) Line 201: data not shown. Delayed resumption of growth after removal of antibiotic would be helpful in indicating drug resilience. This data could enhance the manuscript.

      Data now provided in Figure 3 Supplemental

      (5) Figures 4C and 4F represent the kill curve. It will be good to show the date with CFU against the drug concentration in place of OD600. CFU rather than OD600 best reflects growth inhibition.

      Figures 4C and 4F are measuring the minimum inhibitory concentration (MIC) to stop the overall growth of the bacterial population. While we agree that CFU could be analyzed, this would be measuring a different outcome – cell death and the minimum bactericidal concentration (MBC). In these experiments we sought to specifically examine the MIC so as to separate growth inhibition from cell death. For this we used the standard method employed by clinical microbiology laboratories for MIC, which is optical density of the culture (PMID: 10325306).

      (6) Figure 5C. The authors shall show the effect of TIG/LZD on M. abscesses ROS production without the PBS adaptation. It is important to conclude that TIG/LZD induces ROS in cells. Authors should utilize ROS scavengers such as Thiourea, DFO, etc., to conclude ROS's contribution to bacterial killing following inhibition of transcription and translation.

      New data added (revised Figure 5 and Figure 5 Supplemental)  

      (7) Line 303. Remove "note".

      Text revised. We thank the reviewer for identifying this typographical error.  

      (8) The introduction and Discussion are very similar, and several lines are repeated.

      Text revised with overlapping content removed.

      Reviewer #1 (Recommendations for the authors):

      It appears that the same datasets for PBS adapted cultures were plotted in A-C and D-F. Either this should be specifically mentioned in the legend or it might be better to integrate the non-adapted plots into A-C which would also allow easier comparison.

      Appreciate the reviewer’s suggestion; text modified with added clarification to figure legend.

      This manuscript is focused on M. abs and the antibiotics TIG/LZD, so the Mtb data or data using the antibiotics INH/RIF/EMB and serves more as a distraction and can be removed

      We appreciate the reviewer’s perspective. However, we wish to include these data to show the similarities (and differences) in starvation-induced tolerance between the three organisms.

      Fig 3 -As mentioned for Fig. 1, it appears that the same dataset was used for the control in all the figures A-E. This should be explicitly stated in the Figure legend.

      Appreciate the reviewer’s suggestion; text modified with added clarification to figure legend.

      The divergent results from the clinical strains are extremely interesting. It would be helpful to determine the oxidative stress levels (similar to the cellROX data shown in 5E), to tease out if the difference in katG role is because of lack of ROS induction in these strains or due to expression of alternate anti-oxidative stress defense mechanisms.

      We have performed additional cellROX analysis as suggested by the reviewer and found that the ROS induction is indeed present across all three Mabs strains, but that katG is only required in one of the two strains (Strain #2). These data are now included in the revised Figure 6.

      Reviewer #2 (Recommendations for the authors):

      GENERAL COMMENTS

      This is a nice piece of work that uses the pathogen Mabs as a test subject.

      The work has findings that likely apply generally to antibiotics and mycobacteria: 1) phenotypic tolerance is associated with suppression of ROS, 2) lethal protein synthesis inhibitors act via accumulation of ROS, and 3) levofloxacin behaves in an unexpected way. Each is a new observation. However, I believe that each topic requires more work to be firmly established to be suitable for eLife.

      Phenotypic tolerance: Association with suppression of ROS is important but expected. I would solidify the conclusion by performing several additional experiments. For example, confirm the lethal effect of ROS by reducing it with an iron chelator and a radical scavenger. There is a large literature on effects of iron uptake, levels, etc. on antibiotic lethality that could be applied to this question. In 2013 Imlay argued against the validity of fluorescent probes. Perhaps getting the same results with another probe would strengthen the conclusion.

      We have carried out additional experiments with both an iron chelator and small molecule ROS scavengers to further test this idea but note that these experiments have several inherent limitations: 1) These compounds have highly pleiotropic effects. For example while N-acetyl cysteine (NAC) is an antioxidant it also increases mycobacterial respiration and was shown to paradoxically decrease antibiotic tolerance in M. tuberculosis (PMID: 28396391). 2) It has been shown by the Imlay group that small-molecule antioxidants are often ineffective in quenching ROS in bacteria (PMID: 388893820), making negative results difficult to interpret. Nonetheless, we present new experimental data showing that iron chelation does indeed improve the survival of antibiotic-treated Mabs (revised Figure 5).  However,  small molecule antioxidants such as thiourea do not restore antibiotic tolerance and actually increased bacterial cell death, suggesting that they may be affecting respiration in Mabs in a manner similar to that seen for NAC in Mtb. We also note that our genetic analysis, which identified numerous other genes encoding proteins with antioxidant function (Figure 2) is a strong additional argument in support of the importance of ROS in antibiotic-mediated lethality. 

      Regarding the concern raised by Imlay about the validity of oxidation-sensitive dyes - this relates to concern bacterial autofluorescence induced by antibiotics that can confound analyses in some species. We have ruled this out in our analyses by using bacteria unstained by cellROX as controls to confirm that there is negligible autofluorescence in Mabs (<0.1%, Figure 5E, Figure 6A).

      Protein synthesis inhibitors: At present, this is simply an observation. More work is needed to suggest a mechanism. For example, with E. coli the aminoglycosides are protein synthesis inhibitors that also cause membrane damage. Membrane damage is known to stimulate ROS-mediated killing. Your observation needs to be extended because chloramphenicol, another protein synthesis inhibitor, blocks ROS production. The lethality may be a property of mycobacteria: does it occur with E. coli (note that rifampicin is bacteriostatic with E. coli but lethal to Mtb)?

      We agree with the reviewer that the mechanism underlying ROS accumulation following transcription or translational inhibition in Mabs is of significant interest. It is likely to be a mechanism different from E. coli, because in E. coli tetracyclines and rifamycins are both bacteriostatic, whereas in Mabs they are both bactericidal. Determining the mechanism by which translation inhibitors cause ROS accumulation in Mabs is an ongoing effort in our laboratory using proteomics and metabolomics, but is outside the scope of this manuscript.

      Levofloxacin: This is also at the observational stage but is unexpected. In other studies, ROS is involved in quinolone-mediated killing of bacteria. Why is this not the case with Mabs? The observation should be solidified by showing the contrast with moxifloxacin, since this compound has been studied with mycobacteria (Shee 2022 AAC). With E. coli, quinolone structure can affect the relative contribution of ROS to killing (Malik 2007 AAC), as is also seen with Mtb (Malik 2006 AAC). What is happening in the present work with levofloxacin, an important anti-tuberculosis drug? Is there a structure explanation (compare with ofloxacin)?

      While these are interesting questions, a detailed exploration of the structure-function relationships between different fluoroquinolone antibiotics and their varying activities on Mtb and Mabs is outside the scope of this manuscript.  

      The writing is generally easy to follow. However, the concept of persistence should be changed to phenotypic tolerance with text changes throughout. I base this suggestion on the definitions of tolerance and persistence as stated in the consensus review (Balaban 2019 Nat Micro Rev). Experimentally, tolerance is seen as a gradual decline in survival following antibiotic addition; the decline is slower than seen with wild-type cells. The data presented in this paper fit that definition. In contrast, persistence refers to a rapid drop in survival followed by a distinct plateau (Balaban 2019 Nat Micro Rev; for example, see Wu Lewis AAC 2012 ). Moreover, to claim persistence, it would be necessary to demonstrate subpopulation status, which is not done. The Balaban review is an attempt to bring order to the field with respect to persistence and tolerance, since the two are commonly used without regard for a consistent definition.

      We appreciate the reviewer’s suggestion; text modified in multiple places to clarify.

      Another issue requiring clarification is the relationship between resistance and tolerance. Killing by antibiotics is a two-step process, as most clearly seen with quinolones. First a reversible bacteriostatic event occurs. Resistance blocks that bacteriostatic damage. Then a lethal metabolic response to that damage occurs. Tolerance selectively blocks the second, killing event, a distinct process that often involves the accumulation of ROS. Direct antibiotic-mediated damage is an additional mode of killing that also stems from the reversible, bacteriostatic damage created by antibiotics. The authors recognize the distinction but could make it clearer. Take a look at Zheng (JJ Collins) 2020, 2022.

      Text modified to clarify this point

      Many readers would also like to see a bit more background on Mabs. For example, does it grow rapidly? Are there features that make it a good model for studying mycobacteria or bacteria in general? The more general, the better.

      Text modified, background added

      Below I have listed specific comments that I hope are useful in bringing the work to publication and making it highly cited.

      SPECIFIC COMMENTS

      Line 30 unexpectedly. I would delete this word because the result is expected from the ROS work of Shee et al 2022 with mycobacteria. Moreover, Zeng et al 2022 PNAS showed that ROS participates in antimicrobial tolerance, and persistence is a form of tolerance (Balalban et al, 2019, Nat Micro Rev).

      Text modified as per review suggestion

      Line 39 key goal: this is probably untrue in the general sense stated, since bacteriostatic antibiotics are sufficient to clear infection (Wald-Dickler 2019 Clin Infect Dis). However, it is likely to be the goal for Mtb infections.

      We agree with the reviewer that bacteriostatic antibiotics are effective in treating most types of infections and do not claim otherwise in the manuscript. However, from a clinical standpoint, eradication of the pathogen causing the infection is indeed the goal of antibiotic therapy in virtually all circumstances (with the exception of specific scenarios such as cystic fibrosis where it is recognized that the infecting organism cannot be fully eliminated). In most cases, the combination of bacteriostatic antibiotics and the host immune response is sufficient to achieve eradication. We have modified the manuscript text to reflect this nuance noted by the reviewer.

      Line 62 several: you list three, but hipAB works via ppGpp, so the sentence needs fixing

      Text modified  

      Line 70 uncertain: this uncertainty is unreferenced. Since everything is uncertain, this vague phrase does not add to the story.

      The reviewer makes an interesting philosophical argument. However, we would submit that some aspects of biology, for example the regulation of glycolysis, are understood in great detail. However, other mechanisms, such as the precise mechanisms of lethality for diverse antibiotics in different bacterial species, are far more uncertain and remain a subject of debate (for example PMID: 39910302). Text not modified.

      Line 72 somewhat controversial: I would delete this, because the points in the Science papers by Lewis and Imlay have been clarified and in some cases refuted by prior and subsequent work.

      Text modified

      Line 72 presumed: this suggests that it is wrong and perhaps a different idea has replaced it. Another, and more likely view is that there is an additional mode of killing. I suggest rephrasing to be more in line with the literature.

      Text modified for clarity. In this sentence “presume” refers to the historical concept that direct target inhibition was solely responsible for antibiotic lethality. As the reviewer notes, there is now significant literature that ROS (and perhaps other secondary effects) also contribute to bacterial killing.  

      Line 73 However and the following might also: this phrasing, plus the presumed, misleads the reader from your intent. I suggest rephrasing.

      See above re: line 72

      Line 75 citations: these are inappropriate and should be changed to fit the statement. I suggest the initial paper by Collins (Kohanski 2007 Cell) a recent paper by Zhao (Zeng PNAS 2022), and a review Drlica Expert Rev Anti-infect Therapy 2021). The present citations are fine if you want to narrow the statement to mycobacteria, but the history is that the E. coli work came first and was then generalized to mycobacteria. A mycobacterial paper for ROS is Shee 2022 AAC.

      We thank the reviewer for noticing that we inadvertently omitted several important E. coli-related references. These have been added.

      Line 75 and 76: Conversely ... unresolved. Compelling arguments have been made that show major flaws in the two papers cited, and a large body of evidence has now accumulated showing the validity of the idea promoted by the Collins lab, beginning with Kohanski 2007. In addition to many papers by Collins, see Hong 2019 PNAS and Zeng 2022 PNAS). It is fine if you want to counter the arguments against the Lewis and Imlay papers (summarized in Drlica & Zhao 2021 Expert Rev Anti-infect Therapy), but making a blanket statement suggests that the authors are unfamiliar with the literature.

      We agree with the reviewer that the weight of the evidence supports a role for antibiotic-induced ROS as an important mechanism for antibiotic lethality under many (though not all) conditions. We have revised the text to better reflect this nuance.

      Line 78. Advantages over what?

      Text modified

      Line 80 exposure: to finish the logic you need to show that E. coli and S. aureus persisters fail to do this.

      We thank the reviewer for their suggestion but studying these other organisms is outside the scope of this study. 

      Line 82 whereas: this misdirects the reader. It would seem that a simple "and" is better

      Text modified

      Line 89 I think this paragraph is about the need to study Mabs, the subject of the present report. This paragraph could use a more appropriate topic sentence to guide the reader so that no guessing is involved. I suggest rephrasing this paragraph to make the case for studying more compelling.

      Text modified

      Line 96. I suggest citing several references after subinhibitory concentration of antibiotic.

      The references are in the following sentence alongside the key observations.

      Line 99. Genetic analysis: how does this phrase fit with the idea of persister cells arising stochastically?

      There are two issues: 1) We would argue that persister formation is not completely stochastic, but rather a probability that can be modified both genetically and by environment (for example hipA PMID: 6348026). 2) Even if persister formation were totally stochastic, the survival of these cells may depend on specific genes – as we indeed find in our Tn-Seq analysis of Mabs.  

      Line 106. In this paragraph you need to define persister. The consensus definition (Balaban 2019 Nat Micro Rev) is a subpopulation of tolerant cells. Tolerance is defined as the slowing or absence of killing while an antibiotic retains its ability to block growth. See Zeng 2022 PNAS for example with rapidly growing cells. Phenotypic tolerance is the absence of killing due to environmental perturbations, most notably nutrient starvation, dormancy, and growth to stationary phase. By extension, phenotypic persistence would be subpopulation status of a phenotypically tolerant cells. If you have a different definition, it is important to state it and emphasize that you disagree with the consensus statement.

      Text modified  

      Line 109 unexpectedly. I would delete this word, because the literature leads the reader to expect this result unless you make a clear case for Mabs being fundamentally different from other bacteria with respect to how antibiotics kill bacteria (this is unlikely, see Shee 2022 AAC). Indeed, lines 111-113 state extensions of E. coli work, although suppression of ROS in phenotypic tolerance and genetic persistence have not been demonstrated.

      Text modified

      Line 124 you might add, in parentheses and with references, that a property of persisters is crosspersistence to multiple antibiotic classes. This is also true for tolerance, both genetic and phenotypic. An addition will support your approach.

      Text modified

      Line 128 minimal

      Text not modified. We appreciate the reviewer’s preference but both “minimal” and “minimum” are both widely accepted terms. Indeed, the Balaban et al 2019 consensus statement on definitions cited by the author above also uses “minimum” (PMID: 30980069), as do IDSA clinical guidelines (PMID: 39108079).

      Line 130 is MIC somehow connected to killing or did you also measure killing? Note that blocking growth and killing cells are mechanistically distinct phenomena, although they are related. By being upstream from killing, blockage of growth will also interfere with killing.

      Text modified

      Line 133 PBS is undefined

      Text modified

      Line 134 increase in persisters ... you need to establish that these are not phenotypically tolerant cells. Do they constitute the entire population (tolerance)? Your data would be more indicative of persisters if you saw a distinct plateau with the PBS samples, as such data are often used to document persistence (retardation of killing is a property of tolerance, Balaban 2019). Fig. 1B is clearly phenotypic tolerance, as the entire population grows. Your data suggest that you are not measuring persistence as defined in the literature (Balaban 2019). Line 139 persister should be tolerance •

      Text modified

      Lines 142, 143, 144. 159, 163, 171, 181, 211, 226, 238, 246, 277, 279,289 persistent should be tolerant

      Text modified

      Line 146 fig 1E Mtb does not show the adaptation phenomenon and it is clearly tolerant, not persistent. This should be pointed out. As stated, you may be misleading the reader.

      Text modified  

      *Line 169. Please make it clear whether these genes are affecting antibiotic susceptibility (MIC will affect killing because blocking growth is upstream) or if you are dealing with tolerance (no change in MIC). These measurements are essential and should included as a table. By antibiotic response, do you mean that antibiotics change expression levels?

      Regarding MICs, the data for MICs in control and katG mutant are presented in Figure 4C and 4F. Regarding ‘response’ we have clarified the text of this sentence.

      Line 174 Interestingly should be as expected

      Text not modified; tetracyclines do not induce ROS in E. coli and oxazolidinones have not been studied in this regard.

      Line 183 you need to include citations. You can cite the ability of chloramphenicol to block ROS-mediated killing of E. coli. That allows you to use the word unexpected

      Text modified

      Line 199. All of the data in Fig. 3 shows tolerance, not persistence, requiring word changes in this paragraph.

      Text modified

      Line 226. The MIC experiment is important. You can add that this result solidifies the idea that blocking growth and killing cells are distinct phenomena. You can cite Shee 2022 AAC for a mycobacterial paper

      Text modified

      Line 241. The result with levofloxacin is unexpected, because the fluoroquinolones are widely reported to induce ROS, even with mycobacteria (see Shee 2022 AAC). You need to point this out and perhaps redo the experiment to make sure it is correct.

      We appreciate the reviewer’s interest in this question. All experiments in this paper were repeated multiple times. This particular experiment was repeated 3 times and in all replicates the katG mutant was sensitized to translation inhibitors but not levofloxacin. Shee et al examined Mtb treated with moxifloxacin and found ROS generation, but did not assess whether a Mtb katG mutant had impaired survival. Thus, in addition to differences in: i) the species studied and ii) the particular fluoroquinolone used, the two sets of experiments were designed to address different questions (ROS accumulation vs protection by katG) . A cell might accumulate ROS without a katG mutant having impaired survival if genetic redundancy exists – a result we indeed see in our clinical Mabs strains under some conditions (new data included in revised Figure 6A).  

      Line 269 Additional controls would bolster the conclusion: use of an antioxidant such as thiourea and an iron chelator (dipyridyl) both should reduce ROS effects.

      New experiments performed, revised Figure 5.

      Line 276 the word no is singular

      Text modified

      Line 284 this suggested ... in fact previous work suggested. This summary paragraph might go better as the first paragraph of the Discussion

      Text modified to specify that this is in reference to the work in this manuscript

      Lines 294-299 Most of this is redundant and should be deleted.

      Text modified

      Line 299 this species is vague

      Text modified

      Line 310 Do you want to discuss spoT?

      Text not modified

      Line 313 paragraph is largely redundant

      Text modified

      Line 314 controversial. As above, I would delete this, especially since it is not referenced and is unlikely to be true. If you believe it, you have the obligation to show why the ROS-lethality idea is untrue. If you are referring to Lewis and Imlay, there were almost a dozen supporting papers before 2013 and many after. This statement does not make the present work more important, so deletion costs you nothing.

      Text modified

      Line 314 direct disruption of targets. This is clearly not a general principle, because the quinolones rapidly kill while inhibition of gyrase by temperature-sensitive mutations does not (Kreuzer 1979 J.Bact; Steck 1985). Indeed, formation of drug-gyrase-DNA complexes is reversible: death is not.

      Text modified

      Line 318 as pointed out above, you have not brought this story up to date. The two papers mainly focused on Kohanski 2007, ignoring other available evidence.’’

      Text modified

      Line 326 you need to cite Shee 2022 AAC

      Text modified

      Line 342 the idea of mutants being protective is not novel, as several have been reported with E. coli studies. Thus, there is a general principle involved.

      We agree that this suggests a potential general principle

      Line 344. It depends on the inhibitor. For example, aminoglycosides are translation inhibitors and they also cause the accumulation of ROS.

      We agree that ROS generation depends on the inhibitor, and indeed upon other variables including drug concentration, growth conditions, and bacterial species as well.  

      Line 347. You need to point out the considerable data showing that the absence of catalase increases killing

      Text modified

      Line 363 look at Shee 2022 AAC and Jacobs 2021 AAC

      Text modified, reference added.

      Line 585 I suggest having a colleague provide critical comments on the manuscript and acknowledge that person.

      Text not modified

    1. eLife Assessment

      The study presents important findings regarding the incidence and clinical impact of a mutation in a cardiac muscle protein and its association with the development of atrial fibrillation. The authors provide convincing evidence of electrophysiological disturbances in cells with this mutation and of its association with atrial fibrillation, which would be of interest to cardiologists. Evidence supporting the conclusion that this mutation causes atrial fibrillation would benefit from more rigorous electrophysiologic approaches.

    2. Reviewer #1 (Public review):

      Pavel et al. analyzed a cohort of atrial fibrillation (AF) patients from the University of Illinois at Chicago, identifying TTN truncating variants (TTNtvs) and TTN missense variants (TTNmvs). They reported a rare TTN missense variant (T32756I) associated with adverse clinical outcomes in AF patients. To investigate its functional significance, the authors modeled the TTN-T32756I variant using human induced pluripotent stem cell-derived atrial cardiomyocytes (iPSC-aCMs). They demonstrated that mutant cells exhibit aberrant contractility, increased activity of the cardiac potassium channel KCNQ1 (Kv7.1), and dysregulated calcium homeostasis. Interestingly, these effects occurred without compromising sarcomeric integrity. The study further identified increased binding of the titin-binding protein Four-and-a-Half Lim domains 2 (FHL2) with KCNQ1 and its modulatory subunit KCNE1 in the TTN-T32756I iPSC-aCMs.

      Comments on revised version:

      This revised manuscript demonstrates significant improvement, notably through the inclusion of new data (Supplementary Figures 5 and 7) and expanded explanations in the main text. These additions strengthen the association between the TTN-T32756I missense variant and electrophysiological phenotypes relevant to atrial fibrillation (AF). The authors are commended for their thorough and thoughtful responses to reviewer feedback, their transparency in acknowledging limitations, and their efforts to provide mechanistic insight into the observed phenotype.

      Nonetheless, several important limitations remain and should be more explicitly addressed when framing the conclusions and selecting the final manuscript title:

      (1) While the data support a functional impact of the TTN-T32756I variant, the evidence does not yet definitively establish causality in the context of AF. Statements asserting a causal relationship should be softened and clearly framed as suggestive, pending further in vivo or patient-specific validation.

      (2) The study models the TTN-T32756I variant in a single healthy iPSC line using CRISPR/Cas9 editing. Although this provides a genetically controlled system, the absence of validation in patient-derived iPSCs or replication across multiple isogenic lines limits the generalizability and reproducibility of the findings.

      (3) The co-localization and co-immunoprecipitation (co-IP) data provide strong support for an interaction between FHL2 and the KCNQ1/KCNE1 complex. However, in the current form, the proposed mechanism remains plausible but not fully validated.

    3. Reviewer #2 (Public review):

      Summary:

      The authors present data from a single-center cohort of African-American and Hispanic/Latinx individuals with atrial fibrillation (AF). This study provides insight into the incidences and clinical impact of missense variants in the Titin (TTN) gene in this population. In addition, the authors identified a single amino acid TTN missense variant (TTN-T32756I) that was further studied using human induced pluripotent stem cell-derived atrial cardiomyocytes (iPSC-aCMs). These studies demonstrated that the Four-and-a-Half Lim domains 2 (FHL2), has increased binding with KCNQ1 and its modulatory subunit KCNE1 in the TTN-T32756I-iPSC-aCMs, enhancing the slow delayed rectifier potassium current (Iks) and is a potential mechanism for atrial fibrillation. Finally, the authors demonstrate that suppression of FHL2 could normalize the Iks current.

      Strengths:

      The strengths of this manuscript/study are listed below:

      (1) This study includes a previously underrepresented population in the study the genetic and mechanistic basis of AF.

      (2) The authors utilize current state-of-the-art methods to investigate the pathogenicity of a specific TTN missense variant identified in this underrepresented patient population.

      (3) The findings of this study identify a potential therapeutic for treating atrial fibrillation.

      Weaknesses:

      (1) The authors do not include a non-AF group when evaluating the incidence and clinical significance of TTN missense variants in AF patients. The authors appropriately acknowledge this as a limitation in their single-center cohort.

      (2) All other concerns from a previous version of this manuscript have been adequately addressed by the authors in this revision.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      Pavel et al. analyzed a cohort of atrial fibrillation (AF) patients from the University of Illinois at Chicago, identifying TTN truncating variants (TTNtvs) and TTN missense variants (TTNmvs). They reported a rare TTN missense variant (T32756I) associated with adverse clinical outcomes in AF patients. To investigate its functional significance, the authors modeled the TTN-T32756I variant using human induced pluripotent stem cell-derived atrial cardiomyocytes (iPSC-aCMs). They demonstrated that mutant cells exhibit aberrant contractility, increased activity of the cardiac potassium channel KCNQ1 (Kv7.1), and dysregulated calcium homeostasis. Interestingly, these effects occurred without compromising sarcomeric integrity. The study further identified increased binding of the titin-binding protein Four-and-a-Half Lim domains 2 (FHL2) with KCNQ1 and its modulatory subunit KCNE1 in the TTN-T32756I iPSCaCMs.

      Strengths:

      This work has translational potential, suggesting that targeting KCNQ1 or FHL2 could represent a novel therapeutic strategy for improving cardiac function. The findings may also have broader implications for treating patients with rare, disease-causing variants in sarcomeric proteins and underscore the importance of integrating genomic analysis with experimental evidence to advance AF research and precision medicine.

      Weaknesses

      (1) Variant Identification: It is unclear how the TTN missense variant (T32756I) was identified using REVEL, as none of the patients' parents reportedly carried the mutation or exhibited AF symptoms. Are there other TTN variants identified in the three patients carrying TTN-T32756I? Clarification on this point is necessary.  

      We thank the reviewer for their insightful comment. We have now clarified these in the method section.

      Line 484-491: “The TTN-T32756I variant (REVEL Score: 0.58758, Supplementary Table 1) was prioritized due to its occurrence in multiple unrelated individuals within our clinical AF cohort, despite no reported family history of AF in affected individuals. While no parental inheritance was observed, the possibility of de novo origin cannot be excluded. Furthermore, this variant is located within a region overlapping a deletion mutation recently shown to cause AF in a zebrafish model, supporting its potential pathogenicity [37]. Notably, the affected individuals did not carry additional loss-of-function TTN variants.”

      (2) Patient-Specific iPSC Lines: Since the TTN-T32756I variant was modeled using only one healthy iPSC line, it is unclear whether patient-specific iPSC-derived atrial cardiomyocytes would exhibit similar AF-related phenotypes. This limitation should be addressed.

      We have now acknowledged this limitation in the revised manuscript.

      Line 505-509: “Due to the patients' unavailability of peripheral blood mononuclear cells (PBMCs), we utilized a healthy iPSC line and introduced the TTN-T32756I variant using CRISPR/Cas9 genome editing. This approach ensures an isogenic background, thereby minimizing genetic variability and providing a controlled system to study the direct effects of the mutation.”

      (3) Hypertension as a Confounding Factor: The three patients carrying TTN-T32756I also have hypertension. Could the hypertension associated with this variant contribute secondarily to AF? The authors should discuss or rule out this possibility.

      We have now explicitly discussed this in the revised manuscript.

      Line 362-367: “Hypertension is a common comorbidity in patients with AF and could contribute to disease progression. However, all three individuals carrying TTN-T32756I exhibited earlyonset AF (onset before 66 years), with one case occurring as early as 36 years. This suggests a potential two-hit mechanism, where genetic predisposition and comorbidities influence disease risk. Importantly, our iPSC model isolates the genetic effects of TTN-T32756I from other factors, supporting a direct pathogenic role.”

      (4) FHL2 and KCNQ1-KCNE1 Interaction: Immunostaining data demonstrating the colocalization of FHL2 with the KCNQ1-KCNE1 (MinK) complex in TTN-T32756I iPSC-aCMs are needed to strengthen the mechanistic findings.

      We thank the reviewer for this insightful suggestion. We agree that additional immunostaining data would further strengthen the evidence for FHL2 colocalization with the KCNQ1-KCNE1 complex in TTN-T32756I iPSC-aCMs. In line with this, we have expanded our analysis to include both co-immunoprecipitation and confocal microscopy.  As described in the revised manuscript (Lines 282–287), the colocalization between KCNE1 and FHL2 was increased by approximately threefold in TTN-T32756I iPSC-aCMs compared with WT, supporting an enhanced interaction between these proteins (Figure 5A, Supplementary Figure 6). We are generating additional immunostaining data to validate and extend these findings, and we will incorporate them into the revised submission to further substantiate the mechanistic link proposed.

      Line 282-287: “…..if TTN-T32756I increases I<sub>ks</sub> by modulating the interaction between KCNQ1KCNE1 and FHL2, we performed co-immunoprecipitation studies and confocal microscopy in both WT and TTN-T32756I-iPSC-aCMs. The co-localization between KCNE1 and FHL2 increased ~3 fold in TTN-T32756I-iPSC-aCMs, suggesting an increased interaction between them (Figure 5A, Supplementary Figure 7).”

      (5) Functional Characterization of FHL2-KCNQ1-KCNE1 Interaction: To further validate the proposed mechanism, additional functional assays are necessary to characterize the interaction between FHL2 and the KCNQ1-KCNE1 complex in TTN-T32756I iPSC-aCMs.

      We thank the reviewer for this valuable suggestion. We agree that additional functional assays would provide further validation of the proposed mechanism. However, we believe such in-depth characterization warrants a dedicated follow-up study and is beyond the scope of the current revision. In this work, our primary objective is to establish that the TTN missense variant can exert a detrimental effect and serve as a substrate for AF. 

      Line 418-419: “Further study is needed to validate the proposed mechanism and determine if TTNmvs in other regions are associated with AF by a similar process.”

      Reviewer #2 (Public review):

      Summary:

      The authors present data from a single-center cohort of African-American and Hispanic/Latinx individuals with atrial fibrillation (AF). This study provides insight into the incidences and clinical impact of missense variants in this population in the Titin (TTN) gene. In addition, the authors identified a single amino acid TTN missense variant (TTN-T32756I) that was further studied using human induced pluripotent stem cell-derived atrial cardiomyocytes (iPSC-aCMs). These studies demonstrated that the Four-and-a-Half Lim domains 2 (FHL2) has increased binding with KCNQ1 and its modulatory subunit KCNE1 in the TTN-T32756I-iPSCaCMs, enhancing the slow delayed rectifier potassium current (Iks) and is a potential mechanism for atrial fibrillation. Finally, the authors demonstrate that suppression of FHL2 could normalize the Iks current.

      Strengths:

      The strengths of this manuscript/study are listed below:

      (1) This study includes a previously underrepresented population in the study of the genetic and mechanistic basis of AF.

      (2) The authors utilize current state-of-the-art methods to investigate the pathogenicity of a specific TTN missense variant identified in this underrepresented patient population.

      (3) The findings of this study identify a potential therapeutic for treating atrial fibrillation.

      Weaknesses:

      (1) The authors do not include a non-AF group when evaluating the incidence and clinical significance of TTN missense variants in AF patients.

      We appreciate the reviewer’s comment and acknowledge the limitation of not including a non-AF control group in our clinical analysis. As noted in the revised manuscript (Lines 347–353), our cohort was derived from a single-center registry of individuals with AF and therefore lacks a matched non-AF control population for direct comparison of TTN missense variant incidence. We agree that future studies incorporating larger, multiethnic validation cohorts with both AF and non-AF individuals, as well as evaluating AF-specific measures such as arrhythmia burden and treatment response, will be essential to fully elucidate the clinical significance of TTN missense variants in AF.

      Line 347-353: “Our cohort is derived from a single-center multi-ethnic registry of individuals with AF and lacks a matched cohort of non-AF controls to compare the incidence of TTN missense variants.  Further study exploring these associations in mult-ethnic, larger validation cohorts that include both AF and non-AF individuals and examining AF-specific measures such as arrhythmia burden or treatment response will be necessary to fully understand the clinical importance of TTNmvs in AF.”

      (2) The authors do not provide evidence that TTN-T32756I-iPSC-aCMs are arrhythmogenic, only that there is an increase in the Iks current and associated action potential changes. More specifically, the authors report that "compared to the WT, TTN-T32756I-iPSC-aCMs exhibited increased arrhythmic frequency," yet it is unclear what they are referring to by "arrhythmic frequency."

      We thank the reviewer for this important point and for highlighting the need for clarification. In our study, the term “arrhythmic frequency” was intended to describe the increased spontaneous beating rate, irregular action potential patterns, and abnormal calcium handling observed in TTN-T32756I iPSC-aCMs compared with WT. These findings support the concept that the AF-associated TTN-T32756I variant promotes ion channel remodeling and perturbs excitation–contraction coupling, thereby creating a potential arrhythmogenic substrate for AF. To avoid ambiguity, we have removed the term “arrhythmic frequency” and revised the text for clarity and precision (Lines 222–223).

      Lines 222-223: “Compared to the WT, TTN-T32756I-iPSC-aCMs exhibited increased frequency along with a significant reduction of the time to 50% and 90% decline of calcium transients (Figure 3G-I, Supplementary Figure 4F).”

      (3) There seem to be discrepancies regarding the impact of the TTN-T32756I variant on mechanical function. Specifically, the authors report "both reduced contraction and abnormal relaxation in TTN-T32756I-iPSC-aCMs" yet, separately report "the contraction amplitude of the mutant was also increased . . . suggesting an increased contractile force by the TTN-T32756IiPSC-aCMs and TTN-T32756I-iPSC-CMs exhibited similar calcium transient amplitudes as the WT."

      We thank the reviewer for highlighting this critical point and apologize for the lack of clarity. We intended to distinguish between changes in contractile force and contractile dynamics. Specifically, the increased contraction amplitude observed in TTN-T32756I iPSCaCMs reflects enhanced contractile force, whereas the reduced contraction duration and impaired relaxation reflect abnormalities in contractile kinetics. Together, these findings indicate that the TTN-T32756I variant alters both the strength and the temporal dynamics of contraction, consistent with dysfunctional mechanical performance. We have revised the text accordingly to more accurately convey these results (Lines 187–192).

      Lines 187-192: “Compared to WT, the beating frequency of the TTN-T32756I-iPSC-aCMs was significantly increased (52 ± 7.8 vs. 98 ± 7.5 beats per min, P=0.001; Figure 2C) coupled with the reduction of the contraction duration (456.5 ± 61.45 vs 262.9 ± 48.16 msec, P=0.032; Figure 2D), the peak-to-peak time (1529 ± 195.5 vs 636.6 ± 135.8 msec, P=0.004; Supplementary Figure 3B),  and the relaxation (281.5 ± 42.95 vs 79.40 ± 21.14 msec, P=0.003; Supplementary Figure 3A).”

      Reviewer #3 (Public review):

      Summary:

      The authors describe the abnormal contractile function and cellular electrophysiology in an iPSC model of atrial myocytes with a titin missense variant. They provide contractility data by sarcomere length imaging, calcium imaging, and voltage clamp of the repolarizing current iKs. While each of the findings is interesting, the paper comes across as too descriptive because there is no data merging to support a cohesive mechanistic story/statement, especially from the electrophysiological standpoint. There is not enough support for the title "A Titin Missense Variant Causes Atrial Fibrillation", since there is no strong causative evidence. There is some interesting clinical data regarding the variant of interest and its association with HF hospitalization, which may lead to future important discoveries regarding atrial fibrillation.

      Strengths:

      The manuscript is well written, and a wide range of experimental techniques are used to probe this atrial fibrillation model.

      Weaknesses

      (1) While the clinical data is interesting, it is essential to rule out heart failure with preserved EF as a confounder. HFpEF leads to AF due to increased atrial remodeling, so the fact that patients with this missense variant have increased HF hospitalizations does not necessarily directly support the variant as causative of AF. It could be that the variant is associated directly with HFpEF instead, and this needs to be addressed and corrected in the analyses.

      We appreciate the reviewer’s insightful comment and agree that HFpEF-related atrial remodeling could represent a potential confounder in the association between TTN missense variants and AF. The primary aim of our clinical analysis was to assess the potential significance of TTNmv in AF, recognizing the inherent limitations of retrospective observational data in establishing causality. To complement this, our in vitro studies were specifically designed to demonstrate that TTNmv can alter the electrophysiological substrate, thereby predisposing to AF independent of clinical comorbidities.

      While HFpEF is an important consideration, to our knowledge, no existing literature directly implicates TTNmv in HFpEF pathogenesis. In contrast, loss-of-function TTN variants are more commonly associated with HFrEF and dilated cardiomyopathy, and even these associations remain an area of active debate. To address potential confounding in our cohort, we adjusted for reduced ejection fraction in multivariable analyses of clinical outcomes. Additionally, we performed a sensitivity analysis excluding patients with nonischemic dilated cardiomyopathy (Supplementary Table 6). Together, these approaches mitigate the potential impact of heart failure subtypes on our findings, while our mechanistic studies strengthen the argument that TTNmv may contribute directly to AF susceptibility.

      (2) All contractility and electrophysiologic data should be done with pacing at the same rate in both control and missense variant groups, to control for the effect of cycle length on APD and calcium loading. A shorter APD cannot be claimed when the firing rate of one set of cells is much faster than the other, since shorter APD is to be expected with a quicker rate. Similarly, contractility is affected by diastolic interval because of the influence of SR calcium content on the myocyte power stroke. So the cells need to be paced at the same rate in the IonOptix for any direct comparison of contractility. The authors should familiarize themselves with the concept of electrical restitution.

      We thank the reviewer for this crucial technical comment. iPSC-derived cardiomyocytes (iPSC-CMs) are known to exhibit spontaneous automaticity due to the presence of pacemaker-like currents and reduced I<sub>K1</sub>, which enables interrogation of their intrinsic electrophysiological properties and disease-relevant remodeling. In our study, we leveraged this feature to test the hypothesis that TTN missense variants alter electrophysiological properties through ion channel remodeling. That said, we fully agree with the reviewer that pacing iPSCCMs at a controlled cycle length is essential for minimizing rate-dependent effects on APD, calcium handling, and contractility, and would improve the interpretability of group comparisons. While iPSC-CMs with matched genetic backgrounds are expected to display broadly comparable electrophysiological profiles, biological and technical variability can influence spontaneous beating rates, thereby confounding direct comparisons. To address this, we have incorporated pacing protocols into our revised experimental design to ensure that APD and contractility measurements are obtained under identical cycle lengths, consistent with the concept of electrical restitution.

      (3) It is interesting that the firing rate of the myocytes is faster with the missense variant. This should lead to a hypothesis and investigation of abnormal automaticity or triggered activity, which may also explain the increased contractility since all these mechanisms are related to the SR's calcium clock and calcium loading. See #2 above for suggestions on how to probe calcium handling adequately. Such an investigation into impulse initiation mechanisms would be compelling in supporting the primary statement of the paper since these are actual mechanisms thought to cause AF.

      We thank the reviewer for this insightful suggestion. We agree that the faster firing rate observed in TTN-T32756I iPSC-aCMs raises the possibility of abnormal automaticity or triggered activity, both of which are highly relevant to AF pathophysiology. As these mechanisms are tightly coupled to calcium handling and the SR calcium clock, further probing of calcium cycling abnormalities would provide valuable mechanistic insights. While this level of investigation is beyond the scope of the current study, we view it as a compelling future direction that could directly link TTN missense variants to impulse initiation abnormalities contributing to AF. 

      (4) The claim of shortened APD without correcting for cycle length is problematic. However, linking shortened APD in isolated cells alone to AF causation is more complicated. To have a setup for reentry, there must be a gradient of APD from short to long, and this can only be demonstrated at the tissue level, not at the cellular level, so reentry should not be invoked here. If shortened APD is demonstrated with correction of the cycle length problem, restitution curves can be made showing APD shortening at different cycle lengths. If restitution is abnormal (i.e. the APD does not shorten normally in relation to the diastolic interval), this may lead to triggered activity which is an arrhythmogenic mechanism. This would also tie in well with the finding of abnormally elevated iKs current since iKs is a repolarizing current directly responsible for restitution.

      We thank the reviewer for this necessary clarification. We agree that isolated cell studies cannot directly demonstrate reentrant circuits and that reentry should not be inferred solely from cellular APD data. Our observation of shortened APD and abnormal beating patterns in TTN-T32756I iPSC-aCMs suggests ion channel remodeling that may predispose to arrhythmogenic conditions. Still, we recognize that tissue-level gradients of APD are required to establish reentry as a mechanism. Accordingly, we have removed mention of “the reentrant mechanism” from the revised manuscript and limited our interpretation to the cellular findings. Future studies incorporating pacing protocols and restitution curve analyses will be valuable in determining whether abnormal APD restitution and elevated I<sub>Ks</sub> contribute to triggered activity, thereby providing a more direct mechanistic link to AF (Lines 101–105).

      Lines 101-105: “Our study showed that the TTN-T32756I iPSC-aCMs exhibited a striking AF-like EP phenotype in vitro, and transcriptomic analyses revealed that the TTNmv increases the activity of the FHL2, which then modulates the slow delayed rectifier potassium current (I<sub>Ks</sub>) to cause AF.” 

      Reviewer #1 (Recommendations for the authors):

      Electrophysiological Phenotype in Ventricular CMs: Has the iPSC line carrying TTN-T32756I been differentiated into ventricular cardiomyocytes (iPSC-vCMs)? The reported cellular phenotype in iPSC-aCMs does not seem to specifically reflect an AF phenotype. Does the variant produce similar electrophysiological alterations in iPSC-vCMs?

      We thank the reviewer for this thoughtful comment. To date, we have not differentiated the TTN-T32756I iPSC line into ventricular cardiomyocytes (iPSC-vCMs). Our current work focuses on iPSC-aCMs, where we demonstrate that the AF-associated TTNT32756I variant induces ion channel remodeling and abnormal beating patterns, thereby creating a potential arrhythmogenic substrate relevant to AF. We agree that investigating whether this variant produces similar or distinct electrophysiological alterations in iPSC-vCMs would provide essential insights into chamber-specific effects and broaden our mechanistic understanding. We have acknowledged this as a future direction in the revised manuscript (Lines 422–425).

      Lines 422-425: “While we have not yet explored the effect of TTN-T32756I in iPSC-derived ventricular cardiomyocytes, it would be interesting to investigate whether this variant produces similar or distinct electrophysiological alterations in the ventricular cardiomyocytes.”

    1. eLife Assessment

      This study provides valuable insights into the role of the NF-kB and IKK signaling pathways in γδ T cell development and survival, using robust genetic mouse models. While the research is methodologically sound, some conclusions require further evidence, with incomplete analyses, particularly regarding cell-intrinsic effects and mechanistic details. Overall, the findings are significant for immunologists interested in innate-like T cell biology and advancing the understanding of γδ T cell differentiation and maintenance.

    2. Reviewer #1 (Public review):

      Summary:

      The NF-kB signaling pathway plays a critical role in the development and survival of conventional alpha beta T cells. Gamma delta T cells are evolutionarily conserved T cells that occupy a unique niche in the host immune system and that develop and function in a manner distinct from conventional alpha beta T cells. Specifically, unlike the case for conventional alpha beta T cells, a large portion of gamma delta T cells acquire functionality during thymic development, after which they emigrate from the thymus and populate a variety of mucosal tissues. Exactly how gamma delta T cells are functionally programmed remains unclear. In this manuscript, Islam et al. use a wide variety of mouse genetic models to examine the influence of the NF-kB signaling pathway on gamma delta T cell development and survival. They find that the inhibitor of kappa B kinase complex (IKK) is critical to the development of gamma delta T1 subsets, but not adaptive/naïve gamma delta T cells. In contrast, IKK-dependent NF-kB activation is required for their long-term survival. They find that caspase 8-deficiency renders gamma delta T cells sensitive to RIPK1-mediated necroptosis, and they conclude that IKK repression of RIPK1 is required for the long-term survival of gamma delta T1 and adaptive/naïve gamma delta T cells subsets. These data will be invaluable in comparing and contrasting the signaling pathways critical for the development/survival of both alpha beta and gamma delta T cells.

      The conclusions of the paper are mostly well-supported by the data, but some aspects need to be clarified.

      (1) The authors appear to be excluding a significant fraction of the TCRlow gamma delta T cells from their analysis in Figure 1A. Since this population is generally enriched in CD25+ gamma delta T cells, this gating strategy could significantly impact their analysis due to the exclusion of progenitor gamma delta T cell populations.

      (2) The overall phenotype of the IKKDeltaTCd2 mice is not described in any great detail. For example, it is not clear if these mice possess altered thymocyte or peripheral T cell populations beyond that of gamma delta T cells. Given that gamma delta T cell development has been demonstrated to be influenced by gamma delta T cells (i.e, trans-conditioning), this information could have aided in the interpretation of the data. Related to this, it would have been helpful if the authors provided a comparison of the frequencies of each of the relevant subsets, in addition to the numbers.

      (3) The manner in which the peripheral gamma delta T cell compartment was analyzed is somewhat unclear. The authors appear to have assessed both spleen and lymph node separately. The authors show representative data from only one of these organs (usually the lymph node) and show one analysis of peripheral gamma delta T cell numbers, where they appear to have summed up the individual spleen and lymph node gamma delta T cell counts. Since gamma deltaT17 and gamma deltaT1 are distributed somewhat differently in these compartments (lymph node is enriched in gamma deltaT17, while spleen is enriched in gamma deltaT1), combining these data does not seem warranted. The authors should have provided representative plots for both organs and calculated and analyzed the gamma delta T cell numbers for both organs separately in each of these analyses.

      (4) The authors make extensive use of surrogate markers in their analysis. While the markers that they choose are widely used, there is a possibility that the expression of some of these markers may be altered in some of their genetic mutants. This could skew their analysis and conclusions. A better approach would have been to employ either nuclear stains (Tbx21, RORgammaT) or intracellular cytokine staining to definitively identify functional gamma deltaT1 or gamma deltaT17 subsets.

      (5) The analysis and conclusion of the data in Figure 3A is not convincing. Because the data are graphed on log scale, the magnitude of the rescue by kinase dead RIPK1 appears somewhat overstated. A rough calculation suggests that in type 1 game delta T cells, there is ~ 99% decrease in gamma delta T cells in the Cre+WT strain and a ~90% decrease in the Cre+KD+ strain. Similarly, it looks as if the numbers for adaptive gamma delta T cells are a 95% decrease and an 85% decrease, respectively. Comparing these data to the data in Figure 5, which clearly show that kinase dead RIPK1 can completely rescue the Caspase 8 phenotype, the conclusion that gamma delta T cells require IKK activity to repress RIPK1-dependent pathways does not appear to be well-supported. In fact, the data seem more in line with a conclusion that IKK has a significant impact on gamma delta T cell survival in the periphery that cannot be fully explained by invoking Caspase8-dependent apoptosis or necroptosis. Indeed, while the authors seem to ultimately come to this latter conclusion in the Discussion, they clearly state in the Abstract that "IKK repression of RIPK1 is required for survival of peripheral but not thymic gamma delta T cells." Clarification of these conclusions and seeming inconsistencies would greatly strengthen the manuscript. With respect to the actual analysis in Figure 3A, it appears that the authors used a succession of non-parametric t-tests here without any correction. It may be helpful to determine if another analysis, such as ANOVA, may be more appropriate.

      (6) The conclusion that the alternative pathway is redundant for the development and persistence of the major gamma delta T cell subsets is at odds with a previous report demonstrating that Relb is required for gamma delta T17 development (Powolny-Budnicka, I., et al., Immunity 34: 364-374, 2011). This paper also reported the involvement of RelA in gamma delta T17 development. The present manuscript would be greatly improved by the inclusion of a discussion of these results.

      (7) The data in Figures 1C and 3A are somewhat confusing in that while both are from the lymph nodes of IKKdeltaTCD2 mice, the data appear to be quite different (In Figure 3A, the frequency of gamma delta T cells increases and there is a near complete loss of the CD27+ subset. In Figure 1A, the frequency of gamma delta T cells is drastically decreased, and there is only a slight loss of the CD27+ subset.)

    3. Reviewer #2 (Public review):

      This study presents a comprehensive genetic dissection of the role of IKK signaling in the development and maintenance of lymphoid gd T cells. By employing a variety of conditional and mutant mouse models, the authors demonstrate that IKK-dependent NF-κB activation is essential for the generation of type 1 gd T cells, while adaptive gd T cells require this pathway primarily for long-term survival. The use of multiple complementary genetic strategies, including IKK deletion and modulation of RIPK1 and CASPASE8 activity, provides robust mechanistic insight into subset-specific regulation of gd T cell homeostasis. Overall, the study provides mechanistic insight for IKK-dependent regulation of gd T cell development and peripheral maintenance. However, additional experiments can be performed to improve this manuscript and its interpretations.

      Specific Concerns:

      (1) All approaches used confer changes to the entire T cell compartment. Therefore, the authors are unable to resolve whether the observations are mediated by direct and/or indirect effects (e.g., disorganized lymphoid architecture impacting maintenance/survival/homing).

      (2) Assessment of factors that impact T cell numbers in the periphery is necessary. Are there observable changes to the proliferation, survival, and migration of gd T cell subsets?

      (3) TCRd chain usage, especially among type 3 gd T cells, should be assessed.

      (4) The functional consequences of IKK signaling on gd T cells were largely unaddressed. Cytokine analyses were performed only in the RIPK1D138N Casp8∆TCD2 model, leaving open the question of how canonical NF-κB-dependent signaling impacts the long-term functionality of gd T cells.

      (5) The authors suggest that Caspase 8 is required for the development and maintenance of type 3 gd T cells. While the authors discussed the limitations of assessing adult mice in interpreting the data, it seems like a relatively straightforward experiment to perform.

      (6) While analyses of Casp8∆TCD2 RIPK1D138N mice suggest that loss of adaptive and type 1 gamma delta T cells in Casp8∆TCD2 animals is due to necroptosis, the contribution of RIPK3 kinase activity remains unexamined. RIPK3 activity determines whether cells die via necroptosis or apoptosis in RIPK1/Caspase8-dependent signaling, and inclusion of this analysis would strengthen mechanistic insights.

      (7) Canonical NF-κB signaling through cRel alone was not evaluated, leaving a gap in the understanding of transcriptional pathways required for gd T cell subsets.

    4. Reviewer #3 (Public review):

      Summary

      The regulation of NF-κB signaling is complex and central to the differentiation and homeostasis of αβT cells, essential to adaptive immunity. γδ T cells are a distinct population that responds to stress/injury-induced cues by producing inflammatory cytokines, representing an important bridge between innate and adaptive immunity. This study from Islam et al. demonstrates that the IKK complex, a central regulator of NF-κB signaling, plays distinct and essential roles in the differentiation and maintenance of γδ T cells. The authors use elegant murine genetic models to generate clear data that disentangle these requirements in vivo.

      Although NF-κB activity was found to be dispensable for specification of γδ T cell progenitors and the generation of adaptive γδ T cells, it was required for both the ontogeny of type 1 γδ T cells and the survival of mature adaptive γδ T cells. Subunit-specific analyses revealed parallels with αβ T cells: RELA was necessary for type 1 γδ T cell development, while maintenance of adaptive γδ T cells relied upon redundancy between REL subunits, with cREL and p50 compensating in the absence of RELA but not vice versa. These findings reflect distinct biological requirements for ontogeny versus maintenance, likely driven by differences in receptor signaling, such as TCR and TNFRSF family members. Moreover, IKK also maintained γδ T cell survival through repression of RIPK1-mediated cell death, echoing its dual role in αβ T cells, where it both prevents TNF-induced apoptosis and provides NF-κB-dependent survival signals.

      Strengths:

      The multiple, unique murine genetic models employed for detailed analysis of in vivo γδ T cell differentiation and homeostasis are a major strength of this paper. NF-κB signaling processes are devilishly complex. The conditional mutants generated for this study disentangle the requirements for the various IKK-regulated pathways in γδ T cell differentiation, cell survival, and homeostasis. Data are clearly presented and suitably interpreted, with a helpful synthesis provided in the Discussion. These data will provide a definitive account of the requirements for NF-κB signaling in γδ T cells and provide new genetic models for the community to further study the upstream signals.

      Weaknesses:

      The paper would benefit greatly from a graphical abstract that could summarize the key findings, making the key findings accessible to the general immunology or biochemistry reader. Ideally, this graphic would distinguish the requirements for NF-κB signals sustaining thymic γδ T cell differentiation from peripheral maintenance, taking into account the various subsets and signaling pathways required. In addition, the authors should consider adding further literature comparing the requirements for NF-κB /necroptosis pathways in regulating other non-conventional T cell populations, such as iNKT, MAIT, or FOXP3+ Treg cells. These data might help position the requirements described here for γδ T cells compared to other subsets, with respect to homeostatic cues and transcriptional states.

      Last and least, there are multiple grammatical errors throughout the manuscript, and it would benefit from further editing. Likewise, there are some minor errors in figures (e.g., Figure 3A, add percentage for plot from IKKDT.RIPK1D138N mouse; Figure 7, "Adative").

    1. eLife Assessment

      This study is useful because it challenges the widely accepted notion that muscle stem cell numbers decline with aging, providing novel insights into population heterogeneity and the identification of new surface markers for geriatric MuSCs. However, the evidence is considered incomplete due to insufficient quantitative comparisons of absolute cell numbers, limited analysis of age groups (particularly the lack of "aged" mice as opposed to geriatric), and the need for further functional and mechanistic validation of key subpopulations. Additional concerns that require clarification include the lack of statistical rigor in some experiments, the presentation of supporting data not being complete, and the overextension of claims relating to senescence and new marker validation. Overall, while the findings advance understanding of MuSC aging, the conclusions drawn by the authors should be strengthened with expanded experiments and more rigorous data analysis.

    2. Reviewer #1 (Public review):

      It is widely accepted that the number of muscle stem cells (MuSCs) declines with aging, leading to diminished regenerative capacity. In this study, when MuSCs were labeled with YFP at a young age, the authors found that the YFP-positive MuSC population remained stable with aging. However, VCAM1 and Pax7 expression levels were reduced in the YFP-positive MuSCs. These VCAM1-negative/low cells exhibited limited proliferative potential and reduced regenerative ability upon transplantation into MuSC-depleted mice. Furthermore, Vcam1-/low MuSCs were highly sensitive to senolysis and represented the population in which Vcam1 expression could be restored by DHT. Finally, the authors identified CD200 and CD63 as markers capable of detecting the entire geriatric MuSC population, including Vcam1-/low cells. Although numerous studies have reported an age-related decline in MuSC numbers, this study challenges that consensus. Therefore, the conclusions require further careful validation.

      Major comments:

      (1) As mentioned above, numerous studies have reported that the number of MuSCs declines with aging. The authors' claim is valid, as Pax7 and Vcam1 were widely used for these observations. However, age-related differences have also been reported even when using these markers (Porpiglia et al., Cell Stem Cell 2022; Liu et al., Cell Rep 2013). When comparing geriatric Vcam1⁺ MuSCs with young MuSCs in this study, did the authors observe any of the previously reported differences? Furthermore, would increasing the sample size in Figure 1 reveal a statistically significant difference? The lack of significance appears to result from variation within the young group. In addition, this reviewer requests the presentation of data on MuSC frequency in geriatric control mice using CD200 and CD63 in the final figure.

      (2) Can the authors identify any unique characteristics of Pax7-VCAM-1 GER1-MuSCs using only the data generated in this study, without relying on public databases? For example, reduced expression of Vcam1 and Pax7. The results of such analyses should be presented.

      (3) In the senolysis experiment, the authors state that GER1-MuSCs were depleted. However, no data are provided to support this conclusion. Quantitative cell count data would directly address this concern. In addition, the FACS profile corresponding to Figure 4D should be included.

      (4) Figure S4: It remains unclear whether DHT enhances regenerative ability through restoration of the VCAM1 expression in GER1-MuSCs, as DHT also acts on non-MuSC populations. Analyses of the regenerative ability of Senolysis+DHT mice may help to clarify this issue.

      (5) Why are there so many myonuclear transcripts detected in the single-cell RNA-seq data? Was this dataset actually generated using single-nucleus RNA-seq? This reviewer considers it inappropriate to directly compare scRNA-seq and snRNA-seq results.

    3. Reviewer #2 (Public review):

      In this study, Kim et al. explore the heterogeneity within the aged MuSC population using a mouse model that enables lineage tracing of MuSCs throughout life. The questions addressed in the manuscript are highly relevant to the fields of aging and stem cell biology, and the experimental approach overcomes limitations of earlier studies. However, some of the claims would benefit from additional data analysis, and the central claim of the identification of a "previously unrecognized subpopulation" of aged MuSCs should be evaluated in light of prior work that has also examined MuSC heterogeneity in aging.

      Specific points:

      (1) As a general comment that is transversal to multiple figures, several experiments should include a direct comparison to a young cohort. Previous studies have shown that the depletion of subpopulations with aging is observed early in the aging process, for example, the loss of Pax7-high MuSCs is observed already in 18‐month‐old mice (Li, 2019, doi: 10.15252/embj.2019102154). Using only mice at 12-14 months as the control group is therefore insufficient to claim that no changes occur with aging.

      (2) One of the central claims of the manuscript is a challenge to the notion that MuSCs number declines with age. However, the data analysis associated with the quantification of YFP+ cells needs to be expanded to support this conclusion. The authors present YFP+ cells only as a proportion of Lin-neg cells. Since FAP numbers are known to decrease with aging, a stable proportion of YFP+ cells would simply indicate that MuSCs decline at the same rate as FAPs. To more accurately assess changes in MuSC abundance, the authors should report absolute numbers of YFP+ cells normalized to tissue mass (cells/ mg of muscle).

      (3) The authors emphasize that several studies use VCAM1 as a surface marker to identify MuSCs. However, many other groups rely on α7-integrin, and according to Figure 1D, the decline in ITGA7 expression within the YFP+ population is not significant. Therefore, the suggestion that MuSC numbers have been misquantified with aging would apply only to a subset of studies. If the authors can demonstrate that YFP+ cell numbers (normalized per milligram of tissue) remain unchanged in geriatric mice, the discussion should directly address the discrepancies with studies that quantify MuSCs using the Lin−/α7-integrin+ strategy.

      (4) The authors focus their attention on a population of VCAM-low/VCAM-neg subpopulation of MuSCs that is enriched in aging. However, the functional properties of this same population in middle-aged (or young) mice are not addressed. Thus, it remains unclear whether geriatric VCAM-low/VCAM-neg MuSCs lose regenerative potential or whether this subpopulation inherently possesses low regenerative capacity and simply expands during aging.

      (5) According to Figure 1F, the majority of MuSCs appear to fall within the category of VCAM-low or VCAM-neg (over 80% by visual estimate). It would be important to have an exact quantification of these data. As a result, the assays testing the proliferative and regenerative capacity of VCAM-low/negative cells are effectively assessing the performance of more than 80% of geriatric MuSCs, which unsurprisingly show reduced efficiency. Perhaps more interesting is the fact that a population of VCAM-high geriatric MuSCs retains full regenerative potential. However, the existence of MuSCs that preserve regenerative potential into old age has been reported in other studies (Garcia-Prat, 2020, doi: 10.1038/s41556-020-00593-7 ; Li, 2019, doi: 10.15252/embj.2019102154). At this point, the central question is whether the authors are describing the same aging-resistant subpopulations of MuSCs using a new marker (VCAM) or whether this study truly identifies a new subpopulation of MuSCs. The authors should directly compare the YFP+VCAM+ aged cells with other subpopulations that maintain regenerative potential in aging.

      (6) In Figure 3F, it is unclear from the data presentation and figure legend whether the authors are considering the average of fiber sizes in each mouse as a replicate (with three data points per condition), or applied statistical analysis directly to all individual fiber measurements. The very low p-values with n=3 are surprising. It is important to account for the fact that observations from the same mouse are correlated (shared microenvironment, mouse-specific effects) and therefore cannot be considered independent.

      (7) Regarding Figure 5, it is unclear why ITGA7, a classical surface marker for MuSCs that appears unchanged in aged YFP+ MuSCs (Fig. 1F), is considered inadequate for detecting and isolating GERI-MuSCs.

    4. Reviewer #3 (Public review):

      Summary:

      The manuscript by Kim et al. describes a MuSC subpopulation that loses VCam expression in geriatric muscle and shows reduced ability to contribute to muscle regeneration. They propose that this population underlies the reported decline of MuSCs in aged mice, suggesting that these cells remain present in geriatric muscle but are overlooked due to low or absent VCam expression. The identification of a subpopulation that changes with aging would be compelling and of interest to the field.

      Strengths:

      The authors employ a wide range of assays, from in vitro to in vivo systems, to characterize Vcam-low/negative cells from geriatric muscle. The loss of Vcam appears strong in geriatric mice. They further identify CD63 and CD200 as potential surface markers that remain stable with age, thereby enabling the isolation of MuSCs across different age groups.

      Weaknesses:

      Some issues remain before establishing whether this population represents a true functional subset or explains the reported decline in MuSC numbers in aged mice. A stronger fate assessment of Vcam-low/negative cells is needed to assess their propensity for cell death in vitro and in vivo (e.g., engraftment efficiency), and if this plays a role in their conclusions. Comparisons include young, middle-aged, and geriatric mice, but not aged (~24 months) mice, which are needed for direct assessment of previous reports of age-related MuSC decline. The suggestion that the Vcam-low/negative population reflects senescence appears premature, with few consistent markers for this fate, as well as the cells not exhibiting irreversible cell-cycle exit. Finally, validation of CD63 and CD200 as reliable age-independent MuSC markers requires further testing, specifically using the Pax7-YFP tracing model and co-labeling in geriatric mice.

    1. eLife Assessment

      This useful paper examined the mechanism of planar cell polarity (PCP) using Drosophila pupal wing, investigating how 'cellular level', 'molecular level' and 'tissue level' mechanisms intersect to establish PCP. This represents progress for the field and the conclusions are mostly backed up by solid data. Whereas the manuscript is sound overall, remaining concerns could be addressed by textual clarification of the concepts used in the manuscript.

      [Editors' note: this paper was reviewed by Review Commons and revised by the authors.]

    2. Reviewer #1 (Public review):

      The authors use inducible Fz::mKate2-sfGFP to explore "cell-scale signaling" in PCP. They reach several conclusions. First, they conclude that cell-scale signaling does not depend on limiting pools of core components (other than Fz). Second, they conclude that cell-scale signaling does not depend on microtubule orientation, and third, they conclude that cell-scale signaling is strong relative to cell to cell coupling of polarity.

      There are some interesting inferences that can be drawn from the manuscript, but there are also some significant challenges in interpreting the results and conclusions from the work as presented. I suggest that the authors 1) define "cell-scale signaling," as the precise meaning must be inferred, 2) reconsider some premises upon which some conclusions depend, 3) perform an essential assay validation, and 4) explain some other puzzling inconsistencies.

      Major concerns (first round of review):

      The exact meaning of cell-scale signaling is not defined, but I infer that the authors use this term to describe how what happens on one side of a cell affects another side. The remainder of my critique depends on this understanding of the intended meaning.

      The authors state that any tissue wide directional information comes from pre-existing polarity and its modification by cell flow, such that the de novo signaling paradigm "bypasses" these events and should therefore not be responsive to any further global cues. It is my understanding that this is not a universally accepted model, and indeed, the authors' data seem to suggest otherwise. For example, the image in Fig 5B shows that de novo induction restores polarity orientation to a predominantly proximal to distal orientation. If no global cue is active, how is this orientation explained? The 6 hr condition, that has only partial polarity magnitude, is quite disordered. Do the patterns at 8 and 10 hrs become more proximally-distally oriented? It is stated that they all show swirls, but please provide adult wing images, and the corresponding orientation outputs from QuantifyPolarity to help validate the notion that the global cues are indeed bypassed by this paradigm.

      It is implicit that, in the de novo paradigm, polarization is initiated immediately or shortly after heat shock induction. However, the results should be differently interpreted if the level of available Fz protein does not rise rapidly and then stabilize before the 6 hr time point, and instead continues to rise throughout the experiment. Western blots of the Fz::mKate2-sfGFP at time points after induction should be performed to demonstrate steady state prior to measurements. Otherwise, polarity magnitude could simply reflect the total available pool of Fz at different times after induction. Interpreting stability is complex, and could depend on the same issue, as well as the amount of recycling that may occur. Prior work from this lab using FRAP suggested that turnover occurs, and could result from recycling as well as replenishment from newly synthesized protein.

      From the Fig 3 results, the authors claim that limiting pools of core proteins do not explain cell-scale signaling, a result expected based on the lack of phenotypes in heterozygotes, but of course they do not test the possibility that Fz is limiting. They do note that some other contributing protein could be.

      In Fig 3, it is unclear why the authors chose to test dsh1/+ rather than dsh[null]/+. In any case, the statistically significant effect of Dsh dose reduction is puzzling, and might indicate that the other interpretation is correct. Ideally, a range including larger and smaller reductions would be tested. As is, I don't think limiting Dsh is ruled out.

      The data in Fig 5 are somewhat internally inconsistent, and inconsistent with the authors' interpretation. In both repolarization conditions, the authors claim that repolarization extends only to row 1, and row 1 is statistically different from non-repolarized row 1, but so too is row 3. Row 2 is not. This makes no sense, and suggests either that the statistical tests are inappropriate and/or the data is too sparse to be meaningful. For the related boundary intensity data in Fig 6, the authors need to describe exactly how boundaries were chosen or excluded from the analysis. Ideally, all boundaries would be classified as either meido-lateral (meaning anterior-posterior) or proximal-distal depending on angle.

      If the authors believe their Fig 5 and 6 analyses, how do they explain that hairs are reoriented well beyond where the core proteins are not? This would be a dramatic finding, because as far as I know, when core proteins are polarized, prehair orientation always follows the core protein distribution. Surprisingly, the authors do not so much as comment about this. The authors should age their wings just a bit more to see whether the prehair pattern looks more like the adult hair pattern or like that predicted by their protein orientation results.

    3. Reviewer #2 (Public Review):

      This paper aims to dissect the relative importance of the various cues that establish PCP in the wing disc of Drosophila, which remains a prominent and relevant model for PCP. The authors suggest that one must consider cues at three scales (molecular, cell and tissue) and specifically design tests for the importance of cell-level cues, which they call non-local cell scale signalling. They develop clever experimental approaches that allow them to track complex stability and also to induce polarity at experimentally defined times. In a first set of experiments, they restore PCP after the global cues have disappeared (de novo polarisation) and conclude from the results that another (cell scale) cue must exist. In another set of experiments, they show that de novo repolarization is robust to the dosage of various components of core PCP, leading them to conclude that there must be an underlying cell scale polarity, which, apparently, has nothing to do with microtubule or cell shape polarity. They then describe nice evidence that de novo polarisation is relatively short range both in a polarised and unpolarised field. They conclude that there is a strong cell-intrinsic polarity that remains to be characterised.

      Major concerns (first round of review):

      (1) The first set of repolarisation experiments is performed after the global cell rearrangements that have been shown to act as global signals. However, this approach does not exclude the possible contribution of an unknown diffusible global signal.

      (2) The putative non-local cell scale signal must be more precisely defined (maybe also given a better name). It is not clear to me that one can separate cell-scale from molecular-scale signal. Local signals can redistribute within a cell (or membrane) so local signals are also cell-scale. Without a clear definition, it is difficult to interpret the results of the gene dosage experiments. The link between gene dosage and cell-scale signal is not rigorously stated. Related to this, the concluding statement of the introduction is too cryptic.

      Critique:

      The experiments described in this paper are of high quality with a sophisticated level of design and analysis. However, there needs to be some recalibration of the extent of the conclusions that can be drawn. Moreover, a limitation of this paper is that, despite the quality of their data, they cannot give a molecular hint about the nature of their proposed cell-scale signal.

    4. Reviewer #3 (Public Review):

      The manuscript by Carayon and Strutt addresses the role of cell-scale signaling during the establishment of planar cell polarity (PCP) in the Drosophila pupal wing. The authors induce locally the expression of a tagged core PCP protein, Frizzled, and observe and analyze the de novo establishment of planar cell polarity. Using this system, the authors show that PCP can be established within several hours, that PCP is robust towards variation in core PCP protein levels, that PCP proteins do not orient microtubules, and that PCP is robust towards 'extrinsic' re-polarization. The authors conclude that the polarization at the cell-scale is strongly intrinsic and only weakly affected by the polarity of neighboring cells.

      Major comments (first round of review):

      The data are clearly presented and the manuscript is well written. The conclusions are well supported by the data. 

      (1) The authors use a system to de novo establish PCP, which has the advantage of excluding global cues orienting PCP and thus to focus on the cell-intrinsic mechanisms. At the same time, the system has the limitation that it is unclear to what extent de novo PCP establishment reflects 'normal' cell scale PCP establishment, in particular because the Gal4/UAS expression system that is used to induce Fz expression will likely result in much higher Fz levels compared with the endogenous levels. The authors should briefly discuss this limitation.

      (2) Fig. 3. The authors use heterozygous mutant backgrounds to test the robustness of de novo PCP establishment towards (partial) depletion in core PCP proteins. The authors conclude that de novo polarization is 'extremely robust to variation in protein level'. Since the authors (presumably) lowered protein levels by 50%, this conclusion appears to be somewhat overstated. The authors should tune down their conclusion.

      Significance: 

      The manuscript contributes to our understanding of how planar cell polarity is established. It extends previous work by the authors (Strutt and Strutt, 2002,2007) that already showed that induction of core PCP pathway activity by itself is sufficient to induce de novo PCP. This manuscript further explores the underlying mechanisms. The authors test whether de novo PCP establishment depends on an 'inhibitory signal', as previously postulated (Meinhardt, 2007), but do not find evidence. They also test whether core PCP proteins help to orient microtubules (which could enhance cell intrinsic polarization of core PCP proteins), but, again, do not find evidence, corroborating previous work (Harumoto et al, 2010). The most significant finding of this manuscript, perhaps, is the observation that local de novo PCP establishment does not propagate far through the tissue. A limitation of the study is that the mechanisms establishing intrinsic cell scale polarity remain unknown. The work will likely be of interest to specialists in the field of PCP.

      Summary of comments from the Reviewing Editor on the revised version:

      In the introduction, when you refer to Figure 1, the definition of Molecular, cellular, tissue scale is indeed not too clear to outside readers. For example, when you first refer to 'cell scale' you define it 'non-local', but probably it is not clear to many readers 'non-local' means 'the mechanism that cannot be explained by 'molecular scale'. (because 'molecular scale = local' is only inferred).

      The 'conclusion paragraph' at the end of the Introduction does not have conclusion (only explained 'which question was tested by which method').

      Minor comments that can easily be addressed by textual edits:

      – they do not explain why gene dosage affects constitutive but not de novo polarization. It seems to me that one would expect de novo to be at least as sensitive if not more.

      – Unconventional nomenclature for tissue axes - mediolateral, horizontal - are frequently used. These are sometimes difficult to parse. Please stick with universally accepted anterior, posterior, proximal and distal.

    5. Author response:

      (1) General Statements

      Our manuscript studies mechanisms of planar polarity establishment in vivo in the Drosophila pupal wing. Specifically we seek to understand mechanisms of ‘cell-scale signalling’ that is responsible for segregating core pathway planar polarity proteins to opposite cell edges. This is an understudied question, in part because it is difficult to address experimentally.

      We use conditional and restrictive expression tools to spatiotemporally manipulate core protein activity, combined with quantitative measurement of core protein distribution, polarity and stability. Our results provide evidence for a robust cell-scale signal, while arguing against mechanisms that depend on depletion of a limited pool of a core protein or polarised transport of core proteins on microtubules. Furthermore, we show that polarity propagation across a tissue is hard, highlighting the strong intrinsic capacity of individual cells to establish and maintain planar polarity.

      The original manuscript received three fair and thorough peer-reviews, which raised many important points. In response, we decided to embark on a full revision that attempts to answer all of the points. We have included new data to support our conclusions in Supplemental Figures 1, 2 and 5.

      Additionally in response to the reviewers we have revised the manuscript title, which is now ‘Characterisation of cell-scale signalling by the core planar polarity pathway during Drosophila wing development’.

      (2) Point-by-point description of the revisions

      We thank all of the reviewers for their thorough and thoughtful review of our manuscript. They raise many helpful points which have been extremely useful in assisting us to revise the manuscript.

      In response we have carried out a major revision of the manuscript, making numerous changes and additions to the text and also adding new experimental data. Specific changes are listed after our detailed response to each comment.

      Reviewer #1:

      […] Major points:

      The exact meaning of cell-scale signaling is not defined, but I infer that the authors use this term to describe how what happens on one side of a cell affects another side. The remainder of my critique depends on this understanding of the intended meaning.

      As the reviewer points out, it is important that the meaning of the term ‘cell-scale signalling’ is clear to the reader and in response to their comment we have had another go at defining it explicitly in the Introduction to the manuscript.

      Specifically, we use the term ‘cell-scale signalling’ to describe possible intracellular mechanisms acting on core protein segregation to opposite cell membranes during core pathway dependent planar polarisation. For example, this could be a signal from distal complexes at one side of the cell leading to segregation of proximal complexes to the opposite cell edge, or vice versa. See also our response to Reviewer #2 regarding the distinction between ‘molecular-scale’ and ‘cell-scale’ signalling. 

      Changes to manuscript: Revised definition of ‘cell-scale signalling’ in Introduction.

      The authors state that any tissue wide directional information comes from pre-existing polarity and its modification by cell flow, such that the de novo signaling paradigm "bypasses" these events and should therefore not be responsive to any further global cues. It is my understanding that this is not a universally accepted model, and indeed, the authors' data seem to suggest otherwise. For example, the image in Fig 5B shows that de novo induction restores polarity orientation to a predominantly proximal to distal orientation. If no global cue is active, how is this orientation explained?

      We assume that the reviewer’s point is that it is not universally accepted that de novo induction after hinge contraction leads to uncoupling from global cues (rather than that it is not accepted that hinge contraction remodels radial polarity to a proximodistal pattern). We are (we believe) the only lab that has used de novo induction as a tool, and we’re not aware of any debate in the literature about whether this bypasses global cues. Nevertheless, we accept that it is hard to prove there is no influence of global cues, when the nature of those cues and the time at which they act remain unclear. Below we summarise the reasons why we believe there are not significance effects of global cues in our experiments that would influence the interpretation of our results.

      First, our reading of the literature supports a broad consensus that an early radial core planar polarity pattern is realigned by cell flow produced by hinge contraction beginning at around 16h APF (e.g. Aigouy et al., 2010; Strutt and Strutt, 2015; Aw and Devenport, 2017; Butler and Wallingford, 2017; Tan and Strutt, 2025). Taken at face value, this suggests that there are ‘radial’ cues present prior to hinge contraction, maybe coming from the wing margin – arguably these radial cues could be Ft-Ds or Wnts or both, given they are expressed in patterns consistent with such a role (notwithstanding the published evidence arguing against roles for either of these cues). It then appears that hinge contraction supercedes these cues to convert a radial pattern to a proximodistal pattern – whether the radial cues that affect the core pathway earlier remain active after hinge contraction is unclear, although both Ft-Ds and Wnts appear to maintain their ‘radial’ patterns beyond the beginning of hinge contraction (e.g. Merkel et al., 2014; Ewen-Campen et al., 2020; Yu et al., 2020).

      We think that the reviewer is proposing the presence of a proximodistal cue that is active in the proximal region of the wing that we use for our experiments shown e.g. in Fig.5, and that this cue orients core polarity here (but not elsewhere in the wing) in a time window after 18h APF. Ft-Ds and Wnts do not seem to be plausible candidates as they are still in ‘radial’ patterns. This leaves either an unknown proximodistal cue (a gradient of some unknown signalling molecule?), or possibly some ability of hinge contraction to align proximodistal polarity specifically in this wing region but not elsewhere. We cannot definitively rule out either of these possibilities, but neither do we think there is sufficient evidence to justify invoking their existence to explain our observations.

      In particular, the reason that we don’t think there is a proximodistal cue in the proximal part of the wing after 18h APF, is that work from our lab shows that induction of Fz or Stbm expression at times around or after the start of hinge contraction (i.e. >16 h APF) results in increasing levels of trichome swirling with polarity not being coordinated with the tissue axis either proximally or distally (Strutt and Strutt, 2002; Strutt and Strutt 2007). Our simplest interpretation for this is that induction at these stages fails to establish the early radial pattern of core pathway polarity and hence hinge contraction cannot reorient radial to proximodistal. If hinge contraction alone could specify proximodistal polarity in the absence of the earlier radial polarity, then we would not expect to see swirling over much of the proximal wing (where the forces from hinge contraction are strongest (Etournay et al., 2015)).

      In this manuscript, our earliest de novo experiments begin with Fz induction at 18h APF (de novo 10h), then at 20h APF (de novo 8h) and at 22h APF (de novo 6h). The image in Fig. 5B, referred to by the reviewer, is of a wing where Fz is induced de novo at 22 h APF. In these wings, as expected, the core proteins localise asymmetrically in stereotypical swirling patterns throughout the wing surface (see Fig. 2M and also Strutt and Strutt, 2002; Strutt and Strutt 2007), but – usefully for our experiments – they broadly localise along the proximal-distal axis in the region analysed in Fig. 5B. Given the strong swirling in surrounding regions when inducing at >20h APF, we feel reasonably confident in assuming that the pattern is not due to a proximodistal cue present in the proximal wing.

      We appreciate that the original manuscript did not show images including the trichome pattern in adjacent regions, so this point would not have been clear, but we now include these in Supplementary Fig. 5. We have also added a note in the legend to Fig. 5B to clarify that the proximodistal pattern seen is local to this wing region. We apologise for this oversight and the confusion caused and appreciate the feedback.

      The 6 hr condition, that has only partial polarity magnitude, is quite disordered. Do the patterns at 8 and 10 hrs become more proximally-distally oriented? It is stated that they all show swirls, but please provide adult wing images, and the corresponding orientation outputs from QuantifyPolarity to help validate the notion that the global cues are indeed bypassed by this paradigm.

      In all three ‘normal’ de novo conditions (6h, 8h and 10h), regardless of the time of induction, the polarity orientation patterns of Fz-mKate2 in pupal and adult wings are very similar in the experimentally analysed region (Fig. S5B-E). The strong local hair swirling agrees with the previous published data (Strutt and Strutt, 2002; Strutt and Strutt 2007). Overall, we don’t see any evidence that the 10h de novo induction results in more proximodistally coordinated polarity than the 8h or 6h conditions. This is consistent with our contention that there is no global cue present at these stages, which presumably would have a stronger effect when core pathway activity was induced at earlier stages.

      Changes to manuscript: Added additional explanation of the ‘de novo induction’ paradigm and why we believe the resulting polarity patterns are unlikely to be influenced by any global signals in Introduction and Results section ‘Induced core protein relocalisation…’. Added quantification of polarity in the experiment region proximal to the anterior cross-vein in pupal wings (Fig.S5E-E’’’) and zoomed-out images of the surrounding region in adult wings showing that the polarity pattern does not become more proximodistal when induction time is longer, and also that there is not overall proximodistal polarity in proximal regions of the wing (Fig.S5B-D), arguing against an unknown proximodistal polarity cue at these stages of development.

      In the de novo paradigm, polarization is initiated immediately or shortly after heat shock induction. However, the results should be differently interpreted if the level of available Fz protein does not rise rapidly and then stabilize before the 6 hr time point, and instead continues to rise throughout the experiment. Western blots of the Fz::mKate2-sfGFP at time points after induction should be performed to demonstrate steady state prior to measurements. Otherwise, polarity magnitude could simply reflect the total available pool of Fz at different times after induction. Interpreting stability is complex, and could depend on the same issue, as well as the amount of recycling that may occur. Prior work from this lab using FRAP suggested that turnover occurs, and could result from recycling as well as replenishment from newly synthesized protein. 

      The reviewer raises an important point, which we agree could confound our experimental interpretations. As suggested we have now carried out western blotting and quantitation for Fz::mKate2-sfGFP levels and added these data to Fig.S1 (Fig. S1C,D). Quantified Fz is not significantly different between the three de novo polarity induction timings and not significantly different compared to constitutive Fz::mKate2-sfGFP expression (although there is a trend towards increasing Fz::mKate2-sfGFP protein levels with increasing induction times). These data are consistent with Fz::mKate2-sfGFP being at steady state in our experiments and that levels are sufficient to achieve normal polarity (as constitutive Fz::mKate2-sfGFP does so). Therefore it is unlikely that differing protein levels explain the differing polarity magnitudes at the different induction times. Interestingly, Fz::mKate2-sfGFP levels are lower than endogenous Fz levels, possibly due to lower expression or increased turnover/reduced recycling.

      Changes to manuscript: Added western blot analysis of Fz::mKate2-sfGFP expression under 10h, 8h and 6h induction conditions vs endogenous Fz expression and constitutive Fz::mKate2sfGFP expression (Fig.S1C-D) and discussed in Results section ‘Planar polarity establishment is…’.

      From the Fig 3 results, the authors claim that limiting pools of core proteins do not explain cellscale signaling, a result expected based on the lack of phenotypes in heterozygotes, but of course they do not test the possibility that Fz is limiting. They do note that some other contributing protein could be. 

      Previously published results from our lab (Strutt et al., 2016 Cell Reports; Supplemental Fig. S6E) show that in a heterozygous fz mutant background, Fz protein levels are not affected by halving the gene dosage when compared to wt, suggesting that Fz is most likely produced in excess and is not normally limiting, but that protein that cannot form complexes may be rapidly degraded. We have now added this information to the text.

      Changes to manuscript: Added explanation in text that Fz levels had previously been shown to not be dosage sensitive in Results section ‘Planar polarity establishment is…’ and also added a caveat to the Discussion about not directly testing Fz.

      In Fig 3, it is unclear why the authors chose to test dsh1/+ rather than dsh[null]/+. In any case, the statistically significant effect of Dsh dose reduction is puzzling, and might indicate that the other interpretation is correct. Ideally, a range including larger and smaller reductions would be tested. As is, I don't think limiting Dsh is ruled out. 

      Concerning the choice of dsh allele, we appreciate the query of the reviewer regarding use of dsh[1] instead of a null, as there might be a concern that dsh[1] would give a less strong phenotype. The answer is that over more than two decades we and others have never found any evidence that dsh[1] does not act as a ‘null’ for planar polarity in the pupal wing, and furthermore use of dsh[1] preserves function in Wg signalling – and we would prefer to rule out any phenotypic effects due to any potential cross-talk between the two pathways that might be seen using a complete null. To expand on this point, dsh[1] mutant protein is never seen at cell junctions (Axelrod 2001; Shimada et al., 2001; our own work), and by every criteria we have used, planar polarity is completely disrupted in hemizygous or homozygous mutants e.g. see quantifications of polarity in (Warrington et al., 2017 Curr Biol).

      In terms of the broader point, whether we can rule out Dsh being limiting, we were very careful to be clear that we did not see evidence for Dsh (or other core proteins) being limiting in terms of ‘rates of core pathway de novo polarisation’. When the reviewer says ‘the statistically significant effect of Dsh dose reduction is puzzling’ we believe they are referring to the data in Fig. 3J, showing a small but significantly different reduction in stable Fz in de novo 6h conditions (also seen in 8h de novo conditions, Fig. S3I). As Dsh is known to stabilise Fz in complexes (Strutt et al., 2011 Dev Cell; Warrington et al., 2017 Curr Biol), in itself this result is not wholly surprising. Nevertheless, while this shows that halving Dsh levels does modestly reduce Fz stability, it does not alter our conclusion that halving Dsh levels does not affect Fz polarisation rate under either 6h or 8h de novo conditions.

      Unfortunately, we do not have available to us a practical way of achieving consistent intermediate reductions in Dsh levels (e.g. a series of verified transgenes expressing at different levels). Levels of all the core proteins could be dialled down using transgenes, to see when the system breaks, and indeed we have previously published that lower levels of polarity are seen if Fmi levels are <<50% or if animals are transheterozygous for pk, stbm, dgo or dsh, pk, stbm, dgo simultaneously (Strutt et al., 2016 Cell Reports). However, it seems to be a trivial result that eventually the ability to polarise is lost if insufficient core proteins are present at the junctions. For this reason we have focused on a simple set of experiments reducing gene dosage singly by 50% under two de novo induction conditions, and have been careful to state our results cautiously. The assays we carried out were a great deal of work even for just the 5 heterozygous conditions tested.

      We believe that the experiments shown effectively make the point that there is no strong dosage sensitivity – and it remains our contention that if protein levels were the key to setting up cell-scale polarity, then a 50% reduction would be expected to show an effect on the rate of polarisation. We further note that as Fz::mKate2-sfGFP levels are lower than endogenous Fz levels (see above), the system might be expected to be sensitised to further dosage reductions, and despite this we failed to see an effect on rate of polarisation.

      We note that Reviewer #3 made a similar point about whether we can rule out dosage sensitivity on the basis of 50% reductions in protein level. To address the comments of both reviewers we had now added some further narrative and caveats in the text.

      In a similar vein, Reviewer #2 requested data on whether dosage reduction altered protein levels by the expected amount. We have now added further explanation/references and western blot data to address this.

      Changes to manuscript: Added more explanation of our choice of dsh[1] as an appropriate mutant allele to use in Results section ‘Planar polarity establishment is…’. Added some narrative and caveats regarding whether lowering levels more than 50% would add to our findings in the Discussion. Revised conclusions to be more cautious including altering section title to read ‘Planar polarity establishment is not highly sensitive to variation in protein levels of core complex components’.

      Also added westerns and text/references showing that for the tested proteins there is a reduction in protein levels upon removal of one gene dosage in Results section ‘Planar polarity establishment is…’ and Fig.S2.

      The data in Fig 5 are somewhat internally inconsistent, and inconsistent with the authors' interpretation. In both repolarization conditions, the authors claim that repolarization extends only to row 1, and row 1 is statistically different from non-repolarized row 1, but so too is row 3. Row 2 is not. This makes no sense, and suggests either that the statistical tests are inappropriate and/or the data is too sparse to be meaningful. 

      As we’re sure the reviewer appreciates, this was an extremely complex experiment to perform and analyse. We spent a lot of time trying to find the best way to illustrate the results (finally settling on a 2D vector representation of polarity) and how to show the paired statistical comparisons between different groups. Moreover, in the end we were only able to detect generally quite modest (statistically significant) changes in cell polarity under the experimental conditions.

      However, we note that failure to see large and consistent changes in polarity is exactly the expected result if it is hard to repolarise from a boundary – and this is of course the conclusion that we draw. Conversely, if repolarisation were easy, which was our expectation at least under de novo conditions without existing polarity, then we would have expected large and highly statistically significant changes in polarity across multiple cell rows. Hence we stand by our conclusion that ‘it is hard to repolarise from a boundary of Fz overexpression in both control and de novo polarity conditions’.

      Overall, we were trying to establish three points:

      (1) to demonstrate that repolarisation occurs from a boundary of overexpression i.e. from boundary 0 to row 0

      (2) to establish whether a wave of repolarisation occurs across rows 1, 2 and 3

      (3) to determine if in repolarisation in de novo condition it is easier to repolarise than in repolarisation in the control (already polarised) condition Taking each in turn:

      (1) To detect repolarisation from a boundary relative to the control condition, we have to compare row 0 in repolarisation condition (Fig.5G,K) vs control condition (Fig.5F,J). This comparison shows a significative repolarisation (p=0.0014). From now, row 0 in repolarisation condition is our reference for repolarisation occurring.

      (2) To determine if there is a wave of repolarisation in the repolarisation condition we have to compare row 0 vs row 1 to 3 in the repolarisation condition (Fig.5K). Row 1 is not significantly different to row 0, but rows 2 and 3 are different and the vectors show obviously lower polarity than row 0. Hence no wave of repolarisation is detected over rows 1 to 3.

      (3) To determine if it is easier to repolarise in the de novo condition, our reference for establishment of a repolarisation pattern is the polarisation condition in rows 0 to 3. So, we compare repolarisation condition vs repolarisation in de novo condition, row 0 vs row 0, row 1 vs row 1, row 2 vs row 2 and row 3 vs row 3 – in each case no significative difference in polarity is detected, supporting our conclusion that it is not easier to repolarise in the de novo condition.

      We agree that the variations in row 3 are puzzling, but there is no evidence that this is due to propagation of polarity from row 0, and so in terms of our three questions, it does not alter our conclusions.

      Changes to manuscript: We have extensively revised the text describing the results in Fig.5 to hopefully make the reasons for our conclusions clearer and also be more cautious in our conclusions in Results section ‘Induced core protein relocalisation…’. 

      For the related boundary intensity data in Fig 6, the authors need to describe exactly how boundaries were chosen or excluded from the analysis. Ideally, all boundaries would be classified as either meido-lateral (meaning anterior-posterior) or proximal-distal depending on angle. 

      We thank the reviewer for pointing out that this was not clear.

      All boundaries were classified following their orientation compared to the Fz over-expression boundary using hh-GAL4 expressed in the wing posterior compartment. Horizontal junctions were defined as parallel to the Fz over-expression boundary (between 0 and 45 degrees) and mediolateral junctions as junctions linking two horizontal boundaries (between 45 and 90 degrees).

      Changes to manuscript: The boundary classification detailed above has been added in the Materials and Methods.

      If the authors believe their Fig 5 and 6 analyses, how do they explain that hairs are reoriented well beyond where the core proteins are not? This would be a dramatic finding, because as far as I know, when core proteins are polarized, prehair orientation always follows the core protein distribution. Surprisingly, the authors do not so much as comment about this. The authors should age their wings just a bit more to see whether the prehair pattern looks more like the adult hair pattern or like that predicted by their protein orientation results.

      Again the reviewer makes an interesting point, and we agree that this is something that we should have more directly addressed in the manuscript.

      There are three reasons why we might expect adult trichomes to show a different effect from the measured core protein polarity pattern seen in our experiments:

      (i) we are assaying core protein polarity at 28h APF, but trichomes emerge at >32h APF, so there is still time for polarity to propagate a bit further from the boundary. We now have added data showing that by the point of trichome initiation, the wave of polarisation extends 3-4 cell rows (Fig.S5A).

      (ii) it has long been known that a strong localisation of core proteins at a cell edge is not required for polarisation of trichome polarity from a boundary. For instance, in Strutt & Strutt 2007 we show clones of cells overexpressing Fz causing propagation through pk[pk-sple] mutant tissue where there is no detectable core protein polarity. We were following up prior observations of Adler et al., 2000 in the wing and Lawrence et al., 2004 in the abdomen.

      (iii) there is evidence to suggest that the polarity of adult trichomes is locally coupled, possibly mechanically. This point is hard to prove without live imaging taking in both initial core protein localisation, the site of actin-rich trichome initiation and then the final orientation of the much larger microtubule filled trichome, and we’re not aware that such data exist. However, Wong & Adler 1993 (JCB) showed that over a number of hours trichomes become much larger and move towards the centre of the cell, presumably becoming decoupled from any core protein cue. The images in Guild … & Tilney, 2005 (MBoC)  are also interesting to look at in this regard. Finally, septate junction proteins have been implicated in local alignment of trichomes, independently of the core pathway (Venema … & Auld, 2004 Dev Biol).

      Changes to manuscript: Added new data in Fig.S5A showing where trichomes initiate under 6h de novo induction conditions, for comparison to core protein localisation and adult trichome data in Fig.5. Added some text explaining why adult trichome repolarisation might be stronger than the observed effects on core protein localisation in Discussion. 

      Minor points:

      As the authors know, there is a model in the literature that suggests microtubule trafficking provides a global cue to orient PCP. The authors' repolarization data in Fig 4 make a reasonably convincing case against a role for no role for microtubules in cell-scale signaling, but do not rule out a role as a global cue. The authors should be careful of language such as "...MTs and core proteins being oriented independently of each other" that would appear to possibly also refer to a role as a global cue. 

      Thank you for pointing out that this was not clear. We have now modified the text to hopefully address this.

      Changes to manuscript: Text updated in Results section ‘Microtubules do not provide…’.

      Significance:

      There are two negative conclusions and one positive conclusion made by the authors. Provided the above points are addressed, the negative conclusions, that core proteins are not limiting and that microtubules are not involved in cell-scale signaling are solid. The positive conclusion is more nebulous - the authors say that cell-scale signaling is strong relative to cell-cell signaling - but how strong is strong? Strong relative to their prior expectations? I'm not sure how to interpret such a conclusion. Overall, we learn something from these results, though it fails to reveal anything about mechanism. These results will be of some interest to those studying PCP.

      The reviewer raises an interesting point, which is how do you compare the strength of two different processes, even if both processes affect the same outcome (in this case cell polarity). Repolarisation from a boundary has not been carefully studied at the level of core protein localisation in any previous study to our knowledge – this is one of the important novel aspects of this study. Hence there is not a baseline for defining strong repolarisation. Similarly, there has been no investigation of the nature of ‘cell-scale signalling’. This was a considerable challenge for us in writing the manuscript, and we have done our best to find appropriate language that hopefully conveys our message adequately. Minimally our work may provide a baseline for helping to define the ‘strengths’ of these processes in future studies.

      One of our main points is that we can generate an artificial boundary of Fz expression, where Fz levels are at least several fold higher than in the neighbouring cell (e.g. compare Fig.4N’ and O’) and only two rows of cells show a significant change in polarity relative to controls. Even when the tissue next to the overexpression domain is still in the process of generating polarity (de novo condition) then the boundary has little effect on polarity in neighbouring cell rows. This was a result that surprised us, and we tried to convey that by using language to suggest cell-scale signalling was stronger than cell-cell signalling i.e. stronger in terms of the ability to define the final direction of polarity.

      Changes to manuscript: In the revised manuscript we have reviewed our use of language and now avoid saying ‘strong’ but instead use terms such as ‘effective’ and ‘robust’ in e.g. Results section ‘Induced core protein relocalisation…’, the Discussion and we have also changed the title of the manuscript to avoid claiming a ‘strong’ signal.

      Reviewer #2:

      […] Critique

      The experiments described in this paper are of high quality with a sophisticated level of design and analysis. However, there needs to be some recalibration of the extent of the conclusions that can be drawn (see below). Moreover, a limitation of this paper is that, despite the quality of their data, they cannot give a molecular hint about the nature of their proposed cell-scale signal. Below are a two key points that the authors may want to clarify.

      (1) The first set of repolarisation experiment is performed after the global cell rearrangements that have been shown to act as global signal. However, this approach does not exclude the possible contribution of an unknown diffusible global signal.

      A similar point was raised by Reviewer 1. For the convenience of this reviewer, we’ll summarise the arguments against such an unknown cue again below. More broadly, both reviewers asking a similar question indicates that we have failed to lay out the evidence in sufficient detail. In our defence, we have used the same ‘de novo’ paradigm in three previous publications (Strutt and Strutt 2002, 2007; Brittle et al 2022) without attracting (overt) controversy. We have now added text to the Introduction and Results that goes into more detail, as well as more experimental evidence (Fig.S5).

      Firstly, it is worth noting that the global cues acting in the wing are poorly understood, with mostly negative evidence against particular cues accruing in recent years. This makes it a hard subject to succinctly discuss. Secondly, we accept that it is hard to prove there is no influence of global cues, when the nature of those cues and the time at which they act remain unclear. Below we summarise the reasons why we believe there are not significance effects of global cues in our experiments that would influence the interpretation of our results.

      First, our reading of the literature supports a broad consensus that an early radial core planar polarity pattern is realigned by cell flow produced by hinge contraction beginning at around 16h APF (e.g. Aigouy et al., 2010; Strutt and Strutt, 2015; Aw and Devenport, 2017; Butler and Wallingford, 2017; Tan and Strutt, 2025). Taken at face value, this suggests that there are ‘radial’ cues present prior to hinge contraction, maybe coming from the wing margin – arguably these radial cues could be Ft-Ds or Wnts or both, given they are expressed in patterns consistent with such a role (notwithstanding the published evidence arguing against roles for either of these cues). It then appears that hinge contraction supercedes these cues to convert a radial pattern to a proximodistal pattern – whether the radial cues that affect the core pathway earlier remain active after hinge contraction is unclear, although both Ft-Ds and Wnts appear to maintain their ‘radial’ patterns beyond the beginning of hinge contraction (e.g. Merkel et al., 2014; Ewen-Campen et al.,2020; Yu et al., 2020).

      We think that the reviewers are proposing the presence of a proximodistal cue that is active in the proximal region of the wing that we use for our experiments shown e.g. in Fig.5, and that this cue orients core polarity here (but not elsewhere in the wing) in a time window after 18h APF. Ft-Ds and Wnts do not seem to be plausible candidates as they are still in ‘radial’ patterns. This leaves either an unknown proximodistal cue (a gradient of some unknown signalling molecule?), or possibly some ability of hinge contraction to align proximodistal polarity specifically in this wing region but not elsewhere. We cannot definitively rule out either of these possibilities, but neither do we think there is sufficient evidence to justify invoking their existence to explain our observations.

      In particular, the reason that we don’t think there is a proximodistal cue in the proximal part of the wing after 18h APF, is that work from our lab shows that induction of Fz or Stbm expression at times around or after the start of hinge contraction (i.e. >16 h APF) results in increasing levels of trichome swirling with polarity not being coordinated with the tissue axis either proximally or distally (Strutt and Strutt, 2002; Strutt and Strutt 2007). Our simplest interpretation of this is that induction at these stages fails to result in the early radial pattern of core pathway polarity being established and hence a failure of hinge contraction to reorient radial to proximodistal. If hinge contraction alone could specify proximodistal polarity in the absence of the earlier radial polarity, then we would not expect to see swirling over much of the proximal wing (where the forces from hinge contraction are strongest, Etournay et al., 2015).

      In this manuscript, our earliest de novo experiments begin at 18h APF (de novo 10h), then at 20h APF (de novo 8h) and at 22h APF (de novo 6h). The image in Fig. 5B referred to by Reviewer 1, is of a wing where Fz is induced de novo at 22 h APF. In these wings, as expected, the core proteins localise asymmetrically in stereotypical swirling patterns throughout the wing surface (see Fig. 2M and also Strutt and Strutt, 2002; Strutt and Strutt 2007), but – usefully for our experiments – they broadly localise along the proximal-distal axis in the region analysed in Fig. 5B. Given the strong swirling in surrounding regions when inducing at >20h APF, we feel reasonably confident in assuming that the pattern is not due to a proximodistal cue present in the proximal wing. We appreciate that the original manuscript did not show images including the trichome pattern in adjacent regions, so this point would not have been clear, but we now include these in Supplementary Fig.S5. We have also added a note in the legend to Fig. 5B to clarify that the proximodistal pattern seen is local to this wing region.

      Changes to manuscript: Text extended in Introduction and Results to better explain why we believe the de novo conditions that we use most likely result in a polarity pattern that is not significantly influenced by ‘global cues’. Now show zoomed-out images of the surrounding region around the experiment region proximal to the anterior cross-vein region in adult wings, showing that the polarity pattern does not become more proximodistal when induction time is longer, and also that there is not overall proximodistal polarity in proximal regions of the wing, arguing against an unknown proximodistal polarity cue at these stages of development (Fig.S5B-E’’’).

      (2) The putative non-local cell scale signal must be more precisely defined (maybe also given a better name). It is not clear to me that one can separate cell-scale from molecular-scale signal.

      Local signals can redistribute within a cell (or membrane) so local signals are also cell-scale. Without a clear definition, it is difficult to interpret the results of the gene dosage experiments. The link between gene dosage and cell-scale signal is not rigorously stated. Related to this, the concluding statement of the introduction is too cryptic.

      We thank the reviewer for raising this, as again a similar comment was made by Reviewer 1, so we are clearly falling short in defining the term. We have now had another attempt in the Introduction.

      To more specifically answer the point made by the reviewer regarding molecular vs cellular, we are essentially being guided here by the prior computational modelling work, as at the biological level the details are still being worked out. A specific class of previous models only allowed ‘signals’ between core proteins to act ‘locally’, meaning within a cell junction, and within the models there was no explicit mechanism by which proteins on other junctions could ‘detect’ the polarity of a neighbouring junction (e.g. Amonlirdviman et al., 2005; Le Garrec et al., 2006; Fischer et al., 2013). Other models implicitly or explicitly encode a mechanism by which cell junctions can be influenced by the polarity of other junctions (e.g. Meinhardt, 2007; Burak and Shraiman, 2009; Abley et al., 2013; Shadkhoo and Mani, 2019), for instance by diffusion of a factor produced by localisation of particular planar polarity proteins.

      We agree with the reviewer that a cell-scale signal will depend on ‘molecules’ and thus could be called ‘molecular-scale’, but here by ‘molecular-scale’ we mean signals that at the range of the sizes of molecules i.e. nanometers, rather than cell-scale signals that act at the size of cells i.e. micrometers. A caveat to our definition is that we implicitly include interactions that occur locally on cell junctions (<1 µm range) within ‘molecular-scale’, but this is a shorter range than ‘cellular-scale’ which requires signals acting over the diameter of a cell (3-5 µm). Nevertheless, we think the concept of ‘molecular-scale’ vs ‘cell-scale’ is a helpful one in this context, and have attempted to address the issue through a more careful definition of the terms.

      Changes to manuscript: Text revised in Introduction and legend to Fig.1 to more carefully define ‘cell-scale signalling’ and to distinguish it from ‘molecular-scale signalling’. Final sentence of Introduction also altered so we no longer cryptically speculate on the nature of the cell-scale signal but leave this to the Discussion.

      Minor comments. 

      Some of the (clever) genetic manipulation may need more details in the text. For example:

      - Need to specify if the hs-flp approach induces expression throughout the tissue.

      We apologise for the lack of clarity. In all the experiments, the hs-FLP transgene is present in all cells, and heat-shock results in ubiquitous expression. 

      Changes to manuscript: We have clarified this in the Results and Materials and Methods.

      - Need to specify in the text that in the unpolarised condition the tissue is both dsh and fz mutant.

      The reviewer is of course correct and we have updated this point in the text. The full genotype for the unpolarised condition is: w dsh<sup>1</sup> hsFLP22/y;; Act>>fz-mKate2sfGFP, fz<sup>P21</sup>/fz<sup>P21</sup> (see Table S1). So this line is mutant for dsh and fz with induced expression of Fz-mKate2sfGFP. 

      Changes to manuscript: We have clarified this in the relevant part of the Results.

      - Need to specify in the text that the experiment illustrated in Fig 5 is with hh-gal4. 

      As noted by the reviewer, we continued to use the same hh-GAL4 repolarisation paradigm as in Fig.4 and this info was in the legend to Fig.5 legend. However, we agree it is helpful to be explicit about this in the main text.

      Changes to manuscript: We have added this to this section of the Results.

      - Need to address a possible shortcoming of the hh experiment, that the AP boundary is a region of high tension.

      It is true that the AP boundary is under high tension in the wing disc (e.g. Landsberg et al., 2009). But we are not aware of any evidence that this higher tension persists into the pupal wing. In separate studies we have labelled for Myosin II in pupal wings (Trinidad et al 2025 Curr Biol; Tan & Strutt 2025 Nature Comms), and as far as we have noticed have not seen preferentially higher levels on the AP boundary. We think if tension were higher, the cell boundaries would appear straighter than in surrounding cells (as seen in the wing disc) and this is not evident in our images.

      - Need to dispel the possibility that there is no residual polarisation (e.g. of other components) in fz1 mutant (I assume this is the case).

      We use the null allele fz[P21] through this work, and we and others have consistently reported a complete loss of polarisation of other core proteins or downstream components in this background. The caveat to this is that core proteins that persist at cell junctions always appear at least slightly punctate in mutant backgrounds for other core proteins, and so any automated detection algorithm will always find evidence of individual cell polarity above a baseline level of uniform distribution. Hence we tend to use lack of local coordination of polarity (variance of cell polarity angle) as an additional measure of loss of polarisation, in addition to direct measures of average cell polarity. (We discuss this in the QuantifyPolarity manuscript Tan et al 2021 e.g. Fig.S6).

      Changes to manuscript: We now include in the Materials and Methods section ‘Fly genetics…’ a much more extensive explanation of the evidence for specific mutant alleles being ‘null’ for planar polarity function (including dsh1 as raised by Reviewer 1), specifically that they result in no detectable planar polarisation of either other core proteins or downstream effectors, and added appropriate references.

      - Need to provide evidence that 50% gene dosage commensurately affect protein level. 

      This is a good suggestion. In the case of Stbm, we have already published a western blot showing that a reduction in gene dosage results in reduced protein levels (Strutt et al 2016, Fig.S6). We have now performed western blots to quantify protein levels upon reduction of fmi, pk and dgo levels (we actually used EGFP-dgo for the latter, as we don’t have antibodies that can detect endogenous Dgo on western blots).

      Changes to manuscript: When presenting the dosage reduction experiments, we now refer back to Strutt et al., 2016 explicitly for Stbm, and have added western blot data for Fmi, Pk and EGFPDgo in new Fig.S2.

      - I am surprised that the relationship with microtubule polarity was never investigated. Is this true? 

      We agree this is a point that needed further clarification, as Reviewer 1 made a related point regarding the two possible roles for microtubules, one being as a mediator of a global cue upstream of the core pathway, and the second (which we investigate in this manuscript) as a mediator of a cell-scale signal downstream of the core pathway.

      Both the Uemura and Axelrod groups have published on potential upstream function as a global cue mediator in the Drosophila wing (e.g. Shimada et al., 2006; Harumoto et al., 2010; Matis et al., 2014).

      Both groups have also looked out whether core pathway components could affect orientation of microtubules (Harumoto et al., 2010; Olofsson at al., 2014; Sharp and Axelrod 2016). Notably Harumoto et al., 2010 observed that in 24h APF wings, loss of Fz or Stbm did not alter microtubule polarity from a proximodistal orientation consistent with the microtubules aligning along the long cell axis in the absence of other cues. However, this did not rule out an instructive effect of Fz or Stbm on microtubule polarity during core pathway cell-scale signalling. The Axelrod lab manuscripts saw interesting effects of Pk protein isoforms on microtubule polarity, albeit not throughout the entire wing, which hinted at a potential role in cell-scale signalling. Taken together this prior work was the motivation for our directed experiments to specifically test whether the core pathway might generate cell-scale polarity by instructing microtubule polarity.

      Changes to manuscript: We have revised the Results section ‘Microtubules do not…’ to make a clearer distinction regarding possible ‘upstream’ and ‘downstream’ roles of microtubules in Drosophila core pathway planar polarity and the motivation for our experiments investigating the latter.

      - The authors suggest that polarity does not propagate as a wave. And yet the range measured in adult is longer than in the pupal wing. Explain. 

      Again an excellent point, also made by Reviewer 1, which we have now addressed explicitly in the manuscript. For the convenience of this reviewer, we lay out the reasons why we think the propagation of polarity seen in the adult is further than seen for core protein localisation.

      There are three reasons why we might expect adult trichomes to show a different effect from the measured core protein polarity pattern seen in our experiments:

      (i) we are assaying core protein polarity at 28h APF, but trichomes emerge at >32h APF, so there is still time for polarity to propagate a bit further from the boundary. We now have added data showing that by the point of trichome initiation, the wave of polarisation extends 3-4 cell rows (Fig.S5A).  

      (ii) it has long been known that a strong localisation of core proteins at a cell edge is not required for polarisation of trichome polarity from a boundary. For instance, in Strutt & Strutt 2007 we show clones of cells overexpressing Fz causing propagation through pk[pk-sple] mutant tissue where there is no detectable core protein polarity. We were following up prior observations of Adler et al 2000 in the wing and Lawrence et al 2004 in the abdomen.

      (iii) there is evidence to suggest that the polarity of adult trichomes is locally coupled, possibly mechanically. This point is hard to prove without live imaging taking in both initial core protein localisation, the site of actin-rich trichome initiation and then the final orientation of the much larger microtubule filled trichome, and we’re not aware that such data exist. However, Wong & Adler 1993 (JCB) showed that over a number of hours trichomes become much larger and move towards the centre of the cell, presumably becoming decoupled from any core protein cue. The images in Guild … & Tilney, 2005 (MBoC)  are also interesting to look at in this regard. Finally, septate junction proteins have been implicated in local alignment of trichomes, independently of the core pathway (Venema … & Auld, 2004 Dev Biol).

      Changes to manuscript: Added new data in Fig.S5A showing where trichomes initiate under 6h de novo induction conditions, for comparison to core protein localisation and adult trichome data in Fig.5. Added some text explaining why adult trichome repolarisation might be stronger than the observed effects on core protein localisation in Discussion. 

      - The discussion states that the cell-intrinsic system remains to be fully characterised, implying that it has been partially characterised. What do we know about it? 

      As the reviewer probably realises, we were attempting to side-step a long speculative discussion about the various hints and ideas in the literature by grouping them under the umbrella of ‘remaining to be fully characterised’. We would argue that this current manuscript is the first to attempt to systematically investigate the nature of ‘cell-scale signalling’. The lack of prior work is probably due to two factors (i) pioneering theoretical work showed that a sufficiently strong global signal coupled with ‘local’ (i.e. confined to one cell junction) protein interactions was sufficient to polarise cells without the need to invoke the existence of a cell-scale signal; (ii) there is no easy way to identify cell-scale signals as their loss results in loss of polarity which will also occur if other (i.e. more locally acting) core pathway functions are compromised.

      The main investigation of the potential for cell-scale signalling has been another set of theory studies (Burak and Shraiman 2009; Abley et al., 2013; Shadkhoo and Mani 2019) which have considered the possibility of diffusible signals. In our present work we have further considered the possibility of a ‘depletion’ model, based on the pioneering theory work of Hans Meinhardt, and as discussed above the possibility that microtubules could mediate a cell-scale signal.

      Changes to manuscript: We have revised the Discussion to hopefully be clearer about the current state of knowledge.

      Reviewer #3:

      […] Major comments

      The data are clearly presented and the manuscript is well written. The conclusions are well supported by the data. 

      (1) The authors use a system to de novo establish PCP, which has the advantage of excluding global cues orienting PCP and thus to focus on the cell-intrinsic mechanisms. At the same time, the system has the limitation that it is unclear to what extent de novo PCP establishment reflects 'normal' cell scale PCP establishment, in particular because the Gal4/UAS expression system that is used to induce Fz expression will likely result in much higher Fz levels compared with the endogenous levels. The authors should briefly discuss this limitation. 

      We apologise if this wasn’t clear. We only used GAL4/UAS overexpression when we were generating an artificial boundary of Fz expression with hh-GAL4 to induce repolarisation. The de novo induction system involves Fz::mKate2-sfGFP being expressed directly under an Act5C promoter without use of GAL4/UAS. In response to a comment from Reviewer 1 we have now carried out western blot analysis which shows that Fz::mKate2-sfGFP levels under Act5C are actually lower than endogenous Fz levels. As we achieve normal levels of polarity, similar to what we measure in wild-type conditions when measured using QuantifyPolarity, we assume that therefore Fz levels are not limiting under these conditions. However, we note that lower than normal levels of Fz might sensitise the system to perturbation, which in fact would be advantageous in our study, as it might for instance have been expected to more readily reveal dosage sensitivity of other components.

      Changes to manuscript: We now describe the levels of expression achieved using the de novo induction system (Fig.S1C-D) and discuss possible consequences in the relevant Results sections and Discussion.

      (2) Fig. 3. The authors use heterozygous mutant backgrounds to test the robustness of de novo PCP establishment towards (partial) depletion in core PCP proteins. The authors conclude that de novo polarization is 'extremely robust to variation in protein level'. Since the authors (presumably) lowered protein levels by 50%, this conclusion appears to be somewhat overstated. The authors should tune down their conclusion. 

      Reviewer 1 makes a similar point about whether we can argue that the lack of sensitivity to a 50% reduction in protein levels actually rules out the depletion model. To address the comments of both reviewers we had now added some further narrative and caveats in the text.

      We nevertheless believe that the experiments shown effectively make the point that there is no strong dosage sensitivity – and it remains our contention that if protein levels were the key to setting up cell-scale polarity, then a 50% reduction would be expected to show an effect on the rate of polarisation. We further note that as Fz::mKate2-sfGFP levels are lower than endogenous Fz levels, the system might be expected to be sensitised to further dosage reductions, and despite this we fail to see an effect on rate of polarisation.

      In a similar vein, Reviewer 2 requested data on whether dosage reduction altered protein levels by the expected amount. We have now added further explanation/references and western blot data to address this.

      Changes to manuscript: Added some narrative and caveats regarding whether lowering levels more than 50% would add to our findings in the Discussion. Revised conclusions to be more cautious including altering section title to read ‘Planar polarity establishment is not highly sensitive to variation in protein levels of core complex components.

      Also added westerns and text/references showing that for the tested proteins there is a reduction in protein levels upon removal of one gene dosage in Results section ‘Planar polarity establishment is…’ and Fig.S2.

      Minor comments :

      (1) Page 3. The authors mention and reference that they used the PCA method to quantify cell polarity magnification and magnitude. It would help the unfamiliar reader, if the authors would briefly describe the principle of this method. 

      Changes to manuscript: More details have been added in Materials & Methods.

      Significance:

      The manuscript contributes to our understanding of how planar cell polarity is established. It extends previous work by the authors (Strutt and Strutt, 2002,2007) that already showed that induction of core PCP pathway activity by itself is sufficient to induce de novo PCP. This manuscript further explores the underlying mechanisms. The authors test whether de novo PCP establishment depends on an 'inhibitory signal', as previously postulated (Meinhardt, 2007), but do not find evidence. They also test whether core PCP proteins help to orient microtubules (which could enhance cell intrinsic polarization of core PCP proteins), but, again, do not find evidence, corroborating previous work (Harumoto et al, 2010). The most significant finding of this manuscript, perhaps, is the observation that local de novo PCP establishment does not propagate far through the tissue. A limitation of the study is that the mechanisms establishing intrinsic cell scale polarity remain unknown. The work will likely be of interest to specialists in the field of PCP.

    1. eLife assessment

      This important study provides novel evidence that navigational experiences can shape perceptual scene representations. The evidence presented is incomplete and would benefit from clearer explanations of the experiment design and careful discussion of alternative interpretations such as contextual associations or familiarity. The work will be of interest to cognitive psychologists and neuroscientists working on perception and navigation.

      [Editors’ note: A revised version of this work has been published in the Journal of Cognitive Neuroscience (DOI: https://doi.org/10.1162/JOCN.a.2409).]

    2. Reviewer #1 (Public Review):

      In this study, Li et al. aim to determine the effect of navigational experience on visual representations of scenes. Participants first learn to navigate within simple virtual environments where navigation is either unrestricted or restricted by an invisible wall. Environments are matched in terms of their spatial layout and instead differ primarily in terms of their background visual features. In a later same/different task, participants are slower to distinguish between pairs of scenes taken from the same navigation condition (i.e. both restricted or both unrestricted) than different navigation conditions. Neural response patterns in the PPA also discriminate between scenes from different navigation conditions. These results suggest that navigational experience influences perceptual representations of scenes. This is an interesting study, and the results and conclusions are clearly explained and easy to follow. There are a few points that I think would benefit from further consideration or elaboration from the authors, which I detail below.

      First, I am a little sceptical of the extent to which the tasks are able to measure navigational or perceptual experience with the scenes. The training procedure seems like it wouldn't require obtaining substantial navigational experience as the environments are all relatively simple and only require participants to follow basic paths, rather than encouraging more active exploration of a more complex environment. Furthermore, in the same/different task, all images show the same view of the environment (meaning they show the exact same image in the "same environment" condition). The task is therefore really a simple image-matching task and doesn't require participants to meaningfully extract the perceptual or spatial features of the scenes. An alternative would have been to present different views of the scenes, which would have prevented the use of image-matching and encouraged further engagement with the scenes themselves. Ultimately, the authors do still find a response time difference between the navigation conditions, but the effect does appear quite small. I wonder if the design choices could be obscuring larger effects, which might have been better evident if the navigational and perceptual tasks had encouraged greater encoding of the spatial and perceptual features of the environment. I think it would be helpful for the authors to explain their reasons for not employing such designs, or to at least give some consideration to alternative designs.

      Figure 1B illustrates that the non-navigable condition includes a more complicated environment than the navigable condition, and requires following a longer path with more turns in it. I guess this is a necessary consequence of the experiment design, as the non-navigable condition requires participants to turn around and find an alternative route. Still, this does introduce spatial and perceptual differences between the two navigation conditions, which could be a confounding factor. What do the response times for the "matched" condition in the same/different task look like if they are broken down by the navigable and non-navigable environments? If there is a substantial difference between them, it could be that this is driving the difference between the matched and mismatched conditions, rather than the matching/mismatching experience itself.

      In both experiments, the authors determined their sample sizes via a priori power analyses. This is good, but a bit more detail on these analyses would be helpful. How were the effect sizes estimated? The authors say it was based on other studies with similar methodologies - does this mean the effect sizes were obtained from a literature search? If so, it would be good to give some details of the studies included in this search, and how the effect size was obtained from these (e.g., it is generally recommended to take a lower bound over studies). Or is the effect size based on standard guidelines (e.g., Cohen's d ≈ 0.5 is a medium effect size)? If so, why are the effect sizes different for the two studies?

    3. Reviewer #2 (Public Review):

      Summary:

      Li and colleagues applied virtual reality (VR) based training to create different navigational experiences for a set of visually similar scenes. They found that participants were better at visually discriminating scenes with different navigational experiences compared to scenes with similar navigational experiences. Moreover, this experience-based effect was also reflected in the fMRI data, with the PPA showing higher discriminability for scenes with different navigational experiences. Together, their results suggest that previous navigational experiences shape visual scene representation.

      Strengths:

      (1) The work has theoretical value as it provides novel evidence to the ongoing debate between visual and non-visual contributions to scene representation. While the idea that visual scene representation can encode navigational affordances is not new (e.g., Bonner & Epstein, 2017, PNAS), this study is one of the first to demonstrate that navigational experiences can causally shape visual scene representation. Thus, it serves as a strong test for the hypothesis that our visual scene representations involve encoding top-down navigational information.

      (2) The training paradigm with VR is novel and has the potential to be used by the broader community to explore the impact of experience on other categorical visual representations.

      (3) The converging evidence from behavioral and fMRI experiments consolidates the work's conclusion.

      Weaknesses:

      (1) While this work attempts to demonstrate the effect of navigational experience on visual scene representation, it's not immediately clear to what extent such an effect necessarily reflects altered visual representations. Given that scenes in the navigable condition were more explored and had distinct contextual associations than scenes in the non-navigable condition (where participants simply turned around), could the shorter response time for a scene pair with mismatched navigability be explained by the facilitation of different contextual associations or scene familiarities, rather than changes in perceptual representations? Especially when the visual similarity of the scenes was high and different visual cues might not have been immediately available to participants, the different contextual associations and/or familiarity could serve as indirect cues to facilitate participants' judgment, even if perceptual representations remained intact.

      (2) Similarly, the above-chance fMRI classification results in the PPA could also be explained by the different contextual associations and/or scene familiarities between navigable and non-navigable scenes, rather than different perceptual processes related to scene identification.

      (3) For the fMRI results, the specificity of the experience effect on the PPA is not strictly established, making the statement "such top-down effect was unique to the PPA" groundless. A significant interaction between navigational conditions and ROIs would be required to make such a claim.

      (4) For the behavioral results, the p-value of the interaction between groups and the navigational conditions was 0.05. I think this is not a convincing p-value to rule out visual confounding for the training group. Moreover, from Figure 2B, there appears to be an outlier participant in the control group who deviates dramatically from the rest of the participants. If this outlier is excluded, will the interaction become even less significant?

      (5) Experiment 1 only consists of 25 participants in each group. This is quite a small sample size for behavioral studies when there's no replication. It would be more convincing if an independent pre-registered replication study with a larger sample size could be conducted.

    1. eLife Assessment

      The reviewers have found that this manuscript is a valuable contribution, and the evidence in support of its conclusions is mostly solid. It provides novel insights and raises interesting possibilities about the functions of an understudied histone modification within the nucleosome core; however, the data are mostly descriptive and correlative, and although this has value, it is not totally persuasive. Short of additional non-genomic experiments, a more detailed analysis of the genomic data and perhaps additional data would strengthen the conclusions. The manuscript crucially needs further antibody validation to raise confidence in the data.

    2. Reviewer #1 (Public review):

      Summary:

      The authors investigate the role of H3K115ac in mouse embryonic stem cells. They report that H3K115ac localizes to regions enriched for fragile nucleosomes, CpG islands, and enhancers, and that it correlates with transcriptional activity. These findings suggest a potential role for this globular domain modification in nucleosome dynamics and gene regulation. If robust, these observations would expand our understanding of how non-tail histone modifications contribute to chromatin accessibility and transcriptional control.

      Strengths:

      (1) The study addresses a histone PTM in the globular domain, which is relatively unexplored compared to tail modifications.

      (2) The implication of a histone PTM in fragile nucleosome localization is novel and, if substantiated, could represent a significant advance for the field.

      Weaknesses:

      (1) The absence of replicate paired-end datasets limits confidence in peak localization.

      (2) The analyses are primarily correlative, making it difficult to fully assess robustness or to support strong mechanistic conclusions.

      (3) Some claims (e.g., specificity for CpG islands, "dynamic" regulation during differentiation) are not fully supported by the analyses as presented.

      (4) Overall, the study introduces an intriguing new angle on globular PTMs, but additional rigor and mechanistic evidence are needed to substantiate the conclusions.

    3. Reviewer #2 (Public review):

      Summary:

      Kumar et al. aimed to assess the role of the understudied H3K115 acetylation mark, which is located in the nucleosomal core. To this end, the authors performed ChIP-seq experiments of H3K115ac in mouse embryonic stem cells as well as during differentiation into neuronal progenitor cells. Subsequent bioinformatic analyses revealed an association of H3K115ac with fragile nucleosomes at CpG island promoters, as well as with enhancers and CTCF binding sites. This is an interesting study, which provides important novel insights into the potential function of H3K115ac. However, the study is mainly descriptive, and functional experiments are missing.

      Strengths:

      (1) The authors present the first genome-wide profiling of H3K115ac and link this poorly characterized modification to fragile nucleosomes, CpG island promoters, enhancers, and CTCF binding sites.

      (2) The study provides a valuable descriptive resource and raises intriguing hypotheses about the role of H3K115ac in chromatin regulation.

      (3) The breadth of the bioinformatic analyses adds to the value of the dataset

      Weaknesses:

      (1) I am not fully convinced about the specificity of the antibody. Although the experiment in Figure S1A shows a specific binding to H3K115ac-modified peptides compared to unmodified peptides, the authors do not show any experiment that shows that the antibody does not bind to unrelated proteins. Thus, a Western of a nuclear extract or the chromatin fraction would be critical to show. Also, peptide competition using the H3K115ac peptide to block the antibody may be good to further support the specificity of the antibody. Also, I don't understand the experiment in Figure S1B. What does it tell us when the H3K115ac histone mark itself is missing? The KLF4 promoter does not appear to be a suitable positive control, given that hundreds of proteins/histone modifications are likely present at this region.

      It is important to clearly demonstrate that the antibody exclusively recognizes H3K115ac, given that the conclusion of the manuscript strongly depends on the reliability of the obtained ChIP-Seq data.

      (2) The association of H3K115ac with fragile nucleosomes based on MNase-Sensitivity and fragment length, which are indirect methods and can have technical bias. Experiments that support that the H3K115ac modified nucleosomes are indeed more fragile are missing.

      (3) The comparison of H3K115ac with H3K122ac and H3K64ac relies on publicly available datasets. Since the authors argue that these marks are distinct, data generated under identical experimental conditions would be more convincing. At a minimum, the limitations of using external datasets should be discussed.

      (4) The enrichment of H3K115ac at enhancers and CTCF binding sites is notable but remains descriptive. It would be interesting to clarify whether H3K115ac actively influences transcription factor/CTCF binding or is a downstream correlate.

      (5) No information is provided about how H3K115ac may be deposited/removed. Without this information, it is difficult to place this modification into established chromatin regulatory pathways.

      At the very least, the authors should acknowledge these limitations and provide additional validation of antibody specificity.

    4. Reviewer #3 (Public review):

      Summary:

      Kumar et al. examine the H3K115 epigenetic mark located on the lateral surface of the histone core domain and present evidence that it may serve as a marker enriched at transcription start sites (TSSs) of active CpG island promoters and at polycomb-repressed promoters. They also note enrichment of the H3K115ac mark is found on fragile nucleosomes within nucleosome-depleted regions, on active enhancers, and CTCF-bound sites. They propose that these observations suggest that H3K115ac contributes to nucleosome destabilization and so may serve as a marker of functionally important regulatory elements in mammalian genomes.

      Strengths:

      The authors present novel observations suggesting that acetylation of a histone residue in a core (versus on a histone tail) domain may serve a functional role in promoting transcription, in CPG islands and polycomb-repressed promoters. They present a solid amount of confirmatory in silico data using appropriate methodology that supports the idea that the H3K115ac mark may function to destabilize nucleosomes and contribute to regulating ESC differentiation.

      Weaknesses:

      Additional experiments to confirm antibody specificity are needed. The authors use synthetic peptides for other markers (e.g., H3K122) to support the claim that the antibody is specific, but ChIP-ChIP assays are performed under cross-linked, non-denatured conditions, which preserve structure and epitope accessibility differently than synthetic peptides used for dot blots. Does the antibody give a single band in western blots of histones, and can the H3K115ac peptide block western and immunofluorescence signals of the antibody? Given that the antibody is a rabbit polyclonal, specificity is not a trivial consideration.

    5. Author response:

      Reviewer 1:

      Comment 1. The reviewer was under the impression that that we did not perform biological replicates of our ChIP-seq experiments. All ChIP-seq (and ATAC-seq) experiments were performed with biological replicates and the Pearson’s correlations (all >0.9) between replicates were provided in Supplementary Table 1. We had indicated this in the text and methods but will try to make this even clearer.

      Reviewer 2:

      Comment 2. The reviewer states that our claim of H3K115ac being associated with fragile nucleosomes is based solely on MNase sensitivity and fragment length. This is not correct. Figure 3C and D show the results of sucrose gradient sedimentation experiments, followed by ChIP-seq clearly showing that H3K115ac fractionates with chromatin particles that are enriched for fragile nucleosomes and subnucleosomes. By contrast, H3K115ac is not enriched in stable mononucleosome

      Comment 3. The reviewer states that our H3K122ac and H3K64ac comparison rely on publicly available datasets. We would emphasize that these are our own datasets generated and published previously (Pradeepa et. al., 2016) but using exactly the same native MNase ChIP protocol as used here for H3K115ac and processed with identical computational pipelines.

      Reviewer 3:

      Reviewer 3 is mistaken in thinking our ChIP experiments are performed under cross-linked conditions. As clearly stated in the main text and methods, all our ChIP-seq for histone modifications is done on native MNase-digested chromatin – with no cross-linking. This includes the spike-in experiment shown in Fig S1B to test H3K115ac antibody specificity against the bar-coded SNAP-ChIP® K-AcylStat Panel from Epicypher. We could not include H3K115ac bar-coded nucleosomes in that experiment since they are not available in the panel. 

      Following that, we would propose to make minor revisions in response to specific reviewer recommendations before posting a version of record. These would include:

      (1) Figure 2: title needs change: "H3K115ac marks CpG island promoters poised for activation". this is to make sure it reads with the title for the corresponding section in the main text. Also see: Reviewer 1 comment 7 in Recommendations part. 

      (2) Figure S2B: legend should read: "Gene ontology analysis for the set of genes analysed in Figure 2C"

      (3) Figure F4D: Provide the replicates for western blot 

      (4) Figure 4A,B: Corrected formatting issues.

    1. eLife Assessment

      This study provides fundamental insights into eukaryotic phosphate homeostasis by demonstrating how yeast vacuoles dynamically regulate cytosolic phosphate levels. The conclusions are convincing, supported by an elegant combination of in vitro assays and in vivo measurements. This study will be of interest to cell biologists, particularly for those who are working in the field of phosphate metabolism.

    2. Reviewer #1 (Public review):

      The manuscript by Bru et al. focuses on the role of vacuoles as a phosphate buffering system for yeast cells. The authors describe here the crosstalk between the vacuole and the cytosol using a combination of in vitro analyses of vacuoles and in vivo assays. They show that the luminal polyphosphatases of the vacuole can hydrolyze polyphosphates to generate inorganic phosphate, yet they are inhibited by high concentrations. This balances the synthesis of polyphosphates against the inorganic phosphate pool. Their data further show that the Pho91 transporter provides a valve for the cytosol as it gets activated by a decline in inositol pyrophosphate levels. The authors thus demonstrate how the vacuole functions as a phosphate buffering system to maintain a constant cytosolic inorganic phosphate pool.

      This is a very consistent and well-written manuscript with a number of convincing experiments, where the authors use isolated vacuoles and cellular read-out systems to demonstrate the interplay of polyphosphate synthesis, hydrolysis, and release. The beauty of this system the authors present is the clear correlation between product inhibition and the role of Pho91 as a valve to release Pi to the cytosol to replenish the cytosolic pool. I find the paper overall an excellent fit.

      Comments on Revision:

      The authors have addressed all my concerns.

    3. Reviewer #3 (Public review):

      Bru et al. investigated how inorganic phosphate (Pi) is buffered in cells using S. cerevisiae as a model. Pi is stored in cells in the form of polyphosphates in acidocalcisomes. In S. cerevisiae, the vacuole, which is the yeast lysosome, also fulfills the function of Pi storage organelle. Therefore, yeast is an ideal system to study Pi storage and mobilization.

      They can recapitulate in their previously established system, using isolated yeast vacuoles, findings from their own and other groups. They integrate the available data and propose a working model of feedback loops to control the level of Pi on the cellular level.

      This is a solid study, in which the biological significance of their findings is not entirely clear. The data analysis and statistical significance need to be improved and included, respectively. The manuscript would have benefited from rigorously testing the model, which would also have increased the impact of the study.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review): 

      The manuscript by Bru et al. focuses on the role of vacuoles as a phosphate buffering system for yeast cells. The authors describe here the crosstalk between the vacuole and the cytosol using a combination of in vitro analyses of vacuoles and in vivo assays. They show that the luminal polyphosphatases of the vacuole can hydrolyse polyphosphates to generate inorganic phosphate, yet they are inhibited by high

      concentrations. This balances the synthesis of polyphosphates against the inorganic phosphate pool. Their data further show that the Pho91 transporter provides a valve for the cytosol as it gets activated by a decline in inositol pyrophosphate levels. The authors thus demonstrate how the vacuole functions as a phosphate buffering system to maintain a constant cytosolic inorganic phosphate pool. 

      This is a very consistent and well-written manuscript with a number of convincing experiments, where the authors use isolated vacuoles and cellular read-out systems to demonstrate the interplay of polyphosphate synthesis, hydrolysis, and release. The beauty of this system the authors present is the clear correlation between product inhibition and the role of Pho91 as a valve to release Pi to the cytosol to replenish the cytosolic pool. I find the paper overall an excellent fit and only have a few issues, including: 

      (1) Figure 3: The authors use in their assays 1 mM ZnCl2 or 1mM MgCl2. Is this concentration in the range of the vacuolar luminal ion concentration? Did they also test the effect of Ca2+, as this ion is also highly concentrated in the lumen? 

      The concentrations inside vacuoles reach those values. However, given that polyP can chelate divalent metal ions, what would matter are the concentrations of free Zn<sup>2+</sup> or Mg<sup>2+</sup> inside the organelle. These are not known. This is not critical since we use those two conditions only as a convenient tool to differentiate Ppn1 and Ppn2 activity in vitro. In our initial characterisation of Ppn2 (10.1242/jcs.201061), we had also tested Mn, Co, Ca, Ni, Cu. Only Zn and Co supported activity. Ca did not. Andreeva et al. (10.1016/j.biochi.2019.06.001) reached similar conclusions and extended our results.

      (2) Regarding the concentration of 30 mM K-PI, did the authors also use higher and lower concentrations? I agree that there is inhibition by 30 mM, but they cannot derive conclusions on the luminal concentration if they use just one in their assay. A titration is necessary here. 

      The concentration of 30 mM was not chosen arbitrarily. It is the luminal P<sup>i</sup> concentration that the vacuoles reached through polyP synthesis and hydrolysis when they entered a plateau of luminal P<sup>i</sup>. We consider this as an upper limit because polyP kept increasing which luminal P<sup>i</sup> did not. Thus, there is no physiological motivation for trying higher values. We have nevertheless added a titration to the revised version (new Fig. 3A).

      (3) What are the consequences on vacuole morphology if the cells lack Pho91? 

      We had not observed significant abnormalities during a screen of the genome-wide deletion collection of yeast (10.1371/journal.pone.0054160), nor in other experiments with pho91 mutants, which we have not included in this manuscript due to a lack of effect.

      (4) Discussion: The authors do not refer to the effect of calcium, even though I would expect that the levels of the counterion should affect the phosphate metabolism. I would appreciate it if they would extend their discussion accordingly. 

      The situation is much more complex because Ca2+ is not the only counterion. Major pools of counterions (up to hundreds of mM) are constituted by vacuolar lysine, arginine, polyamines, Mg, Zn etc. Their interplay with polyP is probably complex and worth to be treated in a dedicated project. If we wanted to limit the discussion of this complexity not to the simple statement that it is not understood, which is not very useful, we would have to engage in a lot of speculation. We feel that this would make the discussion lose focus and not contribute concrete insights.

      (5) I would appreciate a brief discussion on how phosphate sensing and control are done in human cells. Do they use a similar lysosomal buffer system? 

      Mammalian cells have their Pi exporter XPR1 mainly on a lysosome-like compartment (10.1016/j.celrep.2024.114316). Whether and how it functions there for Pi export from the cytosol is not entirely clear. We have addressed this situation in the revised discussion section.

      Reviewer #2 (Public review): 

      Summary: 

      This manuscript presents a well-conceived and concise study that significantly advances our understanding of polyphosphate (polyP) metabolism and its role in cytosolic phosphate (Pi) homeostasis in a model unicellular eukaryote. The authors provide evidence that yeast vacuoles function as dynamic regulatory buffers for Pi homeostasis, integrating polyP synthesis, storage, and hydrolysis in response to cellular metabolic demands. The work is methodologically sound and offers valuable insights into the conserved mechanisms of phosphate regulation across eukaryotes. 

      Strengths: 

      The results demonstrate that the vacuolar transporter chaperone (VTC) complex, in conjunction with luminal polyphosphatases (Ppn1/Ppn2) and the Pi exporter Pho91, establishes a finely tuned feedback system that balances cytosolic Pi levels. Under Pi-replete conditions, inositol pyrophosphates (InsPPs) promote polyP synthesis and storage while inhibiting polyP hydrolysis, leading to vacuolar Pi accumulation. 

      Conversely, Pi scarcity triggers InsPP depletion, activating Pho91-mediated Pi export and polyP mobilization to sustain cytosolic phosphate levels. This regulatory circuit ensures metabolic flexibility, particularly during critical processes such as glycolysis, nucleotide synthesis, and cell cycle progression, where phosphate demand fluctuates dramatically. 

      From my viewpoint, one of the most important findings is the demonstration that vacuoles act as a rapidly accessible Pi reservoir, capable of switching between storage (as polyP) and release (as free Pi) in response to metabolic cues. The energetic cost of polyP synthesis-driven by ATP and the vacuolar proton gradient-highlights the evolutionary importance of this buffering system. The study also draws parallels between yeast vacuoles and acidocalcisomes in other eukaryotes, such as Trypanosoma and Chlamydomonas, suggesting a conserved role for these organelles in phosphate homeostasis. 

      Weaknesses: 

      While the manuscript is highly insightful, referring to yeast vacuoles as "acidocalcisome-like" may warrant further discussion. Canonical acidocalcisomes are structurally and chemically distinct (e.g., electrondense, in most cases spherical, and not routinely subjected to morphological changes, and enriched with specific ions), whereas yeast vacuoles have well-established roles beyond phosphate storage. A comment on this terminology could strengthen the comparative analysis and avoid potential confusion in the field.  

      Yeast vacuoles show all major chemical features of acidocalcisomes. They are acidified, contain high concentrations of Ca, polyP (which make them electron-dense, too), other divalent ions, such as Mg, Zn, Mn etc, and high concentrations of basic amino acids. Thus, they clearly have an acidocalcisome-like character. In addition, they have hydrolytic, lysosomelike functions and, depending on the strain background, they can be larger than acidocalcisomes described e.g. in protists. We have elaborated on this point in the introduction of the revised version.

      Reviewer #3 (Public review): 

      Bru et al. investigated how inorganic phosphate (Pi) is buffered in cells using S. cerevisiae as a model. Pi is stored in cells in the form of polyphosphates in acidocalcisomes. In S. cerevisiae, the vacuole, which is the yeast lysosome, also fulfills the function of Pi storage organelle. Therefore, yeast is an ideal system to study Pi storage and mobilization. 

      They can recapitulate in their previously established system, using isolated yeast vacuoles, findings from their own and other groups. They integrate the available data and propose a working model of feedback loops to control the level of Pi on the cellular level. 

      This is a solid study, in which the biological significance of their findings is not entirely clear. The data analysis and statistical significance need to be improved and included, respectively. The manuscript would have benefited from rigorously testing the model, which would also have increased the impact of the study. 

      It is not clear to us what the reviewer would see as a more rigorous test of the model.  

      Reviewer #1 (Recommendations for the authors): 

      (1) Figure 2: Why do the authors label the blue curve in A and B as BY and in C and D as WT? Is this a different genetic background they used here? This should be specified in the legend. 

      No, it is the same background. The figures had been reshuffled before submission and we overlooked to replace "BY" by "WT". This has been corrected. Now we consistently use WT in all figures

      (2) Figure 4 has different scaling for the two panels, which should be labeled as A and B. I am aware that the authors do this for comparison, but it is rather confusing at first glance. I recommend having them at the same scale. 

      We chose this representation on two separate scales because this figure shall primarily illustrate that the shift between pho91 and WT curves vanishes in the presence of IP7. We now highlight in the figure legend that the scales are different to avoid confusion.  

      (3) Figure 8: I would appreciate a model with normal and low Pi concentrations in comparison, as this is what the authors worked out. 

      We have modified the figure. It now compares Pi-rich and Pi-limited scenarios.

      (4) Minor issue: Wouldn't it make more sense to show the molar concentration in the Figures rather than the nmol of Pi/ug of protein? I am aware that this would require information on the vacuole volume rather than the reaction volume, and the authors do this calculation later on. 

      It depends. We often chose this representation because it illustrates the price to pay (metabolic input in terms of protein that must be dedicated to this task) to sequester a certain quantity of P<sup>i</sup>. But, as we provide the corresponding P<sup>i</sup> concentration in the text, this information is accessible to the reader, too.

      Reviewer #2 (Recommendations for the authors): 

      As stated above in the weaknesses section, while functional parallels exist, canonical acidocalcisomes are structurally and chemically distinct, typically smaller, electron-dense, and enriched with cations. Whereas yeast vacuoles are larger, multifunctional organelles with well-established roles beyond phosphate storage. Explicitly addressing these differences would strengthen the comparative framework and prevent potential confusion in interpreting the evolutionary relationships between these organelles. 

      We agree to some degree, which is the reason why we refer to vacuoles as acidocalcisome-like organelles. In fact, vacuoles share virtually all defining chemical traits of acidocalcisomes. They just have a second functional domain as hydrolytic, lysosome-like organelles. Given the plasticity of endo-lysosomal compartments, and acidocalcisomes belong to this group because of their biogenesis through the AP3 pathway, this is not shocking to us. But the reviewer's comment made us realize that it is better to explicitly address this point. We have added a section to the introduction to do this.

      Reviewer #3 (Recommendations for the authors): 

      (1) Page 8: It is unclear why the authors only estimated the Pi concentration in wild-type vacuoles. This should also be done for vacuoles from other strains. 

      This information is inherent in Figure 2. PolyP hyperaccumulating strains show the same plateau as the wildtype, meaning that they also reach around 30 mM luminal Pi concentration, whereas vtc4 vacuoles reach only around 1/10th of that increase, indicating that they remain at 3 mM. We mention this now in the text.

      (2) The attempts of the localization of Pho91 through tagging are not satisfactory. The author described different localizations for Pho91 depending on whether it was tagged on the N- or C-terminus or when Nterminally tagged and overexpressed using two strong promoters. While it is not uncommon that proteins show different localization patterns, depending on where the tag is inserted, it is possible that one of the tags would reflect the localization of the endogenous protein. There is an easy way to test this, in particular when Pho91 is endogenously tagged. pho91∆ has reported phenotypes such as abnormal vacuolar morphology or increased autophagy. They could also measure PI content in vacuoles. The authors could compare the phenotypes of the endogenously tagged strains with WT and a pho91∆ strain. 

      Indeed, the attempts to localise the protein through fluorescent tags are unsatisfactory, in our hands as in the hands of others. We would not have created a series of many different tagged versions (we present only a selection of these in the manuscript) if the creation of a faithful reporter for Pho91 localisation were so straightforward. Expression from the endogenous promoter yields quite low signals (which is why others have overexpressed their GFP fusion from strong promotors). But overexpression brings at least a significant part of the protein to the cell surface, where it can then function as Pi importer and suffice to restore much of the maximal Pi uptake capacity that genuine plasma membrane transporters provide and support normal growth of the cells (Wykoff & O’Shea, 2001). But the localisation pattern of Pho91-GFP, likewise overexpressed from a strong promotor, does not reflect this plasma membrane localisation (see the references that the reviewer mentioned under (3)). The published overexpressed GFP-fusions localise only to the vacuole, suggesting that even in this case the GFP tag may create an artefact. Therefore, we went through a large variety of Pho91 gene fusions, which led us to the conclusion that the protein is very sensitive to tags at both ends and that fusion proteins hence are unlikely to reliably report the correct location of the protein. Given this, we resorted to quantitative proteomics to clarify the issue. This quantitative experiment goes beyond previously published proteomics analyses that the reviewer mentions under (3), which found the protein in the vacuolar fraction but did not calculate the enrichment factors, which is crucial. 

      A strong phenotype of abnormal vacuolar morphology is not apparent in our cultures. 

      (3) Moreover, Pho91 has been identified as a component enriched in vacuolar-mitochondria contact sites (vCLAMP), and this localization was confirmed with GFP-Pho91 (PMID: 25026036). Likewise, PMID: 35175277 also detected Pho91 by mass spectrometry as a vacuolar protein and showed endogenously tagged GFP-Pho91 on the vacuole (co-staining with Vph1). The authors may request the strains from the authors of these papers and use them for their experiments. PMID: 17804816, the oldest of the three reports (from 2007) reports a GFP-Pho91 under either TEF or ADH promoter that localizes to the vacuole. They also showed that the fusion protein is functional. These and other experiments led them to conclude that Pho91 exports phosphate from the vacuolar lumen to the cytoplasma. 

      We have now included these references. As argued above, we have analysed also the strains from PMID17804816. The observed clear localisation of the fusion protein to vacuoles is only visible upon overexpression, not upon expression from the endogenous locus. Apparently also this construct is unlikely to report Pho91 localisation reliably (though, by chance, overexpression leads it to the correct location). Thus, we maintain our conclusion that C- or N-terminally GFP-tagged versions of Pho91 are unreliable tools for localising the protein.

      (4) The impact of pho91∆ on Pho4-GFP nuclear localization is modest at best (increase from 5% of cells showing Pho4-GFP in the nucleus in WT vs 10% in pho91∆), and only somewhat stronger in ppn1∆/ppn2∆. This means 90% of pho91∆ cells do not respond, and Pho4-GFP stays cytoplasmic. It is unclear how the author can derive a meaningful conclusion from these data. Moreover, are these data really supporting the model, or do these data rather indicate that there are additional factors/pathways needed? What is the biological significance of the marginal increase from 5% to 10% of cells that would respond? What happens to the cells that cannot respond? Will they die or at least have a growth disadvantage? It would be useful to provide some functional studies. 

      We should have explained the nature of the assay better. The experiment exploits the fact that dividing yeast cells transiently fall into a state of Pi scarcity during S-phase. Since S-phase is less than a quarter of the cell cycle, only a small fraction of the cells transiently activates the PHO pathway. These cannot be well characterised by ensemble assays, but microscopy circumvents this background of the whole population and picks them up very clearly, allowing to quantify them. We have adapted the respective chapter in the results section to improve the description of this experiment.

      (5) The quantification of the data is suboptimal, as in most assays the mean and standard error of the mean (SEM) are given. SEM is not really appropriate in these cases because it gives only the error of the mean and not of the entire data. Therefore, the standard deviation (SD) is needed, which reports on the variability of the data, and which is usually much larger than the SEM. Using the SD, would also allow the authors to do proper statistical analysis, which is missing entirely in this manuscript. 

      SEM also comprises the variability of the data. It is linked with the SD (SEM=SD/SQRT(n)), but SEM also considers the number of the experiments n. The main goal is to compare the means, and SEM is an appropriate and frequently used tool for this because it illustrates how well the arithmetic mean may estimate the true mean of the population. Therefore, we kept the SEM but have added tests of significance for the differences shown.

      (6) Statistical testing in Figure 7 is essential as the effects are very small. Again, are these changes big enough for a biologically meaningful response? The authors should at least discuss this. 

      Our previous time course analyses of InsPP dynamics, performed under comparable conditions as in this study, showed that InsP8 decreases by around 50% in the first 30 min after transfer to Pi starvation (DOI: https://doi.org/10.7554/eLife.87956) and that this decline is already sufficient to trigger the PHO starvation program, as assessed by Pho4-GFP translocation into the nucleus. Thus, a 50% decrease, which is observed in ppn1 ppn2 mutants, is functionally significant. We have now also evaluated statistical significance in Fig. 7, which is given for the 50% reduction of InsP8 and 1-InsP7 in ppn1 ppn2. 

      Minor points: 

      (1) There are a number of smaller edits (use of italic or better the absence thereof, lacking information in the reference list, and some typos). 

      Thank you. We have corrected those.

      (2) The exact n should be given in the Figure legend. 

      Corrected.

      (3) Page 8, line 8: it would be nice to have a picture of the wild-type vacuoles and what you measured. 

      We now present a sample image in the new Suppl. Fig. 1.

      (4) PMID: 11779791 showed already that Pho91 cannot rescue the absence of the plasma membrane Pi transporters. This study should be at least cited. 

      This is not quite correct. The study that the reviewer mentions showed that Pho91 supports slower growth and the authors concluded that "A synthetic lethal phenotype was observed when (all) five phosphate transporters were inactivated...". We had cited the same group and the same first author, just using their later study (Wykoff et al., 2007) that had recapitulated the results from PMID11779791 and showed in addition quite good growth of the PHO91 expressing strain on YPD (Suppl. Fig. 2). We had obtained the strains from this group. In reproducing their experiments, we noticed that the growth of Pho91 that these authors had observed is due to incomplete repression of Pho84. They had overexpressed Pho84 from a galactose inducible promotor to generate a background with a regulatable Pi transporter. This trick allowed them to conveniently manipulate the strain and reduce (but not abolish) Pho84 expression by transferring the cells from galactose to glucose for their experiments. Therefore, we chose a more rigorous plasmid shuffling strategy to test the individual P<sub>i</sub> transporter, which allows an assessment without the leaky background expression of Pho84 on glucose. In contrast to O'Shea and colleagues, we observed zero growth of a strain expressing only PHO91. We have revised the results section to make this discrepancy more evident and provide a better motivation for our experiment.

      (5) It would be nice to see the actual data in Figure 6; not only a quantification. 

      We illustrate the phenotype of nuclear Pho4-GFP in panel A. Showing all the images necessary to appreciate the differences between the strains would require including many dozens of images into the figure, which would not be useful.

    1. eLife Assessment

      The mechanistic basis for the potential health benefits of NAD⁺ precursors remains incompletely understood. This manuscript provides a useful assessment of the role of SIRT1 in mediating the effects of NMN in mice fed a high-fat diet. The study addresses a key question, though some of the conclusions appear only partially supported by the presented data.

    2. Reviewer #1 (Public review):

      Summary:

      This manuscript investigates the effects of oral supplementation with nicotinamide mononucleotide (NMN) on metabolism and inflammation in mice with diet-induced obesity, and whether these effects depend on the NAD⁺-dependent enzyme SIRT1. Using control and inducible SIRT1 knockout mice, the authors show that NMN administration mitigates high-fat diet-induced weight gain, enhances energy expenditure, and normalizes fasting glucose and plasma lipid profiles in a largely SIRT1-dependent manner. However, reductions in fat mass and adipose tissue expansion occur independently of SIRT1. Comprehensive plasma proteomic analyses (O-Link and mass spectrometry) reveal that NMN reverses obesity-induced alterations in metabolic and immune pathways, particularly those related to glucose and cholesterol metabolism. Integrative network and causal analyses identify both SIRT1-dependent and -independent protein clusters, as well as potential upstream regulators such as FBXW7, ADIPOR2, and PRDM16. Overall, the study supports that NMN modulates key metabolic and immune pathways through both SIRT1-dependent and alternative mechanisms to alleviate obesity and dyslipidemia in mice.

      Strengths:

      Well-written manuscript, and state-of-the-art proteomics-based methodologies to assess NMN and SIRT1-dependent effects.

      Weaknesses:

      Unfortunately, the study design, as well as the data analysis approach taken by the authors, are flawed. This limits the authors' ability to make the proposed conclusions.

    3. Reviewer #2 (Public review):

      Summary:

      Majeed and colleagues aimed to evaluate whether the metabolic effects of NMN in the context of a high-fat diet are SIRT1 dependent. For this, they used an inducible SIRT1 KO model (SIRT1 iKO), allowing them to bypass the deleterious effects of SIRT1 ablation during development. In line with previous reports, the authors observed that NMN prevents, to some degree, diet-induced metabolic damage in wild-type mice. When doing similar tests on SIRT1 iKO mice, the authors see that some, but not all, of the effects of NMN are abrogated. The phenotypic studies are complemented by plasma proteomic analyses evaluating the influence of the high-fat diet, SIRT1, and NMN on circulating protein profiles.

      Strengths:

      The mechanistic aspects behind the potential health benefits of NAD+ precursors have been poorly elucidated. This is in part due to the pleiotropic actions of NAD-related molecules on cellular processes. While sirtuins, most notably SIRT1, have been largely hypothesized to be key players in the therapeutic actions of NAD+ boosters, the proof for this in vivo is very limited. In this sense, this work is an important contribution to the field.

      Weaknesses:

      While the authors use a suitable methodology (SIRT1 iKO mice), the results show very early that the iKO mice themselves have some notable phenotypes, which complicate the picture. The actions of NMN in WT and SIRT1 KO mice are most often presented separately. However, this is not the right approach to evaluate and visualize SIRT1 dependency. Indeed, many of the "SIRT1-dependent" effects of NMN are consequent to the fact that SIRT1 deletion itself has a phenotype equivalent to or larger than that induced by NMN in wild-type mice. This would have been very evident if the two genotypes had been systematically plotted together. Consequently, and despite the value of the study, the results obtained with this model might not allow for solidly established claims of SIRT1 dependency on NMN actions. The fact that some of the effects of SIRT1 deletion are similar to those of NMN supplementation also makes it counterintuitive to propose that activation of SIRT1 is a major driver of NMN actions. Unbiasedly, one might as well conclude that NMN could act by inhibiting SIRT1. The fact that readouts for SIRT1 activity are not explored makes it also difficult to test the influence of NMN on SIRT1 in their experimental setting, or whether compensations could exist.

      A second weak point is that the proteomic explorations are interesting, yet feel too descriptive and disconnected from the overall phenotype or from the goal of the manuscript. It would be unreasonable to ask for gain/loss-of-function experiments based on the differentially abundant peptides. Yet, a deeper exploration of whether their altered presence in circulation is consistent with changes in their expression - and, if so, in which tissues - and a clearer discussion on their link to the phenotypes observed would be needed, especially for changes related to SIRT1 and NMN.

      Impact on the field and further significance of the work:

      Despite the fact that, in my opinion, the authors might not have conclusively achieved their main aim, there are multiple valuable aspects in this manuscript:

      (1) It provides independent validation for the potential benefits of NAD+ boosters in the context of diet-induced metabolic complications. Previous efforts using NR or NMN itself have provided contradicting observations. Therefore, additional independent experiments are always valuable to further balance the overall picture.

      (2) The metabolic consequences of deleting SIRT1 in adulthood have been poorly explored in previous works. Therefore, irrespective of the actions of NMN, the phenotypes observed are intriguing, and the proteomic differences are also large enough to spur further research to understand the role of SIRT1 as a therapeutic target.

      (3) Regardless of the influence of SIRT1, NMN promotes some plasma proteomic changes that are very well worth exploring. In addition, they highlight once more that the in vivo actions of NMN, as those of other NAD+ boosters, are pleiotropic. Hence, this work brings into question whether single gene KO models are really a good approach to explore the mechanisms of action of NAD+ precursors.

    4. Author response:

      Reviewer #1 (Public review):

      Summary:

      This manuscript investigates the effects of oral supplementation with nicotinamide mononucleotide (NMN) on metabolism and inflammation in mice with diet-induced obesity, and whether these effects depend on the NAD⁺-dependent enzyme SIRT1. Using control and inducible SIRT1 knockout mice, the authors show that NMN administration mitigates high-fat diet-induced weight gain, enhances energy expenditure, and normalizes fasting glucose and plasma lipid profiles in a largely SIRT1-dependent manner. However, reductions in fat mass and adipose tissue expansion occur independently of SIRT1. Comprehensive plasma proteomic analyses (O-Link and mass spectrometry) reveal that NMN reverses obesity-induced alterations in metabolic and immune pathways, particularly those related to glucose and cholesterol metabolism. Integrative network and causal analyses identify both SIRT1-dependent and -independent protein clusters, as well as potential upstream regulators such as FBXW7, ADIPOR2, and PRDM16. Overall, the study supports that NMN modulates key metabolic and immune pathways through both SIRT1-dependent and alternative mechanisms to alleviate obesity and dyslipidemia in mice.

      Strengths:

      Well-written manuscript, and state-of-the-art proteomics-based methodologies to assess NMN and SIRT1-dependent effects.

      We thank the reviewer for highlighting that state-of-the-art proteomic research methods used, and we report for the first time on significant changes in plasma proteomics in mice after NMN supplementation in both wild-type and SIRT1-KO mice using a combination of DIA mass spectrometry and Olink.

      Weaknesses:

      Unfortunately, the study design, as well as the data analysis approach taken by the authors, are flawed. This limits the authors' ability to make the proposed conclusions.

      We agree that the administration of tamoxifen, along with the associated weight loss, could affect the obesity phenotype. For this reason, we ensured that both Cre-positive and Cre-negative mice received tamoxifen. Importantly, after the tamoxifen 'washout', the two groups weighed essentially the same. Going forward, we plan to address this comment by performing additional statistical tests on all six experimental groups to gain insights into dependencies. Based on your suggestions, we will clarify the limitations of the study design and improve the data analysis approaches to provide stronger support for our conclusions in the revised version of the paper.

      Reviewer #2 (Public review):

      Summary:

      Majeed and colleagues aimed to evaluate whether the metabolic effects of NMN in the context of a high-fat diet are SIRT1 dependent. For this, they used an inducible SIRT1 KO model (SIRT1 iKO), allowing them to bypass the deleterious effects of SIRT1 ablation during development. In line with previous reports, the authors observed that NMN prevents, to some degree, diet-induced metabolic damage in wild-type mice. When doing similar tests on SIRT1 iKO mice, the authors see that some, but not all, of the effects of NMN are abrogated. The phenotypic studies are complemented by plasma proteomic analyses evaluating the influence of the high-fat diet, SIRT1, and NMN on circulating protein profiles.

      Strengths:

      The mechanistic aspects behind the potential health benefits of NAD+ precursors have been poorly elucidated. This is in part due to the pleiotropic actions of NAD-related molecules on cellular processes. While sirtuins, most notably SIRT1, have been largely hypothesized to be key players in the therapeutic actions of NAD+ boosters, the proof for this in vivo is very limited. In this sense, this work is an important contribution to the field.

      We thank the reviewer for acknowledging the importance of this work to the field. In this report, we provide in vivo evidence of the action of NAD+ boosting, and hope to delineate the action of Sirt1, as well as the pleiotropic effects of NAD-related molecules on cellular and metabolic processes.

      Weaknesses:

      While the authors use a suitable methodology (SIRT1 iKO mice), the results show very early that the iKO mice themselves have some notable phenotypes, which complicate the picture. The actions of NMN in WT and SIRT1 KO mice are most often presented separately. However, this is not the right approach to evaluate and visualize SIRT1 dependency. Indeed, many of the "SIRT1-dependent" effects of NMN are consequent to the fact that SIRT1 deletion itself has a phenotype equivalent to or larger than that induced by NMN in wild-type mice. This would have been very evident if the two genotypes had been systematically plotted together. Consequently, and despite the value of the study, the results obtained with this model might not allow for solidly established claims of SIRT1 dependency on NMN actions. The fact that some of the effects of SIRT1 deletion are similar to those of NMN supplementation also makes it counterintuitive to propose that activation of SIRT1 is a major driver of NMN actions. Unbiasedly, one might as well conclude that NMN could act by inhibiting SIRT1. The fact that readouts for SIRT1 activity are not explored makes it also difficult to test the influence of NMN on SIRT1 in their experimental setting, or whether compensations could exist.

      We thank the reviewer for raising this point and acknowledge the limitations of using Sirt1 iKO mice. However, inducing Sirt1 KO in adulthood is a better alternative than using a homozygous Sirt1 KO mouse model, as the latter leads to embryonic lethality and many other developmental defects (1, 2). The proteomics analysis can provide insight into the effects of SIRT1 deletion under chow and high-fat diet (HFD) conditions, as well as the effects of diet in the presence or absence of nicotinamide mononucleotide (NMN). We will discuss these limitations and present the results for the two genotypes together, as suggested.

      A second weak point is that the proteomic explorations are interesting, yet feel too descriptive and disconnected from the overall phenotype or from the goal of the manuscript. It would be unreasonable to ask for gain/loss-of-function experiments based on the differentially abundant peptides. Yet, a deeper exploration of whether their altered presence in circulation is consistent with changes in their expression - and, if so, in which tissues - and a clearer discussion on their link to the phenotypes observed would be needed, especially for changes related to SIRT1 and NMN.

      First, we presented the data in this manner as a proof of concept, to demonstrate the effect of the diet on the plasma proteome and corroborate our findings with those published in the literature. We then investigated the effects of NAD boosting and Sirt1 KO in order to identify significant changes. We agree with the reviewer that it would be unreasonable to validate all the differentially abundant proteins. However, we will choose key proteins and assess their expression in different tissues, such as the liver, white adipose tissue (WAT) and muscles, and attempt to connect these changes with the phenotypes.

      Impact on the field and further significance of the work:

      Despite the fact that, in my opinion, the authors might not have conclusively achieved their main aim, there are multiple valuable aspects in this manuscript:

      (1) It provides independent validation for the potential benefits of NAD+ boosters in the context of diet-induced metabolic complications. Previous efforts using NR or NMN itself have provided contradicting observations. Therefore, additional independent experiments are always valuable to further balance the overall picture.

      (2) The metabolic consequences of deleting SIRT1 in adulthood have been poorly explored in previous works. Therefore, irrespective of the actions of NMN, the phenotypes observed are intriguing, and the proteomic differences are also large enough to spur further research to understand the role of SIRT1 as a therapeutic target.

      (3) Regardless of the influence of SIRT1, NMN promotes some plasma proteomic changes that are very well worth exploring. In addition, they highlight once more that the in vivo actions of NMN, as those of other NAD+ boosters, are pleiotropic. Hence, this work brings into question whether single gene KO models are really a good approach to explore the mechanisms of action of NAD+ precursors.

      We thank the reviewer for their analysis in highlighting the valuable aspects of the manuscript and we hope the revised manuscript will further strengthen the key results.

      References:

      (1) McBurney   MW, Yang   X, Jardine   K, Hixon   M, Boekelheide   K, Webb   JR, Lansdorp   PM, Lemieux   M. The mammalian SIR2alpha protein has a role in embryogenesis and gametogenesis. Mol Cell Biol  2003; 23:38–54.

      (2) Cheng   HL, Mostoslavsky   R, Saito   S, Manis   JP, Gu   Y, Patel   P, Bronson   R, Appella   E, Alt   FW, Chua   KF. Developmental defects and p53 hyperacetylation in Sir2 homolog (SIRT1)-deficient mice. Proc Natl Acad Sci U S A  2003; 100:10794–10799.

    1. eLife assessment

      The authors identify a novel, conserved link between glycolytic flux and sulfur amino acid metabolism that governs fungal morphological differentiation independently of the cAMP-PKA pathway. This represents an important conceptual advance in understanding metabolic control of development and virulence. While the evidence supporting this connection is compelling, the mechanistic basis of how glycolysis regulates the Met30/Met4 axis requires further experimental clarification.

    2. Reviewer #1 (Public review):

      Summary:

      Fungal survival and pathogenicity rely on the ability to undergo reversible morphological transitions, which are often linked to nutrient availability. In this study, the authors uncover a conserved connection between glycolytic activity and sulfur amino acid biosynthesis that drives morphogenesis in two fungal model systems. By disentangling this process from canonical cAMP signaling, the authors identify a new metabolic axis that integrates central carbon metabolism with developmental plasticity and virulence.

      Strengths:

      The study integrates different experimental approaches, including genetic, biochemical, transcriptomic, and morphological analyses, and convincingly demonstrates that perturbations in glycolysis alter sulfur metabolic pathways and thus impact pseudohyphal and hyphal differentiation. Overall, this work offers new and important insights into how metabolic fluxes are intertwined with fungal developmental programs and therefore opens new perspectives to investigate morphological transitioning in fungi.

      Weaknesses:

      A few aspects could be improved to strengthen the conclusions. Firstly, the striking transcriptomic changes observed upon 2DG treatment should be analyzed in S. cerevisiae adh1 and pfk1 deletion strains, for instance, through qPCR or western blot analyses of sulfur metabolism genes, to confirm that observed changes in 2DG conditions mirror those seen in genetic mutants. Secondly, differences between methionine and cysteine in their ability to rescue the mutant phenotype in both species are not mentioned, nor discussed in more detail. This is especially important as there seem to be differences between S. cerevisiae and C. albicans, which might point to subtle but specific metabolic adaptations.

      The authors are also encouraged to refine several figure elements for clarity and comparability (e.g., harmonized axes in bar plots), condense the discussion to emphasize the conceptual advances over a summary of the results, and shorten figure legends.

    3. Reviewer #2 (Public review):

      Summary:

      This manuscript investigates the interplay between glycolysis and sulfur metabolism in regulating fungal morphogenesis and virulence. Using both Saccharomyces cerevisiae and Candida albicans, the authors demonstrate that glycolytic flux is essential for morphogenesis under nitrogen-limiting conditions, acting independently of the established cAMP-PKA pathway. Transcriptomic and genetic analyses reveal that glycolysis influences the de novo biosynthesis of sulfur-containing amino acids, specifically cysteine and methionine. Notably, supplementation with sulfur sources restores morphogenetic and virulence defects in glycolysis-deficient mutants, thereby linking core carbon metabolism with sulfur assimilation and fungal pathogenicity.

      Strengths:

      The work identifies a previously uncharacterized link between glycolysis and sulfur metabolism in fungi, bridging metabolic and morphogenetic regulation, which is an important conceptual advance and fungal pathogenicity. Demonstrating that adding cysteine supplementation rescues virulence defects in animal models connects basic metabolism to infection outcomes, which adds to biomedical importance.

      Weaknesses:

      The proposed model that glycolytic flux modulates Met30 activity post-translationally remains speculative. While data support Met4 stabilization in met30 deletion strains, the mechanism of Met30 modulation by glycolysis is not demonstrated.

    4. Reviewer #3 (Public review):

      This study investigates the connection between glycolysis and the biosynthesis of sulfur-containing amino acids in controlling fungal morphogenesis, using Saccharomyces cerevisiae and C. albicans as model organisms. The authors identify a conserved metabolic axis that integrates glycolysis with cysteine/methionine biosynthetic pathways to influence morphological transitions. This work broadens the current understanding of fungal morphogenesis, which has largely focused on gene regulatory networks and cAMP-dependent signaling pathways, by emphasizing the contribution of metabolic control mechanisms. However, despite the novel conceptual framework, the study provides limited mechanistic characterization of how the sulfur metabolism and glycolysis blockade directly drive morphological outcomes. In particular, the rationale for selecting specific gene deletions, such as Met32 (and not Met4), or the Met30 deletion used to probe this pathway, is not clearly explained, making it difficult to assess whether these targets comprehensively represent the metabolic nodes proposed to be critical. Further supportive data and experimental validation would strengthen the claims on connections between glycolysis, sulfur amino acid metabolism, and virulence.

      Strengths:

      (1) The delineation of how glycolytic flux regulates fungal morphogenesis through a cAMP-independent mechanism is a significant advancement. The coupling of glycolysis with the de novo biosynthesis of sulfur-containing amino acids, a requirement for morphogenesis, introduces a novel and unexpected layer of regulation.

      (2) Demonstrating this mechanism in both S. cerevisiae and C. albicans strengthens the argument for its evolutionary conservation and biological importance.

      (3) The ability to rescue the morphogenesis defect through exogenous supplementation of sulfur-containing amino acids provides functional validation.

      (4) The findings from the murine Pfk1-deficient model underscore the clinical significance of metabolic pathways in fungal infections.

      Weaknesses:

      (1) While the link between glycolysis and sulfur amino acid biosynthesis is established via transcriptomic and proteomic analysis, the specific regulation connecting these pathways via Met30 remains to be elucidated. For example, what are the expression and protein levels of Met30 in the initial analysis from Figure 2? How specific is this effect on Met30 in anaerobic versus aerobic glycolysis, especially when the pentose phosphate pathway is involved in the growth of the cells when glycolysis is perturbed?

      (2) Including detailed metabolite profiling could have strengthened the metabolic connection and provided additional insights into intermediate flux changes, i.e., measuring levels of metabolites to check if cysteine or methionine levels are influenced intracellularly. Also, it is expected to see how Met30 deletion could affect cell growth. Data on Met30 deletion and its effect on growth are not included, especially given that a viable heterozygous Met30 strain has been established. Measuring the cysteine or methionine levels using metabolomic analysis would further strengthen the claims in every section.

      (3) In comparison with the previous bioRxiv (doi: https://doi.org/10.1101/2025.05.14.654021) of this article in May 2025 to the recent bioRxiv of this article (doi: https://doi.org/10.1101/2025.05.14.654021), there have been some changes, and Met30 deletion has been recently included, and the chemical perturbation of glycolysis has been added as new data. Although the changes incorporated in the recent version of the article improved the illustration of the hypothesis in Figure 6, which connects glycolysis to Sulfur metabolism, the gene expression and protein levels of all genes involved in the illustrated hypothesis are not consistently shown. For example, in some cases, the Met4 expression is not shown (Figure 4), and the Met30 expression is not shown during profiling (gene expression or protein levels) throughout the manuscript. Lack of consistency in profiling the same set of key genes makes understanding more complicated.

      (4) The demonstrated link between glycolysis and sulfur amino acid biosynthesis, along with its implications for virulence in C. albicans, is important for understanding fungal adaptation, as mentioned in the article; however, the Met4 activation was not fully characterized, nor were the data presented when virulence was assessed in Figure 4. Why is Met4 not included in Figure 4D and I? Especially, according to Figure 6, Met4 activation is crucial and guides the differences between glycolysis-active and inactive conditions.

      (5) Similarly, the rationale behind selecting Met32 for characterizing sulfur metabolism is unclear. Deletion of Met32 resulted in a significant reduction in pseudohyphal differentiation; why is this attributed only to Met32? What happens if Met4 is deleted? It is not justified why Met32, rather than Met4, was chosen. Figure 6 clearly hypothesizes that Met4 activation is the key to the mechanism.

      (6) The comparative RT-qPCR in Figure 5 did not account for sulfur metabolism genes, whereas it was focused only on virulence and hyphal differentiation. Is there data to support the levels of sulfur metabolism genes?

      (7) To validate the proposed interlink between sulfur metabolism and virulence, it is recommended that the gene sets (illustrated in Figure 6) be consistently included across all comparative data included throughout the comparisons. Excluding sulfur metabolism genes in Figure 5 prevents the experiment from demonstrating the coordinated role of glycolysis perturbation → sulfur metabolism → virulence. The same is true for other comparisons, where the lack of data on Met30, Met4, etc., makes it hard to connect the hypothesis. It is also recommended to check the gene expression of other genes related to the cAMP pathway and report them to confirm the cAMP-independent mechanism. For example, gap2 deletion was used to confirm the effects of cAMP supplementation, but the expression of this gene was not assessed in the RNA-seq analysis in Figure 2. It would be beneficial to show the expression of cAMP-related genes to completely confirm that they do not play a role in the claims in Figure 2.

      (8) Although the NAC supplementation study is included in the new version of the article compared to the previous version in BioRxiv (May 2025), the link to sulfur metabolism is not well characterized in Figure 5 and their related datasets. The main focus of the manuscript is to delineate the role of sulfur metabolism; hence, it is anticipated that Figure 5 will include sulfur-related metabolic genes and their links to pfk1 deletion, using RT-PCR measurements as shown for the virulence genes.

      (9) The manuscript would benefit from more information added to the introduction section and literature supports for some of the findings reported earlier, including the role of (i) cAMP-PKA and MAPK pathways, (ii) what is known in the literature that reports about the treatment with 2DG (role of Snf1, HXT1, and HXT3), as well as how gpa2 is involved. Some sentences in the manuscripts are repetitive; it would be beneficial to add more relevant sections to the introduction and discussion to clarify the rationale for gene choices.

    1. eLife Assessment

      This valuable work investigates the role of protein N-glycosylation in regulating T-cell activation and function and suggests that B4GALT1 is a potential target for tumor immunotherapy. The strength of evidence is solid, and further mechanistic validation could be provided.

    2. Reviewer #1 (Public review):

      Summary:

      The study by Yu et al investigated the role of protein N-glycosylation in regulating T-cell activation and functions is an interesting work. By using genome-wide CRISPR/Cas9 screenings, the authors found that B4GALT1 deficiency could activate expression of PD-1 and enhance functions of CD8+ T cells both in vitro and in vivo, suggesting the important roles of protein N-glycosylation in regulating functions of CD8+ T cells, which indicates that B4GALT1 is a potential target for tumor immunotherapy.

      Strengths:

      The strengths of this study are the findings of novel function of B4GALT1 deficiency in CD8 T cells.

      Weaknesses:

      However, authors did not directly demonstrate that B4GALT1 deficiency regulates the interaction between TCR and CD8, as well as functional outcomes of this interaction, such as TCR signaling enhancements.

    3. Reviewer #2 (Public review):

      Summary:

      In this study, the authors identify the N-glycosylation factor B4GALT1 as an important regulator of CD8 T-cell function.

      Strengths:

      (1) The use of complementary ex vivo and in vivo CRISPR screens is commendable and provides a useful dataset for future studies of CD8 T-cell biology.

      (2) The authors perform multiple untargeted analyses (RNAseq, glycoproteomics) to hone their model on how B4GALT1 functions in CD8 T-cell activation.

      (3) B4GALT1 is shown to be important in both in vitro T-cell killing assays and a mouse model of tumor control, reinforcing the authors' claims.

      Weaknesses:

      (1) The authors did not verify the efficiency of knockout in their single-gene KO lines.

      (2) As B4GALT1 is a general N-glycosylation factor, the phenotypes the authors observe could formally be attributable to indirect effects on glycosylation of other proteins.

      (3) The specific N-glycosylation sites of TCR and CD8 are not identified, and would be helpful for site-specific mutational analysis to further the authors' model.

      (4) The study could benefit from further in vivo experiments testing the role of B4GALT1 in other physiological contexts relevant to CD8 T cells, for example, autoimmune disease or infectious disease.

    4. Author response:

      Reviewer #1 (Public review):

      Summary:

      The study by Yu et al investigated the role of protein N-glycosylation in regulating T-cell activation and functions is an interesting work. By using genome-wide CRISPR/Cas9 screenings, the authors found that B4GALT1 deficiency could activate expression of PD-1 and enhance functions of CD8+ T cells both in vitro and in vivo, suggesting the important roles of protein N-glycosylation in regulating functions of CD8+ T cells, which indicates that B4GALT1 is a potential target for tumor immunotherapy.

      Strengths:

      The strengths of this study are the findings of novel function of B4GALT1 deficiency in CD8 T cells.

      Weaknesses:

      However, authors did not directly demonstrate that B4GALT1 deficiency regulates the interaction between TCR and CD8, as well as functional outcomes of this interaction, such as TCR signaling enhancements.

      We are very sorry that we did not highlight our results in Fig. 5f-h enough. In those figures, we demonstrated the interaction between TCR and CD8 increased significantly in B4GALT1 deficient T-cells, by FRET assays. To confirm the important role of TCR-CD8 interaction in mediating the functions of B4GALT1 in regulating T-cell functions, such as in vitro killing of target cells, we artificially tethered TCR and CD8 by a CD8β-CD3ε fusion protein and tested its functions in both WT and B4GALT1 knockout CD8<sup>+</sup> T-cell. Our results demonstrate that such fusion protein could bypass the effect of B4GALT1 knockout in CD8<sup>+</sup>T-cells (Fig. 5g-h). Together with the results that B4GALT1 directly regulates the galactosylation of TCR and CD8, those results strongly support the model that B4GALT1 modulates T-cell functions mainly by galactosylations of TCR and CD8 that interfere their interaction.

      Reviewer #2 (Public review):

      Summary:

      In this study, the authors identify the N-glycosylation factor B4GALT1 as an important regulator of CD8 T-cell function.

      Strengths:

      (1) The use of complementary ex vivo and in vivo CRISPR screens is commendable and provides a useful dataset for future studies of CD8 T-cell biology.

      (2) The authors perform multiple untargeted analyses (RNAseq, glycoproteomics) to hone their model on how B4GALT1 functions in CD8 T-cell activation.

      (3) B4GALT1 is shown to be important in both in vitro T-cell killing assays and a mouse model of tumor control, reinforcing the authors' claims.

      Weaknesses:

      (1) The authors did not verify the efficiency of knockout in their single-gene KO lines.

      Thank reviewer for reminding. We verified the efficiency of some gRNAs by FACS and Surveyor assay. We will add those data in supplementary results in revised version later.

      (2) As B4GALT1 is a general N-glycosylation factor, the phenotypes the authors observe could formally be attributable to indirect effects on glycosylation of other proteins.

      please see response to reviewer #1.

      (3) The specific N-glycosylation sites of TCR and CD8 are not identified, and would be helpful for site-specific mutational analysis to further the authors' model.

      Thank reviewer for suggestion! Unfortunately, there are multiple-sites of TCR and CD8 involved in N-glycosylation (https://glycosmos.org/glycomeatlas). We worry that mutations of all these sites may not only affect glycosylation of TCR and CD8 but also other essential functions of those proteins.

      (4) The study could benefit from further in vivo experiments testing the role of B4GALT1 in other physiological contexts relevant to CD8 T cells, for example, autoimmune disease or infectious disease.

      Thank reviewer for this great suggestion to expand the roles of B4GALT1 in autoimmune and infection diseases. However, since in current manuscript we are mainly focusing on tumor immunology, we think we should leave these studies for future works.

    1. eLife Assessment

      This study presents SegPore, a valuable new method for processing direct RNA nanopore sequencing data, which improves the segmentation of raw signals into individual bases and boosts the accuracy of modified base detection. The evidence presented to benchmark SegPore is solid, and the authors provide a fully documented implementation of the method. SegPore will be of particular interest to researchers studying RNA modifications.

    2. Reviewer #1 (Public review):

      Summary:

      In this manuscript, the authors describe a new computational method (SegPore), which segments the raw signal from nanopore direct RNA-Seq data to improve the identification of RNA modifications. In addition to signal segmentation, SegPore includes a Gaussian Mixture Model approach to differentiate modified and unmodified bases. SegPore uses Nanopolish to define a first segmentation, which is then refined into base and transition blocks. SegPore also includes a modification prediction model that is included in the output. The authors evaluate the segmentation in comparison to Nanopolish and Tombo (RNA002) as well as f5c and Uncalled 4 (RNA004), and they evaluate the impact on m6A RNA modification detection using data with known m6A sites. In comparison to existing methods, SegPore appears to improve the ability to detect m6A, suggesting that this approach could be used to improve the analysis of direct RNA-Seq data.

      Strengths:

      SegPore address an important problem (signal data segmentation). By refining the signal into transition and base blocks, noise appears to be reduced, leading to improved m6A identification at the site level as well as for single read predictions. The authors provide a fully documented implementation, including a GPU version that reduces run time. The authors provide a detailed methods description, and the approach to refine segments appears to be new.

    3. Reviewer #2 (Public review):

      Summary:

      The work seeks to improve detection of RNA m6A modifications using Nanopore sequencing through improvements in raw data analysis. These improvements are said to be in the segmentation of the raw data, although the work appears to position the alignment of raw data to the reference sequence and some further processing as part of the segmentation, and result statistics are mostly shown on the 'data-assigned-to-kmer' level.

      As such, the title, abstract and introduction stating the improvement of just the 'segmentation' does not seem to match the work the manuscript actually presents, as the wording seems a bit too limited for the work involved.

      The work itself shows minor improvements in m6Anet when replacing Nanopolish' eventalign with this new approach, but clear improvements in the distributions of data assigned per kmer. However, these assignments were improved well enough to enable m6A calling from them directly, both at site-level and at read-level.

      A large part of the improvements shown appear to stem from the addition of extra, non-base/kmer specific, states in the segmentation/assignment of the raw data, removing a significant portion of what can be considered technical noise for further analysis. Previous methods enforced assignment of (almost) all raw data, forcing a technically optimal alignment that may lead to suboptimal results in downstream processing as datapoints could be assigned to neighbouring kmers instead, while random noise that is assigned to the correct kmer may also lead to errors in modification detection.

      For an optimal alignment between the raw signal and the reference sequence, this approach may yield improvements for downstream processing using other tools.

      Additionally, the GMM used for calling the m6A modifications provides a useful, simple and understandable logic to explain the reason a modification was called, as opposed to the black models that are nowadays often employed for these types of tasks.

      Appraisal:

      The authors have shown their methods ability to identify noise in the raw signal and remove their values from the segmentation and alignment, reducing its influences for further analyses. Figures directly comparing the values per kmer do show a visibly improved assignment of raw data per kmer. As a replacement for Nanopolish' eventalign it seems to have a rather limited, but improved effect, on m6Anet results. At the single read level modification modification calling this work does appear to improve upon CHEUI.

    4. Reviewer #3 (Public review):

      Summary:

      Nucleotide modifications are important regulators of biological function, however, until recently, their study has been limited by the availability of appropriate analytical methods. Oxford Nanopore direct RNA sequencing preserves nucleotide modifications, permitting their study, however many different nucleotide modifications lack an available base-caller to accurately identify them. Furthermore, existing tools are computationally intensive, and their results can be difficult to interpret.

      Cheng et al. present SegPore, a method designed to improve the segmentation of direct RNA sequencing data and boost the accuracy of modified base detection.

      Strengths:

      This method is well described and has been benchmarked against a range of publicly available base callers that have been designed to detect modified nucleotides.

      Comment from the Reviewing Editor:

      The authors have provided responses to the weaknesses highlighted previously and the reviewers were not asked to comment. The authors have now requested a Version of Record.

    5. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this manuscript, the authors describe a new computational method (SegPore), which segments the raw signal from nanopore direct RNA-Seq data to improve the identification of RNA modifications. In addition to signal segmentation, SegPore includes a Gaussian Mixture Model approach to differentiate modified and unmodified bases. SegPore uses Nanopolish to define a first segmentation, which is then refined into base and transition blocks. SegPore also includes a modification prediction model that is included in the output. The authors evaluate the segmentation in comparison to Nanopolish and Tombo (RNA002) as well as f5c and Uncalled 4 (RNA004), and they evaluate the impact on m6A RNA modification detection using data with known m6A sites. In comparison to existing methods, SegPore appears to improve the ability to detect m6A, suggesting that this approach could be used to improve the analysis of direct RNA-Seq data.

      Strengths:

      SegPore address an important problem (signal data segmentation). By refining the signal into transition and base blocks, noise appears to be reduced, leading to improved m6A identification at the site level as well as for single read predictions. The authors provide a fully documented implementation, including a GPU version that reduces run time. The authors provide a detailed methods description, and the approach to refine segments appears to be new.

      Weaknesses:

      The authors show that SegPore reduces noise compared to other methods, however the improvement in accuracy appears to be relatively small for the task of identifying m6A. To run SegPore, the GPU version is essential, which could limit the application of this method in practice.

      As discussed in Paragraph 4 of the Discussion, we acknowledge that the improvement of SegPore combined with m6Anet over Nanopolish+m6Anet in bulk in vivo analysis is modest. This outcome is likely influenced by several factors, including alignment inaccuracies caused by pseudogenes or transcript isoforms, the presence of additional RNA modifications that can affect signal baselines, and the fact that m6Anet is specifically trained on Nanopolish-derived events. Additionally, the absence of a modification-free (in vitro transcribed) control sample in the benchmark dataset makes it challenging to establish true k-mer baselines.

      Importantly, these challenges do not exist for in vitro data, where the signal is cleaner and better defined. As a result, SegPore achieves a clear and substantial improvement at the single-molecule level, demonstrating the strength of its segmentation approach and its potential to significantly enhance downstream analyses. These results indicate that SegPore is particularly well suited for benchmarking and mechanistic studies of RNA modifications under controlled experimental conditions, and they provide a strong foundation for future developments.

      We also recognize that the current requirement for GPU acceleration may limit accessibility in some computational environments. To address this, we plan to further optimize SegPore in future versions to support efficient CPU-only execution, thereby broadening its applicability and impact.

      Reviewer #2 (Public review):

      Summary:

      The work seeks to improve detection of RNA m6A modifications using Nanopore sequencing through improvements in raw data analysis. These improvements are said to be in the segmentation of the raw data, although the work appears to position the alignment of raw data to the reference sequence and some further processing as part of the segmentation, and result statistics are mostly shown on the 'data-assigned-to-kmer' level.

      As such, the title, abstract and introduction stating the improvement of just the 'segmentation' does not seem to match the work the manuscript actually presents, as the wording seems a bit too limited for the work involved.

      The work itself shows minor improvements in m6Anet when replacing Nanopolish' eventalign with this new approach, but clear improvements in the distributions of data assigned per kmer. However, these assignments were improved well enough to enable m6A calling from them directly, both at site-level and at read-level.

      A large part of the improvements shown appear to stem from the addition of extra, non-base/kmer specific, states in the segmentation/assignment of the raw data, removing a significant portion of what can be considered technical noise for further analysis. Previous methods enforced assignment of (almost) all raw data, forcing a technically optimal alignment that may lead to suboptimal results in downstream processing as datapoints could be assigned to neighbouring kmers instead, while random noise that is assigned to the correct kmer may also lead to errors in modification detection.

      For an optimal alignment between the raw signal and the reference sequence, this approach may yield improvements for downstream processing using other tools.

      Additionally, the GMM used for calling the m6A modifications provides a useful, simple and understandable logic to explain the reason a modification was called, as opposed to the black models that are nowadays often employed for these types of tasks.

      Weaknesses:

      The manuscript suggests the eventalign results are improved compared to Nanopolish. While this is believably shown to be true (Table 1), the effect on the use case presented, downstream differentiation between modified and unmodified status on a base/kmer, is likely limited for during downstream modification calling the noisy distributions are often 'good enough'. E.g. Nanopolish uses the main segmentation+alignment for a first alignment and follows up with a form of targeted local realignment/HMM test for modification calling (and for training too), decreasing the need for the near-perfect segmentation+alignment this work attempts to provide. Any tool applying a similar strategy probably largely negates the problems this manuscript aims to improve upon. Should a use-case come up where this downstream optimisation is not an option, SegPore might provide the necessary improvements in raw data alignment.

      Thank you for this thoughtful comment. We agree that many current state-of-the-art (SOTA) methods perform well on benchmark datasets, but we believe there is still substantial room for improvement. Most existing benchmarks are based on limited datasets, primarily focusing on DRACH motifs in human and mouse transcriptomes. However, m6A modifications can also occur in non-DRACH motifs, where current models tend to underperform. Furthermore, other RNA modifications, such as pseudouridine, inosine, and m5C, remain less studied, and their detection is likely to benefit from more accurate and informative signal modeling.

      It is also important to emphasize that raw signal segmentation and RNA modification detection are fundamentally distinct tasks. SegPore focuses on improving the segmentation step by producing a cleaner and more interpretable signal, which provides a stronger foundation for downstream analyses. Even if RNA modification detection algorithms such as m6Anet can partially compensate for noisy segmentation in specific cases, starting from a more accurate signal alignment can still lead to improved accuracy, robustness, and interpretability—particularly in challenging scenarios such as non-canonical motifs or less characterized modifications.

      Scientific progress in this field is often incremental, and foundational improvements can have a significant long-term impact. By enhancing raw signal segmentation, SegPore contributes an essential building block that we expect will enable the development of more accurate and generalizable RNA modification detection algorithms as the community integrates it into more advanced workflows.

      Appraisal:

      The authors have shown their methods ability to identify noise in the raw signal and remove their values from the segmentation and alignment, reducing its influences for further analyses. Figures directly comparing the values per kmer do show a visibly improved assignment of raw data per kmer. As a replacement for Nanopolish' eventalign it seems to have a rather limited, but improved effect, on m6Anet results. At the single read level modification modification calling this work does appear to improve upon CHEUI.

      Impact:

      With the current developments for Nanopore based modification calling largely focusing on Artificial Intelligence, Neural Networks and the likes, improvements made in interpretable approaches provide an important alternative that enables deeper understanding of the data rather than providing a tool that plainly answers the question of wether a base is modified or not, without further explanation. The work presented is best viewed in context of a workflow where one aims to get an optimal alignment between raw signal data and the reference base sequence for further processing. For example, as presented, as a possible replacement for Nanopolish' eventalign. Here it might enable data exploration and downstream modification calling without the need for local realignments or other approaches that re-consider the distribution of raw data around the target motif, such as a 'local' Hidden Markov Model or Neural Networks. These possibilities are useful for a deeper understanding of the data and further tool development for modification detection works beyond m6A calling.

      Reviewer #3 (Public review):

      Summary:

      Nucleotide modifications are important regulators of biological function, however, until recently, their study has been limited by the availability of appropriate analytical methods. Oxford Nanopore direct RNA sequencing preserves nucleotide modifications, permitting their study, however many different nucleotide modifications lack an available base-caller to accurately identify them. Furthermore, existing tools are computationally intensive, and their results can be difficult to interpret.

      Cheng et al. present SegPore, a method designed to improve the segmentation of direct RNA sequencing data and boost the accuracy of modified base detection.

      Strengths:

      This method is well described and has been benchmarked against a range of publicly available base callers that have been designed to detect modified nucleotides.

      Weaknesses:

      However, the manuscript has a significant drawback in its current version. The most recent nanopore RNA base callers can distinguish between different ribonucleotide modifications, however, SegPore has not been benchmarked against these models.

      The manuscript would be strengthened by benchmarking against the rna004_130bps_hac@v5.1.0 and rna004_130bps_sup@v5.1.0 dorado models, which are reported to detect m5C, m6A_DRACH, inosine_m6A and PseU.

      A clear demonstration that SegPore also outperforms the newer RNA base caller models will confirm the utility of this method.

      Thank you for highlighting this important limitation. While Dorado, the new ONT basecaller, is publicly available and supports modification-aware basecalling, suitable public datasets for benchmarking m5C, inosine, m6A, and PseU detection on RNA004 are currently lacking. Dorado’s modification-aware models are trained on ONT’s internal data, which is not publicly released. Therefore, it is currently not feasible to directly evaluate or compare SegPore’s performance against Dorado for these RNA modifications.

      We would also like to emphasize that SegPore’s primary contribution lies in raw signal segmentation, which is an upstream and foundational step in the RNA modification detection pipeline. As more publicly available datasets for RNA004 modification detection become accessible, we plan to extend our work to benchmark and integrate SegPore with modification detection tasks on RNA004 data in future studies.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      Comments based on Author Response

      “However, it is valid to compare them on the segmentation task, where SegPore exhibits better performance (Table 1).”

      This dodges the point of the actual use case of this approach, as Nanopolish indeed does not support calling modifications for this kind of data, but the general approach it uses might, if adapted for this data, nullify the gains made in the examples presented.

      We respectfully disagree with the comment that the advantages demonstrated by SegPore could be “nullified”. Although SegPore’s performance is indeed more modest in in vivo datasets, it shows substantially better performance than CHEUI in in vitro data, clearly demonstrating that improved segmentation directly contributes to more accurate RNA modification estimation.

      It is worth noting that CHEUI relies on Nanopolish’s segmentation results for m6A detection. Despite this, SegPore outperforms CHEUI, further supporting the conclusion that segmentation quality has a meaningful impact on downstream modification calling.

      In conclusion, based on our current experimental results, SegPore is particularly well suited for RNA modification analysis from in vitro transcribed data, where its improved segmentation provides a clear advantage over existing methods.

      Further comments

      (2) “(2) Page 3  employ models like Hidden Markov Models (HMM) to segment the signal, but they are prone to noise and inaccuracies”

      “That's the alignment/calling part, not the segmentation?”

      “Current methods, such as Nanopolish, employ models like Hidden Markov Models (HMM) to segment the signal”

      I get the impression the word 'segment' has a different meaning in this work than what I'm used to based on my knowledge around Nanopolish and Tombo, see the deeper code examples further down below.

      Additionally, in Nanopolish there is a clear segmentation step (or event detection) without any HMM, then a sort of dynamic timewarping step that aligns the segments and re-combines some segments into a single segment where necessary afterwards. I believe the HMM in Nanopolish is not used at all unless modification calling, but if you can point out otherwise I'm open for proof.

      Now I believe it is the meaning of 'segmenting the signal' that confuses me, and now the clarification makes it a bit odd as well:

      “Nanopolish and Tombo align the raw signal to the reference sequence to determine which portion of the signal corresponds to each k-mer. We define this process as the segmentation task, referred to as "eventalign" in Nanopolish.”

      So now it's clearly stated the raw signal is being 'aligned' and then the process is suddenly defined as the 'segmentation task', and again referred to as "eventalign". Why is it not referred to as the 'alignment task' instead?

      I understand the segmentation and alignment parts are closely connected but to me, it seems this work picks the wrong word for the problem being solved.

      “Unlike Nanopolish and Tombo, which directly align the raw signal to the reference sequence,…”

      Looking at their code, I believe both Nanopolish and Tombo actually do segment the data first (or "event detection"), then they align the segments/events they found, and finally multiple events aligned to the same section are merged. See for yourself:

      Nanopolish:

      https://github.com/jts/nanopolish/blob/master/src/nanopolish_squiggle_read.cpp<br /> Line 233:

      cpp

      trim_and_segment_raw(fast5_data.rt, trim_start, trim_end, varseg_chunk, varseg_thresh);

      event_table et = detect_events(fast5_data.rt, *ed_params);

      Line 270:

      cpp

      // align events to the basecalled read

      std::vector event_alignment = adaptive_banded_simple_event_align(*this, *this->base_model[strand_idx], read_sequence);

      Where event detection is further defined at line 268 here:

      https://github.com/jts/nanopolish/blob/master/src/thirdparty/scrappie/event_detection.c

      Tombo:

      https://github.com/nanoporetech/tombo/blob/master/tombo/resquiggle.py

      line 1162 and onwards shows a ‘segment_signal’ call and the results are used in a ‘find_adaptive_base_assignment’ call, where ‘segment_signal’ starting at line 1057 tries to find where the signal jumps from a series of similar values to another (start of a base change in the pore), stored in ‘valid_cpts’, and the ‘find_adaptive_base_assignment’ tries to align the resulting segment values to the expected series of values:

      python

      valid_cpts, norm_signal, new_scale_values = segment_signal(

      map_res, num_events, rsqgl_params, outlier_thresh, const_scale)

      event_means = ts.compute_base_means(norm_signal, valid_cpts)

      dp_res = find_adaptive_base_assignment(

      valid_cpts, event_means, rsqgl_params, std_ref, map_res.genome_seq,

      start_clip_bases=map_res.start_clip_bases,

      seq_samp_type=seq_samp_type, reg_id=map_res.align_info.ID)

      These implementations are also why I find the choice of words for what is segmentation and what is alignment a bit confusing in this work, as both Tombo and Nanopolish do a similar, clear segmentation step (or an "event detection" step), followed by the alignment of the segments they determined. The terminology in this work appears to deviate from these.

      We thank the reviewer for the detailed comments!

      First of all, we sincerely apologize for our earlier misunderstanding regarding how Nanopolish and Tombo operate. Based on a closer examination of their source codes, we now recognize that both tools indeed include a segmentation step based on change-point detection methods, after which the resulting segments are aligned to the reference sequence. We have revised the relevant text in the manuscript accordingly:

      - “Current methods, such as Nanopolish, employ change-point detection methods to segment the signal and use dynamic programming methods and HMM to align the derived segments to the reference sequence,”

      - “We define this process as the segmentation and alignment task (abbreviated as the segmentation task), which is referred to as “eventalign” in Nanopolish.”

      - “In SegPore, we segment the raw signal into small fragments using a Hierarchical Hidden Markov Model (HHMM) and align the mean values of these fragments to the reference, where each fragment corresponds to a sub-state of a k-mer. By contrast, Nanopolish and Tombo use change-point–based methods to segment the signal and employ dynamic programming approaches together with profile HMMs to align the resulting segments to the reference sequence.”

      Regarding terminology, we originally borrowed the term “segmentation” from speech processing, where it refers to dividing continuous audio signals into meaningful units. In the context of nanopore signal analysis, segmentation and alignment are often tightly coupled steps. Because of this and because our initial focus was on methodological development rather than terminology, we used the term “segmentation task” to describe the combined process of signal segmentation and alignment.

      However, we now recognize that this terminology may cause confusion. Changing every instance of “segmentation” to “segmentation and alignment” or “alignment” would require substantial rewriting of the manuscript. Therefore, in this revision, we have clearly defined “segmentation task” as referring to the combined process of segmentation and alignment. We apologize for any earlier confusion and will adopt the term “alignment” in future work for greater clarity.

      (3) I think I do understand the meaning, but I do not understand the relevance of the Aj bit in the last sentence. What is it used for?

      Based on the response and another close look at Fig1, it turns out the j refers to extremely small numbers 1 and 2 in step 3. You may want in improve readability for these.

      Thank you for the suggestion. We have added subscripts to all nucleotides in the reference sequence in Figure 1A and revised the legend to clarify the notation and improve readability. Specifically, we now include the following explanation:

      “For example, A<sub>j</sub> denotes the base ‘A’ at the j-th position on the reference sequence. In this example, A<sub>1</sub> and A<sub>2</sub> refer to the first and second occurrences of ‘A’ in the reference sequence, respectively. Accordingly, μ<sub>1</sub> and μ<sub>2</sub> are aligned to A<sub>1</sub>, while μ<sub>3</sub> is aligned to A<sub>2</sub>”.

      (6) “We chose to use the poly(A) tail for normalization because it is sequence-invariant- i.e., all poly(A) tails consist of identical k-mers, unlike transcript sequences which vary in composition. In contrast, using the transcript region for normalization can introduce biases: for instance, reads with more diverse k-mers (having inherently broader signal distributions) would be forced to match the variance of reads with more uniform k-mers, potentially distorting the baseline across k-mers.”

      While the next part states there was a benchmark showing SegPore still works without this normalization, I think this answer does not touch upon the underlying issue I'm trying to point out here.

      - The biases mentioned here due to a more diverse (or different) subsets of k-mers in a read indeed affects the variance of the signal overall.

      - As I pointed out in my earlier remark here, this can be resolved using an approach of 'general normalization', 'mapping to expected signal', 'theil-sen fitting of scale and offset', 're-mapping to expected signal', as Tombo and Nanopolish have implemented.<br /> - Alternatively, one could use the reference sequence (using the read mapping information) and base the expected signal mean and standard deviation on that instead.

      - The polyA tail stability as an indicator for the variation in the rest of the signal seems a questionable assumption to me. A 'noisy' pore could introduce a large standard deviation using the polyA tail without increasing the deviations on the signal induced by the variety of k-mers, rather it would be representative for the deviations measured within a single k-mer segment. I thought this possible discrepancy is to be expected from a worn out pore, hence I'd imagine reads sequenced later in a run to provide worse results using this method.

      In the current version it is not the statement that is unclear, it is the underlying assumption of how this works that I question.

      We thank the reviewer for raising this important point and for the insightful discussion. Our choice of using the poly(A) tail for normalization is based on the working hypothesis that the poly(A) signal reflects overall pore-level variability and provides a stable reference for signal scaling. We find this to be a practical and effective approach in most experimental settings.

      We agree that more sophisticated strategies, such as “general normalization” or iterative fitting to the expected signal (as implemented in Tombo and Nanopolish), could in principle generate a "better" normalization. However, these approaches are significantly more challenging to implement in practice. This is because signal normalization and alignment are mutually dependent processes: baseline estimates for k-mers influence alignment accuracy, while alignment accuracy, in turn, affects baseline calculation. This interdependence becomes even more complex in the presence of RNA modifications, which alter signal distributions and further confound model fitting.

      It is worth noting that this limitation is already evident in our results. As shown in Figure 4B (first and second k-mers), Nanopolish produces more dispersed baselines than SegPore, even for these unmodified k-mers, suggesting inherent limitations in its normalization strategy. Ideally, baselines for the same k-mer should remain highly consistent across different reads.

      In contrast, poly(A)-based normalization offers a simpler and more robust solution that avoids this circular dependency. Because poly(A) sequences are compositionally homogeneous, they enable reliable estimation of scaling parameters without assumptions about k-mer composition or modification state. Regarding the reviewer’s concern about pore instability, we mitigate this issue by including only high-quality, confidently mapped reads in our analysis, which reduces the likelihood of incorporating signals from degraded or “noisy” pores.

      We fully agree that exploring more advanced normalization strategies is an important direction for future work, and we plan to investigate such approaches as the field progresses.

      (8) “In the remainder of this paper, we refer to these resulting events as the output of eventalign analysis or the segmentation task.”

      Picking only one descriptor rather than two alternatives would be easier to follow (and I'd prefer the first).

      Thank you for the suggestion. We have revised the sentence to:

      “In the remainder of this paper, we refer to these resulting events as the output of eventalign analysis, which also represents the final output of the segmentation and alignment task.”

      (9) “Additionally, a complete explanation of how the weighted mean is computed is provided in Section 5.3 of Supplementary Note 1. It is derived from signal points that are assigned to a given 5mer.”

      I believe there's no more mention of a weighted mean, and I don't get any hits when searching for 'weight'. Is that intentional?

      We apologize for the misplacement of the formulas. We have updated Section 5.3 of Supplementary Note 1 to clarify the definition of the weighted mean. Because multiple current signal segments may be aligned to a single k-mer, we computed the weighted mean for each k-mer across these segments, where the weight corresponds to the number of data points assigned to “curr” state in each event.

      (17) Response: We revised the sentence to clarify the selection criteria: "For selected 5mers “that exhibit both a clearly unmodified and a clearly” “modified signal component”, “SegPore reports the modification rate at each site,” “as well as the modification state of that site on individual reads.””

      So is this the same set described on page 13 ln 343 or not?

      “Due to the differences between human (Supplementary Fig. S2A) and mouse (Supplementary Fig. S2B), only six 5mers were found to have m6A annotations in the test data's ground truth (Supplementary Fig. S2C). For a genomic location to be identified as a true m6A modification site, it had to correspond to one of these six common 5mers and have a read coverage of greater than 20.”

      I struggle to interpret the 'For selected 5mers' part, as I'm not sure if this is a selection I'm supposed to already know at this point in the text or if it's a set just introduced here. If the latter, removing the word 'selected' would clear it up for me.

      We apologize for the confusion. What we mean is that when pooling signals aligned to the same k-mer across different genomic locations and reads, only a subset of k-mers exhibit a bimodal distribution — one peak corresponding to the unmodified state and another to the modified state. Other k-mers show a unimodal distribution, making it impossible to reliably estimate modification levels. We refer to the subset of k-mers that display a bimodal distribution as the “selected” k-mers.

      The “selected k-mers” described on page 13, line 343, must additionally have ground truth labels available in both the training and test datasets. There are 10 k-mers with ground truth annotations in the training data and 11 in the test data, and only 6 of these k-mers are shared between the two datasets, therefore only those 6 overlapping k-mers are retained for evaluation. These 6 k-mers satisfy both criteria: (1) exhibiting a bimodal distribution and (2) having ground truth annotations in both training and test sets.

      To improve clarity, we have removed the term “selected” from the sentence.

      (21) "Tombo used the "resquiggle" method to segment the raw signals, and we standardized the segments using the “poly(A)” tail to ensure a fair comparison “(See” “preprocessing section in Materials and Methods)."”

      In the Materials and Methods:

      “The raw signal segment corresponding to the poly(A) tail is used to standardize the raw signal for each read.”

      I cannot find more detailed information here on what the standardization does, do you mean to refer to Supplementary Note 1, Section 3 perhaps?

      Thank you for pointing this out. Yes, the standardization procedure is described in detail in Supplementary Note 1, Section 3. Tombo itself does not segment and align the raw signal on the absolute pA scale, which can result in very large variance in the derived events if the raw signal is used directly. To ensure a fair comparison, we therefore applied the same preprocessing steps to Tombo’s raw signals as we did for SegPore, using only the event boundary information from Tombo while standardizing the signal in the same way.

      We have revised the sentence for clarity as follows:

      “Tombo used the "resquiggle" method to segment the raw signals, but the resulting signals are not reported on the absolute pA scale. To ensure a fair comparison with SegPore, we standardized the segments using the poly(A) tail in the same way as SegPore (See preprocessing section in Materials and Methods).”

      (22A) The table shown does help showing the benchmark is unlikely to be 'cheated'. However I am suprised to see the Avg std for Nanopolish and Tombo going up instead of down, as I'd expect the transition values to increase the std, and hence, removing them should decrease these values. So why does this table show the opposite?

      I believe this table is not in the main text or the supplement, would it not be a good idea to cover this point somewhere in the work?

      Thank you for this insightful comment. In response, we carefully re-examined our analysis and identified a bug in the code related to boundary removal for Nanopolish. We have now corrected this issue and included the updated results in Supplementary Table S1 of the revised manuscript. As shown in the updated table, the average standard deviations decrease after removing the boundary regions for both Nanopolish and Tombo.

      We have now included this table in Supplementary Table S1 in the revised manuscript and added the following clarification:

      “It is worth noting that the data points corresponding to the transition state between two consecutive 5-mers are not included in the calculation of the standard deviation in SegPore’s results in Table 1. However, their exclusion does not affect the overall conclusion, as there are on average only ~6 points per 5-mer in the transition state (see Supplementary Table S1 for more details).”

      (22B) As mentioned in 2), I'm happy there's a clear definition of what is meant but I found the chosen word a bit odd.

      We apologize for the earlier unclear terminology. We now refer to it as the segmentation and alignment task, abbreviated as the segmentation task.

      (23) Reading back I can gather that from the text earlier, but the summation of what is being tested is this:

      “including Tombo, MINES (31), Nanom6A (32), m6Anet, Epinano (33), and CHEUI (20). “

      next, the identifier "Nanopolish+m6Anet" is, aside from the figure itself, only mentioned in the discussion. Adding a line that explains that "Nanopolish+m6Anet" is the default method of running m6Anet and "SegPore+m6Anet" replaces the Nanopolish part for m6Anet with Segpore, rather than jumping straight to "SegPore+m6Anet", would clarify where this identifier came from.

      Thank you for the helpful suggestion. We have added the identifier to the revised manuscript as follows:

      “Given their comparable methodologies and input data requirements, we benchmarked SegPore against several baseline tools, including Tombo, MINES (31), Nanom6A (32), m6Anet, Epinano (33), and CHEUI (20). By default, MINES and Nanom6A use eventalign results generated by Tombo, while m6Anet, Epinano, and CHEUI rely on eventalign results produced by Nanopolish. In Fig. 3C, ‘Nanopolish+m6Anet’ refers to the default m6Anet pipeline, whereas ‘SegPore+m6Anet’ denotes a configuration in which Nanopolish’s eventalign results are replaced with those from SegPore.”

      (24) For completeness I'd expect tickmarks and values on the y-axis as well.

      Thank you for the suggestion. We have updated Figures 3A and 3B in the revised manuscript to include tick marks and values on the y-axis as requested.

      (25) Considering this statement and looking back at figure 3a and 3b, wouldn't this be easier to observe if the histograms/KDE's were plotted with overlap in a single figure?

      We appreciate the suggestion. However, we believe that overlaying Figures 3A and 3B into a single panel would make the visualization cluttered and more difficult to interpret.

      (29) Please change the sentence in the text to make that clear. As it is written now (while it's the same number of motifs, so one might guess it) it does not seem to refer to that particular set of motifs and could be a new selection of 6 motifs.

      We appreciate the suggestion and have revised the sentence for clarity as follows:

      “We evaluated m6A predictions using two approaches: (1) SegPore’s segmentation results were fed into m6Anet, referred to as SegPore+m6Anet, which works for all DRACH motifs and (2) direct m6A predictions from SegPore’s Gaussian Mixture Model (GMM), which is limited to the six selected 5-mers shown in Supplementary Fig. S2C that exhibit clearly separable modified and unmodified components in the GMM (see Materials and Methods for details). ”

      (31) I think we have a different interpretation of the word 'leverage', or perhaps what it applies to. I'd say it leverages the jiggling if there's new information drawn from the jiggling behaviour. It's taking it into account if it filters for it. The HHMM as far as I understand tries to identify the jiggles, and ignore their values for the segmentation etc. So while one might see this as an approach that "leverages the hypothesis", I don't see how this HHMM "leverages the jiggling property" itself.

      Thank you for the helpful suggestion. We have replaced the word “leverages” with “models” in the revised manuscript.

      New points

      pg6ln166: “…we extract the aligned raw signal segment and reference sequence segment from Nanopolish's events [...] we extract the raw signal segment corresponding to the transcript region for each input read based on Nanopolish's poly(A) detection results.”

      It is not clear as to why this different approach is applied for these two cases in this part of the text.

      Thank you for pointing this out. The two approaches refer to different preprocessing strategies for in vivo and in vitro data.

      For in vivo data, a large proportion of reads do not span the full-length transcript and often map only to a portion of the reference sequence. Moreover, because a single gene can generate multiple transcript isoforms, a read may align equally well to several possible transcripts. Therefore, we extract only the raw signal segment that corresponds to the mapped portion of the transcript for each read.

      In contrast, for in vitro data, the transcript sequence is known precisely. As a result, we can directly extract all raw signals following the poly(A) tail and align them to the complete reference sequence.

      pg10ln259: An important distinction from classical global alignment algorithms is that one or multiple base blocks may align with a single 5mer.”

      If there was usually a 1:1 mapping the alignment algorithm would be more or less a direct match, so I think the multiple blocks aligning to a 5mer thing is actually quite common.

      Thank you for the comment. The “classical global alignment algorithm” here refers to the Needleman–Wunsch algorithm used for sequence alignment. Our intention was to highlight the conceptual difference between traditional sequence alignment and nanopore signal alignment. In classical sequence alignment, each base typically aligns to a single position in the reference. In contrast, in nanopore signal alignment, one or multiple signal segments — corresponding to varying dwell times of the motor protein — can align to a single 5-mer.

      We have revised the sentence as follows:

      “An important distinction from classical global alignment algorithms (Needleman–Wunsch algorithm)……”

      pg13ln356: "dwell time" is not defined or used before, I guess it's effectively the number of raw samples per segment but this should be clarified.

      Thank you for pointing this out. We have now added a clear definition of dwell time in the text as follows:

      "such as the normalized mean μ_i, standard deviation σ_i, dwell time l_i (number of data points in the event)."

      pg13ln358: “Feature vectors from 80% of the genomic locations were used for training, while the remaining 20% were set aside for validation.”

      I assume these are selected randomly but this is not explicitly stated here and should be.

      Yes, they are randomly selected. We have revised the sentence as follows:

      “Feature vectors from a randomly selected 80% of the genomic locations were used for training, while the remaining 20% were set aside for validation.”

      pg18ln488: The manuscript now evaluates RNA004 and compares against f5c and Uncalled4. It mentions the differences between RNA004 and RNA002, namely kmer size and current levels, but does not explain where the starting reference model values for the RNA004 model come from: In pg18ln492 they state "RNA004 provides reference values for 9mers", then later they seem to use a 5mer parameter table (pg19ln508), are they re-using the same table from RNA002 or did they create a 5mer table from the 9mer reference table?

      We apologize for the confusion. The reference model table for RNA004 9-mers is obtained from f5c (the array named ‘rna004_130bps_u_to_t_rna_9mer_template_model_builtin_data’in  https://raw.githubusercontent.com/hasindu2008/f5c/refs/heads/master/src/model.h).

      Author response image 1.

      We have revised the subsection header “5-mer parameter table” in the Method to “5-mer & 9-mer parameter table” to highlight this and added a paragraph about how to obtain the 9-mer parameter table:

      “In the RNA004 data analysis (Table 2), we obtained the 9-mer parameter table from the source code of f5c (version 1.5). Specifically, we used the array named ‘rna004_130bps_u_to_t_rna_9mer_template_model_builtin_data’ from the following file: https://raw.githubusercontent.com/hasindu2008/f5c/refs/heads/master/src/model.h (accessed on 17 October 2025).”

      Also, in page 18 line 195, we added the following sentence:

      “The 9-mer parameter table in pA scale for RNA004 data provided by f5c (see Materials and Methods) was used in the analysis.”

      pg19ln520: “Additionally, due to the differences of the k-mer motifs between human and mouse (Supplementary Fig. S2), six shared 5mers were selected to demonstrate SegPore's performance in modification prediction directly.”

      "the differences" - in occurrence rates, as I gather from the supplementary figure, but it would be good to explicitly state it in this sentence itself too.

      Thank you for the helpful suggestion. We agree that the original sentence was vague. The main reason for selecting only six 5-mers is the difference in the availability of ground truth labels for specific k-mer motifs between human and mouse datasets. We have revised the sentence accordingly:

      “Additionally, due to the differences in the availability of ground truth labels for specific k-mer motifs between human and mouse (Supplementary Fig. S2), six shared 5-mers were selected to directly demonstrate SegPore’s performance in modification prediction.”

      pg24ln654: “SegPore codes current intensity levels”

      "codes" is meant to be "stores" I guess? Perhaps "encodes"?

      Thank you for the suggestion. We have now replaced it with “encodes” in the revised manuscript.

      Lastly, looking at the feedback from the other reviewers comment:

      The 'HMM' mentioned in line 184 looks fine to me, the HHMM is 2 HMM's in a hierarchical setup and the text now refers to one of these HMM layers. If this is to be changed it would need to state the layer (e.g. "the outer HHMM layer") throughout the text instead.

      We agree with this assessment and believe that the term “inner HMM” is accurate in this context, as it correctly refers to one of the two HMM layers within the HHMM structure. Therefore, we have decided to retain the current terminology.

      Reviewer #3 (Recommendations for the authors):

      I recommend the publication of this manuscript, provided that the following comments are addressed.

      Page 5, Preprocessing: You comment that the poly(A) tail provides a stable reference that is crucial for the normalisation of all reads. How would this step handle reads that have interrupted poly(A) tails (e.g. in the case of mRNA vaccines that employ a linker sequence)? Or cell types that express TENT4A/B, which can include transcripts with non-A residues in the poly(A) tail: https://www.science.org/doi/full/10.1126/science.aam5794.

      It depends on Nanopolish’s ability to reliably detect the poly(A) tail. In general, the poly(A) region produces a long stretch of signals fluctuating around a current level of ~108.9 pA (RNA002) with relatively stable variation, which allows it to be identified and used for normalization.

      For in vivo data, if the poly(A) tail is interrupted (e.g., due to non-A residues or linker sequences), two scenarios are possible:

      (1) The poly(A) tail may not be reliably detected, in which case the corresponding read will be excluded from our analysis.

      (2) Alternatively, Nanopolish may still recognize the initial uninterrupted portion of the poly(A) signal, which is typically sufficient in length and stability to be used for signal normalization.

      For in vitro data, the poly(A) tails are uninterrupted, so this issue does not arise.

      All analyses presented in this study are based exclusively on reads with reliably detected poly(A) tails.

      Page 7, 5mer parameter table: r9.4_180mv_70bps_5mer_RNA is an older kmer model (>2 years). How does your method perform with the newer RNA kmer models that do permit the detection of multiple ribonucleotide modifications? Addressing this comment would be beneficial, however I understand that it would require the generation of new data, as limited RNA004 datasets are available in the public domain.

      “r9.4_180mv_70bps_5mer_RNA” is the most widely used k-mer model for RNA002 data. Regarding the newer k-mer models, we believe the reviewer is referring to the “modification basecalling” models available in Dorado, which are specifically designed for RNA004 data. At present, SegPore can perform RNA modification estimation only on RNA002 data, as this is the platform for which suitable training data and ground truth annotations are available. Evaluating SegPore’s performance with the newer RNA004 modification models would require new datasets containing known modification sites generated with RNA004 chemistry. Since such data are currently unavailable, we have not yet been able to assess SegPore under these conditions. This represents an important future direction for extending and validating our method.

      The Methods and Results sections contain redundant information -please streamline the information in these sections and reduce the redundancy.

      We thank the reviewer for this suggestion and acknowledge that there is some overlap between the Methods and Results sections. However, we feel that removing these parts could compromise the clarity and readability of the manuscript, especially given that Reviewer 2 emphasized the need for clearer explanations. We therefore decided to retain certain methodological descriptions in the Results section to ensure that key steps are understandable without requiring the reader to constantly cross-reference the Methods.

      Minor comments

      Please be consistent when referring to k-mers and 5-mers (sometimes denoted as 5mers - please change to 5-mers throughout).

      We have revised the manuscript to ensure consistency and now use “5-mers” throughout the text.

      Introduction

      Lines 80 - 112: Please condense this section to roughly half the length (1-2 paragraphs). In general, the results described in the introduction should be very brief, as they are described in full in the results section.

      Thank you for the suggestion. We have condensed the original three paragraphs into a single, more concise paragraph as follows:

      "SegPore is a novel tool for direct RNA sequencing (DRS) signal segmentation and alignment, designed to overcome key limitations of existing approaches. By explicitly modeling motor protein dynamics during RNA translocation with a Hierarchical Hidden Markov Model (HHMM), SegPore segments the raw signal into small, biologically meaningful fragments, each corresponding to a k-mer sub-state, which substantially reduces noise and improves segmentation accuracy. After segmentation, these fragments are aligned to the reference sequence and concatenated into larger events, analogous to Nanopolish’s “eventalign” output, which serve as the foundation for downstream analyses. Moreover, the “eventalign” results produced by SegPore enhance interpretability in RNA modification estimation. While deep learning–based tools such as m6Anet classify RNA modifications using complex, non-transparent features (see Supplementary Fig. S5), SegPore employs a simple Gaussian Mixture Model (GMM) to distinguish modified from unmodified nucleotides based on baseline current levels. This transparent modeling approach improves confidence in the predictions and makes SegPore particularly well-suited for biological applications where interpretability is essential."

      Line 104: Please change "normal adenosine" to "adenosine".

      We have revised the manuscript as requested and replaced all instances of “normal adenosine” with “adenosine” throughout the text.

      Materials and Methods

      Line 176: Please reword "...we standardize the raw current signals across reads, ensuring that the mean and standard deviation of the poly(A) tail are consistent across all reads." To "...we standardize the raw current signals for each read, ensuring that the mean and standard deviation are consistent across the poly(A) tail region."

      We have changed sentence as requested.

      “Since the poly(A) tail provides a stable reference, we standardize the raw current signals for each read, ensuring that the mean and standard deviation are consistent across the poly(A) tail region.”

      Line 182: Please describe the RNA translocation hypothesis, as this is the first mention of it in the text. Also, why is the Hierachical Hidden Markov model perfect for addressing the RNA translocation hypothesis? Explain more about how the HHMM works and why it is a suitable choice.

      We have revised the sentence as requested:

      “The RNA translocation hypothesis (see details in the first section of Results) naturally leads to the use of a hierarchical Hidden Markov Model (HHMM) to segment the raw current signal.”

      The motivation of the HHMM is explained in detail in the the first section “RNA translocation hypothesis” of Results. As illustrated in Figure 2, the sequencing data suggest that RNA molecules may translocate back and forth (often referred to as jiggling) while passing through the nanopore. This behavior results in complex current fluctuations that are challenging to model with a simple HMM. The HHMM provides a natural framework to address this because it can model signal dynamics at two levels. The outer HMM distinguishes between two major states — base states (where the signal corresponds to a stable sub-state of a k-mer) and transition states (representing transitions from one base state to the next). Within each base state, an inner HMM models finer signal variation using three states — “curr”, “prev”, and “next” — corresponding to the current k-mer sub-states and its neighboring k-mer sub-states. This hierarchical structure captures both the stable signal patterns and the stochastic translocation behavior, enabling more accurate and biologically meaningful segmentation of the raw current signal.

      Line 184: do you mean HHMM? Please be consistent throughout the text.

      As explained in the previous response, the HHMM consists of two layers: an outer HMM and an inner HMM. The term “HMM” in line 184 is meant to be read together with “inner” at the end of line 183, forming the phrase “inner HMM.” It seems the reviewer may have overlooked this when reading the text.

      Line 203: please delete: "It is obviously seen that".

      We have removed the phrase “It is obviously seen that” from the sentence as requested. The revised sentence now reads:

      “The first part of Eq. 2 represents the emission probabilities, and the second part represents the transition probabilities.”

      Line 314, GMM for 5mer parameter table re-estimation: "Typically, the process is repeated three to five times until the5mer parameter table stabilizes." How is the stabilisation of the 5mer parameter table quantified? What is a reasonable cut-off that would demonstrate adequate stabilisation of the 5mer parameter table? Please add details of this to the text.

      We have revised the sentence to clarify the stabilization criterion as follows:

      “Typically, the process is repeated three to five times until the 5-mer parameter table stabilizes (when the average change of mean values of all 5-mers is less than 5e-3).”

      Results

      Line 377: Please edit to read "Traditional base calling algorithms such as Guppy and Albacore assume that the RNA molecule is translocated unidirectionally through the pore by the motor protein."

      We have revised the sentence as:

      “In traditional basecalling algorithms such as Guppy and Albacore, we implicitly assume that the RNA molecule is translocated through the pore by the motor protein in a monotonic fashion, i.e., the RNA is pulled through the pore unidirectionally.”

      Line 555, m6A identification at the site level: "For six selected m6A motifs, SegPore achieved an ROC AUC of 82.7% and a PR AUC of 38.7%, earning the third best performance compared with deep leaning methods m6Anet and CHEUI (Fig. 3D)." So SegPore performs third best of all deep learning methods. Do you recommend its use in conjunction with m6Anet for m6A detection? Please clarify in the text. This will help to guide users to possible best practice uses of your software.

      Thank you for the suggestion. We have added a clarification in the revised manuscript to guide users.

      “For practical applications, we recommend taking the intersection of m6A sites predicted by SegPore and m6Anet to obtain high-confidence modification sites, while still benefiting from the interpretability provided by SegPore’s predictions.”

      Figures.

      Figure 1A please refer to poly(A) tail, rather than polyA tail.

      We have updated it to poly(A) tail in the revised manuscript.

    1. eLife Assessment

      This valuable study provides insights into the role of Pten mutations in SHH-medulloblastoma, by using mouse models to resolve the effects of heterozygous vs homozygous mutations on proliferation and cell death throughout tumorigenesis. The experiments presented are convincing, with rigorous quantifications and orthogonal experimentation provided throughout, and the models employing sporadic oncogene induction, rather than EGL-wide genetic modifications, represent an advancement in experimental design. However, the study remains limited, such that the biological conclusions do not extend greatly from those in the extant literature. This could be addressed with additional experimentation focused on cell cycle kinetic changes at early stages, as well as greater characterization of macrophage phenotypes (e.g., microglia vs circulating monocytes). The work will be of interest to medical biologists studying general cancer mechanisms, as the function of Pten may be similar across tumor types.

    2. Reviewer #1 (Public review):

      Summary:

      This paper investigates how Pten loss influences the development of medulloblastoma using mouse models of Shh-driven MB. Previous studies have shown that Pten heterozygosity can accelerate tumorigenesis in models where the entire GNP compartment has MB-promoting mutations, raising questions about how Pten levels and context interact, especially when cancer-causing mutations are more sporadic. Here, the authors create an allelic series combining sporadic, cell-autonomous induction of SmoM2 with Pten loss in granule neuron progenitors. In their models, Pten heterozygosity does not significantly impact tumor development, whereas complete Pten loss accelerates tumour onset. Notably, Pten-deficient tumours accumulate differentiated cells, reduced cell death, and decreased macrophage infiltration. At early stages, before tumour establishment, they observe EGL hyperplasia and more pre-tumour cells in S phase, leading them to suggest that Pten loss initially drives proliferation but later shifts towards differentiation and accumulation of death-resistant, postmitotic cells. Overall, this is a well-executed and technically elegant study that confirms and extends earlier findings with more refined models. The phenotyping is strong, but the mechanistic insight is limited, especially with respect to dosage effects and macrophage biology.

      Strengths:

      The work is carefully executed, and the models-using sporadic oncogene induction rather than EGL-wide genetic manipulations-represent an advance in experimental design. The deeper phenotyping, including single-cell RNA-seq and target validation, adds rigor.

      Weaknesses:

      The biological conclusions largely confirm findings from previous studies (Castellino et al, 2010; Metcalf et al, 2013), showing that germline or conditional Pten heterozygosity accelerates tumorigenesis, generates tumors with a very similar phenotype, including abundant postmitotic cells, and reduced cell death.

      The second stated goal - to understand why Pten dosage might matter - remains underdeveloped. The difference between earlier models using EGL-wide SmoA1 or Ptch loss versus sporadic cell-autonomous SmoM2 induction and Pten loss in this study could reflect model-specific effects or non-cell-autonomous contributions from Pten-deficient neighbouring cells in the EGL, for example. However, the study does not explore these possibilities. For instance, examining germline Pten loss in the sporadic SmoM2 context could have provided insight into whether dosage effects are cell-autonomous or dependent on the context.

      The observations on macrophages are intriguing but preliminary. The reduction in Iba1+ cells could reflect changes in microglia, barrier-associated macrophages, or infiltrating peripheral macrophages, but these populations are not distinguished. Moreover, the functional relevance of these immune changes for tumor initiation or progression remains unexplored.

    3. Reviewer #2 (Public review):

      The authors sought to answer several questions about the role of the tumor suppressor PTEN in SHH-medulloblastoma formation. Namely, whether Pten loss increases metastasis, understanding why Pten loss accelerates tumor growth, and the effect of single-copy vs double-copy loss on tumorigenesis. Using an elegant mouse model, the authors found that Pten mutations do not increase metastasis in a SmoD2-driven SHH-medulloblastoma mouse model, based on extensive characterization of the presence of spinal cord metastases. Upon examining the cellular phenotype of Pten-null tumors in the cerebellum, the authors made the interesting and puzzling observation that Pten loss increased the differentiation state of the tumor, with fewer cycling cells, seemingly in contrast to the higher penetrance and decreased latency of tumor growth.

      The authors then examined the rate of cell death in the tumor. Interestingly, Pten-null tumors had fewer dying cells, as assessed by TUNEL. In addition, the tumors expressed differentiation markers NeuN and SyP, which are rare in SHH-MB mouse models. This reduction in dying cells is also evident at earlier stages of tumor growth. By looking shortly after Pten-loss induction, the authors found that Pten loss had an immediate impact on increasing the proliferative state of GCPs, followed by enhancing the survival of differentiated cells. These two pro-tumor features together account for the increased penetrance and decreased latency of the model. While heterozygous loss of Pten also promoted proliferation, it did not protect against cell death.

      Interestingly, loss of Pten alone in GCPs caused an increase in cerebellar size throughout development. The authors suggest that Pten normally constrains GCP proliferation, although they did not check whether reduced cell death is also contributing to cerebellum size.

      Lastly, the authors examined macrophage infiltration and found that there was less macrophage infiltration in the Pten-null tumors. Using scRNA-seq, they suggest that the observed reduction in macrophages might be due to an immunosuppressive tumor microenvironment.

      This mouse model will be of high relevance to the medulloblastoma community, as current models do not reflect the heterogeneity of the disease. In addition, the elegant experimentation into Pten function may be relevant to cancer biologists outside of the medulloblastoma field.

      Strengths:

      The in-depth characterisation of the mouse model is a major strength of the study, including multiple time points and quantifications. The single-cell sequencing adds a nice molecular feature, and this dataset may be relevant to other researchers with specific questions of Pten function.

      Weaknesses:

      One weakness of the study was the examination of the macrophage phenotype, which did not include quantification (only single images), so it is difficult to assess whether this reduction of macrophages holds true across multiple samples. Future studies will also be needed to assess whether Pten-mutated patient medulloblastomas also have a differentiation phenotype, but this is difficult to assess given the low number of samples worldwide.

    1. eLife Assessment

      Pinho et al use in vivo calcium imaging and chemogenetic approaches to examine the involvement of hippocampal sub-regions across the different stages of a sensory preconditioning task in mice. They find evidence for sensory preconditioning in male mice. They also find that, in these mice, CaMKII-positive neurons in the dorsal hippocampus encode the audio-visual association that forms in stage 1 of the task. The evidence in support of these findings is convincing. The important study will be of interest to researchers in the fields of learning and memory and/or hippocampus function.

    2. Reviewer #1 (Public review):

      Summary:

      The study by Pinho et al. presents a novel behavioral paradigm for investigating higher-order conditioning in mice. The authors developed a task that creates associations between light and tone sensory cues, driving mediated learning. They observed sex differences in task acquisition, with females demonstrating faster mediated learning compared to males. Using fiber photometry and chemogenetic tools, the study reveals that the dorsal hippocampus (dHPC) plays a central role in encoding mediated learning. These findings are crucial for understanding how environmental cues, which are not directly linked to positive/negative outcomes, contribute to associative learning. Overall, the study is well-designed, with robust results, and the experimental approach aligns with the study's objectives.

      Strengths:

      The authors develop a robust behavioral paradigm to examine higher-order associative learning in mice.

      They discover a sex-specific component influencing mediated learning.

      Using fiber photometry and chemogenetic techniques, the authors identify the dorsal hippocampus but not the ventral hippocampus, plays a crucial for encoding mediated learning.

    3. Reviewer #2 (Public review):

      Pinho et al. developed a new auditory-visual sensory preconditioning procedure in mice. They observed sex differences in this task, with male, but not female mice acquiring preconditioned fear. Using photometry, they observed activation of the dorsal and ventral hippocampus during sensory preconditioning (tone + light) and direct conditioning (light + shock). Finally, the authors combined their sensory preconditioning task with DREADDs. They found that inhibition of CamKII-positive cells in the dorsal hippocampus, but not the ventral hippocampus, during the preconditioning phase impaired the formation of sensory preconditioned fear. However, inhibiting the same cells during phase two (light + shock) had no effect.

      Strengths:

      (1) The authors develop a robust auditory-visual sensory preconditioning protocol in male mice. Research on the neurobiology of sensory preconditioning has primarily used rats as subjects. The development of a mouse protocol will be very beneficial to the field, allowing researchers to take advantage of the many transgenic mouse lines.

      (2) They find sex differences in the acquisition of sensory preconditioning, raising the importance of adapting behavioral procedures to sex

      (3) They identify the dorsal (but not ventral) hippocampus as a key region for the integration of sensory information during the preconditioning phase, furthering our understanding of the role of the hippocampus in integrating experience.

      Comments on the revisions:

      Thank you for addressing my concerns in considerable detail. I have no more suggestions for the authors.

    4. Reviewer #3 (Public review):

      Summary:

      Pinho et al., investigated the role of the dorsal VS ventral hippocampus and gender differences in mediated learning. While previous studies already established the engagement of the hippocampus in sensory preconditioning, the authors here took advantages of freely-moving fiber photometry recording and chemogenetics to observe and manipulate sub-regions of the hippocampus (drosal VS ventral) in a cell-specific manner. Importantly, the authors validated the sensory preconditioning procedure in male mice. The authors found no evidence of sensory preconditioning in female mice, but rather a generalization effect, stressing the importance of gender differences in fear learning. After validation of a sensory preconditioning procedure in male mice using light and tone neutral stimuli and a mild foot shock as the unconditioned stimulus, the authors used fiber photometry to record from all neurons VS parvalbumin_positive_only neurons in the dorsal hippocampus or ventral hippocampus of male mice during both preconditioning and conditioning phases. They found an increased activity of all neurons, PV+_only neurons, and CAMKII+ neurons in both sub-regions of the hippocampus during both preconditioning and conditioning phases. Finally, the authors found that chemogenetic inhibition of CaMKII+ neurons (but not PV+_only neurons) in the dorsal (but not ventral) hippocampus specifically prevented the formation of an association between the two neutral stimuli (i.e., light and tone cues). This manipulation had no effect on the direct association between the light cue and the mild foot shock. This set of data (1) validates sensory preconditioning in male mice, and stresses the importance of taking gender effect into account; (2) validates the recruitment of dorsal and ventral hippocampi during preconditioning and conditioning phases; (3) and further establishes the specific role of CaMKII+ neurons in the dorsal hippocampus, but not ventral hippocampus, in the formation of an association between two neutral stimuli, but not between a neutral-stimulus and a mild foot shock.

      Strengths:

      The authors developed a sensory preconditioning procedure in male mice to investigate mediated learning using light and tone cues as neutral stimuli, and a mild foot shock as the unconditioned stimulus. They provide evidence of a gender effect in the formation of light-cue association. The authors took advantage of fiber-photometry and chemogenetics to target sub-regions of the hippocampus, in a cell-specific manner and investigate their role during different phases of a sensory conditioning procedure, and developed a DeepLabCut-based strategy to assess freezing fear responses.

      Weaknesses:

      The authors went further than previous studies by investigating the role of sub-regions the hippocampus in mediated learning, however, there are a few weaknesses that should be addressed in future studies:

      (1) This study found a generalization effect in female mice only. While the authors attempted to neutralize this effect, the mechanism underlying this gender effect and whether female mice can display evidence for mediated learning has yet to be determined.

      (2) One of the main effects from which derives the conclusion of this study (i.e., deficit of mediated learning in male mice when CAMKII+ neurons are inhibited in the dorsal HPC during the preconditioning phase) lies in the absence of a significant difference of the freezing response before and during the tone cue presentation when CAMKII+ are chemogenetically inhibited during the Probe Test Tone phase (cf. Fig. 4 Panel B, DPCd group). The fear response before the tone cue presentation in this group (DPCd) seems higher than in Controls_d and DPTd groups and could have masked a mediated learning effect.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review): 

      Summary: 

      The study by Pinho et al. presents a novel behavioral paradigm for investigating higher-order conditioning in mice. The authors developed a task that creates associations between light and tone sensory cues, driving mediated learning. They observed sex differences in task acquisition, with females demonstrating faster-mediated learning compared to males. Using fiber photometry and chemogenetic tools, the study reveals that the dorsal hippocampus (dHPC) plays a central role in encoding mediated learning. These findings are crucial for understanding how environmental cues, which are not directly linked to positive/negative outcomes, contribute to associative learning. Overall, the study is well-designed, with robust results, and the experimental approach aligns with the study's objectives. 

      Strengths: 

      (1) The authors develop a robust behavioral paradigm to examine higher-order associative learning in mice. 

      (2) They discover a sex-specific component influencing mediated learning, with females exhibiting enhanced learning abilities. 

      (3) Using fiber photometry and chemogenetic techniques, the authors identify the dorsal hippocampus but not the ventral hippocampus, which plays a crucial for encoding mediated learning.

      We appreciate the strengths highlighted by the Reviewer and the valuable and complete summary of our work.

      Weaknesses: 

      (1) The study would be strengthened by further elaboration on the rationale for investigating specific cell types within the hippocampus.  

      We thank the Reviewer for highlighting this important point. In the revised manuscript, we have added new information (Page 11, Lines 27-34) to specifically explain the rational of studying the possible cell-type specific involvement in sensory preconditioning.

      (2) The analysis of photometry data could be improved by distinguishing between early and late responses, as well as enhancing the overall presentation of the data.  

      According to the Reviewer comment, we have included new panels in Figure 3E and the whole Supplementary Figure 4, which separates the photometry data across different preconditioning and conditioning sessions, respectively. Overall, this data suggests that there are no major changes on cell activity in both hippocampal regions during the different sessions as similar light-tone-induced enhancement of activity is observed. These findings have been incorporated in the Results Section (Page 12, Lines 13-15, 19-20 and 35-36).

      (3) The manuscript would benefit from revisions to improve clarity and readability.

      Based on the fair comment, we have gone through the text to increase clarity and readability.

      Reviewer #2 (Public review): 

      Summary: 

      Pinho et al. developed a new auditory-visual sensory preconditioning procedure in mice and examined the contribution of the dorsal and ventral hippocampus to learning in this task. Using photometry they observed activation of the dorsal and ventral hippocampus during sensory preconditioning and conditioning. Finally, the authors combined their sensory preconditioning task with DREADDs to examine the effect of inhibiting specific cell populations (CaMKII and PV) in the DH on the formation and retrieval/expression of mediated learning. 

      Strengths: 

      The authors provide one of the first demonstrations of auditory-visual sensory preconditioning in male mice. Research on the neurobiology of sensory preconditioning has primarily used rats as subjects. The development of a robust protocol in mice will be beneficial to the field, allowing researchers to take advantage of the many transgenic mouse lines. Indeed, in this study, the authors take advantage of a PV-Cre mouse line to examine the role of hippocampal PV cells in sensory preconditioning. 

      We acknowledge the Reviewer´s effort and for highlighting the strengths of our work.

      Weaknesses: 

      (1) The authors report that sensory preconditioning was observed in both male and female mice. However, their data only supports sensory preconditioning in male mice. In female mice, both paired and unpaired presentations of the light and tone in stage 1 led to increased freezing to the tone at test. In this case, fear to the tone could be attributed to factors other than sensory preconditioning, for example, generalization of fear between the auditory and visual stimulus.

      We thank the comment raised by the Reviewer. At first, we were hypothesizing that female mice were somehow able to associate light and tone although they were presented separately during the preconditioning sessions. Thus, we designed new experiments (shown in Supplementary Figure 2D) to test if we would observe data congruent with our initial hypothesis or with fear generalization as proposed by the reviewer. We have performed a new experiment comparing a Paired group with two additional control groups that are (i) an Unpaired group where we increased the time between the light and tone presentations and (ii) an experimental group where the light was absent during the conditioning. Clearly, the new results indicate the presence of fear generalization in female mice aswe found a significant cue-induced increase on freezing responses in all the experimental groups tested. In accordance with the Reviewer’s suggestion, we can conclude that mediated learning is not correctly observed in female mice using the protocol described (i.e. with 2 conditioning sessions). All these new results forced us to reorganize the structure and the figures of the manuscript to focus more in male mice in the Main Figures whereas showing the data with female mice in Supplementary Figures. Overall, our data clearly revealed the necessity to have adapted behavioral protocols for each sex demonstrating sex differences in sensory preconditioning, which was added in the Discussion Section (Page 15, lines 12-37).

      (2) In the photometry experiment, the authors report an increase in neural activity in the hippocampus during both phase 1 (sensory preconditioning) and phase 2 (conditioning). In the subsequent experiment, they inhibit neural activity in the DH during phase 1 (sensory preconditioning) and the probe test, but do not include inhibition during phase 2 (conditioning). It was not clear why they didn't carry forward investigating the role of the hippocampus during phase 2 conditioning. Sensory preconditioning could occur due to the integration of the tone and shock during phase two, or retrieval and chaining of the tonelight-shock memories at test. These two possibilities cannot be differentiated based on the data. Given that we do not know at which stage the mediate learning is occurring, it would have been beneficial to additionally include inhibition of the DH during phase 2. 

      Following the Reviewer’s valuable comment, we have conducted a new experiment where we have chemogenetically inhibited the CaMKII-positive neurons of the dHPC during the conditioning to explore their involvement in mediated learning formation. Notably, the inhibition of principal neurons of the dHPC during conditioning does not impair the formation ofthe mediated learning in our hands. These new results are now shown in Supplementary Figure 7G and added in the Results section (Page 13, Lines 19-23).

      (3) In the final experiment, the authors report that inhibition of the dorsal hippocampus during the sensory preconditioning phase blocked mediated learning. While this may be the case, the failure to observe sensory preconditioning at test appears to be due more to an increase in baseline freezing (during the stimulus off period), rather than a decrease in freezing to the conditioned stimulus. Given the small effect, this study would benefit from an experiment validating that administration of J60 inhibited DH cells. Further, given that the authors did not observe any effect of DREADD inhibition in PV cells, it would also be important to validate successful cellular silencing in this protocol.  

      According to the Reviewer comments, we have performed new experiments to validate the use of J60 to inhibit hippocampal cells that are shown in Supplementary Figure 7 E-F for CaMKII-positive neurons, in which J60 administration tends to decrease the frequency of calcium events both in the dHPC and vHPC. Furthermore, in Supplementary Figure 8 B-C we show that J60 is also able to modify calcium events in PV-positive interneurons. Although,the best method to validate the use of DREADD (i.e. to inhibit hippocampal cell activity) could be electrophysiology recordings, we lack this technique in our laboratory. Thus, in order to adress the reviewer comment, we decided to combine the DREADD modulation through J60 administration with photometry recordings, where several tendencies are confirmed. In addition, a similar approach has been used in another preprint of the lab (https://doi.org/10.1101/2025.08.29.673009), where there is an increase of phospho-PDH, a marker of neuronal inhibition upon J60 administration in the dHPC, as well as in other experiments conducted from a collaborator lab where they were able to observe a modulation of SOM-positive interneurons activity upon J60 administration (PhD defense of Miguel Sabariego, University Pompeu Fabra, Barcelona). 

      Reviewer #3 (Public review): 

      Summary: 

      Pinho et al. investigated the role of the dorsal vs ventral hippocampus and the gender differences in mediated learning. While previous studies already established the engagement of the hippocampus in sensory preconditioning, the authors here took advantage of freely-moving fiber photometry recording and chemogenetics to observe and manipulate sub-regions of the hippocampus (dorsal vs. ventral) in a cell-specific manner. The authors first found sex differences in the preconditioning phase of a sensory preconditioning procedure, where males required more preconditioning training than females for mediating learning to manifest, and where females displayed evidence of mediated learning even when neutral stimuli were never presented together within the session. 

      After validation of a sensory preconditioning procedure in mice using light and tone neutral stimuli and a mild foot shock as the unconditioned stimulus, the authors used fiber photometry to record from all neurons vs. parvalbumin_positive_only neurons in the dorsal hippocampus or ventral hippocampus of male mice during both preconditioning and conditioning phases. They found increased activity of all neurons, as well as PV+_only neurons in both sub-regions of the hippocampus during both preconditioning and conditioning phases. Finally, the authors found that chemogenetic inhibition of CaMKII+ neurons in the dorsal, but not ventral, hippocampus specifically prevented the formation of an association between the two neutral stimuli (i.e., light and tone cues), but not the direct association between the light cue and the mild foot shock. This set of data: (1) validates the mediated learning in mice using a sensory preconditioning protocol, and stresses the importance of taking sex effect into account; (2) validates the recruitment of dorsal and ventral hippocampi during preconditioning and conditioning phases; and (3) further establishes the specific role of CaMKII+ neurons in the dorsal but not ventral hippocampus in the formation of an association between two neutral stimuli, but not between a neutralstimulus and a mild foot shock. 

      Strengths: 

      The authors developed a sensory preconditioning procedure in mice to investigate mediated learning using light and tone cues as neutral stimuli, and a mild foot shock as the unconditioned stimulus. They provide evidence of a sex effect in the formation of light-cue association. The authors took advantage of fiber-photometry and chemogenetics to target sub-regions of the hippocampus, in a cell-specific manner and investigate their role during different phases of a sensory conditioning procedure. 

      We thank the Reviewer for the extensive summary of our work and for giving interesting value to some of our findings.

      Weaknesses: 

      The authors went further than previous studies by investigating the role of sub-regions of the hippocampus in mediated learning, however, there are several weaknesses that should be noted: 

      (1) This work first validates mediated learning in a sensory preconditioning procedure using light and tone cues as neutral stimuli and a mild foot shock as the unconditioned stimulus, in both males and females. They found interesting sex differences at the behavioral level, but then only focused on male mice when recording and manipulating the hippocampus. The authors do not address sex differences at the neural level. 

      We appreciate the comment of the Reviewer. Indeed, thanks to other Reviewer comments during this revision process (see Point 1 of Reviewer #2), we performed an additional experiment that reveals that using the described protocol in female mice we observed fear generalization rather than mediated learning responding. This data pointed to the need of sex-specific changes in the behavioral protocols to measure sensory preconditioning. The revised version of the manuscript, although highlighting these sex differences in behavioral performance (see Supplementary Figure 2), is more focused in male mice and, accordingly, all photometry or chemogenetic experiments are performed using male mice. In future studies, once we are certain to have a sensory preconditioning paradigm working in female mice, it will be very interesting to study if the same hippocampal mechanisms mediating this behavior in male mice are also observed in female mice.  

      (2) As expected in fear conditioning, the range of inter-individual differences is quite high. Mice that didn't develop a strong light-->shock association, as evidenced by a lower percentage of freezing during the Probe Test Light phase, should manifest a low percentage of freezing during the Probe Test Tone phase. It would interesting to test for a correlation between the level of freezing during mediated vs test phases. 

      Thanks to the comment raised by the reviewer, we generated a new set of data correlating mediated and direct fear responses. As it can be observed in Supplementary Figure 3, there is a significant correlation between mediated and direct learning in male mice (i.e. the individuals that freeze more in the direct learning test, correlate with the individuals that express more fear response in the mediated learning test). In contrast, this correlation is absent in female mice, further confirming what we have explained above. We have highlighted this new analysis in the Results section (Page 11, Lines 20-24).

      (3) The use of a synapsin promoter to transfect neurons in a non-specific manner does not bring much information. The authors applied a more specific approach to target PV+ neurons only, and it would have been more informative to keep with this cell-specific approach, for example by looking also at somatostatin+ inter-neurons. 

      The idea behind using a pan neuronal promoter was to assess in general terms how neuronal activity in the hippocampus is engaged during different phases of the lighttone sensory preconditioning. However, the comment of the Reviewer is very pertinent and, as suggested, we have generated some new data targeting CaMKII-positive neurons (see Point 4 below). Finally, although it could be extremely interesting, we believe that targeting different interneuron subtypes is out of the scope of the present work. However, we have added this in the Discussion Section as a future perspective/limitation of our study (Page 17, Lines 9-24).   

      (4) The authors observed event-related Ca2+ transients on hippocampal pan-neurons and PV+ inter-neurons using fiber photometry. They then used chemogenetics to inhibit CaMKII+ hippocampal neurons, which does not logically follow. It does not undermine the main finding of CaMKII+ neurons of the dorsal, but not ventral, hippocampus being involved in the preconditioning, but not conditioning, phase. However, observing CaMKII+ neurons (using fiber photometry) in mice running the same task would be more informative, as it would indicate when these neurons are recruited during different phases of sensory preconditioning. Applying then optogenetics to cancel the observed event-related transients (e.g., during the presentation of light and tone cues, or during the foot shock presentation) would be more appropriate.  

      We have generated new photometry data to analyze the activity of CaMKII-positive neurons during the preconditioning phase to confirm their engagement during the light-tone pairings. Thus, we infused a CaMKII-GCAMP calcium sensor into the dHPC and vHPC of mice and we recorded its activity during the 6 preconditioning sessions. The new results can be found in Figure 3 and explained in the Results section (Page 12, Lines 26-36). The results clearly show an engagement of CaMKII-positive neurons during the light-tone pairing observed both in the dHPC and vHPC. Finally, although the suggestion of performing optogenetic manipulations would be very elegant, we expect to have convinced the reviewer that our chemogenetic results clearly show and are enough to demonstrate the involvement of dHPC in the formation of mediated learning in the Light-Tone sensory preconditioning paradigm. However, we have added this in the Discussion Section as a future perspective/limitation of our study (Page 17, Lines 9-24).  

      (5) Probe tests always start with the "Probe Test Tone", followed by the "Probe Test Light". "Probe Test Tone" consists of an extinction session, which could affect the freezing response during "Probe Test Light" (e.g., Polack et al. (http://dx.doi.org/10.3758/s13420-013-0119-5)). Preferably, adding a group of mice with a Probe Test Light with no Probe Test Tone could help clarify this potential issue. The authors should at least discuss the possibility that the tone extinction session prior to the "Probe Test Light" could have affected the freezing response to the light cue. 

      We appreciate the comment raised by the reviewer. However, we think that our direct learning responses are quite robust in all of our experiments and, thus, the impact of a possible extinction based on the tone presentation should not affect our direct learning. However, as it is an important point, we have discussed it in the Discussion Section (Page 17, Lines 12-14).  

      Reviewer #4 (Public review): 

      Summary 

      Pinho et al use in vivo calcium imaging and chemogenetic approaches to examine the involvement of hippocampal sub-regions across the different stages of a sensory preconditioning task in mice. They find clear evidence for sensory preconditioning in male but not female mice. They also find that, in the male mice, CaMKII-positive neurons in the dorsal hippocampus: (1) encode the audio-visual association that forms in stage 1 of the task, and (2) retrieve/express sensory preconditioned fear to the auditory stimulus at test. These findings are supported by evidence that ranges from incomplete to convincing. They will be valuable to researchers in the field of learning and memory. 

      We appreciate the summary of our work and all the constructive comments raised by the Reviewer, which have greatly improved the clarity and quality of our manuscript.  

      Abstract 

      Please note that sensory preconditioning doesn't require the stage 1 stimuli to be presented repeatedly or simultaneously. 

      The reviewer is right, and we have corrected and changed that information in the revised abstract.  

      "Finally, we combined our sensory preconditioning task with chemogenetic approaches to assess the role of these two hippocampal subregions in mediated learning."  This implies some form of inhibition of hippocampal neurons in stage 2 of the protocol, as this is the only stage of the protocol that permits one to make statements about mediated learning. However, it is clear from what follows that the authors interrogate the involvement of hippocampal sub-regions in stages 1 and 3 of the protocol - not stage 2. As such, most statements about mediated learning throughout the paper are potentially misleading (see below for a further elaboration of this point). If the authors persist in using the term mediated learning to describe the response to a sensory preconditioned stimulus, they should clarify what they mean by mediated learning at some point in the introduction. Alternatively, they might consider using a different phrase such as "sensory preconditioned responding". 

      Considering the arguments of the Reviewer, we have modified our text in the Abstract and through the main text. Moreover, based on a comment of Reviewer #2 (Point 2) we have generated new data demonstrating that dHPC does not seem to be involved in mediated learning formation during Stage 2, as its inhibition does not impair sensory preconditioning responding. This new data can be seen in Supplementary Figure 7G.  

      Introduction 

      "Low-salience" is used to describe stimuli such as tone, light, or odour that do not typically elicit responses that are of interest to experimenters. However, a tone, light, or odour can be very salient even though they don't elicit these particular responses. As such, it would be worth redescribing the "low-salience" stimuli in some other terms. 

      Through the revised version of the manuscript, we have replaced the term “lowsalience” by “innocuous stimuli” or avoiding any adjective as we think is not necessary.  

      "These higher-order conditioning processes, also known as mediated learning, can be captured in laboratory settings through sensory preconditioning procedures2,6-11."  Higher-order conditioning and mediated learning are not interchangeable terms: e.g., some forms of second-order conditioning are not due to mediated learning. More generally, the use of mediated learning is not necessary for the story that the authors develop in the paper and could be replaced for accuracy and clarity. E.g., "These higher-order conditioning processes can be studied in the laboratory using sensory preconditioning procedures2,6-11." 

      According to the Reviewer proposal, we have modified the text. 

      In reference to Experiment 2, it is stated that: "However, when light and tone were separated on time (Unpaired group), male mice were not able to exhibit mediated learning response (Figure 2B) whereas their response to the light (direct learning) was not affected (Figure 2D). On the other hand, female mice still present a lower but significant mediated learning response (Figure 2C) and normal direct learning (Figure 2E). Finally, in the No-Shock group, both male (Figure 2B and 2D) and female mice (Figure 2C and 2E) did not present either mediated or direct learning, which also confirmed that the exposure to the tone or light during Probe Tests do not elicit any behavioral change by themselves as the presence of the electric footshock is required to obtain a reliable mediated and direct learning responses."  The absence of a difference between the paired and unpaired female mice should not be described as "significant mediated learning" in the latter. It should be taken to indicate that performance in the females is due to generalization between the tone and light. That is, there is no sensory preconditioning in the female mice. The description of performance in the No-shock group really shouldn't be in terms of mediated or direct learning: that is, this group is another control for assessing the presence of sensory preconditioning in the group of interest. As a control, there is no potential for them to exhibit sensory preconditioning, so their performance should not be described in a way that suggests this potential. 

      All these comments are very pertinent and also raised by Reviewer #2 (Point 1, see above). In the revised version of the manuscript, we have carefully changed, when necessary, our interpretation of the results (e.g. in the case of the No-Shock group). In addition, we have generated new data that confirm that using similar conditions (i.e. 2 conditioning sessions in our SPC) in female mice we observe fear generalization and not a confident sensory preconditioning responding. In our opinion, this is not discarding the presence of mediated learning in female mice but suggesting that adapted protocols must be used in each sex. These results forced us to change the organization of the Figures but we hope the reviewer would agree with all the changes proposed. In addition, we have re-wrote a paragraph in the Discussion Section to explain these sex differences (see Page 15, lines 12-37). 

      Methods - Behavior 

      I appreciate the reasons for testing the animals in a new context. This does, however, raise other issues that complicate the interpretation of any hippocampal engagement: e.g., exposure to a novel context may engage the hippocampus for exploration/encoding of its features - hence, it is engaged for retrieving/expressing sensory preconditioned fear to the tone. This should be noted somewhere in the paper given that one of its aims is to shed light on the broader functioning of the hippocampus in associative processes. 

      This general issue - that the conditions of testing were such as to force engagement of the hippocampus - is amplified by two further features of testing with the tone. The first is the presence of background noise in the training context and its absence in the test context. The second is the fact that the tone was presented for 30 s in stage 1 and then continuously for 180s at test. Both changes could have contributed to the engagement of the hippocampus as they introduce the potential for discrimination between the tone that was trained and tested. 

      We have now added these pertinent comments in a “Study limitations” paragraph found in the Discussion Section (Page 17, Lines 9-24). Indeed, the different changes of context (including the presence of background noise) have been implemented by the fact that during the setting up of the paradigm we had problems of fear generalization (also in male mice). Similarly, differences in cue exposure between the preconditioning phase and the test phase were also decided based on important differences between previous protocols used in rats compared to how mice are responding. Certainly, mice were not able to adapt their behavioral responses when shorter time windows exposing the cue were used as it clearly happens with rats [1].

      Results - Behavior 

      The suggestion of sex differences based on differences in the parameters needed to generate sensory preconditioning is interesting. Perhaps it could be supported through some set of formal analyses. That is, the data in supplementary materials may well show that the parameters needed to generate sensory preconditioning in males and females are not the same. However, there needs to be some form of statistical comparison to support this point. As part of this comparison, it would be neat if the authors included body weight as a covariate to determine whether any interactions with sex are moderated by body weight.  

      Regarding the comparison between male and female mice, although the comments of the Reviewer are pertinent and interesting, we think that with the new data generated is not appropriate to compare both sexes as we still have to optimize the SPC protocol for female mice. 

      What is the value of the data shown in Figure 1 given that there are no controls for unpaired presentations of the sound and light? In the absence of these controls, the experiment cannot have shown that "Female and male mice show mediated learning using an auditory-visual sensory preconditioning task" as implied by its title. Minimally, this experiment should be relabelled. 

      Based on the new data generated with female mice, we have decided to remove Figure 1 and re-organize the structure of the manuscript. We hope that the Reviewer would agree that this has improved the clarity of the manuscript.  

      "Altogether, this data confirmed that we successfully set up an LTSPC protocol in mice and that this behavioral paradigm can be used to further study the brain circuits involved in higherorder     conditioning."  Please insert the qualifier that LTSPC was successfully established in male mice. There is no evidence of LTSPC in female mice. 

      We fully agree with the Reviewer and our new findings further confirm this issue. Thus, we have changed the statement in the revised version of the manuscript.  

      Results - Brain 

      "Notably, the inhibition of CaMKII-positive neurons in the dHPC (i.e. J60 administration in DREADD-Gi mice) during preconditioning (Figure 4B), but not before the Probe Test 1 (Figure 4B), fully blocked mediated, but not direct learning (Figure  4D)." The right panel of Figure 4B indicates no difference between the controls and Group DPC in the percent change in freezing from OFF to ON periods of the tone. How does this fit with the claim that CaMKII-positive neurons in the dorsal hippocampus regulate associative formation during the session of tone-light exposures in stage 1 of sensory preconditioning? 

      To improve the quality of the figures and to avoid possible redundancies between panels, in the new version of the manuscript, we have decided to remove all the panels regarding the percentage of change. However, in our opinion regarding the issue raised by the Reviewer, the inhibition of the dHPC clearly induced an impairment of mediated learning as animals do not change their behavior (i.e. there is no significant increase of freezing between OFF and ON periods) when the tone appears in comparison with the other two groups. The graphs indicating the percentage of change (old version of the manuscript) was a different manner to show the presence of tone- or light-induced responses in each experimental group. Thus, a significant effect (shown by # symbol) meant that in that specific experimental group there was a significant change in behavior (freezing) when the cue (tone or light) appeared compared when there was no cue (OFF period). Thus, in the old panel 4B commented by the Reviewer, in our opinion, the absence of significance in the group where the dHPC has been inhibited during thepreconditioning, compared to the other groups, where a clear significant effect can be observed, indicate an impairment of mediated learning formation. However, to avoid any confusion, we have slightly modified the text to strictly mention what is being analyzed and/or shown in the graphs and, as mentioned, the graphs of percentage of change have been removed.  

      Discussion 

      "When low salience stimuli were presented separated on time or when the electric footshock was absent, mediated and direct learning were abolished in male mice. In female mice, although light and tone were presented separately during the preconditioning phase, mediated learning was reduced but still present, which implies that female mice are still able to associate the two low-salience stimuli." 

      This doesn't quite follow from the results. The failure of the female unpaired mice to withhold their freezing to the tone should not be taken to indicate the formation of a light-tone association across the very long interval that was interpolated between these stimulus presentations. It could and should be taken to indicate that, in female mice, freezing conditioned to the light simply generalized to the tone (i.e., these mice could not discriminate well between the tone and light). 

      As discussed above, we fully agree with the Reviewer and all the manuscript has been modified as described above. 

      "Indeed, our data suggests that when hippocampal activity is modulated by the specific manipulation of hippocampal subregions, this brain region is not involved during retrieval."  Does this relate to the results that are shown in the right panel of Figure 4B, where there is no significant difference between the different groups? If so, how does it fit with the results shown in the left panel of this figure, where differences between the groups are observed? 

      "In line with this, the inhibition of CaMKII-positive neurons from the dorsal hippocampus, which has been shown to project to the restrosplenial cortex56, blocked the formation of mediated learning." 

      Is this a reference to the findings shown in Figure 4B and, if so, which of the panels exactly? That is, one panel appears to support the claim made here while the other doesn't. In general, what should the reader make of data showing the percent change in freezing from stimulus OFF to stimulus ON periods? 

      In our opinion, as pointed above, the graphs indicating the percentage of change were a different manner to show the presence of tone- or light-induced behavioral responses in each experimental group. Thus, a significant effect (shown by # symbol) meant that in this specific experimental group there was a significant change in behavior (freezing) when the cue (tone or light appear) compared when there was no cue (OFF period). Thus, in the old panel 4B commented by the Reviewer, in our opinion, the absence of significance in the group where the dHPC has been inhibited during the preconditioning, compared to the other groups where a clear significant effect can be observed, indicates an impairment of mediated learning formation. In the revised version of the manuscript, we have rephrased these sentences to stick to what the graphs are showing and, as explained, the graphs of percentage of change have been removed.

      Reviewer #1 (Recommendations for the authors): 

      The authors may address the following questions: 

      (1) The study identifies major sex differences in the conditioning phase, with females showing faster learning. Since hormonal fluctuations can influence learning and behavior, it would be helpful for the authors to comment on whether they tracked the estrous cycle of the females and whether any potential effects of the cycle on mediated learning were considered. 

      This is a relevant and important point raised by the Reviewer. In our study we did not track the estrous cycle to investigate whether it exists any effect of the cycle on mediated learning, which could be an interesting project by itself. Although in the revised version of the manuscript we provide new information regarding the mediated learning performance in male and female mice, we agree with the reviewer that sex hormones may account for the observed sex differences. However, the aim of the present work was to explore potential sex differences in mediated learning responding rather than to investigate the specific mechanisms behind these potential sex differences. 

      For this reason and to avoid adding further complexity to our present study, we did not check the estrous cycle in the female mice, the testosterone levels in male mice or analyze the amount of sex hormones during different phases of the sensory preconditioning task. Indeed, we think that checking the estrous cycle in female mice would still not be enough to ascertain the role of sex hormones because checking the androgen levels in male mice would also be required. In line with this, meta-analysis of neuroscience literature using the mouse model as research subjects [2-4]  has revealed that data collected from female mice (regardless of the estrous cycle) did not vary more than the data from males. In conclusion, we think that using randomized and mixed cohorts of male and female mice (as in the present study) would provide the same degree of variability in both sexes. Nevertheless, we have added a sentence to point to this possibility in the Discussion Section (Page 15, lines 32-37). 

      (2) The rationale for including parvalbumin (PV) cells in the study could be clarified. Is there prior evidence suggesting that this specific cell type is involved in mediated learning? This could apply to sensory stimuli not used in the current study.

      In the revised version of the manuscript, we have better clarified why we targeted PV interneurons, specifically mentioning previous studies [5] (see Page 11, Lines 27-34). 

      (3) The photometry recordings from the dHPC during the preconditioning phase, shown in Figure 3, are presented as average responses. It would be beneficial to separate the early vs. late trials to examine whether there is an increase in hippocampal activity as the associative learning progresses, rather than reporting the averaged data. Additionally, to clarify the dynamics of the dHPC in associative learning, the authors could compare the magnitude of photometry responses when light and tone stimuli are presented individually in separate sessions versus when they are presented closely in time to facilitate associative learning.

      As commented above, according to the Reviewer’s comment, we have now included a new Supplementary Figure 4, which splits the photometry data by the different preconditioning and conditioning sessions. Overall, this data suggests that there are no major changes on cell activity in both hippocampal regions during the different sessions as similar light-tone-induced enhancement of activity is observed. There is only an interesting trend in the activity of Pan-Neurons over the onset of light during conditioning sessions. All this is included now in the Results Section (Page 12, Line 13-15).

      (4) The authors note that PV cell responses recorded with GCaMP were similar to general hippocampal neurons, yet chemogenetic manipulations of PV cells did not impact behavior. A more detailed discussion of this discrepancy would be helpful. 

      As suggested by the Reviewer, we have included additional Discussion to explain the potential discrepancy between the activity of PV interneurons assessed by photometry and its modulation by chemogenetics (see Page 16, Lines 27-33).   

      (5) All fiber photometry recordings were conducted in male mice. Given the sex differences observed in associative learning, the authors could expand the study to include dHPC responses in females during both preconditioning and conditioning sessions. 

      We appreciate the comment of the Reviewer. Indeed, thanks to other comments made by other Reviewers in this revision (see Point 1 of Reviewer #2), we are not still sure that we have an optimal protocol to study mediated learning in female mice due to sexspecific changes related to fear generalization. Thus, the revised version of the manuscript, although highlighting these sex differences in behavioral performance (see Supplementary Figure 2), is more focused in male mice and, accordingly, all photometry or chemogenetic experiments are performed exclusively using male mice. In future studies, once we would be sure to have a sensory preconditioning paradigm working in female mice, it will be very interesting to study if the same hippocampal mechanisms mediating this behavior in male mice are also observed in female mice. 

      Minor Comments: 

      (1) In the right panel of Figure 2A, females received only one conditioning session, so the "x2" should be corrected to "x1" conditioning to accurately reflect the data. 

      We thank the Reviewer for the comment that has been addressed in the revised version of the manuscript.  

      (2) The overall presentation of Figure 3 could be improved. For example, the y-axis in Panel B could be cut to a maximum of 3 rather than 6, which would better highlight the response data. Alternatively, including heatmap representations of the z-score responses could enhance clarity and visual impact.  

      We thank the Reviewer for the comment that has been addressed providing a new format for Figures 2 and 3 in the revised version of the manuscript.   

      (3) There are several grammatical errors throughout the manuscript. It is recommended that the authors use a grammar correction tool to improve the overall writing quality and readability.  

      We have tried to correct the grammar through all the manuscript.  

      Reviewer #2 (Recommendations for the authors):  

      (1) In the abstract the authors write that sensory preconditioning requires the "repeated and simultaneous presentation of two low-salience stimuli such as a light and a tone". Previous research has shown that sensory preconditioning can still occur if the two stimuli are presented serially, rather than simultaneously. Further, the tone and the light are not necessarily "low-salience", for example, they can be loud or bright. It would be better to refer to them as innocuous. 

      In the revised version of the abstract, we have included the modifications suggested by the Reviewer.   

      (2) The authors develop a novel automated tool for assessing freezing behaviour in mice that correlates highly with both manual freezing and existing, open-source freeze estimation software (ezTrack). The authors should explain how the new program differs from ezTrack, or if it provides any added benefit over this existing software. 

      We have added new information in the Results Section (Page 10, Lines 13-20 to better explain how the new tool to quantify freezing could improve existing software.  

      (3) In Experiment 1, the authors report a sex difference in levels of freezing between male and female mice when they are only given one session of sensory preconditioning. This should be supported by a statistical comparison of levels of freezing between male and female mice. 

      Based on the new results obtained with female mice, we have decided to remove the original Figure 1 of the manuscript as it is not meaningful to compare male and female mediated learning response if we do not have an optimal protocol in female mice.  

      (4) Why did the authors choose to vary the duration of the stimuli across preconditioning, conditioning, and testing? During preconditioning, the light-tone compound was 30s, in conditioning the light was 10s, and at test both stimuli were presented continuously for 3 min. Did the level of freezing vary across the three-minute probe session? There is some evidence that rodents can learn the timing of stimuli and it may be the case that freezing was highest at the start of the test stimulus, when it most closely resembled the conditioned stimulus. 

      Differences in cue exposure between the preconditioning phase and the test phase were decided based on important differences between previous protocols used in rats compared to how mice are responding. Indeed, mice were not able to adapt their behavioral responses when shorter time windows exposing the cue were used as it clearly happens with rats1. In addition, we have added a new graph to show the time course of the behavioral responses (see Figure 1 and 4 and Supplementary Figure 2) that correlate with the quantification of freezing responses shown by the percentage of freezing during ON and OFF periods.   

      (5) The title of Experiment 1 "Female and male mice show mediated learning using an auditory-visual sensory preconditioning task" - this experiment does not demonstrate mediated learning; it merely shows that animals will freeze more in the presence of a stimulus as compared with no stimulus. This experiment lacks the necessary controls to claim mediated learning (which are presented in Experiment 2) and should therefore be retitled something more appropriate.

      As stated above, based on the new results obtained with female mice, we have decided to remove the original Figure 1 of the manuscript as it is not meaningful to compare male and female mediated learning response if we do not have an optimal protocol in female mice.   

      (6) In Figure 2, why does the unpaired group show less freezing to the tone than the paired group given that the tone was directly paired with the shock in both groups? 

      We believe the Reviewer may have referred to the tone in error (i.e. there are no differences in the freezing observed to the tone) and (s)he might be talking about the freezing induced by the Light in the direct learning test. In this case, it is true that the direct learning (e.g. percentage of freezing) seems to be slightly lower in the unpaired group compared to the paired one, which could be due to a latent inhibition process caused by the different exposure of cues between paired and unpaired experimental groups. However, the direct learning in both groups is clear and significant and there are no significant differences between them, which makes difficult to extract any further conclusion. 

      (7) The stimuli in the design schematics are quite small and hard to see, they should be enlarged for clarity. The box plots also looked stretched and the colour difference between the on and off periods is difficult to discern. 

      We have included some important modification to the Figures in order to address the comments made by the Reviewer and improve its quality.   

      (8) The authors do not include labels for the experimental groups (paired, unpaired, no shock) in Figures 2B, 2D, 2C, and 2E. This made it very difficult to interpret the figure.  

      According to this suggestion, Figure 2 has been changed accordingly. 

      (9) The levels of freezing during conditioning should be presented for all experiments.  

      We have generated a new Supplementary Figure 9 to show the freezing levels during conditioning sessions. 

      (10) In the final experiment, the authors wrote that mice were injected with J60 or saline, but I could not find the data for the saline animals.  

      In the Results and Methods section, we have included a sentence to better explain this issue. In addition, we have added a new Supplementary Figure 7 to show the performance of all control groups.  

      (11) Please list the total number of animals (per group, per sex) for each experiment.  

      In the revised version of the manuscript, we have added this information in each Figure Legend.  

      Reviewer #3 (Recommendations for the authors): 

      I found this study very interesting, despite a few weaknesses. I have several minor comments to add, hoping that it would improve the manuscript: 

      (1) The terminology used is not always appropriate/consistent. I would use "freely moving fiber photometry" or simply "fiber photometry" as calcium imaging conventionally refers to endoscopic or 2-photon calcium imaging. 

      We thank the Reviewer for this comment that has been addressed and corrected in the revised version of the manuscript. 

      (2) "Dorsal hippocampus mediates light-tone sensory preconditioning task in mice" suggests that a brain region mediates a task. I would rather suggest, e.g. "Dorsal hippocampus mediates light-tone association in mice" 

      We thank the Reviewer for this comment that has been addressed and corrected in the revised version of the manuscript.

      (3) As you are using low-salience stimuli, it would be better to also inform the readership with the light intensity used for the light cue, for replicability purposes. 

      In the Methods section (Page 5, Line 30), we have added new information regarding the visual stimuli used. 

      (4) If the authors didn't use a background noise during the probe tests, the tone cue could have been perceived as being louder/clearer by mice. Couldn't it have inflated the freezing response for the tone cue?  

      This is an interesting comment made by the Reviewer although we do not have any data to directly answer his/her suggestion. However, the presence of the Background noise resulted necessary to set up the protocol and to change different aspects of the context through all the paradigm, which was necessary to avoid fear generalization in mice. In addition, as demonstrated before [6] , the presence of background noise is important to avoid that other auditory cue (i.e. tone) could induce fear responses by itself as the transition of noise to silence is a signal to danger for animals. 

      (5) "salience" is usually used for the intensity of a stimulus, not for an association or pairing. Rather, we usually refer to the strength of an association. 

      We thank the Reviewer for this comment that has been addressed and corrected in the revised version of the manuscript.

      (6) Figure 3, panel A. "RCaMP Neurons", maybe "Pan-Neurons" would be more appropriate, as PV+ inter-neurons are also neurons. 

      We thank the Reviewer for this comment that has been corrected accordingly.

      (7) Figure 4, panel A, please add the AAV injected, and the neurons labelled in your example slice. 

      We thank the Reviewer for this comment that has been corrected accordingly.

      References

      (1) Wong, F. S., Westbrook, R. F. & Holmes, N. M. 'Online' integration of sensory and fear memories in the rat medial temporal lobe. Elife 8 (2019). https://doi.org:10.7554/eLife.47085

      (2) Prendergast, B. J., Onishi, K. G. & Zucker, I. Female mice liberated for inclusion in neuroscience and biomedical research. Neurosci Biobehav Rev 40, 1-5 (2014). https://doi.org:10.1016/j.neubiorev.2014.01.001

      (3) Becker, J. B., Prendergast, B. J. & Liang, J. W. Female rats are not more variable than male rats: a meta-analysis of neuroscience studies. Biol Sex Differ 7, 34 (2016). https://doi.org:10.1186/s13293-016-0087-5

      (4) Shansky, R. M. Are hormones a "female problem" for animal research? Science 364,  825-826 (2019). https://doi.org:10.1126/science.aaw7570

      (5) Busquets-Garcia, A. et al. Hippocampal CB1 Receptors Control Incidental Associations. Neuron 99, 1247-1259 e1247 (2018). https://doi.org:10.1016/j.neuron.2018.08.014

      (6) Pereira, A. G., Cruz, A., Lima, S. Q. & Moita, M. A. Silence resulting from the cessation of movement signals danger. Curr Biol 22, R627-628 (2012). https://doi.org:10.1016/j.cub.2012.06.015

    1. eLife Assessment

      This Research Advance manuscript further elucidates the roles of SMC5/6 loader proteins and associated factors in the silencing of extrachromosomal circular DNA by the SMC5/6 complex. While the findings are largely in line with expectations, they are valuable, representing a meaningful advance beyond the recent study from the same laboratories (PMC9708086), validating the previous model that distinct SMC5/6 subcomplexes, SIMC1-SLF2 and SLF1/2, separately control its transcriptional repression and DNA repair activities on extrachromosomal DNA. Solid evidence is presented for a role for SIMC1/SLF2 in localization of the SMC5/6 complex to plasmid DNA, and the distinct requirements as compared to recruitment of SMC5/6 to chromosomal DNA lesions.

    2. Reviewer #1 (Public review):

      SMC5/6 is a highly conserved complex able to dynamically alter chromatin structure, playing in this way critical roles in genome stability and integrity that include homologous recombination and telomere maintenance. In the last years, a number of studies have revealed the importance of SMC5/6 in restricting viral expression, which is in part related to its ability to repress transcription from circular DNA. In this context, Oravcova and colleagues recently reported how SMC5/6 is recruited by two mutually exclusive complexes (orthologs of yeast Nse5/6) to SV40 LT-induced PML nuclear bodies (SIMC/SLF2) and DNA lesions (SLF1/2). In this current work, the authors extend this study, providing some new results.

    3. Reviewer #2 (Public review):

      Oracová et al. present data supporting a role for SIMC1/SLF2 in silencing plasmid DNA via the SMC5/6 complex. Their findings are of interest, and they provide further mechanistic detail of how the SMC5/6 complex is recruited to disparate DNA elements. In essence, the present report builds on the author's previous paper in eLife in 2022 (PMID: 36373674, "The Nse5/6-like SIMC1-SLF2 complex localizes SMC5/6 to viral replication centers") by showing the role of SIMC1/SLF2 in localisation of the SMC5/6 complex to plasmid DNA, and the distinct requirements as compared to recruitment to DNA damage foci.

    4. Reviewer #3 (Public review):

      This study by the Boddy and Otomo laboratories further characterizes the roles of SMC5/6 loader proteins and related factors in SMC5/6-mediated repression of extrachromosomal circular DNA. The work shows that mutations engineered at an AlphaFold-predicted protein-protein interface formed between the loader SLF2/SIMC1 and SMC6 (similar to the interface in the yeast counterparts observed by cryo-EM) prevent co-IP of the respective proteins. The mutations in SLF2 also hinder plasmid DNA silencing when expressed in SLF2-/- cell lines, suggesting that this interface is needed for silencing. SIMC1 is dispensable for recruitment of SMC5/6 to sites of DNA damage, while SLF1 is required, thus separating the functions of the two loader complexes. Preventing SUMOylation (with a chemical inhibitor) increases transcription from plasmids but does not in SLF2-deleted cell lines, indicating the SMC5/6 silences plasmids in a SUMOylation dependent manner. Expression of LT is sufficient for increased expression, and again, not additive or synergistic with SIMC1 or SLF2 deletion, indicating that LT prevents silencing by directly inhibiting 5/6. In contrast, PML bodies appear dispensable for plasmid silencing.

    5. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public review):

      SMC5/6 is a highly conserved complex able to dynamically alter chromatin structure, playing in this way critical roles in genome stability and integrity that include homologous recombination and telomere maintenance. In the last years, a number of studies have revealed the importance of SMC5/6 in restricting viral expression, which is in part related to its ability to repress transcription from circular DNA. In this context, Oravcova and colleagues recently reported how SMC5/6 is recruited by two mutually exclusive complexes (orthologs of yeast Nse5/6) to SV40 LT-induced PML nuclear bodies (SIMC/SLF2) and DNA lesions (SLF1/2). In this current work, the authors extend this study, providing some new results. However, as a whole, the story lacks unity and does not delve into the molecular mechanisms responsible for the silencing process. One has the feeling that the story is somewhat incomplete, putting together not directly connected results.

      Please see the introductory overview above.

      (1) In the first part of the work, the authors confirm previous conclusions about the relevance of a conserved domain defined by the interaction of SIMC and SLF2 for their binding to SMC6, and extend the structural analysis to the modelling of the SIMC/SLF2/SMC complex by AlphaFold. Their data support a model where this conserved surface of SIMC/SLF2 interacts with SMC at the backside of SMC6's head domain, confirming the relevance of this interaction site with specific mutations. These results are interesting but confirmatory of a previous and more complete structural analysis in yeast (Li et al. NSMB 2024). In any case, they reveal the conservation of the interaction. My major concern is the lack of connection with the rest of the article. This structure does not help to understand the process of transcriptional silencing reported later beyond its relevance to recruit SMC5/6 to its targets, which was already demonstrated in the previous study.

      Demonstrating the existence of a conserved interface between the Nse5/6-like complexes and SMC6 in both yeast and human is foundationally important, not confirmatory, and was not revealed in our previous study. It remains unclear how this interface regulates SMC5/6 function, but yeast studies suggest a potential role in inhibiting the SMC5/6 ATPase cycle. Nevertheless, the precise function of Nse5/6 and its human orthologs in SMC5/6 regulation remain undefined, largely due to technical limitations in available in vivo analyses. The SIMC1/SLF2/SMC6 complex structure likely extends to the SLF1/2/SMC6 complex, suggesting a unifying function of the Nse5/6-like complexes in SMC5/6 regulation, albeit in the distinct processes of ecDNA silencing and DNA repair. There have been no studies to date (including this one) showing that SIMC1-SLF2 is required for SMC5/6 recruitment to ecDNA. Our previous study showed that SIMC1 was needed for SMC5/6 to colocalize with SV40 LT antigen at PML NBs. Here we show that SIMC1 is required for ecDNA repression, in the absence of PML NBs, which was not anticipated.

      (2) In the second part of the work, the authors focus on the functionality of the different complexes. The authors demonstrate that SMC5/6's role in transcription silencing is specific to its interaction with SIMC/SLF2, whereas SMC5/6's role in DNA repair depends on SLF1/2. These results are quite expected according to previous results. The authors already demonstrated that SLF1/2, but not SIMC/SLF2, are recruited to DNA lesions. Accordingly, they observe here that SMC5/6 recruitment to DNA lesions requires SLF1/2 but not SIMC/SLF2. Likewise, the authors already demonstrated that SIMC/SLF2, but not SLF1/2, targets SMC5/6 to PML NBs. Taking into account the evidence that connects SMC5/6's viral resistance at PML NBs with transcription repression, the observed requirement of SIMC/SLF2 but not SLF1/2 in plasmid silencing is somehow expected. This does not mean the expectation has not to be experimentally confirmed. However, the study falls short in advancing the mechanistic process, despite some interesting results as the dispensability of the PML NBs or the antagonistic role of the SV40 large T antigen. It had been interesting to explore how LT overcomes SMC5/6-mediated repression: Does LT prevent SIMC/SLF2 from interacting with SMC5/6? Or does it prevent SMC5/6 from binding the plasmid? Is the transcription-dependent plasmid topology altered in cells lacking SIMC/SLF2? And in cells expressing LT? In its current form, the study is confirmatory and preliminary. In agreement with this, the cartoons modelling results here and in the previous work look basically the same.

      Our previous study only examined the localization of SLF1 and SIMC1 at DNA lesions. The localization of these subcomplexes alone should not be used to define their roles in SMC5/6 localization. Indeed, the field is split in terms of whether Nse5/6-like complexes are required for ecDNA binding/loading, or regulation of SMC5/6 once bound. 

      We agree, determining the potential mechanism of action of LT in overcoming SMC5/6-based repression is an important next step. We believe it is unlikely due to blocking of the SMC5/6SIMC1/SLF2 interface, since SIMC1-SLF2 is required for SMC5/6 to localize at LT-induced foci. It will require the identification of any direct interactions with SMC5/6 subunits, and better methods for assessing SMC5/6 loading and activity on ecDNAs. Unlike HBx, Vpr, and BNRF1 it does not appear to induce degradation of SMC5/6, making it a more complex and interesting challenge. Also, the dispensability of PML NBs in plasmid silencing versus viral silencing raises multiple important questions about SMC5/6’s repression mechanism. 

      (3) There are some points about the presented data that need to be clarified.

      Thank you, we have addressed these points below, within the Recommendations for authors section.

      Reviewer #2 (Public review):

      Oracová et al. present data supporting a role for SIMC1/SLF2 in silencing plasmid DNA via the SMC5/6 complex. Their findings are of interest, and they provide further mechanistic detail of how the SMC5/6 complex is recruited to disparate DNA elements. In essence, the present report builds on the author's previous paper in eLife in 2022 (PMID: 36373674, "The Nse5/6-like SIMC1-SLF2 complex localizes SMC5/6 to viral replication centers") by showing the role of SIMC1/SLF2 in localisation of the SMC5/6 complex to plasmid DNA, and the distinct requirements as compared to recruitment to DNA damage foci. Although the findings of the manuscript are of interest, we are not yet convinced that the new data presented here represents a compelling new body of work and would better fit the format of a "research advance" article. In their previous paper, Oracová et al. show that the recruitment of SMC5/6 to SV40 replication centres is dependent on SIMC1, and specifically, that it is dependent on SIMC1 residues adjacent to neighbouring SLF2.

      We agree. We submitted this manuscript as a “Research Advance”, not as a standalone research article, given that it is an extension of our previous “Research Article” (1).

      Other comments

      (1) The mutations chosen in Figure 1 are quite extensive - 5 amino acids per mutant. In addition, they are in many cases 'opposite' changes, e.g., positive charge to negative charge. Is the effect lost if single mutations to an alanine are made?

      The mutations were chosen to test and validate the predicted SIMC1-SLF2-SMC6 structure i.e. the contact point between the conserved patch of SIMC1-SLF2 and SMC6. Multiple mutations and charge inversions increased the chance of disrupting the extensive interface. In this respect, the mutations were successful and informative, confirming the requirement of this region in specifically contacting SMC6. Whilst alanine scanning mutations are possible, we believe that they would not add to, or detract from, our validation of the predicted SIMC1-SLF2-SMC6 interface.

      (2) In Figure 2c, it isn't clear from the data shown that the 'SLF2-only' mutations in SMC6 result in a substantial reduction in SIMC1/SLF2 binding.

      To clarify the difference between wild-type and SLF2-only mutations in SIMC1-SLF2 interaction, we have performed an image volume analysis. This shows that the SLF2-facing SMC6 mutant reduces its interaction with SIMC1 (to 44% of WT) and SLF2 (to 21% of WT). The reduction in both SIMC1 and SLF2 interaction with SMC6 SLF2-facing mutant is expected, since SIMC1 and SLF2 are an interdependent heterodimer.  

      Author response table 1.

      (3) In the GFP reporter assays (e.g. Figure 3), median fluorescence is reported - was there any observed difference in the percentage of cells that are GFP positive?

      Yes, as expected when the GFP plasmid is not actively repressed, the percent of GFP positive cells differs in each cell line – in the same trend as GFP intensity

      (4) The potential role of the large T antigen as an SMC5/6 evasion factor is intriguing. However, given the role of the large T antigen as a transcriptional activator, caution is required when interpreting enhanced GFP fluorescence. Antagonism of the SMC5/6 complex in this context might be further supported by ChIP experiments in the presence or absence of large T. Can large T functionally substitute for HBx or HIV-Vpr?

      We agree, the potential role of LT in SMC5/6 antagonism is interesting. We did state in the text “While LT is known to be a promiscuous transcriptional activator (2,3) that does not rule out a co-existing role in antagonizing SMC5/6. Indeed, these findings are reminiscent of HBx from HBV and Vpr of HIV-1, both of which are known promiscuous transcriptional activators that also directly antagonize SMC5/6 to relieve transcriptional repression (4-10).“ We have tried ChIP experiments, but found these to be unreliable in assessing SMC5/6 association with plasmid DNA. Given the many disparate targets of LT, HBx and Vpr (other than SMC5/6), it seems unlikely that LT could functionally substitute for HBx and Vpr in supporting HBV and HIV-1 infections. Whilst certainly an interesting future question, we believe it is beyond the scope of this study.

      (5) In Figure 5c, the apparent molecular weight of large T and SMC6 appears to change following transfection of GFP-SMC5 - is there a reason for this?

      We are not certain as to what causes the molecular weight shift, but it is not specifically related to GFPSMC5 transfection. Rather, it appears to be a general effect of the pulldown. Indeed, a very weak “background” band of LT is seen in the GFP only pulldown, which also runs at a “higher” molecular weight, as in the GFP-SMC5 pulldown. We believe that the effect is instead related to gel mobility in the wells that contain post pulldown proteins and different buffers. We have also seen similar effects using different protein-protein interaction pairs. 

      Reviewer #3 (Public review):

      Summary:

      This study by the Boddy and Otomo laboratories further characterizes the roles of SMC5/6 loader proteins and related factors in SMC5/6-mediated repression of extrachromosomal circular DNA. The work shows that mutations engineered at an AlphaFold-predicted protein-protein interface formed between the loader SLF2/SIMC1 and SMC6 (similar to the interface in the yeast counterparts observed by cryo-EM) prevent co-IP of the respective proteins. The mutations in SLF2 also hinder plasmid DNA silencing when expressed in SLF2-/- cell lines, suggesting that this interface is needed for silencing. SIMC1 is dispensable for recruitment of SMC5/6 to sites of DNA damage, while SLF1 is required, thus separating the functions of the two loader complexes. Preventing SUMOylation (with a chemical inhibitor) increases transcription from plasmids but does not in SLF2-deleted cell lines, indicating the SMC5/6 silences plasmids in a SUMOylation dependent manner. Expression of LT is sufficient for increased expression, and again, not additive or synergistic with SIMC1 or SLF2 deletion, indicating that LT prevents silencing by directly inhibiting 5/6. In contrast, PML bodies appear dispensable for plasmid silencing.

      Strengths:

      The manuscript defines the requirements for plasmid silencing by SMC5/6 (an interaction of Smc6 with the loader complex SLF2/SIMC1, SUMOylation activity) and shows that SLF1 and PML bodies are dispensable for silencing. Furthermore, the authors show that LT can overcome silencing, likely by directly binding to (but not degrading) SMC5/6.

      Weaknesses:

      (1) Many of the findings were expected based on recent publications.

      There have been no manuscripts describing the role of SIMC1-SLF2 in ecDNA silencing. There have been studies describing SLF2’s roles in ecDNA silencing, but these suggested SLF2 had an SLF1 independent role, with no mention of an alternate Nse5-like cofactor. Our earlier study in eLife (1) described the identification of SIMC1 as an Nse5-like cofactor for SLF2 but did not test potential roles of the complex in ecDNA silencing. Also, the apparent dispensability of PML NBs in plasmid silencing (in U2OS cells) was unexpected based on recent publications. Finally, SV40 LT has not previously been implicated in SMC5/6 inhibition, which may occur through novel mechanisms.

      (2) While the data are consistent with SIMC1 playing the main function in plasmid silencing, it is possible that SLF1 contributes to silencing, especially in the absence of SIMC1. This would potentially explain the discrepancy with the data reported in ref. 50. SLF2 deletion has a stronger effect on expression than SIMC1 deletion in many but not all experiments reported in this manuscript. A double mutant/deletion experiments would be useful to explore this possibility.

      It is interesting to note that the data in ref. 50 (11) is also at odds with that in ref. 45 (8) in terms of defining a role for SLF1 in the silencing of unintegrated HIV-1 DNA. The Irwan study showed that SLF1 deficient cells exhibit increased expression of a reporter gene from unintegrated HIV-1, whereas the Dupont study found that SLF1 deletion, unlike SLF2 deletion, has no effect. It is unclear what the basis of this discrepancy is. In line with the Dupont study, we found no effect of SLF1 deletion on plasmid expression (Figure 4B), whereas SLF2 deletion increased reporter expression (Figure 3A/B). It is possible that SLF1 could support some plasmid silencing in the absence of SIMC1, especially considering the gross structural similarity in their C-terminal Nse5-like domains. However, we have been unable to generate double-knockout SIMC1 and SLF1 cells to test such a possibility, and shSLF1 has been ineffective. 

      (3) SLF2 is part of both types of loaders, while SLF1 and SIMC1 are specific to their respective loaders. Did the authors observe differences in phenotypes (growth, sensitivities to DNA damage) when comparing the mutant cell lines or their construction? This should be stated in the manuscript.

      We have not observed significant differences in the growth rates of each cell line, and DNA damage sensitivities are as yet untested.   

      (4) It would be desirable to have control reporter constructs located on the chromosome for several experiments, including the SUMOylation inhibition (Figures 5A and 5-S2) and LT expression (Figure 5D) to exclude more general effects on gene expression.

      We have repeated all GFP reporter assays using integrated versus episomal plasmid DNA. A seminal study by Decorsière et al. (6) showed that SMC5/6 degradation by HBx of HBV increased transcription of episomal but not chromosomally integrated reporters. In line with this data, the deletion of SLF2 does not notably impact the expression of our GFP reporter construct when it is genomically integrated (Figure 3—figure supplement 1C).  

      Somewhat surprisingly, given the generally transcriptionally repressive roles of SUMO, inhibition of the SUMO pathway with SUMOi did not significantly impact the expression of our genomically integrated GFP reporter, versus the episomal plasmid (Figure 5—figure supplement 1C). Finally, the expression of SV40 LT, which enhances plasmid reporter expression (Figure 5D), also did not notably affect expression of the same reporter when located in the genome (Figure 5—figure supplement 3B). This is an interesting result, which is in line with an early study showing that HBx of HBV induces transcription from episomal, but not chromosomally integrated reporters (12). This further suggests that SV40 LT acts similarly to other early viral proteins like HBx and Vpr to counteract or bypass SMC5/6 restriction, amongst their multifaceted functions. Clearly, further analyses are needed to define mechanisms of LT in counteracting SMC5/6, but they do not appear to include complex degradation as seen with HBx and Vpr.  

      (5) Figure 5A: There appears to be an increase in GFP in the SLF2-/- cells with SUMOi? Is this a significant increase?

      No significant difference was found between WT, SIMC1-/- or SLF2-/- when treated with SUMOi (p>0.05). The p-value is 0.0857 (when comparing SLF2-/- to WT in the SUMOi condition) This is described in the figure legend to Figure 5.

      (6) The expression level of SFL2 mut1 should be tested (Figure 3B).

      Full length SLF2 (WT or mutants) has been undetectable by western analyses. However, truncated SLF2 mut1 expresses well and binds SIMC1 but not SMC6 (Figure 1C). Moreover, full length SLF2 mut1 expression was confirmed by qPCR – showing a somewhat higher expression level than SLF2 WT (Figure 3—figure supplement 1B).  

      Reviewer #1 (Recommendations for the authors):

      There are some points about the presented data that need to be clarified.

      (1) Figures 3, 4B, and 5. The authors should rule out the possibility that the reported effects on transcription were due to alterations in plasmid number. This is particularly important, taking into account the importance of SMC5/6 in DNA replication.

      We used qPCR to assess plasmid copy number versus genomic DNA in our cell lines, testing at 72 hours post transfection to avoid any impact of cytosolic DNA (13). Our qPCR data show that there is no significant impact on plasmid copy number across our cell lines i.e. WT and SLF2 null.  SMC5/6 has a positive role in DNA replication progression on the genome (e.g. (14)), so loss of SMC5/6 “targeting” in SIMC1 and SLF2 null cells would be unlikely to promote replication fork progression per se. 

      (2) Figure S1A. In contrast to the statement in the text, the SIMC1-combo control is affected in its binding to SLF2; however, it is not affected in its binding to SMC6. This is somehow unexpected because it suggests that the solenoid-like structure is not required for SMC6 binding, just specific patches at either SIMC or SLF2. This should be commented on.

      We appreciate the reviewer’s observation regarding the discrepancy between Figure S1A and the text. This was our oversight. The data show that SLF2 recovery was reduced in the pull-down with the SIMC1 combo control mutant, while SLF2 expression was unchanged. Because SLF2 or SIMC1 variants that fail to associate typically show poor expression (1), these findings suggest that the SIMC1 combo control mutant associates with SLF2, albeit more weakly. Since the mutations were introduced into surface residues of SIMC1, it is not immediately clear how they would weaken the interaction or destabilize the complex. In contrast, SMC6 was fully recovered with the SIMC1 combo control mutant, indicating that the SIMC1–SMC6 interaction remains stable without stoichiometric SLF2. This may reflect direct recognition of a SIMC1 binding epitope or stabilization of its solenoid structure by SMC6, although this interpretation remains uncertain given the unstable nature of free SIMC1 and SLF2. Alternatively, SMC6 may have co-sedimented with the SIMC1 combo control mutant together with SLF2, which was initially retained but subsequently lost during washing, whereas SMC6 remained due to its limited solubility in the absence of other SMC5/6 subunits. While further mechanistic analysis will require purified SMC5/6 components, our data support the AlphaFold-based model by demonstrating that SIMC1 mutations on the non–SMC6-contacting surface retain association with SMC6. The text has been revised accordingly.

      (3) The SLF2-only mutant has alterations that affect interactions with both SLF2 and SIMC1. Is it not another Mixed mutant?

      We appreciate the reviewer’s observation regarding the discrepancy between the mutant name (“SLF2only”) and its description (“while N947 forms salt bridges with SIMC1”). The previous statement was inaccurate due to a misinterpretation of several AlphaFold models. Across these models, the SIMC1– SLF2 interface residues remain largely consistent, but the SIMC1 residue R470 exhibits positional variability—contacting N947 in some models but not in others. Given this variability and the absence of an experimental structure, we have revised the text to avoid overinterpretation. Because the N947 side chain is oriented toward SLF2 and consistently forms polar contacts with the H1148 side chain and G1149 backbone, we have renamed this mutant “SLF2-facing,” which more accurately describes its modeled environment. The other mutants are likewise renamed “SIMC1-facing” and “SIMC1–SLF2groove-facing,” providing a clearer and more consistent description of the interface.

      (4) The SLF2-only mutant still displays clear interactions with SMC6. Can this be explained with the AlphaFold model?

      SIMC1 may contribute more substantially to SMC6 binding than SLF2, consistent with our mutagenesis results. However, the energetic contributions of individual residues or proteins cannot be quantitatively inferred from structural models alone. Comprehensive experimental and computational analyses would be required to address this point.

      (5) The conclusions about the role of SUMOylation are vague; it is already known that its general effect on transcription repression, and the authors already demonstrated that SIMC interacts with SUMO pathway factors. Concerning the epistatic effect, the experiment should be done at a lower inhibitor concentration; at 100 nM there is not much margin to augment according to the kinetics analysis in Figure S5.

      The SUMO pathway is indeed thought to be generally repressive for transcription. Notably, in response to a suggestion from Reviewer 3 (public review point 4), we have repeated several of our GFP expression assays using cells with the GFP reporter plasmid integrated into the genome (please see Figure 3—figure supplement 1C; Figure 5—figure supplement 1C; Figure 5—figure supplement 3B). This type of integrated reporter does not show elevated expression following inhibition of the SMC5/6 complex, unlike ecDNAs (6,10). Interestingly, SUMOi, LT expression, and SLF2 knockout also did not notably impact the expression of our integrated GFP reporter (Figure 3—figure supplement 1C; Figure 5—figure supplement 1C; Figure 5—figure supplement 3B, unlike that of the plasmid (ecDNA) reporter. Given the “general” inhibitory effect of SUMO on transcription, the SUMOi result was not expected, and it opens further interesting avenues for study. 

      In Figure 5—figure supplement 1A, 100 nM SUMOi increases reporter expression well below the highest SUMOi dose. We believe that the ~3-4 fold induction of GFP expression in SLF2 null cells, if independent of SUMOylation, should further increase GFP expression. The impact of SUMOylation on GFP reporter expression remains “vague”, but our data indicate that SMC5/6 operates within SUMO’s “umbrella” function and provides a starting point for more mechanistic dissection. 

      (6) Figure 5C. Why is the size different between Input versus GFP-PD?

      Please see our response to this question above: reviewer 2, point (5)

      Reviewer #2 (Recommendations for the authors):

      If further data could be provided to extend on that which is presented, then publication as a 'standalone research article' may be appropriate, but not in its present form.

      We submitted this manuscript as a “Research Advance” not as a standalone research article, given that it was an extension of our previous research article (1).

      Reviewer #3 (Recommendations for the authors):

      (1) The term 'LT' should be defined in the title

      We have updated the title accordingly.  

      (2) This reviewer found the nomenclature of the SMC6 mutants confusing (SIMC1-only...). Either rephrase or define more clearly in the text and the figures.

      We agree with the reviewer and have renamed the mutants as “SIMC1-facing”, “SLF2-facing,”, and “SIMC1–SLF2-groove-facing”.

      (3) The authors could better emphasize that LT blocks silencing in trans (not only on its cognate target sequence in cis). This is consistent with the observed direct binding to SMC5/6.

      We appreciate the suggestion to further emphasize the impact of LT on plasmid silencing. We did not want to overstate its impact at this time because we do not know if it directly binds SMC5/6 or indeed affects SMC5/6 function more broadly. LT expression like HBx, does cause induction of a DNA damage response, but we cannot at this point tie that response to SMC5/6 inhibition alone.

      (4) Figure 5 S1: the merge looks drastically different. Is DAPI omitted in the wt merge image?

      Thank you for noting this issue. We have corrected the image, which was impacted by the use of an underexposed DAPI image.  

      (5) Figure 1: how is the structure in B oriented relative to A? A visual guide would be helpful.

      We have added arrows to indicate the view orientation and rotational direction to turn A to B.

      (6) Line 126, unclear what "specificity" here means.

      We have revised the sentence without this word, which now starts with “To confirm the SIMC1-SMC6 interface, we introduced….”

      (7) Line 152, The statement implies that the conserved residues are needed for loader subunits interactions ('mediating the SIMC1-SLF2 interaction"). Does Figure 1C not show that the residues are not important? Please clarify.

      Thank you for noting this writing error. We have corrected the sentence to provide the intended meaning. It now reads "Collectively, these results confirm that the conserved surface patch of SIMC1SLF2 is essential for SMC6 binding.” 

      References

      (1) Oravcova M, Nie M, Zilio N, Maeda S, Jami-Alahmadi Y, Lazzerini-Denchi E, Wohlschlegel JA, Ulrich HD, Otomo T, Boddy MN. The Nse5/6-like SIMC1-SLF2 complex localizes SMC5/6 to viral replication centers. Elife. 2022;11. PMCID: PMC9708086

      (2) Sullivan CS, Pipas JM. T antigens of simian virus 40: molecular chaperones for viral replication and tumorigenesis. Microbiol Mol Biol Rev. 2002;66(2):179-202. PMCID: PMC120785

      (3) Gilinger G, Alwine JC. Transcriptional activation by simian virus 40 large T antigen: requirements for simple promoter structures containing either TATA or initiator elements with variable upstream factor binding sites. J Virol. 1993;67(11):6682-8. PMCID: PMC238107

      (4) Qadri I, Conaway JW, Conaway RC, Schaack J, Siddiqui A. Hepatitis B virus transactivator protein, HBx, associates with the components of TFIIH and stimulates the DNA helicase activity of TFIIH. Proc Natl Acad Sci U S A. 1996;93(20):10578-83. PMCID: PMC38195

      (5) Aufiero B, Schneider RJ. The hepatitis B virus X-gene product trans-activates both RNA polymerase II and III promoters. EMBO J. 1990;9(2):497-504. PMCID: PMC551692

      (6) Decorsiere A, Mueller H, van Breugel PC, Abdul F, Gerossier L, Beran RK, Livingston CM, Niu C, Fletcher SP, Hantz O, Strubin M. Hepatitis B virus X protein identifies the Smc5/6 complex as a host restriction factor. Nature. 2016;531(7594):386-9. 

      (7) Murphy CM, Xu Y, Li F, Nio K, Reszka-Blanco N, Li X, Wu Y, Yu Y, Xiong Y, Su L. Hepatitis B Virus X Protein Promotes Degradation of SMC5/6 to Enhance HBV Replication. Cell Rep. 2016;16(11):2846-54. PMCID: PMC5078993

      (8) Dupont L, Bloor S, Williamson JC, Cuesta SM, Shah R, Teixeira-Silva A, Naamati A, Greenwood EJD, Sarafianos SG, Matheson NJ, Lehner PJ. The SMC5/6 complex compacts and silences unintegrated HIV-1 DNA and is antagonized by Vpr. Cell Host Microbe. 2021;29(5):792-805 e6. PMCID: PMC8118623

      (9) Felzien LK, Woffendin C, Hottiger MO, Subbramanian RA, Cohen EA, Nabel GJ. HIV transcriptional activation by the accessory protein, VPR, is mediated by the p300 co-activator. Proc Natl Acad Sci U S A. 1998;95(9):5281-6. PMCID: PMC20252

      (10) Diman A, Panis G, Castrogiovanni C, Prados J, Baechler B, Strubin M. Human Smc5/6 recognises transcription-generated positive DNA supercoils. Nat Commun. 2024;15(1):7805. PMCID: PMC11379904

      (11) Irwan ID, Bogerd HP, Cullen BR. Epigenetic silencing by the SMC5/6 complex mediates HIV-1 latency. Nat Microbiol. 2022;7(12):2101-13. PMCID: PMC9712108

      (12) van Breugel PC, Robert EI, Mueller H, Decorsiere A, Zoulim F, Hantz O, Strubin M. Hepatitis B virus X protein stimulates gene expression selectively from extrachromosomal DNA templates. Hepatology. 2012;56(6):2116-24. 

      (13) Lechardeur D, Sohn KJ, Haardt M, Joshi PB, Monck M, Graham RW, Beatty B, Squire J, O'Brodovich H, Lukacs GL. Metabolic instability of plasmid DNA in the cytosol: a potential barrier to gene transfer. Gene Ther. 1999;6(4):482-97. 

      (14) Gallego-Paez LM, Tanaka H, Bando M, Takahashi M, Nozaki N, Nakato R, Shirahige K, Hirota T. Smc5/6-mediated regulation of replication progression contributes to chromosome assembly during mitosis in human cells. Mol Biol Cell. 2014;25(2):302-17. PMCID: PMC3890350

    1. eLife Assessment

      This paper provides potentially valuable insight into why memory consolidation may differ between children (5-7 years of age) and adults. The work hints at developmental differences in neural engagement during the retrieval of recent and remote memories. However, there are several major concerns with the analyses not alleviated by included controls, and as such the evidence supporting the authors' main claims remains incomplete.

    2. Reviewer #2 (Public review):

      Summary:

      Schommartz et al. present a manuscript characterizing neural signatures of reinstatement during cued retrieval of middle-aged children compared to adults. The authors utilize a paradigm where participants learn the spatial location of semantically related item-scene memoranda which they retrieve after short or long delays. The paradigm is especially strong as the authors include novel memoranda at each delayed time point to make comparisons across new and old learning. In brief, the authors find that children show more forgetting than adults, and adults show greater engagement of cortical networks after longer delays as well as stronger item-specific reinstatement. Interestingly, children show more category-based reinstatement, however, evidence supports that this marker may be maladaptive for retrieving episodic details. The question is extremely timely both given the boom in neurocognitive research on the neural development of memory, and the dearth of research on consolidation in this age group. Also, the results provide novel insights into why consolidation processes may be disrupted in children.

      Comments on latest version:

      I carefully reviewed not only the responses to my own reviews as well as those raised by the other reviewers. While they addressed some of the concerns raised in the process, I think many substantive concerns remain.

      While I appreciate the authors sub-sample analysis to control for re-exposure to stimuli in children versus adults, the authors only performed this analysis on memory performance and univariate activation, but they did not run this on the main focus of interest which was the pattern analysis. I think this is critical to run as these measures would be the ones most sensitive to repetition and are the foundation for the major claims of the manuscript.

      Also, I still agree that the authors should do an analysis the subsets the number of trials. While they highlight problems with the loss of statistical power and introduced variability, it is these two very same factors that could be potentially driving these differences.

      As part of their efforts to resolve some concerns about their analysis pipeline, the authors show that similar effects do not emerge for incorrectly remembered items. While this is helpful, it would be important to do direct comparisons of subsequently remembered and forgotten items.

      There is a major concern that the white matter control ROIs are showing session effects, and even the ones that are for the contrasts of interest are marginally significant (p=0.08). This raises significant concerns about the ability to interpret the authors' main signal of interest. While I appreciate many of the other control analyses, this one analysis is quite worrisome.

      Similarly, for the item related analysis, the results should look absolutely different, but the authors are showing effects of p-values that are hovering around significance. Indeed, for these analyses to be true controls, perhaps they should directly control across conditions (i.e., use the item reinstatement as a confound control statistically).

      The across run comparisons are a nice addition to the revision, and although they are similar to within conditions, I would recommend when combining these signals there is a factor included for within versus across run comparisons, and the authors show that there are no interactions with this feature.

    3. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #1 (Public Review): 

      Summary: 

      This paper by Schommartz and colleagues investigates the neural basis of memory reinstatement as a function of both how recently the memory was formed (recent, remote) and its development (children, young adults). The core question is whether memory consolidation processes as well as the specificity of memory reinstatement differ with development. A number of brain regions showed a greater activation difference for recent vs. remote memories at the long versus shorter delay specifically in adults (cerebellum, PHG, LOC). A different set showed decreases in the same comparison, but only in children (precuneus, RSC). The authors also used neural pattern similarity analysis to characterize reinstatement, though still in this revised paper I have substantive concerns about how the analyses were performed. While scene-specific reinstatement decreased for remote memories in both children and adults, claims about its presence cannot be made given the analyses. Gist-level reinstatement was observed in children but not adults, but I also have concerns about this analysis. Broadly, the behavioral and univariate findings are consistent with the idea memory consolidation differs between children and adults in important ways, and takes a step towards characterizing how.

      Strengths: 

      The topic and goals of this paper are very interesting. As the authors note, there is little work on memory consolidation over development, and as such this will be an important data point in helping us begin to understand these important differences. The sample size is great, particularly given this is an onerous, multi-day experiment; the authors are to be commended for that. The task design is also generally well controlled, for example as the authors include new recently learned pairs during each session.  

      Weaknesses: 

      As noted above and in my review of the original submission, the pattern similarity analysis for both item and category-level reinstatement were performed in a way that is not interpretable given concerns about temporal autocorrelation within scanning run.Unfortunately these issues remain of concern in this revision because they were not rectified. Most of my review focuses on this analytic issue, though I also outline additional concerns. 

      (1) The pattern similarity analyses are largely uninterpretable due to how they were performed. 

      (a) First, the scene-specific reinstatement index: The authors have correlated a neural pattern during a fixation cross (delay period) with a neural pattern associated with viewing a scene as their measure of reinstatement. The main issue with this is that these events always occurred back-to-back in time. As such, the two patterns will be similar due simply to the temporal autocorrelation in the BOLD signal. Because of the issues with temporal autocorrelation within scanning run, it is always recommended to perform such correlations only across different runs. In this case, the authors always correlated patterns extracted from the same run, and which moreover have temporal lags that are perfectly confounded with their comparison of interest (i.e., from Fig 4A, the "scene-specific" comparisons will always be back-to-back, having a very short temporal lag; "set-based" comparisons will be dispersed across the run, and therefore have a much higher lag). The authors' within-run correlation approach also yields correlation values that are extremely high - much higher than would be expected if this analysis was done appropriately. The way to fix this would be to restrict the analysis to only cross-run comparisons, which is not possible given the design. 

      To remedy this, in the revision the authors have said they will refrain from making conclusions about the presence of scene-specific reinstatement (i.e., reinstatement above baseline). While this itself is an improvement from the original manuscript, I still have several concerns. First, this was not done thoroughly and at times conclusions/interpretations still seem to imply or assume the presence of scene reinstatement (e.g., line 979-985, "our research supports the presence of scene-specific reinstatement in 5-to-7-year-old children"; line 1138). 

      We thank the reviewers for pointing out that there are inconsistencies in our writing. We agree that we cannot make any claims about the baseline level of scene-specific reinstatement. To reiterate, our focus is on the changes in reinstatement over time (30 minutes, 24 hours, and two weeks after learning), which showed a robust decrease. Importantly, scenespecific reinstatement indices for recent items — tested on different days — did not significantly differ, as indicated by non-significant main effects of Session (all p > .323) and Session x ROI interactions (all p > .817) in either age group. This supports our claim that temporal autocorrelation is stable and consistent across conditions and that the observed decline in scene-specific reinstatement reflects a time-dependent change in remote retrieval. We have revised the highlighted passages, accordingly, emphasizing the delay-related decrease in scene-specific reinstatement rather than its absolute magnitude. 

      Second, the authors' logic for the neural-behavioural correlations in the PLSC analysis involved restricting to regions that showed significant reinstatement for the gist analysis, which cannot be done for the analogous scene-specific reinstatement analysis. This makes it challenging to directly compare these two analyses since one was restricted to a small subset of regions and only children (gist), while scene reinstatement included both groups and all ROIs. 

      We thank the reviewer for pointing this out and want to clarify that it was not our intention to directly compare these analyses. For the neural-behavioral correlations, we included only those regions identified based on gist-like representations baseline, whereas for scene-specific reinstatement, we included all regions due to the absence of such a baseline. The primary aim of the PLSC analysis was to identify a set of regions that, after a stringent permutation and bootstrapping procedure, form a latent variable that explains a significant proportion of variance in behavioral performance across all participants. 

      Third, it is also unclear whether children and adults' values should be directly comparable given pattern similarity can be influenced by many factors like motion, among other things. 

      We thank the reviewer for raising this important point. In our multivariate analysis, we included confounding regressors specifically addressing motion-related artefacts. Following recent best practices for mitigating motion-related confounding factors in both adult and pediatric fMRI data (Ciric et al., 2017; Esteban et al., 2020; Jones et al., 2021; Satterthwaite et al., 2013), we implemented the most effective motion correction strategies. 

      Importantly, our group × session interaction analysis focuses on relative changes in reinstatement over time rather than comparing absolute levels of pattern similarity between children and adults. This approach controls for potential baseline differences and instead examines whether the magnitude of delay-related changes differs across groups. We believe this warrants the comparison and ensures that our conclusions are not driven by group-level differences in baseline similarity or motion artifacts.

      My fourth concern with this analysis relates to the lack of regional specificity of the effects. All ROIs tested showed a virtually identical pattern: "Scene-specific reinstatement" decreased across delays, and was greater in children than adults. I believe control analyses are needed to ensure artifacts are not driving these effects. This would greatly strengthen the authors' ability to draw conclusions from the "clean" comparison of day 1 vs. day 14. (A) The authors should present results from a control ROI that should absolutely not show memory reinstatement effects (e.g., white matter?). Results from the control ROI should look very different - should not differ between children and adults, and should not show decreases over time. 

      (C) If the same analysis was performed comparing the object cue and immediately following fixation (rather than the fixation and the immediately following scene), the results should look very different. I would argue that this should not be an index of reinstatement at all since it involves something presented visually rather than something reinstated (i.e., the scene picture is not included in this comparison). If this control analysis were to show the same effects as the primary analysis, this would be further evidence that this analysis is uninterpretable and hopelessly confounded. 

      We appreciate the reviewer’s suggestion to strengthen the interpretation of our findings by including appropriate control analyses to rule out non-memory-related artifacts. In response, we conducted several control analyses, detailed below, which collectively support the specificity of the observed reinstatement effects. The report of the results is included in the manuscript (line 593-619).

      We checked that item reinstatement for incorrectly remembered trial did not show any session-related decline for any ROI. This indicates that the reinstatement for correctly remembered items is memory-related (see Fig. S5 for details). 

      We conducted additional analyses on three subregions of the corpus callosum (the body, genu, and splenium). The results of the linear mixed-effects models revealed no significant group effect (all p > .426), indicating no differences between children and adults. In contrast, all three ROIs showed a significant main effect of Session (all p < .001). However, post hoc analyses indicated that this effect was driven by differences between the recent and the Day 14 remote condition. The main contrasts of interest – recent vs. Day 1 remote and Day 1 remote vs. Day 14 remote – were not significant (all p > .080; see Table S10.4), suggesting that, unlike in other ROIs, there was no delay-related decrease in scene-specific reinstatement in these white matter regions.

      Then we repeated our analysis using the same procedure but replaced the “scene” time window with the “object” time window. The rationale for this control is that comparing the object cue to the immediately following fixation period should not reflect scene reinstatement, as the object and the reinstated scene rely on distinct neural representations. Accordingly, we did not expect a delay-related decrease in the reinstatement index. Consistent with this expectation, the analysis using the object – fixation similarity index – though also influenced by temporal autocorrelation – did not reveal any significant effect of session or delay in any ROI (all p > .059; see Table S9, S9.1).

      Together, these control analyses provide converging evidence that our findings are not driven by global or non-specific signal changes. We believe that these control analyses strengthen our interpretation about delay-related decrease in scene-specific reinstatement index. 

      (B) Do the recent items from day 1 vs. day 14 differ? If so, this could suggest something is different about the later scans (and if not, it would be reassuring). 

      The recent items tested on day 1 and day14 do not differ (all p. > .323). This effect remains stable across all ROIs.

      (b) For the category-based neural reinstatement: (1) This suffers from the same issue of correlations being performed within run. Again, to correct this the authors would need to restrict comparisons to only across runs (i.e., patterns from run 1 correlated with patterns for run 2 and so on). The authors in their response letter have indicated that because the patterns being correlated are not derived from events in close temporal proximity, they should not suffer from the issue of temporal autocorrelation. This is simply not true. For example, see the paper by Prince et al. (eLife 2022; on GLMsingle). This is not the main point of Prince et al.'s paper, but it includes a nice figure that shows that, using standard modelling approaches, the correlation between (same-run) patterns can be artificially elevated for lags as long as ~120 seconds (and can even be artificially reduced after that; Figure 5 from that paper) between events. This would affect many of the comparisons in the present paper. The cleanest way to proceed is to simply drop the within-run comparisons, which I believe the authors can do and yet they have not. Relatedly, in the response letter the authors say they are focusing mainly on the change over time for reinstatement at both levels including the gist-type reinstatement; however, this is not how it is discussed in the paper. They in fact are mainly relying on differences from zero, as children show some "above baseline" reinstatement while adults do not, but I believe there were no significant differences over time (i.e., the findings the authors said they would lean on primarily, as they are arguably the most comparable).  

      We thank the reviewer for this important comment regarding the potential inflation of similarity values due to within-run comparisons.

      To address the reviewer’s concern, we conducted an additional cross-run analysis for all correctly retrieved trials. The approach restricted comparisons to non-overlapping runs (run1run2, run2-run3, run1-run3). This analysis revealed robust gist-like reinstatement in children for remote Day 14 memories in the mPFC (p = .035) and vlPFC (p = .0007), in adults’ vlPFC remote Day 1 memories (p = .029), as well as in children and adults remote Day 1 memories in LOC (p < .02). A significant Session effect in both regions (mPFC: p = .026; vlPFC: p = .002) indicated increased reinstatement for long delay (Day 14) compared to short-delay and recent session (all p < .05). Given that the cross-run results largely replicate and reinforce the effects found previously with within-run, we believe that combining both sources of information is methodologically justified and statistically beneficial. Specifically, both approaches independently identified significant gist-like reinstatement in children’s mPFC and vlPFC (although within-run vlPFC effect (short delay: p = .038; long delay p = .047) did not survive multiple comparisons), particularly for remote memories. Including both withinrun and between-run comparisons increases the number of unique, non-repeated trial pairs, improving statistical power without introducing redundancy. While we acknowledge that same-run comparisons may be influenced by residual autocorrelation (as shown by Prince et al. 2022, eLife), we believe that our design mitigates this risk through consistency between within-run and cross-run results, long inter-trial intervals, and trial-wise estimation of activation. We have adjusted the manuscript, accordingly, reporting the combined analysis. We also report cross-run and within-run analysis separately in supplementary materials (Tables S12.1, S12.2, showing that they converge with the cross-run results and thus strengthen rather than dilute the findings. 

      As suggested, we now explicitly highlight the change over time as the central finding. We observe a clear increase in gist-like reinstatement from recent to remote memories in children, particularly in mPFC and vlPFC. These effects based on combined within- and cross-run comparisons, are now clearly stated in the main results and interpreted in the discussion accordingly. 

      (2) This analysis uses a different approach of comparing fixations to one another, rather than fixations to scenes. In their response letter and the revised paper, the authors do provide a bit of reasoning as to why this is the most sensible. However, it is still not clear to me whether this is really "reinstatement" which (in my mind) entails the re-evoking of a neural pattern initially engaged during perception. Rather, could this be a shared neural state that is category specific? 

      We thank the reviewer for raising this important conceptual point about whether our findings reflect reinstatement in the classical sense — namely, the reactivation of perceptual neural patterns — or a shared, category-specific state.

      While traditional definitions of reinstatement emphasize item-specific reactivation (e.g., Ritchey et al., 2013; Xiao et al., 2017) it is increasingly recognized that memory retrieval can also involve the reactivation of abstracted, generalized, or gist-like representations, especially as memories consolidate. Our analysis follows this view, aimed to capture how memory representations evolve over time, particularly in development.

      Several studies support this broader notion of gist-like reinstatement. For instance, Chen et al. (2017) showed that while event-specific patterns were reinstated across the default mode network and medial temporal lobe, inter-subject recall similarity exceeded encodingretrieval similarity, suggesting transformation and abstraction beyond perceptual reinstatement. Zhuang et al. (2021) further showed that loss of neural distinctiveness in the

      MTL over time predicted false memories, linking neural similarity to representational instability. This aligns with our finding that greater gist-like reinstatement is associated with lower memory accuracy.

      Ye et al. (2020) discuss how memory representations are reshaped post-encoding — becoming more differentiated, integrated, or weakened depending on task goals and neural resources. While their work focuses on adults, our previous findings (Schommartz et al., 2023) suggest that children’s neural systems (the same sample) are structurally immature, making them more likely to rely on gist-based consolidation (see Fandakova et al., 2019). Adults, by contrast, may retain more item-specific traces.

      Relatedly, St-Laurent & Buchsbaum (2019) show that with repeated encoding, neural memory representations become increasingly distinct from perception, suggesting that reinstatement need not mimic perception. We agree that reinstatement does not always reflect reactivation of low-level sensory patterns, particularly over long delays or in developing brains.

      Finally, while we did not correlate retrieval patterns directly with perceptual encoding patterns, we assessed neural similarity among retrieved items within vs. between categories, based on non-repeated, independently sampled trials. This approach is intended to capture the structure and delay-related transformation of mnemonic representations, especially in terms of how they become more schematic or gist-like over time. Our findings align conceptually with the results of Kuhl et al. (2012), who used MVPA to show that older and newer visual memories can be simultaneously reactivated during retrieval, with greater reactivation of older memories interfering with retrieval accuracy for newer memories. Their work highlights how overlapping category-level representations in ventral temporal cortex can reflect competition among similar memories, even in the absence of item-specific cues. In our developmental context, we interpret the increased neural similarity among category members in children as possibly reflecting such representational overlap or competition, where generalized traces dominate over item-specific ones. This pattern may reflect a shift toward efficient but less precise retrieval, consistent with developmental constraints on memory specificity and consolidation.

      In this context, we view our findings as evidence of memory trace reorganization — from differentiated, item-level representations toward more schematic, gist-like neural patterns (Sekeres et al., 2018), particularly in children. Our cross-run analyses further confirm that this is not an artifact of same-run correlations or low-level confounds. We have clarified this distinction and interpretation throughout the revised manuscript (see lines 144-158; 1163-1170).

      In any case, I think additional information should be added to the text to clarify that this definition differs from others in the literature. The authors might also consider using some term other than reinstatement. Again (as I noted in my prior review), the finding of no category-level reinstatement in adults is surprising and confusing given prior work and likely has to do with the operationalization of "reinstatement" here. I was not quite sure about the explanation provided in the response letter, as category-level reinstatement is quite widespread in the brain for adults and is robust to differences in analytic procedures etc. 

      We agree that our operationalization of "reinstatement" differs from more conventional uses of the term, which typically involve direct comparisons between encoding and retrieval phases, often with item-level specificity. As our analysis is based on similarity among retrieval-phase trials (fixation-based activation patterns) and focuses on within- versus between-category neural similarity, we agree that the term reinstatement may suggest a stronger encoding–retrieval mapping than we are claiming.

      To avoid confusion and overstatement, we have revised the terminology throughout the manuscript: we now refer to our measure as “gist-like representations” rather than “gist-like reinstatement.” This change better reflects the nature of our analysis — namely, that we are capturing shared neural patterns among category-consistent memories that may reflect reorganized or abstracted traces, especially after delay and in development.

      As the reviewer rightly points out, category-level reinstatement is well documented in adults (e.g., Kuhl & Chun, 2014; Tompary et al., 2020; Tompary & Davachi, 2017). The absence of such effects in our adult group may indeed reflect differences in study design, particularly our use of non-repeated, cross-trial comparisons based on fixation events. It may also reflect different consolidation strategies, with adults preserving more differentiated or item-specific representations, while children form more schematic or generalizable representations — a pattern consistent with our interpretation and supported by prior work (Fandakova et al., 2019; Sekeres et al., 2018) 

      We have updated the relevant sections of the manuscript (Results, Discussion (particularly lines 1163- 1184), and Figure captions) to clarify this terminology shift and explicitly contrast our approach with more standard definitions of reinstatement. We hope this revision provides the needed conceptual clarity while preserving the integrity of our developmental findings.

      (3) Also from a theoretical standpoint-I'm still a bit confused as to why gist-based reinstatement would involve reinstatement of the scene gist, rather than the object's location (on the screen) gist. Were the locations on the screen similar across scene backgrounds from the same category? It seems like a different way to define memory retrieval here would be to compare the neural patterns when cued to retrieve the same vs. similar (at the "gist" level) vs. different locations across object-scene pairs. This is somewhat related to a point from my review of the initial version of this manuscript, about how scene reinstatement is not necessary. The authors state that participants were instructed to reinstate the scene, but that does not mean they were actually doing it. The point that what is being measured via the reinstatement analyses is actually not necessary to perform the task should be discussed in more detail in the paper. 

      We appreciate the reviewer’s thoughtful theoretical question regarding whether our measure of “gist-like representations” might reflect reinstatement of spatial (object-location) gist, rather than scene-level gist. We would like to clarify several key points about our task design and interpretation:

      (1) Object locations were deliberately varied and context dependent.

      In our stimulus set, each object was embedded in a rich scene context, and the locations were distributed across six distinct possible areas within each scene, with three possible object placements per location. These placements were manually selected to ensure realistic and context-sensitive positioning of objects within the scenes. Importantly, locations were not fixed across scenes within a given category. For example, objects placed in “forest” scenes could appear in different screen locations across different scene exemplars (e.g., one in the bottom-left side, another floating above). Therefore, the task did not introduce a consistent spatial schema across exemplars from the same scene category that could give rise to a “location gist.”

      (2) Scene categories provided consistent high-level contextual information.

      By contrast, the scene categories (e.g., farming, forest, indoor, etc.) provided semantically coherent and visually rich contextual backgrounds that participants could draw upon during retrieval. This was emphasized in the instruction phase, where participants were explicitly encouraged to recall the whole scene based on the stories they created during learning (not just the object or its position). While we acknowledge that we cannot directly verify the reinstated content, this instruction aligns with prior studies showing that scene and context reinstatement can occur even without direct task relevance (e.g., Kuhl & Chun, 2014; Ritchey et al., 2013).

      (3) Our results are unlikely to reflect location-based reinstatement.

      If participants had relied on a “location gist” strategy, we would have expected greater neural similarity across scenes with similar spatial layouts, regardless of category. However, our design avoids this confound by deliberately varying locations across exemplars within categories. Additionally, our categorical neural similarity measure contrasted within-category vs. between-category comparisons — making it sensitive to shared contextual or semantic structure, not simply shared screen positions.

      Considering this, we believe that the neural similarity observed in the mPFC and vlPFC in children at long delay reflects the emergence of scene-level, gist-like representations, rather than low-level spatial regularities. Nevertheless, we now clarify this point in the manuscript and explicitly discuss the limitation that reinstatement of scene context was encouraged but not required for successful task performance.

      Future studies could dissociate spatial and contextual components of reinstatement more directly by using controlled spatial overlap or explicit location recall conditions. However, given the current task structure, location-based generalization is unlikely to account for the category-level similarity patterns we observe.

      (2) Inspired by another reviewer's comment, it is unclear to me the extent to which age group differences can be attributed to differences in age/development versus memory strength. I liked the other reviewer's suggestions about how to identify and control for differences in memory strength, which I don't think the authors actually did in the revision. They instead showed evidence that memory strength does seem to be lower in children, which indicates this is an interpretive confound. For example, I liked the reviewer's suggestion of performing analyses on subsets of participants who were actually matched in initial learning/memory performance would have been very informative. As it is, the authors didn't really control for memory strength adequately in my opinion, and as such their conclusions about children vs. adults could have been reframed as people with weak vs. strong memories. This is obviously a big drawback given what the authors want to conclude. Relatedly, I'm not sure the DDM was incorporated as the reviewer was suggesting; at minimum I think the authors need to do more work in the paper to explain what this means and why it is relevant. (I understand putting it in the supplement rather

      than the main paper, but I still wanted to know more about what it added from an interpretive perspective.) 

      We appreciate the reviewer’s thoughtful concerns regarding potential confounding effects of memory strength on the observed age group differences. This is indeed a critical issue when interpreting developmental findings.

      While we agree that memory strength differs between children and adults — and our own DDM-based analysis confirms this, mirroring differences observed in accuracy — we would like to emphasize that these differences are not incidental but rather reflect developmental changes in the underlying memory system. Given the known maturation of both structural and functional memory-related brain regions, particularly the hippocampus and prefrontal cortex, we believe it would be theoretically inappropriate to control for memory strength entirely, as doing so would remove variance that is central to the age-related neural effects we aim to understand.

      To address the reviewer's concern empirically, we conducted an additional control analysis in which we subsampled children to include only those who reached learning criterion after two cycles (N = 28 out of 49 children, see Table S1.1, S1.2, Figure S1, Table S9.1), thereby selecting a high-performing subgroup. Importantly, this subsample replicated behavioral and neural results to the full group. This further suggests that the observed age group differences are not merely driven by differences in memory strength.

      As abovementioned, the results of the DDM support our behavioral findings, showing that children have lower drift rates for evidence accumulation, consistent with weaker or less accessible memory representations. While these results are reported in the Supplementary Materials (section S2.1, Figure S2, Table S2), we agree that their interpretive relevance should be more clearly explained in the main text. We have therefore updated the Discussion section to explicitly state how the DDM results provide converging evidence for our interpretation that developmental differences in memory quality — not merely strategy or task performance — underlie the observed neural differences (see lines 904-926).

      In sum, we view memory strength not as a confound to be removed, but as a meaningful and theoretically relevant factor in understanding the emergence of gist-like representations in children. We have clarified this interpretive stance in the revised manuscript and now discuss the role of memory strength more explicitly in the Discussion.

      (3) Some of the univariate results reporting is a bit strange, as they are relying upon differences between retrieval of 1- vs. 14-day memories in terms of the recent vs. remote difference, and yet don't report whether the regions are differently active for recent and remote retrieval. For example in Figure 3A, neither anterior nor posterior hippocampus seem to be differentially active for recent vs. remote memories for either age group (i.e., all data is around 0). Precuneus also interestingly seems to show numerically recent>remote (values mostly negative), whereas most other regions show the opposite. This difference from zero (in either direction) or lack thereof seems important to the message. In response to this comment on the original manuscript, the authors seem to have confirmed that hippocampal activity was greater during retrieval than implicit baseline. But this was not really my question - I was asking whether hippocampus is (and other ROIs in this same figure are) differently engaged for recent vs. remote memories.

      We thank the reviewer for bringing up this important point. Our previous analysis showed that both anterior and posterior regions of the hippocampus, anterior parahippocampal gyrus and precuneus exhibited significant activation from zero in children and adults for correctly remembered items (see Fig. S2, Table S7 in Supplementary Materials). Based on your suggestion, our additional analysis showed: 

      (i) The linear mixed-effects model for correctly remembered items showed no significant interaction effects (group x session x memory age (recent, remote)) for the anterior hippocampus (all p > .146; see Table S7.1).

      (ii) For the posterior hippocampus, we observed a significant main effect of group (F(1,85),   = 5.62, p = .038), showing significantly lower activation in children compared to adults (b = .03, t = -2.34, p = .021). No other main or interaction effects were significant (all p > .08; see Table S7.1).

      (iii) For the anterior PHG, that also showed no significant remote > recent difference, the model showed that there was indeed no difference between remote and recent items across age groups and delays (all p > .194; Table S7.1). 

      Moreover, when comparing recent and remote hippocampal activation directly, there were no significant differences in either group (all FDR-adjusted p > .116; Table S7.2), supporting the conclusion that hippocampal involvement was stable across delays for successfully retrieved items. 

      In contrast, analysis of unsuccessfully remembered items showed that hippocampal activation was not significantly different from zero in either group (all FDR-adjusted p > .052; Fig. S2.1, Table S7.1), indicating that hippocampal engagement was specific to successful memory retrieval.

      To formally test whether hippocampal activation differs between remembered and forgotten items, we ran a linear mixed-effects model with Group, Memory Success (remembered vs. forgotten), and ROI (anterior vs. posterior hippocampus) as fixed effects. This model revealed a robust main effect of memory success (F(1,1198) = 128.27, p < .001), showing that hippocampal activity was significantly higher for remembered compared to forgotten items (b = .06, t(1207) = 11.29, p < .001; Table S7.3). 

      As the reviewer noted, precuneus activation was numerically higher for recent vs. remote items, and this was confirmed in our analysis. While both recent and remote retrieval elicited significantly above-zero activation in the precuneus (Table S7.2), activation for recent items was significantly higher than for remote items, consistent across both age groups.

      Taken together, these analyses support the conclusion that hippocampal involvement in successful retrieval is sustained across delays, while other ROIs such as the precuneus may show greater engagement for more recent memories. We have now updated the manuscript text ( lines 370-390) and supplementary materials to reflect these findings more clearly, as well as to clarify the distinction between activation relative to baseline and memory-agerelated modulation.

      (4) Related to point 3, the claims about hippocampus with respect to multiple trace theory feel very unsupported by the data. I believe the authors want to conclude that children's memory retrieval shows reliance on hippocampus irrespective of delay, presumably because this is a detailed memory task. However the authors have not really shown this; all they have shown is that hippocampal involvement (whatever it is) does not vary by delay. But we do not have compelling evidence that the hippocampus is involved in this task at all. That hippocampus is more active during retrieval than implicit baseline is a very low bar and does not necessarily indicate a role in memory retrieval. If the authors want to make this claim, more data are needed (e.g., showing that hippocampal activity during retrieval is higher when the upcoming memory retrieval is successful vs. unsuccessful). In the absence of this, I think all the claims about multiple trace theory supporting retrieval similarly across delays and that this is operational in children are inappropriate and should be removed. 

      We thank the reviewer for pointing this out. We agree that additional analysis of hippocampal activity during successful and unsuccessful memory retrieval is warranted. This will provide stronger support for our claim that strong, detailed memories during retrieval rely on the hippocampus in both children and adults. Our previously presented results on the remote > recent univariate signal difference in the hippocampus (p. 14-18; lines 433-376, Fig. 3A) show that this difference does not vary between children and adults, or between Day 1 and Day 14. Our further analysis showed that both anterior and posterior regions of the hippocampus exhibited significant activation from zero in children and adults for correctly remembered items (see Fig. S2, Table S7 in Supplementary Materials). Based on your suggestion, our recent additional analysis showed:

      (i) For forgotten items, we did not observe any activation significantly higher than zero in either the anterior or posterior hippocampus for recent and remote memory on Day 1 and Day 14 in either age group (all p > .052 FDR corrected; see Table S7.1, Fig. S2.1).

      (ii) After establishing no difference between recent and remote activation across and between sessions (Day 1, Day 14), we conducted another linear mixed-effects model with group x memory success (remembered, forgotten) x region (anterior hippocampus, posterior hippocampus), with subject as a random effect. The model showed no significant effects for the memory success x region interaction (F = 1.12(1,1198), p = .289) and no significant group x memory success x region interaction (F = .017(1,1198), p = .895). However, we observed a significant main effect of memory success (F = 128.27(1,1198), p < .001), indicating significantly higher hippocampal activation for remembered compared to forgotten items (b = .06, t = 11.29, p <.001; see Table S7.3).

      (iii) Considering the comparatively low number of incorrect trials for recent items in the adult group, we reran this analysis only for remote items. Similarly, the model showed no significant effects for the memory success x region interaction (F = .72(1,555), p = .398) and no significant group x memory success x region interaction (F = .14(1,555), p = .705). However, we observed a significant main effect of memory success (F = 68.03(1,555), p < .001), indicating significantly higher hippocampal activation for remote remembered compared to forgotten items (b = .07, t = 8.20, p <.001; see Table S7.3).

      Taken together, our results indicate that significant hippocampal activation was observed only for correctly remembered items in both children and adults, regardless of memory age and session. For forgotten items, we did not observe any significant hippocampal activation in either group or delay. Moreover, hippocampal activation was significantly higher for remembered compared to forgotten memories. This evidence supports our conclusions regarding the Multiple Trace and Trace Transformation Theories, suggesting that the hippocampus supports retrieval similarly across delays, and provides novel evidence that this process is operational in both children and adults. This aligns also with Contextual Bindings Theory, as well as empirical evidence by Sekeres, Winokur, & Moscovitch (2018), among others. We have added this information to the manuscript.

      (5) There are still not enough methodological details in the main paper to make sense of the results. Some of these problems were addressed in the revision but others remain. For example, a couple of things that were unclear: that initially learned locations were split, where half were tested again at day 1 and the other half at day 14; what specific criterion was used to determine to pick the 'well-learned' associations that were used for comparisons at different delay periods (object-scene pairs that participants remembered accurately in the last repetition of learning? Or across all of learning?). 

      We thank the reviewer for pointing this out. The initially learned object-scene associations on Day 0 were split in two halves based on  their categories before the testing. Specifically, half of the pairs from the first set and half of the pairs from the second set of 30 object-scene associations were used to create the set 30 remote pair for Day 1 testing. A similar procedure was repeated for the remaining pairs to create a set of remote object-scene associations for Day 14 retrieval. We tried to equally distribute the categories of pairs between the testing sets. We added this information to the methods section of the manuscript (see p. 47, lines 12371243). In addition, the sets of association for delay test on Day 1 and Day 14 were not based on their learning accuracy. Of note, the analysis of variance revealed that there was no difference in learning accuracy between the two sets created for delay tests in either age group (children: p = .23; adults  p = .06). These results indicate that the sets were comprised of items learned with comparable accuracy in both age groups. 

      (6) In still find the revised Introduction a bit unclear. I appreciated the added descriptions of different theories of consolidation, though the order of presented points is still a bit hard to follow. Some of the predictions I also find a bit confusing as laid out in the introduction. (1) As noted in the paper multiple trace theory predicts that hippocampal involvement will remain high provided memories retained are sufficiently high detail. The authors however also predict that children will rely more on gist (than detailed) memories than adults, which would seem to imply (combined with the MTT idea) that they should show reduced hippocampal involvement over time (while in adults, it should remain high). However, the authors' actual prediction is that hippocampus will show stable involvement over time in both kids and adults. I'm having a hard time reconciling these points. (2) With respect to the extraction of gist in children, I was confused by the link to Fuzzy Trace Theory given the children in the present study are a bit young to be showing the kind of gist extraction shown in the Brainerd & Reyna data. Would 5-7 year olds not be more likely to show reliance on verbatim traces under that framework? Also from a phrasing perspective, I was confused about whether gist-like information was something different from just gist in this sentence: "children may be more inclined to extract gist information at the expense of detailed or gist-like information." (p. 8) - is this a typo? 

      We thank the reviewer for this thoughtful observation. 

      Our hypothesis of stable hippocampal engagement over time was primarily based on Contextual Binding Theory (Yonelinas et al., 2019), and the MTT, supported by the evidence provided by Sekeres et al., 2018, which posits that the hippocampus continues to support retrieval when contextual information is preserved, even for older, consolidated memories. Given that our object-location associations were repeatedly encoded and tied to specific scene contexts, we believe that retrieval success for both recent and remote memories likely involved contextual reinstatement, leading to sustained hippocampal activity. Also in accordance with the MTT and related TTT, different memory representations may coexist, including detailed and gist-like memories. Therefore, we suggest that children may not rely on highly detailed item-specific memory, but rather on sufficiently contextualized schematic traces, which still engage the hippocampus. This distinction is now made clearer in the Introduction (see lines 223-236).

      We appreciate the reviewer’s point regarding Fuzzy Trace Theory (Brainerd & Reyna, 2002). Indeed, in classic FTT, young children are thought to rely more on verbatim traces due to immature gist extraction mechanisms (primarily from verbal material). However, we use the term “gist-like representations” to refer to schematic or category-level retrieval that emerges through structured, repeated learning (as in our task). This form of abstraction may not require full semantic gist extraction in the FTT sense but may instead reflect consolidation-driven convergence onto shared category-level representations — especially when strategic resources are limited. We now clarify this distinction and revise the ambiguous sentence with typo (“at the expense of detailed or gist-like information”) to better reflect our intended meaning (see p.8).

      (7) For the PLSC, if I understand this correctly, the profiles were defined for showing associations with behaviour across age groups. (1) As such, is it not "double dipping" to then show that there is an association between brain profile and behaviour-must this not be true by definition? If I am mistaken, it might be helpful to clarify this in the paper. (2) In addition, I believe for the univariate and scene-specific reinstatement analyses these profiles were defined across both age groups. I assume this doesn't allow for separate definition of profiles across the two group (i.e., a kind of "interaction"). If this is the case, it makes sense that there would not be big age differences... the profiles were defined for showing an association across all subjects. If the authors wanted to identify distinct profiles in children and adults they may need to run another analysis. 

      We thank the reviewer for this thoughtful comment. 

      (1) We agree that showing the correlation between the latent variable and behavior may be redundant, as the relationship is already embedded in the PLSC solution and quantified by the explained variance. Our intention was merely to visualize the strength of this relationship. In hindsight, we agree that this could be misinterpreted, and we have removed the additional correlation figure from the manuscript.

      We also see the reviewer’s point that, given the shared latent profile across groups, it is expected that the strength of the brain-behavior relationship does not differ between age groups. Instead, to investigate group differences more appropriately, we examined whether children and adults differed in their expression of the shared latent variable (i.e., brain scores). This analysis revealed that children showed significantly lower brain scores than adults both in short delay, t(83) = -4.227, p = .0001, and long delay, t(74) = -5.653, p < .001, suggesting that while the brain-behavior profile is shared, its expression varies by group. We have added this clarification to the Results section (p. 19-20) of the revised manuscript. 

      (2) Regarding the second point, we agree with the reviewer that defining the PLS profiles across both age groups inherently limits the ability to detect group-specific association, as the resulting latent variables represent shared pattern across the full sample. To address this, we conducted additional PLS analyses separately within each age group to examine whether distinct neural upregulation profiles (remote > recent) emerge for short and long delay conditions.

      These within-group analyses, however, were based on smaller subsamples, which reduced statistical power, especially when using bootstrapping to assess the stability of the profiles. For the short delay, although some regions reached significance, the overall latent variables did not reach conventional thresholds for stability (all p > .069), indicating that the profiles were not robust. This suggests that within-group PLS analyses may be underpowered to detect subtle effects, particularly when modelling neural upregulation (remote > recent), which may be inherently small.

      Nonetheless, when we exploratively applied PLSC separately within each group using recent and remote activity levels against the implicit baseline (rather than the contrast remote > recent) and its relation to memory performance, we observed significant and stable latent variables in both children and adults. This implies that such contrasts (vs. baseline) may be more sensitive and better suited to detect meaningful brain–behavior relationships within age groups. We have added this clarification to the Results sections of the manuscript to highlight the limitations of within-group contrasts for neural upregulation. 

      Author response image 1.

      (3) Also, as for differences between short delay brain profile and long delay brain profile for the scene-specific reinstatement - there are 2 regions that become significant at long delay that were not significant at a short delay (PC, and CE). However, given there are ceiling effects in behaviour at the short but not long delay, it's unclear if this is a meaningful difference or just a difference in sensitivity. Is there a way to test whether the profiles are statistically different from one another?

      We thank the reviewer for this comment. To better illustrate differential profiles also for high memory accuracy after immediate delay (30 minutes delay), we added the immediate (30 minutes delay) condition as a third reference point, given the availability of scene-specific reinstatement data at this time point. Interestingly, the immediate reinstatement profile revealed a different set of significant regions, with distinct expression patterns compared to both the short and long delay conditions. This supports the view that scene-specific reinstatement is not static but dynamically reorganized over time.

      Regarding the ceiling effect at short delay, we acknowledge this as a potential limitation. However, we note that our primary analyses were conducted across both age groups combined, and not solely within high-performing individuals. As such, the grouping may mitigate concerns that ceiling-level performance in a subset of participants unduly influenced the overall reinstatement profile. Moreover, we observed variation in neural reinstatement despite ceiling-level behavior, suggesting that the neural signal retains sensitivity to consolidation-related processes even when behavioral accuracy is near-perfect.

      While we agree that formal statistical comparisons of reinstatement profiles across delays (e.g., using representational profile similarity or interaction tests) could be an informative direction, we feel that this goes beyond the scope of the current manuscript. 

      (4) As I mentioned above, it also was not ideal in my opinion that all regions were included for the scene-specific reinstatement due to the authors' inability to have an appropriate baseline and therefore define above-chance reinstatement. It makes these findings really challenging to compare with the gist reinstatement ones. 

      We appreciate the reviewer’s comment and agree that the lack of a clearly defined baseline for scene-specific reinstatement limits our ability to determine whether these values reflect above-chance reinstatement. However, we would like to clarify that we do not directly compare the magnitude of scene-specific reinstatement to that of gist-like reinstatement in our analyses or interpretations. These two analyses serve complementary purposes: the scenespecific analysis captures trial-unique similarity (within-item reinstatement), while the gistlike analysis captures category-level representational structure (across items). Because they differ not only in baseline assumptions but also in analytical scope and theoretical interpretation, our goal was not to compare them directly, but rather to explore distinct but co-existing representational formats that may evolve differently across development and delay.

      (8) I would encourage the authors to be specific about whether they are measuring/talking about memory representations versus reinstatement, unless they think these are the same thing (in which case some explanation as to why would be helpful). For example, especially under the Fuzzy Trace framework, couldn't someone maintain both verbatim and gist traces of a memory yet rely more on one when making a memory decision? 

      We thank the reviewer for pointing out the importance of conceptual clarity when referring to memory representations versus reinstatement. We agree that these are distinct but related concepts: in our framework, memory representations refer to the neural content stored as a result of encoding and consolidation, whereas reinstatement refers to the reactivation of those representations during retrieval. Thus, reinstatement serves as a proxy for the underlying memory representation — it is how we measure or infer the nature (e.g., specificity, abstraction) of the stored content.

      Under Fuzzy Trace Theory, it is indeed possible for both verbatim and gist representations to coexist. Our interpretation is not that children lack verbatim traces, but rather that they are more likely to rely on schematic or gist-like representations during retrieval, especially after a delay. Our use of neural pattern similarity (reinstatement) reflects which type of representation is being accessed, not necessarily which traces exist in parallel.

      To avoid ambiguity, we have revised the manuscript to more explicitly distinguish between reinstatement (neural reactivation) and the representational format (verbatim vs. gist-like), especially in the framing of our hypotheses and interpretation of age group differences.

      (9) With respect to the learning criteria - it is misleading to say that "children needed between two to four learning-retrieval cycles to reach the criterion of 83% correct responses" (p. 9). Four was the maximum, and looking at the Figure 1C data it appears as though there were at least a few children who did not meet the 83% minimum. I believe they were included in the analysis anyway? Please clarify. Was there any minimum imposed for inclusion?

      We thank the reviewer for pointing this out. As stated in Methods Section (p. 50, lines 13261338) “These cycles ranged from a minimum of two to a maximum of four.<…> The cycles ended when participants provided correct responses to 83% of the trials or after the fourth cycle was reached.” We have corrected the corresponding wording in the Results section (line 286-289) to reflect this more accurately. Indeed, five children did not reach the 83% criterion but achieved final performance between 70 and 80% after the fourth learning cycle. These participants were included in this analysis for two main reasons:

      (1) The 83% threshold was established during piloting as a guideline for how many learningretrieval cycles to allow, not a strict learning criterion. It served to standardize task continuation, rather than to exclude participants post hoc.

      (2) The performance of these five children was still well above chance level (33%), indicating meaningful learning. Excluding them would have biased the sample toward higherperforming children and reduced the ecological validity of our findings. Including them ensures a more representative view of children’s performance under extended learning conditions.

      (10) For the gist-like reinstatement PLSC analysis, results are really similar a short and long delays and yet some of the text seems to implying specificity to the long delay. One is a trend and one is significant (p. 31), but surely these two associations would not be statistically different from one another?  

      We agree with the reviewer that the associations at short and long delays appeared similar. While a formal comparison (e.g., using a Z-test for dependent correlations) would typically be warranted, in the reanalyzed dataset only the long delay profile remains statistically significant, which limits the interpretability of such a comparison. 

      (11) As a general comment, I had a hard time tying all of the (many) results together. For example adults show more mature neocortical consolidation-related engagement, which the authors say is going to create more durable detailed memories, but under multiple trace theory we would generally think of neocortical representations as providing more schematic information. If the authors could try to make more connections across the different neural analyses, as well as tie the neural findings in more closely with the behaviour & back to the theoretical frameworks, that would be really helpful.  

      We thank the reviewer for this valuable suggestion. We have revised the discussion section to more clearly link the behavioral and neural findings and to interpret them in light of existing consolidation theories for better clarity. 

      Reviewer #2 (Public Review): 

      Schommartz et al. present a manuscript characterizing neural signatures of reinstatement during cued retrieval of middle-aged children compared to adults. The authors utilize a paradigm where participants learn the spatial location of semantically related item-scene memoranda which they retrieve after short or long delays. The paradigm is especially strong as the authors include novel memoranda at each delayed time point to make comparisons across new and old learning. In brief, the authors find that children show more forgetting than adults, and adults show greater engagement of cortical networks after longer delays as well as stronger item-specific reinstatement. Interestingly, children show more category-based reinstatement, however, evidence supports that this marker may be maladaptive for retrieving episodic details. The question is extremely timely both given the boom in neurocognitive research on the neural development of memory, and the dearth of research on consolidation in this age group. Also, the results provide novel insights into why consolidation processes may be disrupted in children. 

      We thank the reviewer for the positive evaluation.

      Comments on the revised version: 

      I carefully reviewed not only the responses to my own reviews as well as those raised by the other reviewers. While they addressed some of the concerns raised in the process, I think many substantive concerns remain. 

      Regarding Reviewer 1: 

      The authors point that the retrieval procedure is the same over time and similarly influenced by temporal autocorrelations, which makes their analysis okay. However, there is a fundamental problem as to whether they are actually measuring reinstatement or they are only measuring differences in temporal autocorrelation (or some non-linear combination of both). The authors further argue that the stimuli are being processed more memory wise rather than perception wise, however, I think there is no evidence for that and that perception-memory processes should be considered on a continuum rather than as discrete processes. Thus, I agree with reviewer 1 that these analyses should be removed. 

      We thank the reviewer for raising this important question. We would like to clarify a few key points regarding temporal autocorrelation and reinstatement.

      During the fixation window, participants were instructed to reinstate the scene and location associated with the cued object from memory. This task was familiar to them, as they had been trained in retrieving locations within scenes. Our analysis aims to compare the neural representations during this retrieval phase with those when participants view the scene, in order to assess how these representations change in similarity over time, as memories become less precise.

      We acknowledge that temporal proximity can lead to temporal autocorrelation. However, evidence suggests that temporal autocorrelation is consistent and stable across conditions (Gautama & Van Hulle, 2004; Woolrich et al., 2004). Shinn & Lagalwar (2021)further demonstrated that temporal autocorrelation is highly reliable at both the subject and regional levels. Given that we analyze regions of interest (ROIs) separately, potential spatial variability in temporal autocorrelation is not a major concern.

      No difference between item-specific reinstatement for recent items on day 1 and day 14 (which were merged) for further delay-related comparison also suggests that the reinstatement measure was stable for recent items even sampled at two different testing days. 

      Importantly, we interpret the relative change in the reinstatement index rather than its absolute value.

      In addition, when we conducted the same analysis for incorrectly retrieved memories, we did not observe any delay-related decline in reinstatement (see p. 25, lines 623-627). This suggests that the delay-related changes in reinstatement are specific to correctly retrieved memories. 

      Finally, our control analysis examining reinstatement between object and fixation time points (as suggested by Reviewer 1) revealed no delay-related effects in any ROI (see p.24, lines 605-612), further highlighting the specificity of the observed delay-related change in item reinstatement.

      We emphasize that temporal autocorrelation should be similar across all retrieval delays due to the identical task design and structure. Therefore, any observed decrease in reinstatement with increasing delay likely reflects a genuine change in the reinstatement index, rather than differences in temporal autocorrelation. Since our analysis includes only correctly retrieved items, and there is no perceptual input during the fixation window, this process is inherently memory-based, relying on mnemonic retrieval rather than sensory processing.

      We respectfully disagree with the reviewer's assertion that retrieval during the fixation period cannot be considered more memory-driven than perception-driven. At this time point, participants had no access to actual images of the scene, making it necessary for them to rely on mnemonic retrieval. The object cue likely triggered pattern completion for the learned object-scene association, forming a unique memory if remembered correctly(Horner & Burgess, 2013). This process is inherently mnemonic, as it is based on reconstructing the original neural representation of the scene (Kuhl et al., 2012; Staresina et al., 2013).

      While perception and memory processes can indeed be viewed as a continuum, some cognitive processes are predominantly memory-based, involving reconstruction rather than reproduction of previous experiences (Bartlett, 1932; Ranganath & Ritchey, 2012). In our task, although the retrieved material is based on previously encoded visual information, the process of recalling this information during the fixation period is fundamentally mnemonic, as it does not involve visual input. Our findings indicate that the similarity between memorybased representations and those observed during actual perception decreases over time, suggesting a relative change in the quality of the representations. However, this does not imply that detailed representations disappear; they may still be robust enough to support correct memory recall. Previous studies examining encoding-retrieval similarity have shown similar findings(Pacheco Estefan et al., 2019; Ritchey et al., 2013).

      We do not claim that perception and memory processes are entirely discrete, nor do we suggest that only perception is involved when participants see the scene. Viewing the scene indeed involves recognition processes, updating retrieved representations from the fixation period, and potentially completing missing or unclear information. This integrative process demonstrates the interrelation of perception and memory, especially in complex tasks like the one we employed.

      In conclusion, our task design and analysis support the interpretation that the fixation period is primarily characterized by mnemonic retrieval, facilitated by cue-triggered pattern completion, rather than perceptual processing. We believe this approach aligns with the current understanding of memory retrieval processes as supported by the existing literature.

      The authors seem to have a design that would allow for across run comparisons, however, they did not include these additional analyses. 

      Thank you for pointing this out. We ran as additional cross-run comparison. This results and further proceeding are reported in the comment for reviewer 1. 

      To address the reviewer’s concern, we conducted an additional cross-run analysis for all correctly retrieved trials. The approach restricted comparisons to non-overlapping runs (run1run2, run2-run3, run1-run3). This analysis revealed robust gist-like reinstatement in children for remote Day 14 memories in the mPFC (p = .035) and vlPFC (p = .0007), in adults’ vlPFC remote Day 1 memories (p = .029), as well as in children and adults remote Day 1 memories in LOC (p < .02). A significant Session effect in both regions (mPFC: p = .026; vlPFC: p = .002) indicated increased reinstatement for long delay (Day 14) compared to short-delay and recent session (all p < .05). Given that the cross-run results largely replicate and reinforce the effects found previously with within-run, we believe that combining both sources of information is methodologically justified and statistically beneficial. Specifically, both approaches independently identified significant gist-like reinstatement in children’s mPFC and vlPFC (although within-run vlPFC effect (short delay: p = .038; long delay p = .047) did not survive multiple comparisons), particularly for remote memories. Including both withinrun and between-run comparisons increases the number of unique, non-repeated trial pairs, improving statistical power without introducing redundancy. While we acknowledge that same-run comparisons may be influenced by residual autocorrelation(Prince et al., 2022), we believe that our design mitigates this risk through consistency between within-run and crossrun results, long inter-trial intervals, and trial-wise estimation of activation. We have adjusted the manuscript, accordingly, reporting the combined analysis. We also report cross-run and within-run analysis separately in supplementary materials (Tables S12.1, S12.2, showing that they converge with the cross-run results and thus strengthen rather than dilute the findings. 

      As suggested, we now explicitly highlight the change over time as the central finding. We observe a clear increase in gist-like reinstatement from recent to remote memories in children, particularly in mPFC and vlPFC. These effects based on combined within- and cross-run comparisons, are now clearly stated in the main results and interpreted in the discussion accordingly. 

      (1) The authors did not satisfy my concerns about different amounts of re-exposures to stimuli as a function of age, which introduces a serious confound in the interpretation of the neural data. 

      (2) Regarding Reviewer 1's point about different number of trials being entered into analysis, I think a more formal test of sub-sampling the adult trials is warranted. 

      (1) We thank the reviewer for pointing this out. Overall, children needed 2 to 4 learning cycles to improve their performance and reach the learning criteria, compared to 2 learning cycles in adults. To address the different amounts of re-exposure to stimuli between the age groups, we subsampled the child group to only those children who reached the learning criteria after 2 learning cycles. For this purpose, we excluded 21 children from the analysis who needed 3 or 4 learning cycles. This resulted in 39 young adults and 28 children being included in the subsequent analysis. 

      (i) We reran the behavioral analysis with the subsampled dataset (see Supplementary Materials,  Table S1.1, Fig. S1, Table S1.2). This analysis replicated the previous findings of less robust memory consolidation in children across all time delays. 

      (ii) We reran the univariate analysis (see in Supplementary Materials, Table S9.1). This analysis also replicated fully the previous findings. This indicates that the inclusion of child participants with greater material exposure during learning in the analysis of neural retrieval patterns did not affect the group differences in univariate neural results. 

      These subsampled results demonstrated that the amount of re-exposure to stimuli during encoding does not affect consolidation-related changes in memory retrieval at the behavioral and neural levels in children and adults across all time delays. We have added this information to the manuscript (line 343-348, 420-425). 

      (2) We appreciate Reviewer 1's suggestion to perform a formal test by sub-sampling the adult trials to match the number of trials in the child group. However, we believe that this approach may not be optimal for the following reasons:

      (i) Loss of Statistical Power: Sub-sampling the adult trials would result in a reduced sample size, potentially leading to a significant loss of statistical power and the ability to detect meaningful effects, particularly in a context where the adult group is intended to serve as a robust control or comparison group.

      (ii) Introducing sub-sampling could introduce variability that complicates the interpretation of results, particularly if the trial sub-sampling process does not fully capture the variability inherent in the original adult data.

      (iii) Robustness of Existing Findings: We have already addressed potential concerns about unequal trial numbers by conducting analyses that control for the number of learning cycles, as detailed in our supplementary materials. These analyses have shown that the observed effects are consistent, suggesting that the differences in trial numbers do not critically influence our findings.

      Given these considerations, we hope the reviewer understands our rationale and agrees that the current analysis is robust and appropriate for addressing the research questions.

      I also still fundamentally disagree with the use of global signals when comparing children to adults, and think this could very much skew the results. 

      We thank the reviewer for raising this important issue. To address this concern comprehensively, we have taken the following steps:

      (1) Overview of the literature support for global signal regression (GSR). A growing body of methodological and empirical research supports the inclusion of global signal repression as part of best practice denoising pipelines, particularly when analyzing pediatric fMRI data. Studies such as (Ciric et al., 2017; Parkes et al., 2018; J. D. Power et al., 2012, 2014; Power et al., 2012), and (Thompson et al., 2016) show that  GSR improves motion-related artifact removal. Critically, pediatric-specific studies (Disselhoff et al., 2025; Graff et al., 2022) conclude that pipelines including GSR are most effective for signal recovery and artifact removal in younger children. Graff et al. (2021) demonstrated that among various pipelines, GSR yielded the best noise reduction in 4–8-year-olds. Additionally, (Li et al., 2019; Qing et al., 2015) emphasized that GSR reduces artifactual variance without distorting the spatial structure of neural signals. (Ofoghi et al., 2021)demonstrated that global signal regression helps mitigate non-neuronal noise sources, including respiration, cardiac activity, motion, vasodilation, and scanner-related artifacts. Based on this and other recent findings, we consider GSR particularly beneficial for denoising paediatric  fMRI data in our study.

      (2) Empirical comparison of pipelines with and without GSR. We re-run the entire first-level univariate analysis using the pipeline that excluded the global signal regression. The resulting activation maps (see Supplementary Figure S3.2, S4.2, S5.2, S9.2) differed notably from the original pipeline. Specifically, group differences in cortical regions such as mPFC, cerebellum, and posterior PHG no longer reached significance, and the overall pattern of results appeared noisier. 

      (3) Evaluation of the pipeline differences. To further evaluate the impact of GSR, we conducted the following analyses:

      (a) Global signal is stable across groups and sessions. A linear mixed-effects model showed no significant main effects or interactions involving group or session on the global signal (F-values < 2.62, p > .11), suggesting that the global signal was not group- or session-dependent in our sample. 

      (b) Noise Reduction Assessment via Contrast Variability. We compared the variability (standard deviation and IQR) of contrast estimates across pipelines. Both SD (b = .070, p < .001) and IQR (b = .087, p < .001) were significantly reduced in the GSR pipeline, especially in children (p < .001) compared to adults (p = .048). This suggests that GSR reduces inter-subject variability in children, likely reflecting improved signal quality.

      (c) Residual Variability After Regressing Global Signal. We regressed out global signal post hoc from both pipelines and compared the residual variance. Residual standard deviation was significantly lower for the GSR pipeline (F = 199, p < .001), with no interaction with session or group, further indicating that GSR stabilizes the signal and attenuates non-neuronal variability.

      Conclusion

      In summary, while we understand the reviewer’s concern, we believe the empirical and theoretical support for GSR, especially in pediatric samples, justifies its use in our study. Nonetheless, to ensure full transparency, we provide full results from both pipelines in the Supplementary Materials and have clarified our reasoning in the revised manuscript.

      Reviewer #1 (Recommendations For The Authors): 

      (1) Some figures are still missing descriptions of what everything on the graph means; please clarify in captions. 

      We thank the reviewer for pointing this out. We undertook the necessary adjustments in the graph annotations. 

      (2) The authors conclude they showed evidence of neural reorganization of memory representations in children (p. 41). But the gist is not greater in children than adults, and also does not differ over time-so, I was confused about what this claim was based on? 

      We thank the reviewer for raising this question. Our results on gist-like reinstatements suggest that gist-like reinstatement was significantly higher in children compared to adults in the mPFC in addition to the child gist-like reinstatement indices being significantly higher than zero (see p.27-28). These results support our claim on neural reorganization of memory represenations in children. We hope this clarifies the issue. 

      References

      Bartlett, F. C. (1932). Remembering: A study in experimental and social psychology. Cambridge University Press.

      Brainerd, C. J., & Reyna, V. F. (2002). Fuzzy-Trace Theory: Dual Processes in Memory, Reasoning, and Cognitive Neuroscience (pp. 41–100). https://doi.org/10.1016/S00652407(02)80062-3

      Chen, J., Leong, Y. C., Honey, C. J., Yong, C. H., Norman, K. A., & Hasson, U. (2017). Shared memories reveal shared structure in neural activity across individuals. Nature Neuroscience, 20(1), 115–125. https://doi.org/10.1038/nn.4450

      Ciric, R., Wolf, D. H., Power, J. D., Roalf, D. R., Baum, G. L., Ruparel, K., Shinohara, R. T., Elliott, M. A., Eickhoff, S. B., Davatzikos, C., Gur, R. C., Gur, R. E., Bassett, D. S., & Satterthwaite, T. D. (2017). Benchmarking of participant-level confound regression strategies for the control of motion artifact in studies of functional connectivity. NeuroImage, 154, 174–187. https://doi.org/10.1016/j.neuroimage.2017.03.020

      Disselhoff, V., Jakab, A., Latal, B., Schnider, B., Wehrle, F. M., Hagmann, C. F., Held, U., O’Gorman, R. T., Fauchère, J.-C., & Hüppi, P. (2025). Inhibition abilities and functional brain connectivity in school-aged term-born and preterm-born children. Pediatric Research, 97(1), 315–324. https://doi.org/10.1038/s41390-024-03241-0

      Esteban, O., Ciric, R., Finc, K., Blair, R. W., Markiewicz, C. J., Moodie, C. A., Kent, J. D., Goncalves, M., DuPre, E., Gomez, D. E. P., Ye, Z., Salo, T., Valabregue, R., Amlien, I. K., Liem, F., Jacoby, N., Stojić, H., Cieslak, M., Urchs, S., … Gorgolewski, K. J. (2020). Analysis of task-based functional MRI data preprocessed with fMRIPrep. Nature Protocols, 15(7), 2186–2202. https://doi.org/10.1038/s41596-020-0327-3

      Fandakova, Y., Leckey, S., Driver, C. C., Bunge, S. A., & Ghetti, S. (2019). Neural specificity of scene representations is related to memory performance in childhood. NeuroImage, 199, 105–113. https://doi.org/10.1016/j.neuroimage.2019.05.050

      Gautama, T., & Van Hulle, M. M. (2004). Optimal spatial regularisation of autocorrelation estimates in fMRI analysis. NeuroImage, 23(3), 1203–1216.  https://doi.org/10.1016/j.neuroimage.2004.07.048

      Graff, K., Tansey, R., Ip, A., Rohr, C., Dimond, D., Dewey, D., & Bray, S. (2022). Benchmarking common preprocessing strategies in early childhood functional connectivity and intersubject correlation fMRI. Developmental Cognitive Neuroscience, 54, 101087. https://doi.org/10.1016/j.dcn.2022.101087

      Horner, A. J., & Burgess, N. (2013). The associative structure of memory for multi-element events. Journal of Experimental Psychology: General, 142(4), 1370–1383. https://doi.org/10.1037/a0033626

      Jones, J. S., the CALM Team, & Astle, D. E. (2021). A transdiagnostic data-driven study of children’s behaviour and the functional connectome. Developmental Cognitive Neuroscience, 52, 101027. https://doi.org/10.1016/j.dcn.2021.101027

      Kuhl, B. A., Bainbridge, W. A., & Chun, M. M. (2012). Neural Reactivation Reveals Mechanisms for Updating Memory. Journal of Neuroscience, 32(10), 3453–3461. https://doi.org/10.1523/JNEUROSCI.5846-11.2012

      Kuhl, B. A., & Chun, M. M. (2014). Successful Remembering Elicits Event-Specific Activity Patterns in Lateral Parietal Cortex. Journal of Neuroscience, 34(23), 8051–8060. https://doi.org/10.1523/JNEUROSCI.4328-13.2014

      Li, J., Kong, R., Liégeois, R., Orban, C., Tan, Y., Sun, N., Holmes, A. J., Sabuncu, M. R., Ge, T., & Yeo, B. T. T. (2019). Global signal regression strengthens association between resting-state functional connectivity and behavior. NeuroImage, 196, 126–141. https://doi.org/10.1016/j.neuroimage.2019.04.016

      Ofoghi, B., Chenaghlou, M., Mooney, M., Dwyer, D. B., & Bruce, L. (2021). Team technical performance characteristics and their association with match outcome in elite netball. International Journal of Performance Analysis in Sport, 21(5), 700–712. https://doi.org/10.1080/24748668.2021.1938424

      Pacheco Estefan, D., Sánchez-Fibla, M., Duff, A., Principe, A., Rocamora, R., Zhang, H., Axmacher, N., & Verschure, P. F. M. J. (2019). Coordinated representational reinstatement in the human hippocampus and lateral temporal cortex during episodic memory retrieval. Nature Communications, 10(1), 2255. https://doi.org/10.1038/s41467019-09569-0

      Parkes, L., Fulcher, B., Yücel, M., & Fornito, A. (2018). An evaluation of the efficacy, reliability, and sensitivity of motion correction strategies for resting-state functional MRI. NeuroImage, 171, 415–436. https://doi.org/10.1016/j.neuroimage.2017.12.073

      Power, J. D., Barnes, K. A., Snyder, A. Z., Schlaggar, B. L., & Petersen, S. E. (2012). Spurious but systematic correlations in functional connectivity MRI networks arise from subject motion. NeuroImage, 59(3), 2142–2154. https://doi.org/10.1016/j.neuroimage.2011.10.018

      Power, J. D., Mitra, A., Laumann, T. O., Snyder, A. Z., Schlaggar, B. L., & Petersen, S. E. (2014). Methods to detect, characterize, and remove motion artifact in resting state fMRI. NeuroImage, 84, 320–341. https://doi.org/10.1016/j.neuroimage.2013.08.048

      Power, S. D., Kushki, A., & Chau, T. (2012). Intersession Consistency of Single-Trial Classification of the Prefrontal Response to Mental Arithmetic and the No-Control State by NIRS. PLoS ONE, 7(7), e37791. https://doi.org/10.1371/journal.pone.0037791

      Prince, J. S., Charest, I., Kurzawski, J. W., Pyles, J. A., Tarr, M. J., & Kay, K. N. (2022). Improving the accuracy of single-trial fMRI response estimates using GLMsingle. ELife, 11. https://doi.org/10.7554/eLife.77599

      Qing, Z., Dong, Z., Li, S., Zang, Y., & Liu, D. (2015). Global signal regression has complex effects on regional homogeneity of resting state fMRI signal. Magnetic Resonance Imaging, 33(10), 1306–1313. https://doi.org/10.1016/j.mri.2015.07.011

      Ranganath, C., & Ritchey, M. (2012). Two cortical systems for memory-guided behaviour. Nature Reviews Neuroscience, 13(10), 713–726. https://doi.org/10.1038/nrn3338

      Ritchey, M., Wing, E. A., LaBar, K. S., & Cabeza, R. (2013). Neural Similarity Between Encoding and Retrieval is Related to Memory Via Hippocampal Interactions. Cerebral Cortex, 23(12), 2818–2828. https://doi.org/10.1093/cercor/bhs258

      Satterthwaite, T. D., Elliott, M. A., Gerraty, R. T., Ruparel, K., Loughead, J., Calkins, M. E., Eickhoff, S. B., Hakonarson, H., Gur, R. C., Gur, R. E., & Wolf, D. H. (2013). An improved framework for confound regression and filtering for control of motion artifact in the preprocessing of resting-state functional connectivity data. NeuroImage, 64, 240–256. https://doi.org/10.1016/j.neuroimage.2012.08.052

      Schommartz, I., Lembcke, P. F., Pupillo, F., Schuetz, H., de Chamorro, N. W., Bauer, M., Kaindl, A. M., Buss, C., & Shing, Y. L. (2023). Distinct multivariate structural brain profiles are related to variations in short- and long-delay memory consolidation across children and young adults. Developmental Cognitive Neuroscience, 59. https://doi.org/10.1016/J.DCN.2022.101192

      Sekeres, M. J., Winocur, G., & Moscovitch, M. (2018). The hippocampus and related neocortical structures in memory transformation. Neuroscience Letters, 680, 39–53. https://doi.org/10.1016/j.neulet.2018.05.006

      Shinn, L. J., & Lagalwar, S. (2021). Treating Neurodegenerative Disease with Antioxidants: Efficacy of the Bioactive Phenol Resveratrol and Mitochondrial-Targeted MitoQ and SkQ. Antioxidants, 10(4), 573. https://doi.org/10.3390/antiox10040573

      Staresina, B. P., Alink, A., Kriegeskorte, N., & Henson, R. N. (2013). Awake reactivation predicts memory in humans. Proceedings of the National Academy of Sciences, 110(52), 21159–21164. https://doi.org/10.1073/pnas.1311989110

      St-Laurent, M., & Buchsbaum, B. R. (2019). How Multiple Retrievals Affect Neural Reactivation in Young and Older Adults. The Journals of Gerontology: Series B, 74(7), 1086–1100. https://doi.org/10.1093/geronb/gbz075

      Thompson, G. J., Riedl, V., Grimmer, T., Drzezga, A., Herman, P., & Hyder, F. (2016). The Whole-Brain “Global” Signal from Resting State fMRI as a Potential Biomarker of Quantitative State Changes in Glucose Metabolism. Brain Connectivity, 6(6), 435–447. https://doi.org/10.1089/brain.2015.0394

      Tompary, A., & Davachi, L. (2017). Consolidation Promotes the Emergence of Representational Overlap in the Hippocampus and Medial Prefrontal Cortex. Neuron, 96(1), 228-241.e5. https://doi.org/10.1016/j.neuron.2017.09.005

      Tompary, A., Zhou, W., & Davachi, L. (2020). Schematic memories develop quickly, but are not expressed unless necessary. PsyArXiv.

      Woolrich, M. W., Behrens, T. E. J., Beckmann, C. F., Jenkinson, M., & Smith, S. M. (2004). Multilevel linear modelling for FMRI group analysis using Bayesian inference. NeuroImage, 21(4), 1732–1747. https://doi.org/10.1016/j.neuroimage.2003.12.023

      Xiao, X., Dong, Q., Gao, J., Men, W., Poldrack, R. A., & Xue, G. (2017). Transformed Neural Pattern Reinstatement during Episodic Memory Retrieval. The Journal of Neuroscience, 37(11), 2986–2998. https://doi.org/10.1523/JNEUROSCI.2324-16.2017

      Ye, Z., Shi, L., Li, A., Chen, C., & Xue, G. (2020). Retrieval practice facilitates memory updating by enhancing and differentiating medial prefrontal cortex representations. ELife, 9, 1–51. https://doi.org/10.7554/ELIFE.57023

      Yonelinas, A. P., Ranganath, C., Ekstrom, A. D., & Wiltgen, B. J. (2019). A contextual binding theory of episodic memory: systems consolidation reconsidered. Nature Reviews. Neuroscience, 20(6), 364–375. https://doi.org/10.1038/S41583-019-01504

      Zhuang, L., Wang, J., Xiong, B., Bian, C., Hao, L., Bayley, P. J., & Qin, S. (2021). Rapid neural reorganization during retrieval practice predicts subsequent long-term retention and false memory. Nature Human Behaviour, 6(1), 134–145.

      https://doi.org/10.1038/s41562-021-01188-4

    1. eLife Assessment

      This study presents fundamental insights into overcoming resistance in hormone receptor-positive breast cancer by demonstrating that sustained CDK4/6 inhibitor treatment, either alone or in combination with CDK2 inhibitors, significantly suppresses the growth of drug-resistant cancer cells. The findings are supported by compelling evidence from both in vitro cell line experiments and in vivo mouse models, highlighting the therapeutic potential of maintaining CDK4/6 inhibitors beyond disease progression. Additionally, the identification of cyclin E overexpression as a key driver of resistance offers a target that will be of value for future therapeutic strategies, potentially improving outcomes for patients with advanced breast cancer.

    2. Reviewer #1 (Public review):

      Summary:

      In the research manuscript submitted to eLife (Manuscript ID eLife-RP-RA-2024-104545) titled "Therapeutic benefits of maintaining CDK4/6 inhibitors and incorporating CDK2 inhibitors beyond progression in breast cancer" authors identified 1) CDK4/6i treatment attenuates the growth of drug-resistant cell by prolongation of G1 phase; 2) CDK4/6i treatment results in an ineffective Rb inactivation pathways and suppress the growth of drug-resistant tumors; 3) Addition of endocrine therapy augments the efficacy of CDK4/6i maintenance; 4) Addition of CDK2i with CDK4/6 treatment as second-line treatment can suppress the growth of resistant cell; 5) finally role of cyclin E as key driver of resistance to CDK4/6 and CDK2 inhibition.

      Strengths:

      To prove authors complicated proposal, authors employed orchestration of several kinds of live cell markers, timed in situ hybridization, IF and Immono-bloting. The authors strongly recognize the resistance of CDK4/6 + ET therapy and demonstrated how to overcome it.

      Weaknesses:

      None.

      Comments on revisions:

      In response to the reviewers' questions and comments, the authors have revised the manuscript accordingly and sufficiently addressed the differences between their study and previous works on CDK4/6 and CDK2 combination therapy as a second-line approach.

    3. Reviewer #2 (Public review):

      Summary:

      This study elucidated the mechanism underlying drug resistance induced by CDK4/6i as a single agent and proposed a novel and efficacious second-line therapeutic strategy. It highlighted the potential of combining CDK2i with CDK4/6i for the treatment of HR+/HER2- breast cancer.

      Strengths:

      The study demonstrated that CDK4/6 induces drug resistance by impairing Rb activation, which results in diminished E2F activity and a delay in G1 phase progression. It suggests that the synergistic use of CDK2i and CDK4/6i may represent a promising second-line treatment approach. Addressing critical clinical challenges, this study holds substantial practical implications.

      Comments on revisions:

      The author has comprehensively addressed all the questions I raised.

    4. Reviewer #3 (Public review):

      Summary:

      In their manuscript, Armand and colleagues investigate the potential of continuing CDK4/6 inhibitors or combining them with CDK2 inhibitors in the treatment of breast cancer that has developed resistance to initial therapy. Utilizing cellular and animal models, the research examines whether maintaining CDK4/6 inhibition or adding CDK2 inhibitors can effectively control tumor growth after resistance has set in. The key findings from the study indicate that the sustained use of CDK4/6 inhibitors can slow down the proliferation of cancer cells that have become resistant, and the combination of CDK2 inhibitors with CDK4/6 inhibitors can further enhance the suppression of tumor growth. Additionally, the study identifies that high levels of Cyclin E play a significant role in resistance to the combined therapy. These results suggest that continuing CDK4/6 inhibitors along with the strategic use of CDK2 inhibitors could be an effective strategy to overcome treatment resistance in hormone receptor-positive breast cancer. However, several issues need to be addressed before considering its publication.

      Strengths:

      (1) Continuous CDK4/6 Inhibitor Treatment Significantly Suppresses the Growth of Drug-Resistant HR+ Breast Cancer: The study demonstrates that the continued use of CDK4/6 inhibitors, even after disease progression, can significantly inhibit the growth of drug-resistant breast cancer.

      (2) Potential of Combined Use of CDK2 Inhibitors with CDK4/6 Inhibitors: The research highlights the potential of combining CDK2 inhibitors with CDK4/6 inhibitors to effectively suppress CDK2 activity and overcome drug resistance.

      (3) Discovery of Cyclin E Overexpression as a Key Driver: The study identifies overexpression of cyclin E as a key driver of resistance to the combination of CDK4/6 and CDK2 inhibitors, providing insights for future cancer treatments.

      (4) Consistency of In Vitro and In Vivo Experimental Results: The study obtained supportive results from both in vitro cell experiments and in vivo tumor models, enhancing the reliability of the research.

      (5) Validation with Multiple Cell Lines: The research utilized multiple HR+/HER2- breast cancer cell lines (such as MCF-7, T47D, CAMA-1) and triple-negative breast cancer cell lines (such as MDA-MB-231), validating the broad applicability of the results.

      Comments on revisions:

      The authors made a significant effort to improve the manuscript. My comments were sufficiently addressed.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary: 

      In this manuscript, the authors identified that

      (1) CDK4/6i treatment attenuates the growth of drug-resistant cells by prolongation of the G1 phase;

      (2) CDK4/6i treatment results in an ineffective Rb inactivation pathway and suppresses the growth of drugresistant tumors;

      (3) Addition of endocrine therapy augments the efficacy of CDK4/6i maintenance; 

      (4) Addition of CDK2i with CDK4/6 treatment as second-line treatment can suppress the growth of resistant cell; 

      (5) The role of cyclin E as a key driver of resistance to CDK4/6 and CDK2 inhibition.

      Strengths: 

      To prove their complicated proposal, the authors employed orchestration of several kinds of live cell markers, timed in situ hybridization, IF and Immunoblotting. The authors strongly recognize the resistance of CDK4/6 + ET therapy and demonstrated how to overcome it. 

      Weaknesses: 

      The authors need to underscore their proposed results from what is to be achieved by them and by other researchers. 

      Reviewer #2 (Public review): 

      Summary: 

      This study elucidated the mechanism underlying drug resistance induced by CDK4/6i as a single agent and proposed a novel and efficacious second-line therapeutic strategy. It highlighted the potential of combining CDK2i with CDK4/6i for the treatment of HR+/HER2- breast cancer.

      Strengths: 

      The study demonstrated that CDK4/6 induces drug resistance by impairing Rb activation, which results in diminished E2F activity and a delay in G1 phase progression. It suggests that the synergistic use of CDK2i and CDK4/6i may represent a promising second-line treatment approach. Addressing critical clinical challenges, this study holds substantial practical implications.

      Weaknesses: 

      (1) Drug-resistant cell lines: Was a drug concentration gradient treatment employed to establish drug-resistant cell lines? If affirmative, this methodology should be detailed in the materials and methods section. 

      We greatly appreciate the reviewer for raising this important question. In the revised manuscript, we have updated the methods section (“Drug-resistant cell lines”) to more precisely describe how the drug-resistant cell lines were established. 

      (2) What rationale informed the selection of MCF-7 cells for the generation of CDK6 knockout cell lines? Supplementary Figure 3. A indicates that CDK6 expression levels in MCF-7 cells are not notably elevated. 

      We appreciate the reviewer’s insightful question about the rationale for selecting MCF-7 cells to generate CDK6 knockout cell lines. This choice was guided by prior studies highlighting the significant role of CDK6 in mediating resistance to CDK4/6 inhibitors (21-24). Moreover, we observed a 4.6-fold increase in CDK6 expression in CDK4/6i resistant MCF-7 cells compared to their drug-naïve counterparts (Supplementary Figure 3A). While we did not detect notable differences in CDK4/6 activity between wild-type and CDK6 knockout cells under CDK4/6 inhibitor treatment, these findings point to a potential non-canonical function of CDK6 in conferring resistance to CDK4/6 inhibitors.  

      (3) For each experiment, particularly those involving mice, the author must specify the number of individuals utilized and the number of replicates conducted, as detailed in the materials and methods section. 

      We sincerely thank the reviewer for bringing this to our attention. In the revised manuscript, we have explicitly stated the number of replicates and mice used for each experiment as appropriate in figure legends and relevant text to ensure transparency and clarity. 

      (4) Could this treatment approach be extended to triple-negative breast cancer?

      We greatly appreciate the reviewer’s inquiry about extending our findings to triple-negative breast cancer (TNBC). Based on the data presented in Figure 1 and Supplementary Figure 2, which include the TNBC cell line MDA-MB-231, we expect that the benefits of maintaining CDK4/6 inhibitors could indeed be applicable to TNBC with an intact Rb/E2F pathway. Additionally, our recent paper (25) indicates a similar mechanism in TNBC.

      Reviewer #3 (Public review):

      Summary: 

      In their manuscript, Armand and colleagues investigate the potential of continuing CDK4/6 inhibitors or combining them with CDK2 inhibitors in the treatment of breast cancer that has developed resistance to initial therapy. Utilizing cellular and animal models, the research examines whether maintaining CDK4/6 inhibition or adding CDK2 inhibitors can effectively control tumor growth after resistance has set in. The key findings from the study indicate that the sustained use of CDK4/6 inhibitors can slow down the proliferation of cancer cells that have become resistant, and the combination of CDK2 inhibitors with CDK4/6 inhibitors can further enhance the suppression of tumor growth. Additionally, the study identifies that high levels of Cyclin E play a significant role in resistance to the combined therapy. These results suggest that continuing CDK4/6 inhibitors along with the strategic use of CDK2 inhibitors could be an effective strategy to overcome treatment resistance in hormone receptor-positive breast cancer.

      Strengths: 

      (1) Continuous CDK4/6 Inhibitor Treatment Significantly Suppresses the Growth of Drug-Resistant HR+ Breast Cancer: The study demonstrates that the continued use of CDK4/6 inhibitors, even after disease progression, can significantly inhibit the growth of drug-resistant breast cancer. 

      (2) Potential of Combined Use of CDK2 Inhibitors with CDK4/6 Inhibitors: The research highlights the potential of combining CDK2 inhibitors with CDK4/6 inhibitors to effectively suppress CDK2 activity and overcome drug resistance. 

      (3) Discovery of Cyclin E Overexpression as a Key Driver: The study identifies overexpression of cyclin E as a key driver of resistance to the combination of CDK4/6 and CDK2 inhibitors, providing insights for future cancer treatments. 

      (4) Consistency of In Vitro and In Vivo Experimental Results: The study obtained supportive results from both in vitro cell experiments and in vivo tumor models, enhancing the reliability of the research. 

      (5) Validation with Multiple Cell Lines: The research utilized multiple HR+/HER2- breast cancer cell lines (such as MCF-7, T47D, CAMA-1) and triple-negative breast cancer cell lines (such as MDA-MB-231), validating the broad applicability of the results.

      Weaknesses: 

      (1) The manuscript presents intriguing findings on the sustained use of CDK4/6 inhibitors and the potential incorporation of CDK2 inhibitors in breast cancer treatment. However, I would appreciate a more detailed discussion of how these findings could be translated into clinical practice, particularly regarding the management of patients with drug-resistant breast cancer. 

      Thank you to the reviewer for this crucial comment. In the revised Discussion, we've broadened our exploration of clinical translation. Specifically, we emphasize that ongoing CDK4/6 inhibition, although not fully stopping resistant tumors, significantly slows their growth and may offer a therapeutic window when combined with ET and CDK2 inhibition. We also note that these approaches may work best for patients without Rb loss or newly acquired resistance-driving mutations, and that cyclin E overexpression could be a biomarker to inform patient selection. These points together highlight that our findings provide a mechanistic understanding and potential framework for clinical trials testing maintenance CDK4/6i with selective addition of CDK2i as a secondline strategy in drug-resistant HR+/HER2- breast cancer.

      (2) While the emergence of resistance is acknowledged, the manuscript could benefit from a deeper exploration of the molecular mechanisms underlying resistance development. A more thorough understanding of how CDK2 inhibitors may overcome this resistance would be valuable. 

      We thank the reviewer for this valuable suggestion. In the revised manuscript, we have expanded our Discussion to more explicitly synthesize the molecular mechanisms of resistance and how CDK2 inhibitors counteract them. Specifically, we describe how sustained CDK4/6 inhibition drives a non-canonical route of Rb degradation, resulting in inefficient E2F activation and prolonged G1 phase progression. We also highlight the role of c-Myc in amplifying E2F activity and promoting resistance, and we show that continued ET mitigates this effect by suppressing c-Myc. Importantly, we demonstrate that CDK2 inhibition alone cannot fully suppress the growth of resistant cells, but when combined with CDK4/6 inhibition, it produces durable repression of E2F and Myc target gene programs and significantly delays the G1/S transition. Finally, we identify cyclin E overexpression as a key mechanism of escape from dual CDK4/6i + CDK2i therapy, suggesting its potential as a biomarker for patient stratification . Together, these findings provide a detailed mechanistic rationale for how CDK2 inhibition can overcome specific pathways of resistance in HR<sup>+</sup>/HER2<sup>-</sup> breast cancer.

      (3) The manuscript supports the continued use of CDK4/6 inhibitors, but it lacks a discussion on the long-term efficacy and safety of this approach. Additional studies or data to support the safety profile of prolonged CDK4/6 inhibitor use would strengthen the manuscript. 

      We appreciate the reviewer’s insightful comment. In the revised manuscript, we emphasize the longterm efficacy and safety considerations of sustained CDK4/6 inhibition. Clinical trial and retrospective data have shown that continued CDK4/6i therapy can extend progression-free survival in selected patients, while maintaining a favorable safety profile (26-28). We have updated the Discussion to highlight these findings more explicitly, underscoring that while prolonged CDK4/6 inhibition slows but does not fully arrest tumor growth, it remains a clinically viable strategy when balanced against its manageable toxicity profile.

      Reviewer #1 (Recommendations for the authors): 

      It is well known that the combination therapy of CDK4/6i and ET has therapeutic benefits in ER(+) HER2(-) advanced breast cancer. However, drug resistance is a problem, and second-line therapy to solve this problem has not been established. Although some parts of the research results are already reported, the authors confirmed them by employing live cell markers, and further proved and suggested how to overcome this resistance in detail. This part is considered novel. 

      Overall, this research manuscript is eligible to be accepted with the appropriate addressing of questions.

      (1)The effects and biochemical changes of combination therapy of CDK4/6i and CDK2i are already known in several papers. The author needs to highlight the differences between the author's research and that of otherresearchers. 

      We thank the reviewer for the opportunity to clarify the novelty of our findings in the context of prior studies on CDK4/6i and CDK2i combination therapy. In the revised manuscript, we have updated the Discussion section to more clearly delineate how our work extends and differs from existing research.

      Specifically, we now state:

      Page 12: The combination of CDK4/6i and ET has reshaped treatment for HR<sup>+</sup>/HER2<sup>-</sup> breast cancer (1-8). However, resistance commonly emerges, and no consensus second-line standard is established. Our data show that continued CDK4/6i treatment in drug-resistant cells engages a non-canonical, proteolysis-driven route of Rb inactivation, yielding attenuated E2F output and a pronounced delay in G1 progression (Figure 7G). Concurrent ET further deepens this blockade by suppressing c-Myc-mediated E2F amplification, thereby prolonging G1 and slowing population growth. Importantly, CDK2 inhibition alone was insufficient to control resistant cells. Robust suppression of CDK2 activity and resistant-cell growth required CDK2i in combination with CDK4/6i, consistent with prior reports supporting dual CDK targeting (9-16). Moreover, cyclin E, and in some contexts cyclin A, blunted the efficacy of the CDK4/6i and CDK2i combination by reactivating CDK2. Together, these findings provide a mechanistic rationale for maintaining CDK4/6i beyond progression and support testing ET plus CDK4/6i with the strategic addition of CDK2i, as evidenced by concordant in vitro and in vivo results.

      (2) Regarding Figures 3H and 3I, I wonder if it is live cell imaging results or if the authors counter each signal via timed IF staining slides? If live cell imaging is used, the authors need to present the methods. 

      We appreciate the reviewer’s question. Figures 3H and 3I derive from a live–fixed correlative pipeline rather than purely live imaging or independently timed IF slides. We first imaged asynchronously proliferating cells live for ≥48 h to (i) segment/track nuclei with H2B fluorescence, (ii) define mitotic exit (t = 0 at anaphase), and (iii) record CDK2 activity using a CDK2 KTR in the last live frame. Immediately after the live acquisition, we pulsed EdU (10 µM, 15 min) and fixed the same wells, photobleached fluorescent proteins (3% H₂O₂ + 20 mM HCl, 2 h, RT) to prevent crosstalk, and then performed click-chemistry EdU detection, IF for phospho-Rb (Ser807/811) and total Rb, and RNA FISH for E2F1. Fixed-cell readouts (p-Rb positivity, EdU incorporation, E2F1 mRNA puncta) were mapped back to each single cell’s live-derived time since mitosis and/or CDK2 activity, enabling the kinetic plots shown in Fig. 3H–I.

      To ensure transparency and reproducibility, we added detailed methods describing this workflow in the “Immunofluorescence and mRNA fluorescence in situ hybridization (FISH)” section under a dedicated “live– fixed pipeline” paragraph, and we cross-referenced acquisition and analysis parameters in “Live- and fixed-cell image acquisition” and “Image processing and analysis.” These updates specify: EdU pulse/fix conditions, photobleaching, antibodies/probes, imaging hardware and channels, segmentation/tracking, mitosis alignment, background correction, and how fixed readouts were binned/quantified as functions of time after mitosis and CDK2 activity.

      (3) Regarding Figure 3F, seven images were obtained in same fields? The author needs to describe the meaning of the white image and the yellow and blue image of the bottom in detail. 

      Thank you for raising this point. All seven panels in Fig. 3F are from the same field of view. The top row shows the raw channels (Hoechst, p-Rb, total Rb, and E2F1 RNA FISH). The bottom row shows the corresponding processed outputs from that field: (i) nuclear segmentation, (ii) phosphorylated Rb-status classification, and (iii) cell boundaries used for single-cell RNA-FISH quantification. We have revised the figure legend to make this explicit.

      (4) The author showed E2F mRNA by ISH, but in fact, RB does not suppress E2F mRNA but suppresses protein, so the author needs to confirm E2F at the protein level.

      We sincerely appreciate the reviewer’s thoughtful suggestion to examine E2F1 at the protein level. In our study, we focused on E2F1 mRNA expression because it is a well-established and biologically meaningful readout of E2F1 transcriptional activity. Due to its autoregulatory nature (17), the release of active E2F1 protein from Rb induces the transcription of E2F1 itself, creating a positive feedback loop. As a result, E2F1 mRNA abundance serves as a direct and reliable proxy for E2F1 protein activity (18-20). Thus, quantifying E2F1 mRNA provides a biologically relevant and mechanistic indicator of Rb-E2F pathway status. To clarify this rationale, we have updated the Results section and added references supporting our use of E2F1 mRNA as a readout for E2F1 activity.

      (5) Is it possible to synchronize cells (nocodazole shake-off, Double thymidine block) under the presence of cdk4/6i? If so, then the authors need to demonstrate the delay of G1 progression via immunoblotting. 

      We thank the reviewer for this constructive suggestion. To address it, we performed nocodazole synchronization followed by release and monitored cell-cycle progression in the presence or absence of CDK4/6 inhibition.

      Specifically, we added the following new datasets to the revised manuscript:

      Fig. 3L: Live single-cell trajectories of CDK4/6 and CDK2 activities alongside the Cdt1-degron reporter after 14 hours of nocodazole (250 nM) treatment and release. We compared the averaged traces of CDK4/6 and CDK2 activities and Cdt1 intensity in parental cells (gray) and resistant cells with (red) and without (blue) CDK4/6i maintenance. These data show suppressed and delayed CDK2 activation, as well as a right-shifted S-phase entry, particularly under continuous CDK4/6 inhibition.

      Fig. 3M: Fixed-cell EdU pulse-labeling at 4, 6, 8, 12, 16, and 24 h post-release further confirms a significant delay in S-phase entry and prolonged G1 duration in CDK4/6i-maintained cells compared with naïve and withdrawn conditions.

      Together, these results directly demonstrate the delay in G1 progression following synchronized mitotic exit under CDK4/6 inhibition.

      (6) In Figure 5C the authors showed a violin plot of c-Myc level. Is this Immunohistochemical staining? The authors need to clarify the methods.

      Thank you for flagging this. The c-Myc measurements in Fig. 5C are from immunofluorescence (IF), not IHC. We now state this explicitly in the legend.

      (7) Regarding Live cell immunofluorescence tracing of live-cell reporters, the author needs to clarify the methods (excitation, emission), name of instruments, and software used.

      To address this, we have expanded the “Live-cell, fixed-cell, and tumor tissue image acquisition” section in the Materials and Methods.

      (8) Lines 475 SF1A, the authors need to correct typos. Naïve Naïve.

      We greatly appreciate the reviewer’s attention to this detail and have ensured all typos have been addressed.  

      (9) The authors need to unify Cdt1-degron(legends) Vs Cdt1 degron (figures). 

      We greatly appreciate your attention to this discrepancy. Language referring to the Cdt1 degron has been unified between figures and legends. 

      Reviewer #3 (Recommendations for the authors):

      (1) While the manuscript discusses the selection of doses for CDK4/6 inhibitors and CDK2 inhibitors, there is a lack of detailed data on the dose-response relationship. Additional data on the effects of different doses would be beneficial. 

      We appreciate the reviewer’s important comment. To address it, we performed additional dose– response experiments testing a range of CDK4/6i and CDK2i concentrations. These analyses revealed a clear synergistic interaction between the two inhibitors. The new data are now presented in Figure 6G and Supplementary Figure 8F of the revised manuscript.

      (2) In clinical trials, the criteria for patient selection are crucial for interpreting study outcomes. A detailed description of the patient selection criteria should be provided.  

      We thank the reviewer for bringing this important point to our attention. In the revised manuscript, we have clarified the patient selection criteria relevant to the interpretation of clinical outcomes. Specifically, we note that retrospective analyses suggest patients with indolent disease and no prior chemotherapy may benefit most from continued CDK4/6i plus ET. Moreover, our data and others’ indicate that clinical benefit is expected in tumors retaining an intact Rb/E2F axis, while resistance-driving alterations (e.g., Rb loss, PIK3CA, ESR1, FGFR1–3, HER2, FAT1 mutations) are likely to limit efficacy. Finally, we highlight cyclin E overexpression as a potential biomarker of resistance to combined CDK4/6i and CDK2i, underscoring the need for biomarker-guided patient stratification. These additions provide a more detailed framework for patient selection in future clinical applications.

      References

      (1) Finn RS, Crown JP, Lang I, Boer K, Bondarenko IM, Kulyk SO, et al. The cyclin-dependent kinase 4/6 inhibitor palbociclib in combination with letrozole versus letrozole alone as first-line treatment of oestrogen receptor-positive, HER2-negative, advanced breast cancer (PALOMA-1/TRIO-18): a randomised phase 2 study. Lancet Oncol 2015;16:25-35

      (2) Finn RS, Martin M, Rugo HS, Jones S, Im S-A, Gelmon K, et al. Palbociclib and Letrozole in Advanced Breast Cancer. New England Journal of Medicine 2016;375:1925-36

      (3) Turner NC, Slamon DJ, Ro J, Bondarenko I, Im S-A, Masuda N, et al. Overall Survival with Palbociclib and Fulvestrant in Advanced Breast Cancer. New England Journal of Medicine 2018;379:1926-36

      (4) Dickler MN, Tolaney SM, Rugo HS, Cortés J, Diéras V, Patt D, et al. MONARCH 1, A Phase II Study of Abemaciclib, a CDK4 and CDK6 Inhibitor, as a Single Agent, in Patients with Refractory HR(+)/HER2(-) Metastatic Breast Cancer. Clin Cancer Res 2017;23:5218-24

      (5) Johnston S, Martin M, Di Leo A, Im S-A, Awada A, Forrester T, et al. MONARCH 3 final PFS: a randomized study of abemaciclib as initial therapy for advanced breast cancer. npj Breast Cancer 2019;5:5

      (6) Hortobagyi GN, Stemmer SM, Burris HA, Yap Y-S, Sonke GS, Hart L, et al. Overall Survival with Ribociclib plus Letrozole in Advanced Breast Cancer. New England Journal of Medicine 2022;386:94250

      (7) Slamon DJ, Neven P, Chia S, Fasching PA, De Laurentiis M, Im S-A, et al. Overall Survival with Ribociclib plus Fulvestrant in Advanced Breast Cancer. New England Journal of Medicine 2019;382:51424

      (8) Im S-A, Lu Y-S, Bardia A, Harbeck N, Colleoni M, Franke F, et al. Overall Survival with Ribociclib plus Endocrine Therapy in Breast Cancer. New England Journal of Medicine 2019;381:307-16

      (9) Pandey K, Park N, Park KS, Hur J, Cho YB, Kang M, et al. Combined CDK2 and CDK4/6 Inhibition Overcomes Palbociclib Resistance in Breast Cancer by Enhancing Senescence. Cancers (Basel) 2020;12

      (10) Freeman-Cook K, Hoffman RL, Miller N, Almaden J, Chionis J, Zhang Q, et al. Expanding control of the tumor cell cycle with a CDK2/4/6 inhibitor. Cancer Cell 2021;39:1404-21 e11

      (11) Dietrich C, Trub A, Ahn A, Taylor M, Ambani K, Chan KT, et al. INX-315, a selective CDK2 inhibitor, induces cell cycle arrest and senescence in solid tumors. Cancer Discov 2023

      (12) Al-Qasem AJ, Alves CL, Ehmsen S, Tuttolomondo M, Terp MG, Johansen LE, et al. Co-targeting CDK2 and CDK4/6 overcomes resistance to aromatase and CDK4/6 inhibitors in ER+ breast cancer. NPJ Precis Oncol 2022;6:68

      (13) Kudo R, Safonov A, Jones C, Moiso E, Dry JR, Shao H, et al. Long-term breast cancer response to CDK4/6 inhibition defined by TP53-mediated geroconversion. Cancer Cell 2024

      (14) Arora M, Moser J, Hoffman TE, Watts LP, Min M, Musteanu M, et al. Rapid adaptation to CDK2 inhibition exposes intrinsic cell-cycle plasticity. Cell 2023;186:2628-43 e21

      (15) Kumarasamy V, Wang J, Roti M, Wan Y, Dommer AP, Rosenheck H, et al. Discrete vulnerability to pharmacological CDK2 inhibition is governed by heterogeneity of the cancer cell cycle. Nature Communications 2025;16:1476

      (16) Dommer AP, Kumarasamy V, Wang J, O'Connor TN, Roti M, Mahan S, et al. Tumor Suppressors Condition Differential Responses to the Selective CDK2 Inhibitor BLU-222. Cancer Res 2025

      (17) Johnson DG, Ohtani K, Nevins JR. Autoregulatory control of E2F1 expression in response to positive and negative regulators of cell cycle progression. Genes & Development 1994;8:1514-25

      (18) Chung M, Liu C, Yang HW, Koberlin MS, Cappell SD, Meyer T. Transient Hysteresis in CDK4/6 Activity Underlies Passage of the Restriction Point in G1. Mol Cell 2019;76:562-73 e4

      (19) Kim S, Leong A, Kim M, Yang HW. CDK4/6 initiates Rb inactivation and CDK2 activity coordinates cell-cycle commitment and G1/S transition. Sci Rep 2022;12:16810

      (20) Yang HW, Chung M, Kudo T, Meyer T, Yang HW, Chung, Mingyu, Kudo T, et al. Competing memories of mitogen and p53 signalling control cell-cycle entry. Nature 2017;549:404-8

      (21) Yang C, Li Z, Bhatt T, Dickler M, Giri D, Scaltriti M, et al. Acquired CDK6 amplification promotes breast cancer resistance to CDK4/6 inhibitors and loss of ER signaling and dependence. Oncogene 2017;36:2255-64

      (22) Li Q, Jiang B, Guo J, Shao H, Del Priore IS, Chang Q, et al. INK4 Tumor Suppressor Proteins Mediate Resistance to CDK4/6 Kinase Inhibitors. Cancer Discov 2022;12:356-71

      (23) Ji W, Zhang W, Wang X, Shi Y, Yang F, Xie H, et al. c-myc regulates the sensitivity of breast cancer cells to palbociclib via c-myc/miR-29b-3p/CDK6 axis. Cell Death & Disease 2020;11:760

      (24) Wu X, Yang X, Xiong Y, Li R, Ito T, Ahmed TA, et al. Distinct CDK6 complexes determine tumor cell response to CDK4/6 inhibitors and degraders. Nature Cancer 2021;2:429-43

      (25) Kim S, Son E, Park HR, Kim M, Yang HW. Dual targeting CDK4/6 and CDK7 augments tumor response and anti-tumor immunity in breast cancer models. J Clin Invest 2025

      (26) Ravani LV, Calomeni P, Vilbert M, Madeira T, Wang M, Deng D, et al. Efficacy of Subsequent Treatments After Disease Progression on CDK4/6 Inhibitors in Patients With Hormone Receptor-Positive Advanced Breast Cancer. JCO Oncol Pract 2025;21:832-42

      (27) Martin JM, Handorf EA, Montero AJ, Goldstein LJ. Systemic Therapies Following Progression on Firstline CDK4/6-inhibitor Treatment: Analysis of Real-world Data. Oncologist 2022;27:441-6

      (28) Kalinsky K, Bianchini G, Hamilton E, Graff SL, Park KH, Jeselsohn R, et al. Abemaciclib Plus Fulvestrant in Advanced Breast Cancer After Progression on CDK4/6 Inhibition: Results From the Phase III postMONARCH Trial. J Clin Oncol 2025;43:1101-12

    1. eLife Assessment

      The study provides important mechanistic insight into the transcriptional control of γδT17 development, elegantly demonstrating how HEB and Id3 act sequentially and cooperatively to regulate γδT17 cell specification and maturation. The study provides compelling evidence that advances the understanding of E-Id protein dynamics in thymic T cell specification. The work is comprehensive, technically rigorous, and conceptually clear, and will be of interest to immunologists, developmental biologists, and those studying the molecular underpinnings of physiological outcomes.

    2. Reviewer #1 (Public review):

      The authors use Flow cytometry and scRNA seq to identify and characterize the defect in gdT17 cell development from HEB f/f, Vav-icre (HEB cKO), and Id3 germline-deficient mice. HEB cKO mice showed defects in the gdT17 program at an early stage, and failed to properly upregulate expression of Id3 along with other genes downstream of TCR signaling. Id3KO mice showed a later defect in maturation. The results together indicate HEB and Id3 act sequentially during gdT17 development. The authors further showed that HEB and TCR signaling synergize to upregulate Id3 expression in the Scid-adh DN3-like T cell line. Analysis of previously published Chi-seq data revealed binding of HEB (and Egr2) at overlapping regulatory regions near Id3 in DN3 cells.

      The study provides insight into mechanisms by which HEB and Id3 act to mediate gdT17 specification and maturation. The work is well performed and clearly presented. We only have minor comments.

    3. Reviewer #2 (Public review):

      Summary:

      The manuscript by Selvaratnam et al. defines how the transcription factor HEB integrates with TCR signaling to regulate Id3 expression in the context of gdT17 maturation in the fetal thymus. Using conditional HEB ablation driven by Vav Cre, flow cytometry, scRNA-seq, and reanalysis of ChIP-seq data the authors, provide evidence for a sequential model in which HEB and TCR-induced Egr2 cooperatively upregulate Id3, enabling gdT17 maturation and limiting diversion to the ab lineages. The work provides an important mechanistic insight into how the E/ID-protein axis coordinates gd T cell specification and effector maturation.

      Strengths include:

      (1) The proposed model that HEB primes, TCR induces, and Id3 stabilizes gdT17 cells in embryonal development is elegant and consistent with the findings.

      (2) The choice of animal models and the study of a precise developmental window.

      (3) The cross-validation of flow, scRNA-seq, and ChIP-seq reanalyses strengthens the conclusions.

      (4) The study clarifies the dual role of Id3, first as an HEB-dependent maturation factor for gdT17 cells, and as a suppressor of diversion to the ab lineages.

      Weaknesses:

      (1) The ChIP-seq reanalysis indicates overlapping HEB, E2A, and Egr2 peaks ~60 kb upstream of Id3. Given that the Egr2 data are not generated using the same thymocyte subsets, some form of validation should be considered for the co-binding of HEB and Egr2, potentially ChIP-qPCR in sorted gdT17 progenitors.

      (2) E2A expression is not affected in HEB-deficient cells, raising the question of partial compensation, a point that should be specifically discussed.

      (3) All experiments are done at E18, when fetal gdT17 development predominates. The discussion could address whether these mechanisms extend to neonatal or adult gdT17 subsets.

    4. Reviewer #3 (Public review):

      Summary:

      The authors of this manuscript have addressed a key concept in T cell development: how early thymus gd T cell subsets are specified and the elements that govern gd T17 versus other gd T cell subsets or ab T cell subsets are specified. They show that the transcriptional regulator HEB/Tcf12 plays a critical role in specifying the gd T17 lineage and, intriguingly, that it upregulates the inhibitor Id3, which is later required for further gd T17 maturation.

      Strengths:

      The conclusions drawn by the authors are amply supported by a detailed analysis of various stages of T cell maturation in WT and KO mouse strains at the single cell level, both phenotypically, by flow cytometry for various diagnostic surface markers, and transcriptionally, by single cell sequencing. Their conclusions are balanced and well supported by the data and citations of previous literature.

      Weaknesses:

      I actually found this work to be quite comprehensive. I have a few suggestions for additional analyses the authors could explore that are unrelated to the predominant conclusions of the manuscript, but I failed to find major flaws in the current work.

      I note that HEB is expressed in many hematopoietic lineages from the earliest progenitors and throughout T cell development. It is also noteworthy that abortive gamma and delta TCR rearrangements have been observed in early NK cells and ILCs, suggesting that, particularly in early thymic development, specification of these lineages may have lower fidelity. It might prove interesting to see whether their single-cell sequencing or flow data reveal changes in the frequency of these other T-cell-related lineages. Is it possible that HEB is playing a role not only in the fidelity of gdT17 cell specification, but also perhaps in the separation of T cells from NK cells and ILCs or the frequency of DN1, DN2, and DN3 cells? Perhaps their single-cell sequencing data or flow analyses could examine the frequency of these cells? That minor caveat aside, I find this to be an extremely exciting body of work.

    1. eLife Assessment

      This useful manuscript reports findings indicating that cell cycle progression and cytokinesis both play a role in the transition of early to late neural stem cell fates. The imaging data are solid and mostly support the conclusions. However, experimental details are missing, the method of quantitation could be improved, and orthogonal approaches are needed to confirm the findings, which are based on loss-of-function approaches and are not sufficient to support some of the authors' conclusions. Lastly, there is no investigation of the underlying mechanism linking the cell cycle or cytokinesis to the changes (or lack thereof) of early and late NSC fates.

    2. Reviewer #1 (Public review):

      Summary:

      Drosophila larval type II neuroblasts generate diverse types of neurons by sequentially expressing different temporal identity genes during development. Previous studies have shown that the transition from early temporal identity genes (such as Chinmo and Imp) to late temporal identity genes (such as Syp and Broad) depends on the activation of the expression of EcR by Seven-up (Svp) and progression through the G1/S transition of the cell cycle. In this study, Chaya and Syed examined whether the expression of Syp and EcR is regulated by cell cycle and cytokinesis by knocking down CDK1 or Pav, respectively, throughout development or at specific developmental stages. They find that knocking down CDK1 or Pav either in all type II neuroblasts throughout development or in single-type neuroblast clones after larval hatching consistently leads to failure to activate late temporal identity genes Syp and EcR. To determine whether the failure of the activation of Syp and EcR is due to impaired Svp expression, they also examined Svp expression using a Svp-lacZ reporter line. They find that Svp is expressed normally in CDK1 RNAi neuroblasts. Further, knocking down CDK1 or Pav after Svp activation still leads to loss of Syp and EcR expression. Finally, they also extended their analysis to type I neuroblasts. They find that knocking down CDK1 or Pav, either at 0 hours or at 42 hours after larval hatching, also results in loss of Syp and EcR expression in type I neuroblasts. Based on these findings, the authors conclude that cycle and cytokinesis are required for the transition from early to late temporal identity genes in both types of neuroblasts. These findings add mechanistic details to our understanding of the temporal patterning of Drosophila larval neuroblasts.

      Strengths:

      The data presented in the paper are solid and largely support their conclusion. Images are of high quality. The manuscript is well-written and clear.

      Weaknesses:

      The quantifications of the expression of temporal identity genes and the interpretation of some of the data could be more rigorous.

      (1) Expression of temporal identity genes may not be just positive or negative. Therefore, it would be more rigorous to quantify the expression of Imp, Syp, and EcR based on the staining intensity rather than simply counting the number of neuroblasts that are positive for these genes, which can be very subjective. Or the authors should define clearly what qualifies as "positive" (e.g., a staining intensity at least 2x background).

      (2) The finding that inhibiting cytokinesis without affecting nuclear divisions by knocking down Pav leads to the loss of expression of Syp and EcR does not support their conclusion that nuclear division is also essential for the early-late gene expression switch in type II NSCs (at the bottom of the left column on page 5). No experiments were done to specifically block the nuclear division in this study. This conclusion should be revised.

      (3) Knocking down CDK1 in single random neuroblast clones does not make the CDK1 knockdown neuroblast develop in the same environment (except still in the same brain) as wild-type neuroblast lineages. It does not help address the concern whether "type 2 NSCS with cell cycle arrest failed to undergo normal temporal progression is indirectly due to a lack of feedback signaling from their progeny", as discussed (from the bottom of the right column on page 9 to the top of the left column on page 10). The CDK1 knockdown neuroblasts do not divide to produce progeny and thus do not receive a feedback signal from their progeny as wild-type neuroblasts do. Therefore, it cannot be ruled out that the loss of Syp and EcR expression in CDK1 knockdown neuroblasts is due to the lack of the feedback signal from their progeny. This part of the discussion needs to be clarified.

      (4) In Figure 2I, there is a clear EcR staining signal in the clone, which contradicts the quantification data in Figure 2J that EcR is absent in Pav RNAi neuroblasts. The authors should verify that the image and quantification data are consistent and correct.

    3. Reviewer #2 (Public review):

      Summary:

      Neural stem cells produce a wide variety of neurons during development. The regulatory mechanisms of neural diversity are based on the spatial and temporal patterning of neural stem cells. Although the molecular basis of spatial patterning is well-understood, the temporal patterning mechanism remains unclear. In this manuscript, the authors focused on the roles of cell cycle progression and cytokinesis in temporal patterning and found that both are involved in this process.

      Strengths:

      They conducted RNAi-mediated disruption on cell cycle progression and cytokinesis. As they expected, both disruptions affected temporal patterning in NSCs.

      Weaknesses:

      Although the authors showed clear results, they needed to provide additional data to support their conclusion sufficiently.

      For example, they need to identify type II NSCs using molecular markers (Ase/Dpn).

      The authors are encouraged to provide a more detailed explanation of each experiment. The current version of the manuscript is difficult for non-expert readers to understand.

    4. Reviewer #3 (Public review):

      Summary:

      The manuscript by Chaya and Syed focuses on understanding the link between cell cycle and temporal patterning in central brain type II neural stem cells (NSCs). To investigate this, the authors perturb the progression of the cell cycle by delaying the entry into M phase and preventing cytokinesis. Their results convincingly show that temporal factor expression requires progression of the cell cycle in both Type 1 and Type 2 NSCs in the Drosophila central brain. Overall, this study establishes an important link between the two timing mechanisms of neurogenesis.

      Strengths:

      The authors provide solid experimental evidence for the coupling of cell cycle and temporal factor progression in Type 2 NSCs. The quantified phenotype shows an all-or-none effect of cell cycle block on the emergence of subsequent temporal factors in the NSCs, strongly suggesting that both nuclear division and cytokinesis are required for temporal progression. The authors also extend this phenotype to Type 1 NSCs in the central brain, providing a generalizable characterization of the relationship between cell cycle and temporal patterning.

      Weaknesses:

      One major weakness of the study is that the authors do not explore the mechanistic relationship between the cell cycle and temporal factor expression. Although their results are quite convincing, they do not provide an explanation as to why Cdk1 depletion affects Syp and EcR expression but not the onset of svp. This result suggests that at least a part of the temporal cascade in NSCs is cell-cycle independent, which isn't addressed or sufficiently discussed.

    1. eLife Assessment

      The manuscript by Hensley and Yildez studies the mechanical behavior of kinesin under conditions where the z-component of the applied force is minimized. The important study shows that much of the mechanical information gleaned from the traditional "one bead" with attached kinesin approach was probably profoundly influenced by the direction of the applied force. The data are convincing, but in some cases the amount of data collected appears to be smaller than optimal.

    2. Reviewer #1 (Public review):

      Summary:

      The manuscript by Hensley and Yildez studies the mechanical behavior of kinesin under conditions where the z-component of the applied force is minimized. This is accomplished by tethering the kinesin to the trapped bead with a long double-stranded DNA segment as opposed to directly binding the kinesin to the large bead. It complements several recent studies that have used different approaches to looking at the mechanical properties of kinesin under low z-force loads. The study shows that much of the mechanical information gleaned from the traditional "one bead" with attached kinesin approach was probably profoundly influenced by the direction of the applied force. The authors speculate that when moving small vesicle cargos (particularly membrane-bound ones), the direction of resisting force on the motor has much less of a z-component than might be experienced if the motor were moving large organelles like mitochondria.

      Strengths:

      The approach is sound and provides an alternative method to examine the mechanics of kinesin under conditions where the z-component of the force is lessened. The data show that kinesin has very different mechanical properties compared to those extensively reported using the "single-bead" assay, where the molecule is directly coupled to a large bead, which is then trapped.

      Weaknesses:

      My primary concern is that in some of the studies, there are not enough data points to be totally convincing. This is particularly apparent in the low z-force condition of Figure 1C and in Figure 2B.

      The substoichiometric binding of kinesins to multivalent DNA complicates the interpretation of the data.

    3. Reviewer #2 (Public review):

      This short report by Hensley and Yildiz explores kinesin-1 motility under more physiological load geometries than previous studies. Large Z-direction (or radial) forces are a consequence of certain optical trap experimental geometries, and likely do not occur in the cell. Use of a long DNA tether between the motor and the bead can alleviate Z-component forces. The authors perform three experiments. In the first, they use two assay geometries - one with kinesin attached directly to a bead and the other with kinesin attached via a 2 kbp DNA tether - with a constant-position trap to determine that reducing the Z component of force leads to a difference in stall time but not stall force. In the second, they use the same two assay geometries with a constant-force trap to replicate the asymmetric slip bond of kinesin-1; reducing the Z component of force leads to a small but uniform change in the run lengths and detachment rates under hindering forces but not assisting forces. In the third, they connect two or three kinesin molecules to each DNA, and measure a stronger scaling in stall force and time when the Z component of force is reduced. They conclude that kinesin-1 is a more robust motor than previously envisaged, where much of its weakness came from the application of axial force. If forces are instead along the direction of transport, kinesin can hold on longer and work well in teams. The experiments are rigorous, and the data quality is very high. There is little to critique or discuss. The improved dataset will be useful for modeling and understanding multi-motor transport. The conclusions complement other recent works that used different approaches to low-Z component kinesin force spectroscopy, and provide strong value to the kinesin field.

      Major comments:

      (1) Kinesin-1 is covalently bound to a DNA oligo, which then attaches to the DNA chassis by hybridization. This oligo is 21 nt with a relatively low GC%. At what force does this oligo unhybridize? Can the authors verify that their stall force measurements are not cut short by the oligo detaching from the chassis?

      (2) Figure 1, a justification or explanation should be provided for why events lower than 1.5 pN were excluded. It appears arbitrary.

      (3) Figure 2b, is the difference in velocity statistically significant?

      (4) The number of measurements for each experimental datapoint in the corresponding figure caption should be provided. SEM is used without, but N is not reported in the caption.

    4. Reviewer #3 (Public review):

      Summary:

      Hensley et al. present an important study into the force-detachment behaviour of kinesin-1, the most well-characterised motor protein. One of the key techniques used to characterise kinesins is in vitro optical trapping of purified proteins, which has provided remarkable insights into the biochemical and mechanical mechanisms of motor proteins under single- and multi-motor conditions. This study presents an adapted (from Urbanska et al.) methodological approach of DNA-tethering kinesin-1 to a bead, both under single- and multi-motor conditions, which is then trapped to characterise the run length, processivity, and stall behaviour under unloaded and loaded (both assisting and hindering) conditions. The new approach reduces the vertical or z-force and thus provides insights into the role of horizontal or x-forces acting on the motor. Based on their method of imposing dominant horizontal forces on the motor and their data, they conclude that kinesin-1 exhibits a higher asymmetry in its force-detachment kinetics, is less slippery, and exhibits slip-bond behaviour, particularly under hindering loads. Under assisting loads, similar slip-bond kinetics ensue, but detachment from the microtubule is far more sensitive. To demonstrate the implications of their method and data, they conduct a multi-motor assay and show that multiple kinesin-1 motors can generate significantly higher forces, almost proportional to motor number. Overall, this is important work, and the data are compelling.

      Strengths:

      The method of DNA-tethered motor trapping is effective in reducing vertical forces and can be easily optimised for other motors and protein characterisation. The major strength of the paper is characterising kinesin-1 under low z-forces, which is likely to reflect the physiological scenario. They report that kinesin-1 is more robust and less prone to premature detachment. The motors exhibit higher stall rates and times. Under hindering and assisting loads, kinesin-1 detachment is more asymmetric and sensitive, and with low z-force shows that slip-behaviour kinetics prevail. Another achievement of this paper is the demonstration of the multi-motor kinesin-1 assay using their low-z force method, showing that multiple kinesin-1 motors are capable of generating higher forces (up to 15 pN, and nearly proportional to motor number), thus opening an avenue to study multiple motor coordination.

      Weaknesses:

      The method of DNA-tethered motor trapping to enable low z-force is not entirely novel, but adapted from Urbanska (2021) for use in conventional optical trapping laboratories without reliance on microfluidics. However, I appreciate that they have fully established it here to share with the community. The authors could strengthen their methods section by being transparent about protein weight, protein labelling, and DNA ladders shown in the supplementary information. What organism is the protein from? Presumably human, but this should be specified in the methods. While the figures show beautiful data and exemplary traces, the total number of molecules analysed or events is not consistently reported. Overall, certain methodological details should be made sufficient for reproducibility.

      The major limitation the study presents is overarching generalisability, starting with the title. I recommend that the title be specific to kinesin-1. The study uses two constructs: a truncated K560 for conventional high-force assays, and full-length Kif5b for the low z-force method. However, for the multi-motor assay, the authors use K560 with the rationale of preventing autoinhibition due to binding with DNA, but that would also have limited characterisation in the single-molecule assay. Overall, the data generated are clear, high-quality, and exciting in the low z-force conditions. But why have they not compared or validated their findings with the truncated construct K560? This is especially important in the force-feedback experiments and in comparison with Andreasson et al. and Carter et al., who use Drosophila kinesin-1. Could kinesin-1 across organisms exhibit different force-detachment kinetics? It is quite possible. Similarly, the authors test backward slipping of Kif5b and K560 and measure dwell times in multi-motor assays. Why not detail the backward slippage kinetics of Kif5b and any step-size impact under low z-forces? For instance, with the traces they already have, the authors could determine slip times, distances, and frequency in horizontal force experiments. Overall, the manuscript could be strengthened by analysing both constructs more fully.

      Appraisal and impact:

      This study contributes to important and debated evidence on kinesin-1 force-detachment kinetics. The authors conclude that kinesin-1 exhibits a slip-bond interaction with the microtubule under increasing forces, while other recent studies (Noell et al. and Kuo et al.), which also use low z-force setups, conclude catch-bond behaviour under hindering loads. I find the results not fully aligned with their interpretation. The first comparison of low z-forces in their setup with Noell et al. (2024), based on stall times, does not hold, because it is an apples-to-oranges comparison. Their data show a stall time constant of 2.52 s, which is comparable to the 3 s reported by Noell et al., but the comparison is made with a weighted average of 1.49 s. The authors do report that detachment rates are lower in low z-force conditions under unloaded scenarios. So, to completely rule out catch-bond-like behaviour is unfair. That said, their data quality is good and does show that higher hindering forces lead to higher detachment rates. However, on closer inspection, the range of 0-5 pN shows either a decrease or no change in detachment rate, which suggests that under a hindering force threshold, catch-bond-like or ideal-bond-like behaviour is possible, followed by slip-bond behaviour, which is amazing resolution. Under assisting loads, the slip-bond character is consistent, as expected. Overall, the study contributes to an important discussion in the biophysical community and is needed, but requires cautious framing, particularly without evidence of motor trapping in a high microtubule-affinity state rather than genuine bond strengthening.

    1. eLife Assessment

      This important manuscript presents a thorough analysis of the evolution of Major Histocompatibility Complex gene families across primates. A key strength of this analysis is the use of state-of-the-art phylogenetic methods to estimate rates of gene gain and loss, accounting for the notorious difficulty to properly assemble MHC genomic regions. Overall the evidence for the authors' conclusions – that there is considerable diversity in how MHC diversity is deployed across species – is compelling.

    2. Joint Public Review:

      Summary:

      The Major Histocompatibility Complex (MHC) region is a collection of numerous genes involved in both innate and adaptive immunity. MHC genes are famed for their role in rapid evolution and extensive polymorphism in a variety of vertebrates. This paper presents a summary of gene-level gain and loss of orthologs and paralogs within MHC across the diversity of primates, using publicly available data.

      Strengths:

      This paper provides a strong case that MHC genes are rapidly gained (by paralog duplication) and lost over millions of years of macroevolution. The authors are able to identify MHC loci by homology across species, and from this infer gene duplications and losses using phylogenetic analyses. There is a remarkable amount of genic turnover, summarized in Figure 6 and Figure 7, either of which might be a future textbook figure of immune gene family evolution. The authors draw on state-of-the-art phylogenetic methods, and their inferences are robust.

      Editorial note:

      The authors have responded to the previous reviews and the Assessment was updated without involving the reviewers again.

    3. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      The Major Histocompatibility Complex (MHC) region is a collection of numerous genes involved in both innate and adaptive immunity. MHC genes are famed for their role in rapid evolution and extensive polymorphism in a variety of vertebrates. This paper presents a summary of gene-level gain and loss of orthologs and paralogs within MHC across the diversity of primates, using publicly available data.

      Strengths:

      This paper provides a strong case that MHC genes are rapidly gained (by paralog duplication) and lost over millions of years of macroevolution. The authors are able to identify MHC loci by homology across species, and from this infer gene duplications and losses using phylogenetic analyses. There is a remarkable amount of genic turnover, summarized in Figure 6 and Figure 7, either of which might be a future textbook figure of immune gene family evolution. The authors draw on state-of-the-art phylogenetic methods, and their inferences are robust insofar as the data might be complete enough to draw such conclusions.

      Weaknesses:

      One concern about the present work is that it relies on public databases to draw inferences about gene loss, which is potentially risky if the publicly available sequence data are incomplete. To say, for example, that a particular MHC gene copy is absent in a taxon (e.g., Class I locus F absent in Guenons according to Figure 1), we need to trust that its absence from the available databases is an accurate reflection of its absence in the genome of the actual organisms. This may be a safe assumption, but it rests on the completeness of genome assembly (and gene annotations?) or people uploading relevant data. This reviewer would have been far more comfortable had the authors engaged in some active spot-checking, doing the lab work to try to confirm absences at least for some loci and some species. Without this, a reader is left to wonder whether gene loss is simply reflecting imperfect databases, which then undercuts confidence in estimates of rates of gene loss.

      Indeed, just because a locus has not been confirmed in a species does not necessarily mean that it is absent. As we explain in the Figure 1 caption, only a few species have had their genomes extensively studied (gray background), and only for these species does the absence of a point in this figure mean that a locus is absent. The white background rows represent species that are not extensively studied, and we point out that the absence of a point does not mean that a locus is absent from the species, rather undiscovered. We have also added a parenthetical to the text to explain this (line 156): “Only species with rows highlighted in gray have had their MHC regions extensively studied (and thus only for these rows is the absence of a gene symbol meaningful).”

      While we agree that spot-checking may be a helpful next step, one of the goals of this manuscript is to collect and synthesize the enormous volume of MHC evolution research in the primates, which will serve as a jumping-off point for other researchers to perform important wet lab work.

      Some context is useful for comparing rates of gene turnover in MHC, to other loci. Changing gene copy numbers, duplications, and loss of duplicates, are common it seems across many loci and many organisms; is MHC exceptional in this regard, or merely behaving like any moderately large gene family? I would very much have liked to see comparable analyses done for other gene families (immune, like TLRs, or non-immune), and quantitative comparisons of evolutionary rates between MHC versus other genes. Does MHC gene composition evolve any faster than a random gene family? At present readers may be tempted to infer this, but evidence is not provided.

      Our companion paper (Fortier and Pritchard, 2025) demonstrates that the MHC is a unique locus in many regards, such as its evidence for deep balancing selection and its excess of disease associations. Thus, we expect that it is evolving faster than any random gene family. It would be interesting to repeat this analysis for other gene families, but that is outside of the scope of this project. Additionally, allele databases for other gene families are not nearly as developed, but as more alleles become available for other polymorphic families, a comparable analysis could become possible.

      We have added a paragraph to the discussion (lines 530-546) to clarify that we do not know for certain whether the MHC gene family is evolving rapidly compared to other gene families.

      While on the topic of making comparisons, the authors make a few statements about relative rates. For instance, lines 447-8 compare gene topology of classical versus non-classical genes; and line 450 states that classical genes experience more turnover. But there are no quantitative values given to these rates to provide numerical comparisons, nor confidence intervals provided (these are needed, given that they are estimates), nor formal statistical comparisons to confirm our confidence that rates differ between types of genes.

      More broadly, the paper uses sophisticated phylogenetic methods, but without taking advantage of macroevolutionary comparative methods that allow model-based estimation of macroevolutionary rates. I found the lack of quantitative measurements of rates of gene gain/loss to be a weakness of the present version of the paper, and something that should be readily remedied. When claiming that MHC Class I genes "turn over rapidly" (line 476) - what does rapidly mean? How rapidly? How does that compare to rates of genetic turnover at other families? Quantitative statements should be supported by quantitative estimates (and their confidence intervals).

      These statements refer to qualitative observations, so we cannot provide numerical values. We simply conclude that certain gene groups evolve faster or slower based on the species and genes present in each clade. It is difficult to provide estimates because of the incomplete sampling of genes that survived to the present day. In addition, the presence or absence of various orthologs in different species still needs to be confirmed, at which point it might be useful to be more quantitative. We have also added a paragraph to the discussion to address this concern and advocate for similar analyses of other gene families in the future when more data is available (lines 530-546).

      The authors refer to 'shared function of the MHC across species' (e.g. line 22); while this is likely true, they are not here presenting any functional data to confirm this, nor can they rule out neofunctionalization or subfunctionalization of gene duplicates. There is evidence in other vertebrates (e.g., cod) of MHC evolving appreciably altered functions, so one may not safely assume the function of a locus is static over long macroevolutionary periods, although that would be a plausible assumption at first glance.

      Indeed, we cannot assume that the function of a locus is static across time, especially for the MHC region. In our research, we read hundreds of papers that each focused on a small number of species or genes and gathered some information about them, sometimes based on functional experiments and sometimes on measures such as dN/dS. These provide some indication of a gene’s broad classification in a species or clade, even if the evidence is preliminary. Where possible, we used this preliminary evidence to give genes descriptors “classical,” “non-classical,” “dual characteristics,” “pseudogene,” “fixed”, or “unfixed.” Sometimes multiple individuals and haplotypes were analyzed, so we could even assign a minimum number of gene copies present in a species. We have aggregated all of these references into Supplementary Table 1 (for Class I/Figure 1) and Supplementary Table 2 (for Class II/Figure 2) along with specific details about which data points in these figures that each reference supports. We realize that many of these classifications are based on a small number of individuals or indirect measures, so they may change in the future as more functional data is generated.

      Reviewer #2 (Public review):

      Summary:

      The authors aim to provide a comprehensive understanding of the evolutionary history of the Major Histocompatibility Complex (MHC) gene family across primate species. Specifically, they sought to:

      (1) Analyze the evolutionary patterns of MHC genes and pseudogenes across the entire primate order, spanning 60 million years of evolution.

      (2) Build gene and allele trees to compare the evolutionary rates of MHC Class I and Class II genes, with a focus on identifying which genes have evolved rapidly and which have remained stable.

      (3) Investigate the role of often-overlooked pseudogenes in reconstructing evolutionary events, especially within the Class I region.

      (4) Highlight how different primate species use varied MHC genes, haplotypes, and genetic variation to mount successful immune responses, despite the shared function of the MHC across species.

      (5) Fill gaps in the current understanding of MHC evolution by taking a broader, multi-species perspective using (a) phylogenomic analytical computing methods such as Beast2, Geneconv, BLAST, and the much larger computing capacities that have been developed and made available to researchers over the past few decades, (b) literature review for gene content and arrangement, and genomic rearrangements via haplotype comparisons.

      (6) The authors overall conclusions based on their analyses and results are that 'different species employ different genes, haplotypes, and patterns of variation to achieve a successful immune response'.

      Strengths:

      Essentially, much of the information presented in this paper is already well-known in the MHC field of genomic and genetic research, with few new conclusions and with insufficient respect to past studies. Nevertheless, while MHC evolution is a well-studied area, this paper potentially adds some originality through its comprehensive, cross-species evolutionary analysis of primates, focus on pseudogenes and the modern, large-scale methods employed. Its originality lies in its broad evolutionary scope of the primate order among mammals with solid methodological and phylogenetic analyses.

      The main strengths of this study are the use of large publicly available databases for primate MHC sequences, the intensive computing involved, the phylogenetic tool Beast2 to create multigene Bayesian phylogenetic trees using sequences from all genes and species, separated into Class I and Class II groups to provide a backbone of broad relationships to investigate subtrees, and the presentation of various subtrees as species and gene trees in an attempt to elucidate the unique gene duplications within the different species. The study provides some additional insights with summaries of MHC reference genomes and haplotypes in the context of a literature review to identify the gene content and haplotypes known to be present in different primate species. The phylogenetic overlays or ideograms (Figures 6 and 7) in part show the complexity of the evolution and organisation of the primate MHC genes via the orthologous and paralogous gene and species pathways progressively from the poorly-studied NWM, across a few moderately studied ape species, to the better-studied human MHC genes and haplotypes.

      Weaknesses:

      The title 'The Primate Major Histocompatibility Complex: An Illustrative Example of GeneFamily Evolution' suggests that the paper will explore how the Major Histocompatibility Complex (MHC) in primates serves as a model for understanding gene family evolution. The term 'Illustrative Example' in the title would be appropriate if the paper aimed to use the primate Major Histocompatibility Complex (MHC) as a clear and representative case to demonstrate broader principles of gene family evolution. That is, the MHC gene family is not just one instance of gene family evolution but serves as a well-studied, insightful example that can highlight key mechanisms and concepts applicable to other gene families. However, this is not the case, this paper only covers specific details of primate MHC evolution without drawing broader lessons to any other gene families. So, the term 'Illustrative Example' is too broad or generalizing. In this case, a term like 'Case Study' or simply 'Example' would be more suitable. Perhaps, 'An Example of Gene Family Diversity' would be more precise. Also, an explanation or 'reminder' is suggested that this study is not about the origins of the MHC genes from the earliest jawed vertebrates per se (~600 mya), but it is an extension within a subspecies set that has emerged relatively late (~60 mya) in the evolutionary divergent pathways of the MHC genes, systems, and various vertebrate species.

      Thank you for your input on the title; we have changed it to “A case study of gene family evolution” instead.

      Thank you also for pointing out the potential confusion about the time span of our study. We have added “Having originated in the jawed vertebrates,” to a sentence in the introduction (lines 38-39). We have also added the sentence “Here, we focus on the primates, spanning approximately 60 million years within the over 500-million-year evolution of the family \citep{Flajnik2010}.“ to be more explicit about the context for our work (lines 59-61).

      Phylogenomics. Particular weaknesses in this study are the limitations and problems associated with providing phylogenetic gene and species trees to try and solve the complex issue of the molecular mechanisms involved with imperfect gene duplications, losses, and rearrangements in a complex genomic region such as the MHC that is involved in various effects on the response and regulation of the immune system. A particular deficiency is drawing conclusions based on a single exon of the genes. Different exons present different trees. Which are the more reliable? Why were introns not included in the analyses? The authors attempt to overcome these limitations by including genomic haplotype analysis, duplication models, and the supporting or contradictory information available in previous publications. They succeed in part with this multidiscipline approach, but much is missed because of biased literature selection. The authors should include a paragraph about the benefits and limitations of the software that they have chosen for their analysis, and perhaps suggest some alternative tools that they might have tried comparatively. How were problems with Bayesian phylogeny such as computational intensity, choosing probabilities, choosing particular exons for analysis, assumptions of evolutionary models, rates of evolution, systemic bias, and absence of structural and functional information addressed and controlled for in this study?

      We agree that different exons have different trees, which is exactly why we repeated our analysis for each exon in order to compare and contrast them. In particular, the exons encoding the binding site of the resulting protein (exons 2 and 3 for Class I and exon 2 for Class II) show evidence for trans-species polymorphism and gene conversion. These phenomena lead to trees that do not follow the species tree and are fascinating in and of themselves, which we explore in detail in our companion paper (Fortier and Pritchard, 2025). Meanwhile, the non-peptide-binding extracellular-domain-encoding exon (exon 4 for Class I and exon 3 for Class II) is comparably sized to the binding-site-encoding exons and provides an interesting functional contrast. As this exon is likely less affected by trans-species polymorphism, gene conversion, and convergent evolution, we present results from it most often in the main text, though we occasionally touch on differences between the exons. See lines 191-196, 223-226, and 407-414 for some examples of how we discuss the exons in the text. Additionally, all trees from all of these exons can be found in the supplement. 

      We agree that introns would valuable to study in this context. Even though the non--binding-site-encoding exons are probably *less* affected by trans-species polymorphism, gene conversion, and convergent evolution, they are still functional. The introns, however, experience much more relaxed selection, if any, and comparing their trees to those for the exons would be valuable and illuminating. We did not generate intron trees for two reasons. Most importantly, there is a dearth of data available for the introns; in the databases we used, there was often intron data available only for human, chimpanzee, and sometimes macaque, and only for a small subset of the genes. This limitation is at odds with the comprehensive, many-gene-many-species approach which we feel is the main novelty of this work. Secondly, the introns that *are* available are difficult to align. Even aligning the exons across such a highly-diverged set of genes and pseudogenes was difficult and required manual effort. The introns proved even more difficult to try to align across genes. In the future, when more intron data is available and sufficient effort is put into aligning them, it will be possible and desirable to do a comparable analysis. We also added a sentence to the “Data” section to briefly explain why we did not include introns (lines 134-135).

      We explain our Bayesian phylogenetics approach in detail in the Methods (lines 650-725), including our assumptions and our solutions to challenges specific to this application. For further explanation of the method itself, we suggest reading the original BEAST and BEAST2 papers (Drummond & Rambaut (2007), Drummond et al. (2012), Bouckaert et al. (2014), and Bouckaert et al. (2019)). Known structural and functional information helped us validate the alignments we used in this study, but the fact that such information is not fully known for every gene and species should not affect the method itself.

      Gene families as haplotypes. In the Introduction, the MHC is referred to as a 'gene family', and in paragraph 2, it is described as being united by the 'MHC fold', despite exhibiting 'very diverse functions'. However, the MHC region is more accurately described as a multigene region containing diverse, haplotype-specific Conserved Polymorphic Sequences, many of which are likely to be regulatory rather than protein-coding. These regulatory elements are essential for controlling the expression of multiple MHC-related products, such as TNF and complement proteins, a relationship demonstrated over 30 years ago. Non-MHC fold loci such as TNF, complement, POU5F1, lncRNA, TRIM genes, LTA, LTB, NFkBIL1, etc, are present across all MHC haplotypes and play significant roles in regulation. Evolutionary selection must act on genotypes, considering both paternal and maternal haplotypes, rather than on individual genes alone. While it is valuable to compile databases for public use, their utility is diminished if they perpetuate outdated theories like the 'birth-and-death model'. The inclusion of prior information or assumptions used in a statistical or computational model, typically in Bayesian analysis, is commendable, but they should be based on genotypic data rather than older models. A more robust approach would consider the imperfect duplication of segments, the history of their conservation, and the functional differences in inheritance patterns. Additionally, the MHC should be examined as a genomic region, with ancestral haplotypes and sequence changes or rearrangements serving as key indicators of human evolution after the 'Out of Africa' migration, and with disease susceptibility providing a measurable outcome. There are more than 7000 different HLA-B and -C alleles at each locus, which suggests that there are many thousands of human HLA haplotypes to study. In this regard, the studies by Dawkins et al (1999 Immunol Rev 167,275), Shiina et al. (2006 Genetics 173,1555) on human MHC gene diversity and disease hitchhiking (haplotypes), and Sznarkowska et al. (2020 Cancers 12,1155) on the complex regulatory networks governing MHC expression, both in terms of immune transcription factor binding sites and regulatory non-coding RNAs, should be examined in greater detail, particularly in the context of MHC gene allelic diversity and locus organization in humans and other primates.

      Thank you for these comments. To clarify that the MHC “region” is different from (and contains) the MHC “gene family” as we describe it, we changed a sentence in the abstract (lines 8-10) from “One large gene family that has experienced rapid evolution is the Major Histocompatibility Complex (MHC), whose proteins serve critical roles in innate and adaptive immunity.” to “One large gene family that has experienced rapid evolution lies within the Major Histocompatibility Complex (MHC), whose proteins serve critical roles in innate and adaptive immunity.” We know that the region is complex and contains many other genes and regulatory sequences; Figure 1 of our companion paper (Fortier and Pritchard, 2025) depicts these in order to show the reader that the MHC genes we focus on are just one part of the entire region.

      We love the suggestion to look at the many thousands of alleles present at each of the classical loci. This is the focus of our complimentary paper (Fortier and Pritchard, 2025) which explores variation at the allele level. In the current paper, we look mainly at the differences between genes and the use of different genes in different species.

      Diversifying and/or concerted evolution. Both this and past studies highlight diversifying selection or balancing selection model is the dominant force in MHC evolution. This is primarily because the extreme polymorphism observed in MHC genes is advantageous for populations in terms of pathogen defence. Diversification increases the range of peptides that can be presented to T cells, enhancing the immune response. The peptide-binding regions of MHC genes are highly variable, and this variability is maintained through selection for immune function, especially in the face of rapidly evolving pathogens. In contrast, concerted evolution, which typically involves the homogenization of gene duplicates through processes like gene conversion or unequal crossing-over, seems to play a minimal role in MHC evolution. Although gene duplication events have occurred in the MHC region leading to the expansion of gene families, the resulting paralogs often undergo divergent evolution rather than being kept similar or homozygous by concerted evolution. Therefore, unlike gene families such as ribosomal RNA genes or histone genes, where concerted evolution leads to highly similar copies, MHC genes display much higher levels of allelic and functional diversification. Each MHC gene copy tends to evolve independently after duplication, acquiring unique polymorphisms that enhance the repertoire of antigen presentation, rather than undergoing homogenization through gene conversion. Also, in some populations with high polymorphism or genetic drift, allele frequencies may become similar over time without the influence of gene conversion. This similarity can be mistaken for gene conversion when it is simply due to neutral evolution or drift, particularly in small populations or bottlenecked species. Moreover, gene conversion might contribute to greater diversity by creating hybrids or mosaics between different MHC genes. In this regard, can the authors indicate what percentage of the gene numbers in their study have been homogenised by gene conversion compared to those that have been diversified by gene conversion?

      We appreciate the summary, and we feel we have appropriately discussed both gene conversion and diversifying selection in the context of the MHC genes. Because we cannot know for sure when and where gene conversion has occurred, we cannot quantify percentages of genes that have been homogenized or diversified.  

      Duplication models. The phylogenetic overlays or ideograms (Figures 6 and 7) show considerable imperfect multigene duplications, losses, and rearrangements, but the paper's Discussion provides no in-depth consideration of the various multigenic models or mechanisms that can be used to explain the occurrence of such events. How do their duplication models compare to those proposed by others? For example, their text simply says on line 292, 'the proposed series of events is not always consistent with phylogenetic data'. How, why, when? Duplication models for the generation and extension of the human MHC class I genes as duplicons (extended gene or segmental genomic structures) by parsimonious imperfect tandem duplications with deletions and rearrangements in the alpha, beta, and kappa blocks were already formulated in the late 1990s and extended to the rhesus macaque in 2004 based on genomic haplotypic sequences. These studies were based on genomic sequences (genes, pseudogenes, retroelements), dot plot matrix comparisons, and phylogenetic analyses of gene and retroelement sequences using computer programs. It already was noted or proposed in these earlier 1999 studies that (1) the ancestor of HLA-P(90)/-T(16)/W(80) represented an old lineage separate from the other HLA class I genes in the alpha block, (2) HLA-U(21) is a duplicated fragment of HLA-A, (3) HLA-F and HLA-V(75) are among the earliest (progenitor) genes or outgroups within the alpha block, (4) distinct Alu and L1 retroelement sequences adjoining HLA-L(30), and HLA-N genomic segments (duplicons) in the kappa block are closely related to those in the HLA-B and HLA-C in the beta block; suggesting an inverted duplication and transposition of the HLA genes and retroelements between the beta and kappa regions. None of these prior human studies were referenced by Fortier and Pritchard in their paper. How does their human MHC class I gene duplication model (Fig. 6) such as gene duplication numbers and turnovers differ from those previously proposed and described by Kulski et al (1997 JME 45,599), (1999 JME 49,84), (2000 JME 50,510), Dawkins et al (1999 Immunol Rev 167,275), and Gaudieri et al (1999 GR 9,541)? Is this a case of reinventing the wheel?

      Figures 6 and 7 are intended to synthesize and reconcile past findings and our own trees, so they do not strictly adhere to the findings of any particular study and cannot fully match all studies. In the supplement, Figure 6 - figure supplement 1 and Figure 7 - figure supplement 1 duly credit all of the past work that went into making these trees. Most previous papers focus on just one aspect of these trees, such as haplotypes within a species, a specific gene or allelic lineage relationship, or the branching pattern of particular gene groups. We believe it was necessary to bring all of these pieces of evidence together. Even among papers with the same focus (to understand the block duplications that generated the current physical layout of the MHC), results differ. For example, Geraghty (1992), Hughes (1995), Kulski (2004)/Kulski (2005),  and Shiina (1999) all disagree on the exact branching order of the genes MHC-W, -P, and -T, and of MHC-G, -J, and -K. While the Kulski studies you pointed out were very thorough for their era, they still only relied on data from three species and one haplotype per species. Our work is not intended to replace or discredit these past works, simply build upon them with a larger set of species and sequences. We hope the hypotheses we propose in Figures 6 and 7 can help unify existing research and provide a more easily accessible jumping-off-point for future work.

      Results. The results are presented as new findings, whereas most if not all of the results' significance and importance already have been discussed in various other publications. Therefore, the authors might do better to combine the results and discussion into a single section with appropriate citations to previously published findings presented among their results for comparison. Do the trees and subsets differ from previous publications, albeit that they might have fewer comparative examples and samples than the present preprint? Alternatively, the results and discussion could be combined and presented as a review of the field, which would make more sense and be more honest than the current format of essentially rehashing old data.

      In starting this project, we found that a large barrier to entry to this field of study is the immense amount of published literature over 30+ years. It is both time-consuming and confusing to read up on the many nuances of the MHC genes, their changing names, and their evolution, making it difficult to start new, innovative projects. We acknowledge that while our results are not entirely novel, the main advantage of our work is that it provides a thorough, comprehensive starting point for others to learn about the MHC quickly and dive into new research. We feel that we have appropriately cited past literature in both the main text, appendices, and supplement, so that readers may dive into a particular area with ease.

      Minor corrections:

      (1) Abstract, line 19: 'modern methods'. Too general. What modern methods?

      To keep the abstract brief, the methods are introduced in the main text when each becomes relevant as well as in the methods section.

      (2) Abstract, line 25: 'look into [primate] MHC evolution.' The analysis is on the primate MHC genes, not on the entire vertebrate MHC evolution with a gene collection from sharks to humans. The non-primate MHC genes are often differently organised and structurally evolved in comparison to primate MHC.

      Thank you! We have added the word “primate” to the abstract (line 25).

      (3) Introduction, line 113. 'In a companion paper (Fortier and Pritchard, 2024)' This paper appears to be unpublished. If it's unpublished, it should not be referenced.

      This paper is undergoing the eLife editorial process at the same time; it will have a proper citation in the final version.

      (4) Figures 1 and 2. Use the term 'gene symbols' (circle, square, triangle, inverted triangle, diamond) or 'gene markers' instead of 'points'. 'Asterisks "within symbols" indicate new information.

      Thank you, the word “symbol” is much clearer! We have changed “points” to “symbols” in the captions for Figure 1, Figure 1 - figure supplement 1, Figure 2, and Figure 2 - figure supplement 1. We also changed this in the text (lines 157-158 and 170).

      (5) Figures. A variety of colours have been applied for visualisation. However, some coloured texts are so light in colour that they are difficult to read against a white background. Could darker colours or black be used for all or most texts?

      With such a large number of genes and species to handle in this work, it was nearly impossible to choose a set of colors that were distinct enough from each other. We decided to prioritize consistency (across this paper, its supplement, and our companion paper) as well as at-a-glance grouping of similar sequences. Unfortunately, this means we had to sacrifice readability on a white background, but readers may turn to the supplement if they need to access specific sequence names.

      (6) Results, line 135. '(Fortier and Pritchard, 2024)' This paper appears to be unpublished. If it's unpublished, it should not be referenced.

      Repeat of (3). This paper is undergoing the eLife editorial process at the same time; it will have a proper citation in the final version.

      (7) Results, lines 152 to 153, 164, 165, etc. 'Points with an asterisk'. Use the term 'gene symbols' (circle, square, triangle, inverted triangle, diamond) or 'gene markers' instead of 'points'. A point is a small dot such as those used in data points for plotting graphs .... The figures are so small that the asterisks in the circles, squares, triangles, etc, look like points (dots) and the points/asterisks terminology that is used is very confusing visually.

      Repeat of (4). Thank you, the word “symbol” is much clearer! We have changed “points” to “symbols” in the captions for Figure 1, Figure 1 - figure supplement 1, Figure 2, and Figure 2 - figure supplement 1. We also changed this in the text (lines 157-158 and 170).

      (8) Line 178 (BEA, 2024) is not listed alphabetically in the References.

      Thank you for catching this! This reference maps to the first bibliography entry, “SUMMARIZING POSTERIOR TREES.” We are unsure how to cite a webpage that has no explicit author within the eLife Overleaf template, so we will consult with the editor.

      (9) Lines 188-190. 'NWM MHC-G does not group with ape/OWM MHC-G, instead falling outside of the clade containing ape/OWM MHC-A, -G, -J and -K.' This is not surprising given that MHC-A, -G, -J, and -K are paralogs of each other and that some of them, especially in NWM have diverged over time from the paralogs and/or orthologs and might be closer to one paralog than another and not be an actual ortholog of OWM, apes or humans.

      We included this sentence to clarify the relationships between genes and to help describe what is happening in Figure 6. Figure 6 - figure supplement 1 includes all of the references that go into such a statement and Appendix 3 details our reasoning for this and other statements.

      (10) Line 249. Gene conversion: This is recombination between two different genes where a portion of the genes are exchanged with one another so that different portions of the gene can group within one or other of the two gene clades. Alternatively, the gene has been annotated incorrectly if the gene does not group within either of the two alternative clades. Another possibility is that one or two nucleotide mutations have occurred without a recombination resulting in a mistaken interpretation or conclusion of a recombination event. What measures are taken to avoid false-positive conclusions? How many MHC gene conversion (recombination) events have occurred according to the authors' estimates? What measures are taken to avoid false-positive conclusions?

      All of these possibilities are certainly valid. We used the program GENECONV to infer gene conversion events, but there is considerable uncertainty owing to the ages of the genes and the inevitable point mutations that have occurred post-event. Gene conversion was not the focus of our paper, so we did our best to acknowledge it (and the resulting differences between trees from different exons) without spending too much time diving into it. A list of inferred gene conversion events can be found in Figure 3 - source data 1 and Figure 4 - source data 1.

      (11) Lines 284-286. 'The Class I MHC region is further divided into three polymorphic blocks-alpha, beta, and kappa blocks-that each contains MHC genes but are separated by well-conserved non-MHC genes.' The MHC class I region was first designated into conserved polymorphic duplication blocks, alpha and beta by Dawkins et al (1999 Immunol Rev 167,275), and kappa by Kulski et al (2002 Immunol Rev 190,95), and should be acknowledged (cited) accordingly.

      Thank you for catching this! We have added these citations (lines 302-303)!

      (12) Lines 285-286. 'The majority of the Class I genes are located in the alpha-block, which in humans includes 12 MHC genes and pseudogenes.' This is not strictly correct for many other species, because the majority of class I genes might be in the beta block of new and old-world monkeys, and the authors haven't provided respective counts of duplication numbers to show otherwise. The alpha block in some non-primate mammalian species such as pigs, rats, and mice has no MHC class I genes or only a few. Most MHC class I genes in non-primate mammalian species are found in other regions. For example, see Ando et al (2005 Immunogenetics 57,864) for the pig alpha, beta, and kappa regions in the MHC class I region. There are no pig MHC genes in the alpha block.

      Yes, which is exactly why we use the phrase “in humans” in that particular sentence. The arrangement of the MHC in several other primate reference genomes is shown in Figure 1 - figure supplement 2.

      (13) Line 297 to 299. 'The alpha-block also contains a large number of repetitive elements and gene fragments belonging to other gene families, and their specific repeating pattern in humans led to the conclusion that the region was formed by successive block duplications (Shiina et al., 1999).' There are different models for successive block duplications in the alpha block and some are more parsimonious based on imperfect multigenic segmental duplications (Kulski et al 1999, 2000) than others (Shiina et al., 1999). In this regard, Kulski et al (1999, 2000) also used duplicated repetitive elements neighbouring MHC genes to support their phylogenetic analyses and multigenic segmental duplication models. For comparison, can the authors indicate how many duplications and deletions they have in their models for each species?

      We have added citations to this sentence to show that there are different published models to describe the successive block duplications (line 307). Our models in Figure 6 and Figure 7 are meant to aggregate past work and integrate our own, and thus they were not built strictly by parsimony. References can be found in Figure 6 - figure supplement 1 and Figure 7 - figure supplement 1.

      (14) Lines 315-315. 'Ours is the first work to show that MHC-U is actually an MHC-A-related gene fragment.' This sentence should be deleted. Other researchers had already inferred that MHC-U is actually an MHC-A-related gene fragment more than 25 years ago (Kulski et al 1999, 2000) when the MHC-U was originally named MHC-21.

      While these works certainly describe MHC-U/MHC-21 as a fragment in the 𝛼-block, any relation to MHC-A was by association only and very few species/haplotypes were examined. So although the idea is not wholly novel, we provide convincing evidence that not only is MHC-U related to MHC-A by sequence, but also that it is a very recent partial duplicate of MHC-A. We show this with Bayesian phylogenetic trees as well as an analysis of haplotypes across many more species than were included in those papers.  

      (15) Lines 361-362. 'Notably, our work has revealed that MHC-V is an old fragment.' This is not a new finding or hypothesis. Previous phylogenetic analysis and gene duplication modelling had already inferred HLA-V (formerly HLA-75) to be an old fragment (Kulski et al 1999, 2000).

      By “old,” we mean older than previous hypotheses suggest. Previous work has proposed that MHC-V and -P were duplicated together, with MHC-V deriving from an MHC-A/H/V ancestral gene and MHC-P deriving from an MHC-W/T/P ancestral gene (Kulski (2005), Shiina (1999)). However, our analysis (Figure 5A) shows that MHC-V sequences form a monophyletic clade outside of the MHC-W/P/T group of genes as well as outside of the MHC-A/B/C/E/F/G/J/K/L group of genes, which is not consistent with MHC-A and -V being closely related. Thus, we conclude that MHC-V split off earlier than the differentiation of these other gene groups and is thus older than previously thought. We explain this in the text as well (lines 317-327) and in Appendix 3.  

      (16) Line 431-433. 'the Class II genes have been largely stable across the mammals, although we do see some lineage-specific expansions and contractions (Figure 2 and Figure 2-gure Supplement 2).' Please provide one or two references to support this statement. Is 'gure' a typo?

      We corrected this typo, thank you! This conclusion is simply drawn from the data presented in Figure 2 and Figure 2 - figure supplement 2. The data itself comes from a variety of sources, which are already included in the supplement as Figure 2 - source data 1.

      (17) Line 437. 'We discovered far more "specific" events in Class I, while "broad-scale" events were predominant in Class II.' Please define the difference between 'specific' and 'broad-scale'.

      These terms are defined in the previous sentence (lines 466-469).

      450-451. 'This shows that classical genes experience more turnover and are more often affected by long-term balancing selection or convergent evolution.' Is balancing selection a form of divergent evolution that is different from convergent evolution? Please explain in more detail how and why balancing selection or convergent evolution affects classical and nonclassical genes differently.

      Balancing selection acts to keep alleles at moderate frequencies, preventing any from fixing in the population. In contrast, convergent evolution describes sequences or traits becoming similar over time even though they are not similar by descent. While we cannot know exactly what selective forces have occurred in the past, we observe different patterns in the trees for each type of gene. In Figures 1 and 2, viewers can see at first glance that the nonclassical genes (which are named throughout the text and thoroughly described in Appendix 3) appear to be longer-lived than the classical genes. In addition, lines 204-222 and 475-488 describe topological differences in the BEAST2 trees of these two types of genes. However, we acknowledge that it could be helpful to have additional, complimentary information about the classical vs. non-classical genes. Thus, we have added a sentence and reference to our companion paper (Fortier and Pritchard, 2025), which focuses on long-term balancing selection and draws further contrast between classical and non-classical genes. In lines 481-484, we added  “We further explore the differences between classical and non-classical genes in our companion paper, finding ancient trans-species polymorphism at the classical genes but not at the non-classical genes \citep{Fortier2025b}.”

      References

      Some references in the supplementary materials such as Alvarez (1997), Daza-Vamenta (2004), Rojo (2005), Aarnink (2014), Kulski (2022), and others are missing from the Reference list. Please check that all the references in the text and the supplementary materials are listed correctly and alphabetically.

      We will make sure that these all show up properly in the proof.

      Reviewer #3 (Public review):

      Summary:

      The article provides the most comprehensive overview of primate MHC class I and class II genes to date, combining published data with an exploration of the available genome assemblies in a coherent phylogenetic framework and formulating new hypotheses about the evolution of the primate MHC genomic region.

      Strengths:

      I think this is a solid piece of work that will be the reference for years to come, at least until population-scale haplotype-resolved whole-genome resequencing of any mammalian species becomes standard. The work is timely because there is an obvious need to move beyond short amplicon-based polymorphism surveys and classical comparative genomic studies. The paper is data-rich and the approach taken by the authors, i.e. an integrative phylogeny of all MHC genes within a given class across species and the inclusion of often ignored pseudogenes, makes a lot of sense. The focus on primates is a good idea because of the wealth of genomic and, in some cases, functional data, and the relatively densely populated phylogenetic tree facilitates the reconstruction of rapid evolutionary events, providing insights into the mechanisms of MHC evolution. Appendices 1-2 may seem unusual at first glance, but I found them helpful in distilling the information that the authors consider essential, thus reducing the need for the reader to wade through a vast amount of literature. Appendix 3 is an extremely valuable companion in navigating the maze of primate MHC genes and associated terminology.

      Weaknesses:

      I have not identified major weaknesses and my comments are mostly requests for clarification and justification of some methodological choices.

      Thank you so much for your kind and supportive review!

      Reviewer #1 (Recommendations for the authors):

      (1) Line 151: How is 'extensively studied' defined?

      Extensively studied is not a strict definition, but a few organisms clearly stand apart from the rest in terms of how thoroughly their MHC regions have been studied. For example, the macaque is a model organism, and individuals from many different species and populations have had their MHC regions fully sequenced. This is in contrast to the gibbon, for example, in which there is some experimental evidence for the presence of certain genes, but no MHC region has been fully sequenced from these animals.

      (2) Can you clarify how 'classical' and 'non-classical' MHC genes are being determined in your analysis?

      Classical genes are those whose protein products perform antigen presentation to T cells and are directly involved in adaptive immunity, while non-classical genes are those whose protein products do not do this. For example, these non-classical genes might code for proteins that interact with receptors on Natural Killer cells and influence innate immunity. The roles of these proteins are not necessarily conserved between closely related species, and experimental evidence is needed to evaluate this. However, in the absence of such evidence, wherever possible we have provided our best guess as to the roles of the orthologous genes in other species, presented in Figure 1 - source data 1 and Figure 2 - source data 1. This is based on whatever evidence is available at the moment, sometimes experimental but typically based on dN/dS ratios and other indirect measures.

      (3) I find the overall tone of the paper to be very descriptive, and at times meandering and repetitive, with a lot of similar kinds of statements being repeated about gene gain/loss. This is perhaps inevitable because a single question is being asked of each of many subsets of MHC gene types, and even exons within gene types, so there is a lot of repetition in content with a slightly different focus each time. This does not help the reader stay focused or keep track. I found myself wishing for a clearly defined question or hypothesis, or some rate parameter in need of estimation. I would encourage the authors to tighten up their phrasing, or consider streamlining the results with some better signposting to organize ideas within the results.

      We totally understand your critique, as we talk about a wide range of specific genes and gene groups in this paper. To improve readability, we have added many more signposting phrases and sentences:

      “Aside from MHC-DRB, …” (line 173)

      “Now that we had a better picture of the landscape of MHC genes present in different primates, we wanted to understand the genes’ relationships. Treating Class I, Class IIA, and Class IIB separately, ...” (line 179-180)

      “We focus first on the Class I genes.” (line 191)

      “... for visualization purposes…” (line195)

      “We find that sequences do not always assort by locus, as would be expected for a typical gene.” (lines 196-197)

      “... rather than being directly orthologous to the ape/OWM MHC-G genes.” (lines 201-202)

      “Appendix 3 explains each of these genes in detail, including previous work and findings from this study.“ (lines 202-203)

      “... (but not with NWM) …” (line 208)

      “While genes such as MHC-F have trees which closely match the overall species tree, other genes show markedly different patterns, …” (lines 212-213)

      “Thus, while some MHC-G duplications appear to have occurred prior to speciation events within the NWM, others are species-specific.” (lines 218-219)

      “... indicating rapid evolution of many of the Class I genes” (lines 220-221)

      “Now turning to the Class II genes, …“ (line 223)

      “(see Appendix 2 for details on allele nomenclature) “ (line 238)

      “(e.g. MHC-DRB1 or -DRB2)” (line 254)

      “...  meaning their names reflect previously-observed functional similarity more than evolutionary relatedness.” (lines 257-258)

      “(see Appendix 3 for more detail)” (line 311)

      “(a 5'-end fragment)” (line 324)

      “Therefore, we support past work that has deemed MHC-V an old fragment.” (lines 326-327)

      “We next focus on MHC-U, a previously-uncharacterized fragment pseudogene containing only exon 3.” (line 328-329)

      “However, it is present on both chimpanzee haplotypes and nearly all human haplotypes, and we know that these haplotypes diverged earlier---in the ancestor of human and gorilla. Therefore, ...” (lines 331-333)

      “Ours is the first work to show that MHC-U is actually an MHC-A-related gene fragment and that it likely originated in the human-gorilla ancestor.” (lines 334-336)  

      “These pieces of evidence suggest that MHC-K and -KL duplicated in the ancestor of the apes.” (lines 341-342)

      “Another large group of related pseudogenes in the Class I $\alpha$-block includes MHC-W, -P, and -T (see Appendix 3 for more detail).” (lines 349-350)

      “...to form the current physical arrangement” (lines 354)

      “Thus, we next focus on the behavior of this subgroup in the trees.” (line 358)

      “(see Appendix 3 for further explanation).” (line 369)

      “Thus, for the first time we show that there must have been three distinct MHC-W-like genes in the ape/OWM ancestor.” (lines 369-371)

      “... and thus not included in the previous analysis. ” (lines 376-377)

      “MHC-Y has also been identified in gorillas (Gogo-Y) (Hans et al., 2017), so we anticipate that Gogo-OLI will soon be confirmed. This evidence suggests that the MHC-Y and -OLI-containing haplotype is at least as old as the human-gorilla split. Our study is the first to place MHC-OLI in the overall story of MHC haplotype evolution“ (lines 381-384)

      “Appendix 3 explains the pieces of evidence leading to all of these conclusions (and more!) in more detail.” (lines 395-396)

      “However, looking at this exon alone does not give us a complete picture.” (lines 410-411)

      “...instead of with other ape/OWM sequences, …” (lines 413-414)

      “Figure 7 shows plausible steps that might have generated the current haplotypes and patterns of variation that we see in present-day primates. However, some species are poorly represented in the data, so the relationships between their genes and haplotypes are somewhat unclear.” (lines 427-429)

      “(and more-diverged)” (line 473)

      “(of both classes)” (line 476)

      “..., although the classes differ in their rate of evolution.”  (line 487-488)

      “Including these pseudogenes in our trees helped us construct a new model of $\alpha$-block haplotype evolution. “ (lines 517-518)

      (4) Line 480-82: "Notably...." why is this notable? Don't merely state that something is notable, explain what makes it especially worth drawing the reader's attention to: in what way is it particularly significant or surprising?

      We have changed the text from “Notably” to “In particular” (line 390) so that readers are expecting us to list some specific findings. Similarly, we changed “Notably” to “Specifically” (line 515).

      (5) The end of the discussion is weak: "provide context" is too vague and not a strong statement of something that we learned that we didn't know before, or its importance. This is followed by "This work will provide a jumping-off point for further exploration..." such as? What questions does this paper raise that merit further work?

      We have made this paragraph more specific and added some possible future research directions. It now reads “By treating the MHC genes as a gene family and including more data than ever before, this work enhances our understanding of the evolutionary history of this remarkable region. Our extensive set of trees incorporating classical genes, non-classical genes, pseudogenes, gene fragments, and alleles of medical interest across a wide range of species will provide context for future evolutionary, genomic, disease, and immunologic studies. For example, this work provides a jumping-off-point for further exploration of the evolutionary processes affecting different subsets of the gene family and the nuances of immune system function in different species. This study also provides a necessary framework for understanding the evolution of particular allelic lineages within specific MHC genes, which we explore further in our companion paper \citep{Fortier2025b}. Both studies shed light on MHC gene family evolutionary dynamics and bring us closer to understanding the evolutionary tradeoffs involved in MHC disease associations.” (lines 576-586)

      Reviewer #3 (Recommendations for the authors):

      (1) Figure 1 et seq. Classifying genes as having 'classical', 'non-classical' and 'dual' properties is notoriously difficult in non-model organisms due to the lack of relevant information. As you have characterised a number of genes for the first time in this paper and could not rely entirely on published classifications, please indicate the criteria you used for classification.

      The roles of these proteins are not necessarily conserved between closely related species, and experimental evidence is needed to evaluate this. However, in the absence of such evidence, wherever possible we have provided our best guess as to the roles of the orthologous genes in other species, presented in Figure 1 - source data 1 and Figure 2 - source data 1. This is based on whatever evidence is available at the moment, sometimes experimental but typically based on dN/dS ratios and other indirect measures.

      (2) Line 61 It's important to mention that classical MHC molecules present antigenic peptides to T cells with variable alphabeta T cell receptors, as non-classical MHC molecules may interact with other T cell subsets/types.

      Thank you for pointing this out; we have updated the text to make this clearer (lines 63-65). We changed “‘Classical’ MHC molecules perform antigen presentation to T cells---a key part of adaptive immunity---while ‘non-classical’ molecules have niche immune roles.” to “‘Classical’ MHC molecules perform antigen presentation to T cells with variable alphabeta TCRs---a key part of adaptive immunity---while ‘non-classical’ molecules have niche immune roles.”

      (3) Perhaps it's worth mentioning in the introduction that you are deliberately excluding highly divergent non-classical MHC molecules such as CD1.

      Thank you, it’s worth clarifying exactly what molecules we are discussing. We have added a sentence to the introduction (lines 38-43): “Having originated in the jawed vertebrates, this group of genes is now involved in diverse functions including lipid metabolism, iron uptake regulation, and immune system function (proteins such as zinc-𝛼2-glycoprotein (ZAG), human hemochromatosis protein (HFE), MHC class I chain–related proteins (MICA, MICB), and the CD1 family) \citep{Hansen2007,Kupfermann1999,Kaufman2022,Adams2013}. However, here we focus on…”

      (4) Line 94-105 This material presents results, it could be moved to the results section as it now somewhat disrupts the flow.

      We feel it is important to include a “teaser” of the results in the introduction, which can be slightly more detailed than that in the abstract.

      (5) Line 118-131 This opening section of the results sets the stage for the whole presentation and contains important information that I feel needs to be expanded to include an overview and justification of your methodological choices. As the M&M section is at the end of the MS (and contains limited justification), some information on two aspects is needed here for the benefit of the reader. First, as far as I understand, all phylogenetic inferences were based entirely on DNA sequences of individual (in some cases concatenated) exons. It would be useful for the reader to explain why you've chosen to rely on DNA rather than protein sequences, even though some of the genes you include in the phylogenetic analysis are highly divergent. Second, a reader might wonder how the "maximum clade credibility tree" from the Bayesian analysis compares to commonly seen trees with bootstrap support or posterior probability values assigned to particular clades. Personally, I think that the authors' approach to identifying and presenting representative trees is reasonable (although one might wonder why "Maximum clade credibility tree" and not "Maximum credibility tree" https://www.beast2.org/summarizing-posterior-trees/), since they are working with a large number of short, sometimes divergent and sometimes rather similar sequences - in such cases, a requirement for strict clade support could result in trees composed largely of polytomies. However, I feel it's necessary to be explicit about this and to acknowledge that the relationships represented by fully resolved bifurcating representative trees and interpreted in the study may not actually be highly supported in the sense that many readers might expect. In other words, the reader should be aware from the outset of what the phylogenies that are so central to the paper represent.

      We chose to rely on DNA rather than protein sequences because convergent evolution is likely to happen in regions that code for extremely important functions such as adaptive and innate immunity. Convergent evolution acts upon proteins while trans-species polymorphism retains ancient nucleotide variation, so studying the DNA sequence can help tease apart convergent evolution from trans-species polymorphism.

      As for the “maximum clade credibility tree”, this is a matter of confusing nomenclature. In the online reference guide (https://www.beast2.org/summarizing-posterior-trees/), the tree with the maximum product of the posterior clade probabilities is called the “maximum credibility tree” while the tree that has the maximum sum of posterior clade probabilities is called the “Maximum credibility tree”. The “Maximum credibility tree” (referring to the sum) appears to have only been named in this way in the first version of TreeAnnotator. However, the version of TreeAnnotator that I used lists the options “maximum clade credibility tree” and “maximum sum of clade probabilities”. So the context suggests that the “maximum clade credibility tree” option is actually maximizing the product. This “maximum clade credibility tree” is the setting I used for this project (in TreeAnnotator version 2.6.3).

      We agree that readers may not fully grasp what the collapsed trees represent upon first read. We have added a sentence to the beginning of the results (line 188-190) to make this more explicit.

      (6) Line 224, you're referring to the DPB1*09 lineage, not the DRB1*09 lineage.

      Indeed! We have changed these typos.

      (7) Line 409, why "Differences between MHC subfamilies" and not "Differences between MHC classes"?

      We chose the word “subfamilies” because we discuss the difference between classical and non-classical genes in addition to differences between Class I and Class II genes.

      (8) Line 529-544 This might work better as a table.

      We agree! This information is now presented as Table 1.

      (9) Line 547 MHC-DRB9 appears out of the blue here - please say why you are singling it out.

      Great point! We added a paragraph (lines 614-623) to explain why this was necessary.

      (10) Line 550-551 Even though you've screened the hits manually, it would be helpful to outline your criteria for this search.

      Thank you! We’ve added a couple of sentences to explain how we did this (lines 607-610).

      (11) Line 556-580 please provide nucleotide alignments as supplementary data so that the reader can get an idea of the actual divergence of the sequences that have been aligned together.

      Thank you! We’ve added nucleotide alignments as supplementary files.

      (12) Line 651-652 Why "Maximum clade credibility tree" and not "Maximum credibility tree"? 

      Repeat of (5). This is a matter of confusing nomenclature. In the online reference guide (https://www.beast2.org/summarizing-posterior-trees/), the tree with the maximum product of the posterior clade probabilities is called the “maximum credibility tree” while the tree that has the maximum sum of posterior clade probabilities is called the “Maximum credibility tree”. The “Maximum credibility tree” (referring to the sum) appears to have only been named in this way in the first version of TreeAnnotator. However, the version of TreeAnnotator that I used lists the options “maximum clade credibility tree” and “maximum sum of clade probabilities”. So the context suggests that the “maximum clade credibility tree” option is actually maximizing the product. This “maximum clade credibility tree” is the setting I used for this project (in TreeAnnotator version 2.6.3).

      (13) In the appendices, links to references do not work as expected.

      We will make sure these work properly when we receive the proofs.

    1. eLife Assessment

      This important study demonstrates that some degree of spatial tuning (e.g., place cells) and ability to decode spatial location emerges in sufficiently complex systems trained to process visual information. This intriguing observation challenges existing approaches and findings used in the study of spatial navigation. However, the strength of evidence regarding the nature and quality of spatial tuning, its compatibility with experimental data, and the overall interpretation of the study remains incomplete. This work will be of interest to the research community of spatial navigation.

    2. Reviewer #1 (Public review):

      Summary:

      This study investigated spatial representations in deep feedforward neural network models (DDNs) that were often used in solving vision tasks. The authors create a three-dimensional virtual environment, and let a simulated agent randomly forage in a smaller two-dimensional square area. The agent "sees" images of the room within its field of view from different locations and heading directions. These images were processed by DDNs. Analyzing model neurons in DDNs, they found response properties similar to those of place cells, border cells and head direction cells in various layers of deep nets. A linear readout of network activity can decode key spatial variables. In addition, after removing neurons with strong place/border/head direction selectivity, one can still decode these spatial variables from remaining neurons in the DNNs. Based on these results, the authors argue that that the notion of functional cell types in spatial cognition is misleading.

      Comments on the revision:

      In the revision, the authors proposed that their model should be interpreted as a null model, rather than the actual model of the spatial navigation system in the brain. In the revision, the authors also argued that the criterion used in the place cell literature was arbitrary. However, the strength of the present work still depends on how well the null model can explain the experimental findings. It seems that currently the null model failed to explain important aspects of the response properties of different functional cell types in the hippocampus.

      Strengths:

      This paper contains interesting and original ideas, and I enjoy reading it. Most previous studies (e.g., Banino, Nature, 2018; Cueva & Wei, ICLR, 2018; Whittington et al, Cell, 2020) using deep network models to investigate spatial cognition mainly relied on velocity/head rotation inputs, rather than vision (but see Franzius, Sprekeler, Wiskott, PLoS Computational Biology, 2007). Here, the authors find that, under certain settings, visual inputs alone may contain enough information about the agent's location, head direction and distance to the boundary, and such information can be extracted by DNNs. This is an interesting observation from these models.

      Weaknesses:

      While the findings reported here are interesting, it is unclear whether they are the consequence of the specific model setting and how well they would generalize. Furthermore, I feel the results are over-interpreted. There are major gaps between the results actually shown and the claim about the "superfluousness of cell types in spatial cognition". Evidence directly supporting the overall conclusion seems to be weak at the moment.

      Comments on the revision:

      The authors showed that the results generalized to different types of networks. The results were generally robust to different types of deep network architectures. This partially addressed my concern. It remains unclear whether the findings would generalize across different types of environment. Regarding this point, the authors argued that the way how they constructed the environment was consistent with the typical experimental setting in studying spatial navigation system in rodents. After the revision, it remains unclear what the implications of the work is for the spatial navigation system in the brain, given that the null model neurons failed to reproduce certain key properties of place cells (although I agreed with the authors that examining such null models are useful and would encourage one to rethink about the approach used to study these neural systems).

      Major concerns:

      (1) The authors reported that, in their model setting, most neurons throughout the different layers of CNNs show strong spatial selectivity. This is interesting and perhaps also surprising. It would be useful to test/assess this prediction directly based on existing experimental results. It is possible that the particular 2-d virtual environment used is special. The results will be strengthened if similar results hold for other testing environments.

      In particular, examining the pictures shown in Fig. 1A, it seems that local walls of the 'box' contain strong oriented features that are distinct across different views. Perhaps the response of oriented visual filters can leverage these features to uniquely determine the spatial variable. This is concerning because this is is a very specific setting that is unlikely to generalize.

      [Updated after revision]: This concern is partially addressed in the revision. The authors argued that the way how they constructed the environment is consistent with the typical experimental setting in studying spatial navigation system in rodents.

      (2) Previous experimental results suggest that various function cell types discovered in rodent navigation circuits persist in dark environments. If we take the modeling framework presented in this paper literally, the prediction would be that place cells/head direction cells should go away in darkness. This implies that key aspects of functional cell types in the spatial cognition are missing in the current modeling framework. This limitation needs to be addressed or explicitly discussed.

      [Updated after revision]: The authors proposed that their model should be treated as a null model, instead of a candidate model for the brain's spatial navigation system. This clarification helps to better position this work. I would like to thank the authors for making this point explicit. However, this doesn't fully address the issues raised. The significance of the reported results still depend on how well the null model can explain the experimental findings. If the null model failed to explain important aspects of the firing properties of functional cell types, that would speak in favor of the usefulness of the concept of functional cell types.

      (3) Place cells/border cell/ head direction cells are mostly studied in the rodent's brain. For rodents, it is not clear whether standard DNNs would be good models of their visual systems. It is likely that rodent visual system would not be as powerful in processing visual inputs as the DNNs used in this study.

      [Updated after revision]: The authors didn't specifically address this. But clarifying their work as a null model partially addresses this concern.

      (4) The overall claim that the functional cell types defined in spatial cognition are superfluous seems to be too strong based on the results reported here. The paper only studied a particular class of models, and arguably, the properties of these models have a major gap to those of real brains. Even though that, in the DNN models simulated in this particular virtual environment, (i) most model neurons have strong spatial selectivity; (ii) removing model neurons with the strongest spatial selectivity still retain substantial spatial information, why is this relevant to the brain? The neural circuits may operate in a very different regime. Perhaps a more reasonable interpretation of the results would be: these results raise the possibility that those strongly selective neurons observed in the brain may not be essential for encoding certain features, as something like this is observed in certain models. It is difficult to draw definitive conclusions about the brain based on the results reported.

      [Updated after revision]: The authors clarified that their model should be interpreted as a null model. This partially addresses the concern raised here. However, some concerns remain- it remains unclear what new insights the current work offers in terms of understanding the spatial navigation systems. It seems that this work concerns more about the approach to studying the neural systems. Perhaps this point could be made even more clear.

    3. Reviewer #3 (Public review):

      Summary:

      In this paper, the authors demonstrate the inevitability of the emergence of spatial information in sufficiently complex systems, even those that are only trained on object recognition (i.e. not a "spatial" system). As such, they present an important null hypothesis that should be taken into consideration for experimental design and data analysis of spatial tuning and its relevance for behavior.

      Strengths:

      The paper's strengths include the use of a large multi-layer network trained in a detailed visual environment. This illustrates an important message for the field: that spatial tuning can be a result of sensory processing. While this is a historically recognized and often-studied fact in experimental neuroscience, it is made more concrete with the use of a complex sensory network. Indeed, the manuscript is a cautionary tale for experimentalists and computational researchers alike against blindly applying and interpreting metrics without adequate controls. The addition of the deep network, i.e. the argument that sufficient processing increases the likelihood of such a confound, is a novel and important contribution.

      Weaknesses:

      However, the work has a number of significant weaknesses. Most notably: the spatial tuning that emerges is precisely that we would expect from visually-tuned neurons, and they do not engage with literature that controls for these confounds or compare the quality or degree of spatial tuning with neural data; the ability to linearly decode position from a large number of units is not a strong test of spatial cognition; and the authors make strong but unjustified claims as to the implications of their results in opposition to, as opposed to contributing to, work being done in the field.

      The first weakness is that the degree and quality of spatial tuning that emerges in the network is not analyzed to the standards of evidence that have been used in well-controlled studies of spatial tuning in the brain. Specifically, the authors identify place cells, head direction cells, and border cells in their network, and their conjunctive combinations. However, these forms of tuning are the most easily confounded by visual responses, and it's unclear if their results will extend to observed forms of spatial tuning that are not.

      For example, consider the head direction cells in Figure 3C. In addition to increased activity in some directions, these cells also have a high degree of spatial nonuniformity, suggesting they are responding to specific visual features of the environment. In contrast, the majority of HD cells in the brain are only very weakly spatially selective, if at all, once an animal's spatial occupancy is accounted for (Taube et al 1990, JNeurosci). While the preferred orientation of these cells are anchored to prominent visual cues, when they rotate with changing visual cues the entire head direction system rotates together (cells' relative orientation relationships are maintained, including those that encode directions facing AWAY from the moved cue), and thus these responses cannot be simply independent sensory-tuned cells responding to the sensory change) (Taube et al 1990 JNeurosci, Zugaro et al 2003 JNeurosci, Ajbi et al 2023).

      As another example, the joint selectivity of detected border cells with head direction in Figure 3D suggests that they are "view of a wall from a specific angle" cells. In contrast, experimental work on border cells in the brain has demonstrated that these are robust to changes in the sensory input from the wall (e.g. van Wijngaarden et al 2020), or that many of them are are not directionally selective (Solstad et al 2008).

      The most convincing evidence of "spurious" spatial tuning would be the emergence of HD-independent place cells in the network, however, these cells are a very small minority (in contrast to hippocampal data, Thompson and Best 1984 JNeurosci, Rich et al 2014 Science), the examples provided in Figure 3 are significantly more weakly tuned than those observed in the brain.

      Indeed, the vast majority of tuned cells in the network are conjunctively selective for HD (Figure 3A). While this conjunctive tuning has been reported, many units in the hippocampus/entorhinal system are not strongly hd selective (Muller et al 1994 JNeurosci, Sangoli et al 2006 Science, Carpenter et al 2023 bioRxiv). Further, many studies have been done to test and understand the nature of sensory influence (e.g. Acharya et al 2016 Cell), and they tend to have a complex relationship with a variety of sensory cues, which cannot readily be explained by straightforward sensory processing (rev: Poucet et al 2000 Rev Neurosci, Plitt and Giocomo 2021 Nat Neuro). E.g. while some place cells are sometimes reported to be directionally selective, this directional selectivity is dependent on behavioral context (Markus et al 1995, JNeurosci), and emerges over time with familiarity to the environment (Navratiloua et al 2012 Front. Neural Circuits). Thus, the question is not whether spatially tuned cells are influenced by sensory information, but whether feed-forward sensory processing alone is sufficient to account for their observed turning properties and responses to sensory manipulations.

      These issues indicate a more significant underlying issue of scientific methodology relating to the interpretation of their result and its impact on neuroscientific research. Specifically, in order to make strong claims about experimental data, it is not enough to show that a control (i.e. a null hypothesis) exists, one needs to demonstrate that experimental observations are quantitatively no better than that control.

      Where the authors state that "In summary, complex networks that are not spatial systems, coupled with environmental input, appear sufficient to decode spatial information." what they have really shown is that it is possible to decode some degree of spatial information. This is a null hypothesis (that observations of spatial tuning do not reflect a "spatial system"), and the comparison must be made to experimental data to test if the so-called "spatial" networks in the brain have more cells with more reliable spatial info than a complex-visual control.

      Further, the authors state that "Consistent with our view, we found no clear relationship between cell type distribution and spatial information in each layer. This raises the possibility that "spatial cells" do not play a pivotal role in spatial tasks as is broadly assumed." Indeed, this would raise such a possibility, if 1) the observations of their network were indeed quantitatively similar to the brain, and 2) the presence of these cells in the brain were the only evidence for their role in spatial tasks. However, 1) the authors have not shown this result in neural data, they've only noticed it in a network and mentioned the POSSIBILITY of a similar thing in the brain, and 2) the "assumption" of the role of spatially tuned cells in spatial tasks is not just from the observation of a few spatially tuned cells. But from many other experiments including causal manipulations (e.g. Robinson et al 2020 Cell, DeLauilleon et al 2015 Nat Neuro), which the authors conveniently ignore. Thus, I do not find their argument, as strongly stated as it is, to be well-supported.

      An additional weakness is that linear decoding of position is not a measure of spatial cognition. The ability to decode position from a large number of weakly tuned cells is not surprising. However, based on this ability to decode, the authors claim that "'spatial' cells do not play a privileged role in spatial cognition". To justify this claim, the authors would need to use the network to perform e.g. spatial navigation tasks, then investigate the networks' ability to perform these tasks when tuned cells were lesioned.

      Finally, I find a major weakness of the paper to be the framing of the results in opposition to, as opposed to contributing to, the study of spatially tuned cells. For example, the authors state that "If a perception system devoid of a spatial component demonstrates classically spatially-tuned unit representations, such as place, head-direction, and border cells, can "spatial cells" truly be regarded as 'spatial'?" Setting aside the issue of whether the perception system in question does indeed demonstrate spatially-tuned unit representations comparable to those in the brain, I ask "Why not?" This seems to be a semantic game of reading more into a name than is necessarily there. The names (place cells, grid cells, border cells, etc) describe an observation (that cells are observed to fire in certain areas of an animal's environment). They need not be a mechanistic claim (that space "causes" these cells to fire) or even, necessarily, a normative one (these cells are "for" spatial computation). This is evidenced by the fact that even within e.g. the place cell community, there is debate as to these cells' mechanisms and function (eg memory, navigation, etc), or if they can even be said to only serve a single one function. However, they are still referred to as place cells, not as a statement of their function but as a history-dependent label that refers to their observed correlates with experimental variables. Thus, the observation that spatially tuned cells are "inevitable derivatives of any complex system" is itself an interesting finding which contributes to, rather than contradicts, the study of these cells. It seems that the authors have a specific definition in mind when they say that a cell is "truly" "spatial" or that a biological or artificial neural network is a "spatial system", but this definition is not stated, and it is not clear that the terminology used in the field presupposes their definition.

      In sum, the authors have demonstrated the existence of a control/null hypothesis for observations of spatially-tuned cells. However, 1) It is not enough to show that a control (null hypothesis) exists, one needs to test if experimental observations are no better than control, in order to make strong claims about experimental data, 2) the authors do not acknowledge the work that has been done in many cases specifically to control for this null hypothesis in experimental work or to test the sensory influences on these cells, and 3) the authors do not rigorously test the degree or source of spatial tuning of their units.

      Comments on revisions:

      While I'm happy to admit that standards of spatial tuning are not unified or consistent across the field, I do not believe the authors have addressed my primary concern: they have pointed out a null model, and then have constructed a strong opinion around that null model without actually testing if it's sufficient to account for neural data. I've slightly modified my review to that effect.

      I do think it would be good for the authors to state in the manuscript what they mean when they say that a cell is "truly" "spatial" or that a biological or artificial neural network is a "spatial system". This is implied throughout, but I was unable to find what would distinguish a "truly" spatial system from a "superfluous" one.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      but see Franzius, Sprekeler, Wiskott, PLoS Computational Biology, 2007

      We have discussed the differences with this work in the response to Editor recommendations above.

      While the findings reported here are interesting, it is unclear whether they are the consequence of the specific model setting, and how well they would generalize.

      We have considered deep vision models across different architectures in our paper, which include traditional feedforward convolutional neural networks (VGG-16), convolutional neural networks with skip connections (ResNet-50) and the Vision Transformer (VIT) which employs self-attention instead of convolution as its core information processing unit.

      In particular, examining the pictures shown in Fig. 1A, it seems that local walls of the ’box’ contain strong oriented features that are distinct across different views. Perhaps the response of oriented visual filters can leverage these features to uniquely determine the spatial variable. This is concerning because this is a very specific setting that is unlikely to generalize.

      The experimental set up is based on experimental studies of spatial cognition in rodents. They are typically foraging in square or circular environments. Indeed, square environments will have more borders and corners that will provide information about the spatial environment, which is true in both empirical studies and our simulations. In any navigation task, and especially more realistic environments, visual information such as borders or landmarks likely play a major role in spatial information available to the agent. In fact, studies that do not consider sensory information to contribute to spatial information are likely missing a major part of how animals navigate.

      The prediction would be that place cells/head direction cells should go away in darkness. This implies that key aspects of functional cell types in the spatial cognition are missing in the current modeling framework.

      We addressed this comment in our response to the editor’s highlight. To briefly recap, we do not intend to propose a comprehensive model of the brain that captures all spatial phenomena, as we would not expect this from an object recognition network. Instead, we show that such a simple and nonspatial model can reproduce key signatures of spatial cells, raising important questions about how we interpret spatial cell types that dominate current research.

      Reviewer #2 (Public Review):

      The network used in the paper is still guided by a spatial error signal [...] one could say that the authors are in some way hacking this architecture and turning it into a spatial navigation one through learning.

      To be clear, the base networks we use do not undergo spatial error training. They have either been pre-trained on image classification tasks or are untrained. We used a standard neuroscience approach: training linear decoders on representations to assess the spatial information present in the network layers. The higher decoding errors in early layer representations (Fig. 2A) indicate that spatial information differs across layers—an effect that cannot be attributed to the linear decoder alone.

      My question is whether the paper is fighting an already won battle.

      Intuitive cell type discovery are still being celebrated. Concentrating on this kind of cell type discovery has broader implications that could be deleterious to the future of science. One point to note is that this issue depends on the area or subfield of neuroscience. In some subfields, papers that claim to find cell types with a strong claim of specific functions are relatively rare, and population coding is common (e.g., cognitive control in primate prefrontal cortex, neural dynamics of motor control). Although rodent neuroscience as a field is increasingly adopting population approaches, influential researchers and labs are still publishing “cell types” and in top journals (here are a few from 2017-2024: Goal cells (Sarel et al., 2017), Object-vector cells (Høydal et al., 2019), 3D place cells (Grieves et al., 2020), Lap cells (Sun et al., 2020), Goal-vector cells (Ormond and O’Keefe, 2022), Predictive grid cells (Ouchi and Fujisawa, 2024).

      In some cases, identification of cell types is only considered a part of the story, and there are analyses on behavior, neural populations, and inactivationbased studies. However, our view (and suggest this is shared amongst most researchers) is that a major reason these papers are reviewed and accepted to top journals is because they have a simple, intuitive “cell type” discovery headline, even if it is not the key finding or analysis that supports the insightful aspects of the work. This is unnecessary and misleading to students of neuroscience, related fields, and the public, it affects private and public funding priorities and in turn the future of science. Worse, it could lead the field down the wrong path, or at the least distribute attention and resources to methods and papers that could be providing deeper insights. Consistent with the central message of our work, we believe the field should prioritize theoretical and functional insights over the discovery of new “cell types”.

      Reviewer #3 (Public Review):

      The ability to linearly decode position from a large number of units is not a strong test of spatial information, nor is it a measure of spatial cognition

      Using a linear decoder to test what information is contained in a population of neurons available for downstream areas is a common technique in neuroscience (Tong and Pratte, 2012; DiCarlo et al., 2012) including spatial cells (e.g., Diehl et al. 2017; Horrocks et al. 2024). A linear decoder is used because it is a direct mapping from neurons to potential output behavior. In other words, it only needs to learn some mapping to link one set of neurons to another set which can “read out” the information. As such, it is a measure of the information contained in the population, and it is a lower bound of the information contained - as both biological and artificial neurons can do more complex nonlinear operations (as the activation function is nonlinear).

      We understand the reviewer may understand this concept but we explain it here to justify our position and for completeness of this public review.

      For example, consider the head direction cells in Figure 3C. In addition to increased activity in some directions, these cells also have a high degree of spatial nonuniformity, suggesting they are responding to specific visual features of the environment. In contrast, the majority of HD cells in the brain are only very weakly spatially selective, if at all, once an animal’s spatial occupancy is accounted for (Taube et al 1990, JNeurosci). While the preferred orientation of these cells are anchored to prominent visual cues, when they rotate with changing visual cues the entire head direction system rotates together (cells’ relative orientation relationships are maintained, including those that encode directions facing AWAY from the moved cue), and thus these responses cannot be simply independent sensory-tuned cells responding to the sensory change) (Taube et al 1990 JNeurosci, Zugaro et al 2003 JNeurosci, Ajbi et al 2023).

      As we have noted in our response to the editor, one of the main issues is how the criteria to assess what they are interested in is created in a subjective, and biased way, in a circular fashion (seeing spatial-like responses, developing criteria to determine a spatial response, select a threshold).

      All the examples the reviewer provides concentrate on strict criteria developed after finding such cells. What is the purpose of these cells for function, for behavior? Just finding a cell that looks like it is tuned to something does not explain its function. Neuroscience began with tuning curves in part due to methodological constraints, which was a promising start, but we propose that this is not the way forward.

      The metrics used by the authors to quantify place cell tuning are not clearly defined in the methods, but do not seem to be as stringent as those commonly used in real data. (e.g. spatial information, Skaggs et al 1992 NeurIPS).

      We identified place cells following the definition from Tanni et al. (2022), by one of the leading labs in the field. Since neurons in DNNs lack spikes, we adapted their criteria by focusing on the number of spatial bins in the ratemap rather than spike-based measures. However, our central argument is that the very act of defining spatial cells is problematic. Researchers set out to find place cells to study spatial representations, find spatially selective cells with subjective, qualitative criteria (sometimes combined with prior quantitative criteria, also subjectively defined), then try to fine-tune the criteria to more “stringent” criteria, depending on the experimental data at hand. It is not uncommon to see methodological sections that use qualitative judgments, such as: “To avoid bias ... we applied a loose criteria for place cells” Tanaka et al. (2018) , which reflects the lack of clarity for and subjectivity of place cell selection criteria.

      A simple literature survey reveals inconsistent criteria across studies. For place field selection, Dombeck et al. (2010) required mean firing rates exceeding 25% of peak rate, while Tanaka et al. (2018) used a 20% threshold. Speed thresholds also vary dramatically: Dombeck et al. (2010) calculated firing rates only when mice moved faster than 8.3 cm/s, whereas Tanaka et al. (2018) used 2 cm/s. Additional criteria differ further: Tanaka et al. (2018) required firing rates between 1-10 Hz and excluded cells with place fields larger than 1/3 of the area, while Dombeck et al. (2010) selected fields above 1.5 Hz, and Tanni et al. (2022) used a 10 spatial bins to 1/2 area threshold. As Dombeck et al. (2010) noted, differences in recording methods and place field definitions lead to varying numbers of identified place cells. Moreover, Grijseels et al. (2021) demonstrated that different detection methods produce vastly different place cell counts with minimal overlap between identified populations.

      This reflects a deeper issue. Unlike structurally and genetically defined cell types (e.g., pyramidal neurons, interneurons, dopamingeric neurons, cFos expressing neurons), spatial cells lack such clarity in terms of structural or functional specialization and it is unclear whether such “cell types” should be considered cell types in the same way. While scientific progress requires standardized definitions, the question remains whether defining spatial cells through myriad different criteria advances our understanding of spatial cognition. Are researchers finding the same cells? Could they be targeting different populations? Are they missing cells crucial for spatial cognition that they exclude due to the criteria used? We think this is likely. The inconsistency matters because different criteria may capture genuinely different neural populations or computational processes.

      Variability in definitions and criteria is an issue in any field. However, as we have stated, the deeper issue is whether we should be defining and selecting these cells at all before commencing analysis. By defining and restricting to spatial “cell types”, we risk comparing fundamentally different phenomena across studies, and worse, missing the fundamental unit of spatial cognition (e.g., the population).

      We have added a paragraph in Discussion (lines 357-366) noting the inconsistency in place cell selection criteria in the literature and the consequences of using varying criteria.

      We have also added a sentence (lines 354-356) raising the comparison of functionally defined spatial cell types with structurally and genetically defined cell types in the Discussion.

      Thus, the question is not whether spatially tuned cells are influenced by sensory information, but whether feed-forward sensory processing alone is sufficient to account for their observed turning properties and responses to sensory manipulations.

      These issues indicate a more significant underlying issue of scientific methodology relating to the interpretation of their result and its impact on neuroscientific research. Specifically, in order to make strong claims about experimental data, it is not enough to show that a control (i.e. a null hypothesis) exists, one needs to demonstrate that experimental observations are quantitatively no better than that control.

      Where the authors state that ”In summary, complex networks that are not spatial systems, coupled with environmental input, appear sufficient to decode spatial information.” what they have really shown is that it is possible to decode *some degree* of spatial information. This is a null hypothesis (that observations of spatial tuning do not reflect a ”spatial system”), and the comparison must be made to experimental data to test if the so-called ”spatial” networks in the brain have more cells with more reliable spatial info than a complex-visual control.

      We agree that good null hypotheses with quantitative comparisons are important. However, it is not clear that researchers in the field have not been using a null hypothesis, rather they make the assumption that these cell types exist and are functional in the way they assume. We provide one null hypothesis. The field can and should develop more and stronger null hypotheses.

      In our work, we are mainly focusing on criteria of finding spatial cells, and making the argument that simply doing this is misleading. Researcher develop criteria and find such cells, but often do not go further to assess whether they are real cell “types”, especially if they exclude other cells which can be misleading if other cells also play a role in the function of interest.

      But from many other experiments including causal manipulations (e.g. Robinson et al 2020 Cell, DeLauilleon et al 2015 Nat Neuro), which the authors conveniently ignore. Thus, I do not find their argument, as strongly stated as it is, to be well-supported.

      We acknowledge that there are several studies that have performed inactivation studies that suggest a strong role for place cells in spatial behavior. Most studies do not conduct comprehensive analyses to confirm that their place cells are in fact crucial for the behavior at hand.

      One question is how the criteria were determined. Did the researchers make their criteria based on what “worked”, so they did not exclude cells relevant to the behavior? What if their criteria were different, then the argument could have been that non-place cells also contribute to behavior.

      Another question is whether these cells are the same kinds of cells across studies and animals, given the varied criteria across studies? As most studies do not follow the same procedures, it is unclear whether we can generalize these results across cells and indeed, across task and spatial environments.

      Finally, does the fact that the place cells – the strongly selective cells with a place field – have a strong role in navigation provide any insight into the mechanism? Identifying cells by itself does not contribute to our understanding of how they work. Consistent with our main message, we argue that performing analyses and building computational models that uncover how the function of interest works is more valuable than simply naming cells.

      Finally, I find a major weakness of the paper to be the framing of the results in opposition to, as opposed to contributing to, the study of spatially tuned cells. For example, the authors state that ”If a perception system devoid of a spatial component demonstrates classically spatially-tuned unit representations, such as place, head-direction, and border cells, can ”spatial cells” truly be regarded as ’spatial’?” Setting aside the issue of whether the perception system in question does indeed demonstrate spatiallytuned unit representations comparable to those in the brain, I ask ”Why not?” This seems to be a semantic game of reading more into a name then is necessarily there. The names (place cells, grid cells, border cells, etc) describe an observation (that cells are observed to fire in certain areas of an animal’s environment). They need not be a mechanistic claim... This is evidenced by the fact that even within e.g. the place cell community, there is debate about these cells’ mechanisms and function (eg memory, navigation, etc), or if they can even be said to serve only a single function. However, they are still referred to as place cells, not as a statement of their function but as a history-dependent label that refers to their observed correlates with experimental variables. Thus, the observation that spatially tuned cells are ”inevitable derivatives of any complex system” is itself an interesting finding which *contributes to*, rather than contradicts, the study of these cells. It seems that the authors have a specific definition in mind when they say that a cell is ”truly” ”spatial” or that a biological or artificial neural network is a ”spatial system”, but this definition is not stated, and it is not clear that the terminology used in the field presupposes their definition.

      We have to agree to disagree with the reviewer on this point. Although researchers may reflect on their work and discuss what the mechanistic role of these cells are, it is widely perceived that cell type discovery is perceived as important to journals and funders due to its intuitive appeal and easy-tounderstand impact – even if there is no finding of interest to be reported. As noted in the comment above, papers claiming cell type discovery continue to be published in top journals and is continued to be funded.

      Our argument is that maybe “cell type” discovery research should not celebrated in the way it is, and in fact they shouldn’t be discovered when they are not genuine cell types like structural or genetic cell types. By using this term it make it appear like they are something they are not, which is misleading. They may be important cells, but providing a name like a “place” cell also suggests other cells are not encoding space - which is very unlikely to be true.

      In sum, our view is that finding and naming cells through a flawed theoretical lens that may not actually function as their names suggests can lead us down the wrong path and be detrimental to science.

      Reviewer #1 (Recommendations For The Authors):

      The novelty of the current study relative to the work by Franzius, Sprekeler, Wiskott (PLoS Computational Biology, 2007) needs to be carefully addressed. That study also modeled the spatial correlates based on visual inputs.

      Our work differs from Franzius et al. (2007) on both theoretical and experimental fronts. While both studies challenge the mechanisms underlying spatial cell formation, our theoretical contributions diverge. Franzius et al. (2007) assume spatial cells are inherently important for spatial cognition and propose a sensory-driven computational mechanism as an alternative to mainstream path integration frameworks for how spatial cells arise and support spatial cognition. In contrast, we challenge the notion that spatial cells are special at all. Using a model with no spatial grounding, we demonstrate that 1) spatial cells as naturally emerge from complex non-linear processing and 2) are not particularly useful for spatial decoding tasks, suggesting they are not crucial for spatial cognition.

      Our approach employs null models with fixed weights—either pretrained on classification tasks or entirely random—that process visual information non-sequentially. These models serve as general-purpose information processors without spatial grounding. In contrast, Franzius et al. (2007)’s model learns directly from environmental visual information, and the emergence of spatial cells (place or head-direction cells) in their framework depends on input statistics, such as rotation and translation speeds. Notably, their model does not simultaneously generate both place and head-direction cells; the outcome varies with the relative speed of rotation versus translation. Their sensory-driven model indirectly incorporates motion information through learning, exhibiting a time-dependence influenced by slow-feature analysis.

      Conversely, our model simultaneously produces units with place and headdirection cell profiles by processing visual inputs sampled randomly across locations and angles, independent of temporal or motion-related factors. This positions our model as a more general and fundamental null hypothesis, ideal for challenging prevailing theories on spatial cells due to its complete lack of spatial or motion grounding.

      Finally, unlike Franzius et al. (2007), who do not evaluate the functional utility of their spatial representations, we test whether the emergent spatial cells are useful for spatial decoding. We find that not only do spatial cells emerge in our non-spatial model, but they also fail to significantly aid in location or head-direction decoding. This is the central contribution of our work: spatial cells can arise without spatial or sensory grounding, and their functional relevance is limited. We have updated the manuscript to clarify the novelty of the current contribution to previous work (lines 324-335).

      In Fig. 2, it may be useful to plot the error in absolute units, rather than the normalized error. The direction decoding can be quantified in terms of degree Also, it would be helpful to compare the accuracy of spatial localization to that of the actual place cells in rodents.

      We argue it makes more sense and put comparison in perspective when we normalize the error by dividing the maximal error possible under each task. For transparency, we plot the errors in absolute physical units used by the Unity game engine in the updated Appendix (Fig. 1).

      Reviewer #2 (Recommendations For The Authors):

      Regarding the involvement of ’classified cells’ in decoding, I think a useful way to present the results would be to show the relationship between ’placeness’, ’directioness’ and ’borderness’ and the strength of the decoder weights. Either as a correlation or as a full scatter plot.

      We appreciate your suggestion to visualize the relationship between units’ spatial properties and their corresponding decoder weights. We believe it would be an important addition to our existing results. Based on the exclusion analyses, we anticipated the correlation to be low, and the additional results support this expectation.

      As an example, we present unit plots below for VGG-16 (pre-trained and untrained, at its penultimate layer with sampling rate equals 0.3; Author response image 1 and 2). Additional plots for various layers and across models are included in the supplementary materials (Fig. S12-S28). Consistently across conditions, we observed no significant correlations between units’ spatial properties (e.g., placeness) and their decoding weight strengths. These results further corroborate the conclusions drawn from our exclusion analyses.

      Reviewer #3 (Recommendations For The Authors):

      My main suggestions are that the authors: -perform manipulations to the sensory environment similar to those done in experimental work, and report if their tuned cells respond in similar ways -quantitatively compare the degree of spatial tuning in their networks to that seen in publicly available data -re-frame the discussion of their results to critically engage with and contribute to the field and its past work on sensory influences to these cells

      As we noted in our opening section, our model is not intended as a model of the brain. It is a non-spatial null model, and we present the surprising finding that even such a model contains spatial cell-like units if identified using criteria typically used in the field. This raises the question whether simply finding cells that show spatial properties is sufficient to grant the special status of “cell type” that is involved in the brain function of interest.

      Author response image 1.

      VGG-16 (pre-trained), penultimate layer units, show no apparent relationship between spatial properties and their decoder weight strengths.

      Author response image 2.

      VGG-16 (untrained), penultimate layer units, show no apparent relationship between spatial properties and their decoder weight strengths.

      Furthermore, our main simulations were designed to be compared to experimental work where rodents foraged around square environments in the lab. We did not do an extensive set of simulations as the purpose of our study is not to show that we capture exactly every single experimental finding, but rather raise the issues with the functional cell type definition and identification approach for progressing neuroscientific knowledge.

      Finally, as we note in more detail below, different labs use different criteria for identifying spatial cells, which depend both on the lab and the experimental design. Our point is that we can identify such cells using criteria set by neuroscientists, and that such cell types may not reflect any special status in spatial processing. Additional simulations that show less alignment with certain datasets will not provide support for or against our general message.

      References

      Banino A, Barry C, Uria B, Blundell C, Lillicrap T, Mirowski P, Pritzel A, Chadwick MJ, Degris T, Modayil J, Wayne G, Soyer H, Viola F, Zhang B, Goroshin R, Rabinowitz N, Pascanu R, Beattie C, Petersen S, Sadik A, Gaffney S, King H, Kavukcuoglu K, Hassabis D, Hadsell R, Kumaran D (2018) Vector-based navigation using grid-like representations in artificial agents. Nature 557(7705):429–433, DOI 10.1038/s41586-018-0102-6, URL http://www.nature.com/articles/s41586-018-0102-6

      DiCarlo JJ, Zoccolan D, Rust NC (2012) How Does the Brain Solve Visual Object Recognition? Neuron 73(3):415–434, DOI 10.1016/J.NEURON.2012.01.010, URL https://www.cell.com/neuron/fulltext/S0896-6273(12)00092-X

      Diehl GW, Hon OJ, Leutgeb S, Leutgeb JK (2017) Grid and Nongrid Cells in Medial Entorhinal Cortex Represent Spatial Location and Environmental Features with Complementary Coding Schemes. Neuron 94(1):83– 92.e6, DOI 10.1016/j.neuron.2017.03.004, URL https://linkinghub.elsevier.com/retrieve/pii/S0896627317301873

      Dombeck DA, Harvey CD, Tian L, Looger LL, Tank DW (2010) Functional imaging of hippocampal place cells at cellular resolution during virtual navigation. Nature Neuroscience 13(11):1433–1440, DOI 10.1038/nn.2648, URL https://www.nature.com/articles/nn.2648

      Ebitz RB, Hayden BY (2021) The population doctrine in cognitive neuroscience. Neuron 109(19):3055–3068, DOI 10.1016/j.neuron. 2021.07.011, URL https://linkinghub.elsevier.com/retrieve/pii/S0896627321005213

      Grieves RM, Jedidi-Ayoub S, Mishchanchuk K, Liu A, Renaudineau S, Jeffery KJ (2020) The place-cell representation of volumetric space in rats. Nature Communications 11(1):789, DOI 10.1038/s41467-020-14611-7, URL https://www.nature.com/articles/s41467-020-14611-7

      Grijseels DM, Shaw K, Barry C, Hall CN (2021) Choice of method of place cell classification determines the population of cells identified. PLOS Computational Biology 17(7):e1008835, DOI 10.1371/journal.pcbi.1008835, URL https://dx.plos.org/10.1371/journal.pcbi.1008835

      Horrocks EAB, Rodrigues FR, Saleem AB (2024) Flexible neural population dynamics govern the speed and stability of sensory encoding in mouse visual cortex. Nature Communications 15(1):6415, DOI 10.1038/s41467-024-50563-y, URL https://www.nature.com/articles/s41467-024-50563-y

      Høydal , Skytøen ER, Andersson SO, Moser MB, Moser EI (2019) Objectvector coding in the medial entorhinal cortex. Nature 568(7752):400– 404, DOI 10.1038/s41586-019-1077-7, URL https://www.nature.com/articles/s41586-019-1077-7

      Ormond J, O’Keefe J (2022) Hippocampal place cells have goal-oriented vector fields during navigation. Nature 607(7920):741–746, DOI 10.1038/s41586-022-04913-9, URL https://www.nature.com/articles/s41586-022-04913-9

      Ouchi A, Fujisawa S (2024) Predictive grid coding in the medial entorhinal cortex. Science 385(6710):776–784, DOI 10.1126/science.ado4166, URL https://www.science.org/doi/10.1126/science.ado4166

      Sarel A, Finkelstein A, Las L, Ulanovsky N (2017) Vectorial representation of spatial goals in the hippocampus of bats. Science 355(6321):176–180, DOI 10.1126/science.aak9589, URL https://www.science.org/doi/10.1126/science.aak9589

      Sun C, Yang W, Martin J, Tonegawa S (2020) Hippocampal neurons represent events as transferable units of experience. Nature Neuroscience 23(5):651–663, DOI 10.1038/s41593-020-0614-x, URL https://www.nature.com/articles/s41593-020-0614-x

      Tanaka KZ, He H, Tomar A, Niisato K, Huang AJY, McHugh TJ (2018) The hippocampal engram maps experience but not place. Science 361(6400):392–397, DOI 10.1126/science.aat5397, URL https://www.science.org/doi/10.1126/science.aat5397

      Tanni S, De Cothi W, Barry C (2022) State transitions in the statistically stable place cell population correspond to rate of perceptual change. Current Biology 32(16):3505–3514.e7, DOI 10.1016/j.cub. 2022.06.046, URL https://linkinghub.elsevier.com/retrieve/pii/S0960982222010089

      Tong F, Pratte MS (2012) Decoding Patterns of Human Brain Activity. Annual Review of Psychology 63(1):483–509, DOI 10.1146/annurev-psych-120710-100412, URL https://www.annualreviews.org/doi/10.1146/annurev-psych-120710-100412

    1. eLife Assessment

      This important study uses an original method to address the longstanding question of why reaching movements are often biased. The combination of a wide range of experimental conditions and computational modeling is a strength. Solid evidence is presented in support of the main claim that most of the biases in 2-D movement planning originate in misalignment between visuo-proprioceptive reference frames.

    2. Reviewer #1 (Public review):

      Wang et al. studied an old, still unresolved problem: Why are reaching movements often biased? Using data from a set of new experiments and from earlier studies, they identified how the bias in reach direction varies with movement direction and movement extent, and how this depends on factors such as the hand used, the presence of visual feedback, the size and location of the workspace, the visibility of the start position and implicit sensorimotor adaptation. They then examined whether a target bias, a proprioceptive bias, a bias in the transformation from visual to proprioceptive coordinates and/or biomechanical factors could explain the observed patterns of biases. The authors conclude that biases are best explained by a combination of transformation and target biases.

      A strength of this study is that it used a wide range of experimental conditions with also a high resolution of movement directions and large numbers of participants, which produced a much more complete picture of the factors determining movement biases than previous studies did. The study used an original, powerful and elegant method to distinguish between the various possible origins of motor bias, based on the number of peaks in the motor bias plotted as a function of movement direction. The biomechanical explanation of motor biases could not be tested in this way, but this explanation was excluded in a different way using data on implicit sensorimotor adaptation. This was also an elegant method as it allowed the authors to test biomechanical explanations without the need to commit to a certain biomechanical cost function.

      Overall, the authors have done a good job mapping out reaching biases in a wide range of conditions, revealing new patterns in one of the most basic tasks, and the evidence for the proposed origins is convincing. The study will likely have substantial impact on the field, as the approach taken is easily applicable to other experimental conditions. As such, the study can spark future research on the origin of reaching biases.

    3. Reviewer #2 (Public review):

      Summary:

      This work examines an important question in the planning and control of reaching movements - where do biases in our reaching movements arise and what might this tell us about the planning process. They compare several different computational models to explain the results from a range of experiments including those within the literature. Overall, they highlight that motor biases are primarily caused errors in the transformation between eye and hand reference frames. One strength of the paper is the large numbers of participants studied across many experiments. However, one weakness is that most of the experiments follow a very similar planar reaching design - with slicing movements through targets rather than stopping within a target. This is partially addressed with Exp 4. This work provides a valuable insight into the biases that govern reaching movements. While the evidence is solid for planar reaching movements, further support in the manner of 3D reaching movements would help strengthen the findings.

      Strengths:

      The work uses a large number of participants both with studies in the laboratory which can be controlled well and a huge number of participants via online studies. In addition, they use a large number of reaching directions allowing careful comparison across models. Together these allow a clear comparison between models which is much stronger than would usually be performed.

    4. Reviewer #3 (Public review):

      This study makes excellent use of a uniquely large dataset of reaching movements collected over several decades to evaluate the origins of systematic motor biases. The analyses convincingly demonstrate that these biases are not explained by errors in sensed hand position or by biomechanical constraints, but instead arise from a misalignment between eye-centric and body-centric representations of position. By testing multiple computational models across diverse contexts-including different effectors, visible versus occluded start positions-the authors provide strong evidence for their transformation model. My earlier concerns have been addressed, and I find the work to be a significant and timely contribution that will be of broad interest to researchers studying visuomotor control, perception, and sensorimotor integration.

    5. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public Review):

      Wang et al. studied an old, still unresolved problem: Why are reaching movements often biased? Using data from a set of new experiments and from earlier studies, they identified how the bias in reach direction varies with movement direction, and how this depends on factors such as the hand used, the presence of visual feedback, the size and location of the workspace, the visibility of the start position and implicit sensorimotor adaptation. They then examined whether a visual bias, a proprioceptive bias, a bias in the transformation from visual to proprioceptive coordinates and/or biomechanical factors could explain the observed patterns of biases. The authors conclude that biases are best explained by a combination of transformation and visual biases.

      A strength of this study is that it used a wide range of experimental conditions with also a high resolution of movement directions and large numbers of participants, which produced a much more complete picture of the factors determining movement biases than previous studies did. The study used an original, powerful, and elegant method to distinguish between the various possible origins of motor bias, based on the number of peaks in the motor bias plotted as a function of movement direction. The biomechanical explanation of motor biases could not be tested in this way, but this explanation was excluded in a different way using data on implicit sensorimotor adaptation. This was also an elegant method as it allowed the authors to test biomechanical explanations without the need to commit to a certain biomechanical cost function.

      We thank the reviewer for their enthusiastic comments.

      (1) The main weakness of the study is that it rests on the assumption that the number of peaks in the bias function is indicative of the origin of the bias. Specifically, it is assumed that a proprioceptive bias leads to a single peak, a transformation bias to two peaks, and a visual bias to four peaks, but these assumptions are not well substantiated. Especially the assumption that a transformation bias leads to two peaks is questionable. It is motivated by the fact that biases found when participants matched the position of their unseen hand with a visual target are consistent with this pattern. However, it is unclear why that task would measure only the effect of transformation biases, and not also the effects of visual and proprioceptive biases in the sensed target and hand locations. Moreover, it is not explained why a transformation bias would lead to this specific bias pattern in the first place.

      We would like to clarify two things.

      Frist, the measurements of the transformation bias are not entirely independent of proprioceptive and visual biases. Specifically, we define transformation bias as the misalignment between the internal representation of a visual target and the corresponding hand position. By this definition, the transformation error entails both visual and proprioceptive biases (see Author response image 1). Transformation biases have been empirically quantified in numerous studies using matching tasks, where participants either aligned their unseen hand to a visual target (Wang et al., 2021) or aligned a visual target to their unseen hand (Wilson et al., 2010). Indeed, those tasks are always considered as measuring proprioceptive biases assuming visual bias is small given the minimal visual uncertainty.

      Author response image 1.

      Second, the critical difference between models is in how these biases influence motor planning rather than how those biases are measured. In the Proprioceptive bias model, a movement is planned in visual space. The system perceives the starting hand position in proprioceptive space and transforms this into visual space (Vindras & Viviani, 1998; Vindras et al., 2005). As such, bias only affects the perceived starting position; there is no influence on the perceived target location (no visual bias).

      In contrast, the Transformation bias model proposes that while both the starting and target positions are perceived in visual space, movement is planned in proprioceptive space. Consequently, both positions must be transformed from visual space to proprioceptive coordinates before movement planning (i.e., where is my sensed hand and where do I want it to be). Under this framework, biases can emerge from both the start and target positions. This is how the transformation model leads to different predictions compared to the perceptual models, even if the bias is based on the same measurements.

      We now highlight the differences between the Transformation bias model and the Proprioceptive bias model explicitly in the Results section (Lines 192-200):

      “Note that the Proprioceptive Bias model and the Transformation Bias model tap into the same visuo-proprioceptive error map. The key difference between the two models arises in how this error influences motor planning. For the Proprioceptive Bias model, planning is assumed to occur in visual space. As such, the perceived position of the hand (based on proprioception) is transformed into the visual space. This will introduce a bias in the representation of the start position. In contrast, the Transformation Bias model assumes that the visually-based representations of the start and target positions need to be transformed into proprioceptive space for motor planning. As such, both positions are biased in the transformation process. In addition to differing in terms of their representation of the target, the error introduced at the start position is in opposite directions due to the direction of the transformation (see fig 1g-h).”

      In terms of the motor bias function across the workspace, the peaks are quantitatively derived from the model simulations. The number of peaks depends on how we formalize each model. Importantly, this is a stable feature of each model, regardless of how the model is parameterized. Thus, the number of peaks provides a useful criterion to evaluate different models.

      Figure 1 g-h illustrates the intuition of how the models generate distinct peak patterns. We edited the figure caption and reference this figure when we introduce the bias function for each model.

      (2) Also, the assumption that a visual bias leads to four peaks is not well substantiated as one of the papers on which the assumption was based (Yousif et al., 2023) found a similar pattern in a purely proprioceptive task.

      What we referred to in the original submission as “visual bias” is not an eye-centric bias, nor is it restricted to the visual system. Rather, it may reflect a domain-general distortion in the representation of position within polar space. We called it a visual bias as it was associated with the perceived location of the visual target in the current task. To avoid confusion, we have opted to move to a more general term and now refer to this as “target bias.”

      We clarify the nature of this bias when introducing the model in the Results section (Lines 164-169):

      “Since the task permits free viewing without enforced fixation, we assume that participants shift their gaze to the visual target; as such, an eye-centric bias is unlikely. Nonetheless, prior studies have shown a general spatial distortion that biases perceived target locations toward the diagonal axes(Huttenlocher et al., 2004; Kosovicheva & Whitney, 2017). Interestingly, this bias appears to be domain-general, emerging not only for visual targets but also for proprioceptive ones(Yousif et al., 2023). We incorporated this diagonal-axis spatial distortion into a Target Bias model. This model predicts a four-peaked motor bias pattern (Fig 1f).”

      We also added a paragraph in the Discussion to further elaborate on this model (Lines 502-511):

      “What might be the source of the visual bias in the perceived location of the target? In the perception literature, a prominent theory has focused on the role of visual working memory account based on the observation that in delayed response tasks, participants exhibit a bias towards the diagonals when recalling the location of visual stimuli(Huttenlocher et al., 2004; Sheehan & Serences, 2023). Underscoring that the effect is not motoric, this bias is manifest regardless of whether the response is made by an eye movement, pointing movement, or keypress(Kosovicheva & Whitney, 2017). However, this bias is unlikely to be dependent on a visual input as similar diagonal bias is observed when the target is specified proprioceptively via the passive displacement of an unseen hand(Yousif et al., 2023). Moreover, as shown in the present study, a diagonal bias is observed even when the target is continuously visible. Thus, we hypothesize that the bias to perceive the target towards the diagonals reflects a more general distortion in spatial representation rather than being a product of visual working memory.”

      (3) Another weakness is that the study looked at biases in movement direction only, not at biases in movement extent. The models also predict biases in movement extent, so it is a missed opportunity to take these into account to distinguish between the models.

      We thank the reviewer for this suggestion. We have now conducted a new experiment to assess angular and extent biases simultaneously (Figure 4a; Exp. 4; N = 30). Using our KINARM system, participants were instructed to make center-out movements that would terminate (rather than shoot past) at the visual target. No visual feedback was provided throughout the experiment.

      The Transformation Bias model predicts a two-peaked error function in both the angular and extent dimensions (Figure 4c). Strikingly, when we fit the data from the new experiment to both dimensions simultaneously, this model captures the results qualitatively and quantitatively (Figure 4e). In terms of model comparison, it outperformed alternative models (Figure 4g) particularly when augmented with a visual bias component. Together, these results provide strong evidence that a mismatch between visual and proprioceptive space is a key source of motor bias.

      This experiment is now reported within the revised manuscript (Lines 280-301).

      Overall, the authors have done a good job mapping out reaching biases in a wide range of conditions, revealing new patterns in one of the most basic tasks, but unambiguously determining the origin of these biases remains difficult, and the evidence for the proposed origins is incomplete. Nevertheless, the study will likely have a substantial impact on the field, as the approach taken is easily applicable to other experimental conditions. As such, the study can spark future research on the origin of reaching biases.

      We thank the reviewer for these summary comments. We believe that the new experiments and analyses do a better job of identifying the origins of motor biases.

      Reviewer #2 (Public Review):

      Summary:

      This work examines an important question in the planning and control of reaching movements - where do biases in our reaching movements arise and what might this tell us about the planning process? They compare several different computational models to explain the results from a range of experiments including those within the literature. Overall, they highlight that motor biases are primarily caused by errors in the transformation between eye and hand reference frames. One strength of the paper is the large number of participants studied across many experiments. However, one weakness is that most of the experiments follow a very similar planar reaching design - with slicing movements through targets rather than stopping within a target. Moreover, there are concerns with the models and the model fitting. This work provides valuable insight into the biases that govern reaching movements, but the current support is incomplete.

      Strengths:

      The work uses a large number of participants both with studies in the laboratory which can be controlled well and a huge number of participants via online studies. In addition, they use a large number of reaching directions allowing careful comparison across models. Together these allow a clear comparison between models which is much stronger than would usually be performed.

      We thank the reviewer for their encouraging comments.

      Weaknesses:

      Although the topic of the paper is very interesting and potentially important, there are several key issues that currently limit the support for the conclusions. In particular I highlight:

      (1) Almost all studies within the paper use the same basic design: slicing movements through a target with the hand moving on a flat planar surface. First, this means that the authors cannot compare the second component of a bias - the error in the direction of a reach which is often much larger than the error in reaching direction.

      Reviewer 1 made a similar point, noting that we had missed an opportunity to provide a more thorough assessment of reaching biases. As described above, we conducted a new experiment in which participants made pointing movements, instructed to terminate the movements at the target. These data allow us to analyze errors in both angular and extent dimensions. The transformation bias model successfully predicts angular and extent biases, outperformed the other models at both group and individual levels. We have now included this result as Exp 4 in the manuscript. Please see response to Reviewer 1 Comment 3 for details.

      Second, there are several studies that have examined biases in three-dimensional reaching movements showing important differences to two-dimensional reaching movements (e.g. Soechting and Flanders 1989). It is unclear how well the authors' computational models could explain the biases that are present in these much more common-reaching movements.

      This is an interesting issue to consider. We expect the mechanisms identified in our 2D work will generalize to 3D.

      Soechting and Flanders (1989) quantified 3D biases by measuring errors across multiple 2D planes at varying heights (see Author response image 2 for an example from their paper). When projecting their 3-D bias data to a horizontal 2D space, the direction of the bias across the 2D plane looks relatively consistent across different heights even though the absolute value of the bias varies (Author response image 2). For example, the matched hand position is generally to the leftwards and downward of the target. Therefore, the models we have developed and tested in a specific 2D plane are likely to generalize to other 2D plane of different heights.

      Author response image 2.

      However, we think the biases reported by Soechting and Flanders likely reflect transformation biases rather than motor biases. First, the movements in their study were performed very slowly (3–5 seconds), more similar to our proprioceptive matching tasks and much slower than natural reaching movements (<500ms). Given the slow speed, we suspect that motor planning in Soechting and Flanders was likely done in a stepwise, incremental manner (closed loop to some degree). Second, the bias pattern reported in Soechting and Flanders —when projected into 2D space— closely mirrors the leftward transformation errors observed in previous visuo-proprioceptive matching task (e.g., Wang et al., 2021).

      In terms of the current manuscript, we think that our new experiment (Exp 4, where we measure angular and radial error) provides strong evidence that the transformation bias model generalizes to more naturalistic pointing movements. As such, we expect these principles will generalize were we to examine movements in three dimensions, an extension we plan to test in future work.

      (2) The model fitting section is under-explained and under-detailed currently. This makes it difficult to accurately assess the current model fitting and its strength to support the conclusions. If my understanding of the methods is correct, then I have several concerns. For example, the manuscript states that the transformation bias model is based on studies mapping out the errors that might arise across the whole workspace in 2D. In contrast, the visual bias model appears to be based on a study that presented targets within a circle (but not tested across the whole workspace). If the visual bias had been measured across the workspace (similar to the transformation bias model), would the model and therefore the conclusions be different?

      We have substantially expanded the Methods section to clarify the modeling procedures (detailed below in section “Recommendations for the Authors”). We also provide annotated code to enable others to easily simulate the models.

      Here we address three points relevant to the reviewer’s concern about whether the models were tested on equal footing, and in particular, concern that the transformation bias model was more informed by prior literature than the visual bias model.

      First, our center-out reaching task used target locations that have been employed in both visual and proprioceptive bias studies, offering reasonable comprehensive coverage of the workspace. For example, for a target to the left of the body’s midline, visual biases tend to be directed diagonally (Kosovicheva & Whitney, 2017), while transformation biases are typically leftward and downward (Wang et al, 2021). In this sense, the models were similarly constrained by prior findings.

      Second, while the qualitative shape of each model was guided by prior empirical findings, no previous data were directly used to quantitatively constrain the models. As such, we believe the models were evaluated on equal footing. No model had more information or, best we can tell, an inherent advantage over the others.

      Third, reassuringly, the fitted transformation bias closely matches empirically observed bias maps reported in prior studies (Fig 2h). The strong correspondence provides convergent validity and supports the putative causality between transformation biases to motor biases.

      (3) There should be other visual bias models theoretically possible that might fit the experimental data better than this one possible model. Such possibilities also exist for the other models.

      Our initial hypothesis, grounded in prior literature, was that motor biases arise from a combination of proprioceptive and visual biases. This led us to thoroughly explore a range of visual models. We now describe these alternatives below, noting that in the paper, we chose to focus on models that seemed the most viable candidates. (Please also see our response to Reviewer 3, Point 2, on another possible source of visual bias, the oblique effect.)

      Quite a few models have described visual biases in perceiving motion direction or object orientation (e.g., Wei & Stocker, 2015; Patten, Mannion & Clifford, 2017). Orientation perception would be biased towards the Cartesian axis, generating a four-peak function. However, these models failed to account for the motor biases observed in our experiments. This is not surprising given that these models were not designed to capture biases related to a static location.

      We also considered a class of eye-centric models where biases for peripheral locations are measured under fixation. A prominent finding here is that the bias is along the radial axis in which participants overshoot targets when they fixate on the start position during the movement (Beurze et al., 2006; Van Pelt & Medendorp, 2008). Again, this is not consistent with the observed motor biases. For example, participants undershoot rightward targets when we measured the distance bias in Exp 4. Importantly, since most our tasks involved free viewing in natural settings with no fixation requirements, we considered it unlikely that biases arising from peripheral viewing play a major role.

      We note, though, that in our new experiment (Exp 4), participants observed the visual stimuli from a fixed angle in the KinArm setup (see Figure 4a). This setup has been shown to induce depth-related visual biases (Figure 4b, e.g., Volcic et al., 2013; Hibbard & Bradshaw, 2003). For this reason, we implemented a model incorporating this depth bias as part of our analyses of these data. While this model performed significantly worse than the transformation bias model alone, a mixed model that combined the depth bias and transformation bias provided the best overall fit. We now include this result in the main text (Lines 286-294).

      We also note that the “visual bias” we referred to in the original submission is not restricted to the visual system. A similar bias pattern has been observed when the target is presented visually or proprioceptively (Kosovicheva & Whitney, 2017; Yousif, Forrence, & McDougle, 2023). As such, it may reflect a domaingeneral distortion in the representation of position within polar space. Accordingly, in the revision, we now refer to this in a more general way, using the term “target bias.” We justify this nomenclature when introducing the model in the Results section (Lines 164-169). Please also see Reviewer 1 comment 2.

      We recognize that future work may uncover a better visual model or provide a more fine-grained account of visual biases (or biases from other sources). With our open-source simulation code, such biases can be readily incorporated—either to test them against existing models or to combine them with our current framework to assess their contribution to motor biases. Given our explorations, we expect our core finding will hold: Namely, that a combination of transformation and target biases offers the most parsimonious account, with the bias associated with the transformation process explaining the majority of the observed motor bias in visually guided movements.

      Given the comments from the reviewer, we expanded the discussion session to address the issue of alternative models of visual bias (lines 522-529):

      “Other forms of visual bias may influence movement. Depth perception biases could contribute to biases in movement extent(Beurze et al., 2006; Van Pelt & Medendorp, 2008). Visual biases towards the principal axes have been reported when participants are asked to report the direction of moving targets or the orientation of an object(Patten et al., 2017; Wei & Stocker, 2015). However, the predicted patterns of reach biases do not match the observed biases in the current experiments. We also considered a class of eye-centric models in which participants overestimate the radial distance to a target while maintaining central fixation(Beurze et al., 2006; Van Pelt & Medendorp, 2008). At odds with this hypothesis, participants undershot rightward targets when we measured the radial bias in Exp 4. The absence of these other distortions of visual space may be accounted for by the fact that we allowed free viewing during the task.”

      (4) Although the authors do mention that the evidence against biomechanical contributions to the bias is fairly weak in the current manuscript, this needs to be further supported. Importantly both proprioceptive models of the bias are purely kinematic and appear to ignore the dynamics completely. One imagines that there is a perceived vector error in Cartesian space whereas the other imagines an error in joint coordinates. These simply result in identical movements which are offset either with a vector or an angle. However, we know that the motor plan is converted into muscle activation patterns which are sent to the muscles, that is, the motor plan is converted into an approximation of joint torques. Joint torques sent to the muscles from a different starting location would not produce an offset in the trajectory as detailed in Figure S1, instead, the movements would curve in complex patterns away from the original plan due to the non-linearity of the musculoskeletal system. In theory, this could also bias some of the other predictions as well. The authors should consider how the biomechanical plant would influence the measured biases.

      We thank the reviewer for encouraging us on this topic and to formalize a biomechanical model. In response, we have implemented a state-of-the-art biomechanical framework, MotorNet

      (https://elifesciences.org/articles/88591), which simulates a six-muscle, two-skeleton planar arm model using recurrent neural networks (RNNs) to generate control policies (See Figure 6a). This model captures key predictions about movement curvature arising from biomechanical constraints. We view it as a strong candidate for illustrating how motor bias patterns could be shaped by the mechanical properties of the upper limb.

      Interestingly, the biomechanical model did not qualitatively or quantitatively reproduce the pattern of motor biases observed in our data. Specifically, we trained 50 independent agents (RNNs) to perform random point-to-point reaching movements across the workspace used in our task. We used a loss function that minimized the distance between the fingertip and the target over the entire trajectory. When tested on a center-out reaching task, the model produced a four-peaked motor bias pattern (Figure 6b), in contrast to the two-peaked function observed empirically. These results suggest that upper limb biomechanical constraints are unlikely to be a primary driver of motor biases in reaching. This holds true even though the reported bias is read out at 60% of the reaching distance, where biomechanical influences on the curvature of movement are maximal. We have added this analysis to the results (lines 367-373).

      It may seem counterintuitive that biomechanics plays a limited role in motor planning. This could be due to several factors. First, First, task demands (such as the need to grasp objects) may lead the biomechanical system to be inherently organized to minimize endpoint errors (Hu et al., 2012; Trumbower et al., 2009). Second, through development and experience, the nervous system may have adapted to these biomechanical influences—detecting and compensating for them over time (Chiel et al., 2009).

      That said, biomechanical constraints may make a larger contribution in other contexts; for example, when movements involve more extreme angles or span larger distances, or in individuals with certain musculoskeletal impairments (e.g., osteoarthritis) where physical limitations are more likely to come into play. We address this issue in the revised discussion.

      “Nonetheless, the current study does not rule out the possibility that biomechanical factors may influence motor biases in other contexts. Biomechanical constraints may have had limited influence in our experiments due to the relatively modest movement amplitudes used and minimal interaction torques involved. Moreover, while we have focused on biases that manifest at the movement endpoint, biomechanical constraints might introduce biases that are manifest in the movement trajectories.(Alexander, 1997; Nishii & Taniai, 2009) Future studies are needed to examine the influence of context on reaching biases.”

      Reviewer #3 (Public review):

      The authors make use of a large dataset of reaches from several studies run in their lab to try to identify the source of direction-dependent radial reaching errors. While this has been investigated by numerous labs in the past, this is the first study where the sample is large enough to reliably characterize isometries associated with these radial reaches to identify possible sources of errors.

      (1) The sample size is impressive, but the authors should Include confidence intervals and ideally, the distribution of responses across individuals along with average performance across targets. It is unclear whether the observed “averaged function” is consistently found across individuals, or if it is mainly driven by a subset of participants exhibiting large deviations for diagonal movements. Providing individual-level data or response distributions would be valuable for assessing the ubiquity of the observed bias patterns and ruling out the possibility that different subgroups are driving the peaks and troughs. It is possible that the Transformation or some other model (see below) could explain the bias function for a substantial portion of participants, while other participants may have different patterns of biases that can be attributable to alternative sources of error.

      We thank the reviewer for encouraging a closer examination of the individual-level data. We did include standard error when we reported the motor bias function. Given that the error distribution is relatively Gaussian, we opted to not show confidence intervals since they would not provide additional information.

      To examine individual differences, we now report a best-fit model frequency analysis. For Exp 1, we fit each model at the individual level and counted the number of participants that are best predicted by each model. Among the four single source models (Figure 3a), the vast majority of participants are best explained by the transformation bias model (48/56). When incorporating mixture models, the combined transformation + target bias model emerged as the best fit for almost all participants across experiments (50/56). The same pattern holds for Exp 3b, the frequency analysis is more distributed, likely due to the added noise that comes with online studies.

      We report this new analysis in the Results. (see Fig 3. Fig S2). Note that we opted to show some representative individual fits, selecting individuals whose data were best predicted by different models (Fig S2). Given that the number of peaks characterizes each model (independent of the specific parameter values), the two-peaked function exhibited for most participants indicates that the Transformation bias model holds at the individual level and not just at the group level.

      (2) The different datasets across different experimental settings/target sets consistently show that people show fewer deviations when making cardinal-directed movements compared to movements made along the diagonal when the start position is visible. This reminds me of a phenomenon referred to as the oblique effect: people show greater accuracy for vertical and horizontal stimuli compared to diagonal ones. While the oblique effect has been shown in visual and haptic perceptual tasks (both in the horizontal and vertical planes), there is some evidence that it applies to movement direction. These systematic reach deviations in the current study thus may reflect this epiphenomenon that applies across modalities. That is, estimating the direction of a visual target from a visual start position may be less accurate, and may be more biased toward the horizontal axis, than for targets that are strictly above, below, left, or right of the visual start position. Other movement biases may stem from poorer estimation of diagonal directions and thus reflect more of a perceptual error than a motor one. This would explain why the bias function appears in both the in-lab and on-line studies although the visual targets are very different locations (different planes, different distances) since the oblique effects arise independent of plane, distance, or size of the stimuli. When the start position is not visible like in the Vindras study, it is possible that this oblique effect is less pronounced; masked by other sources of error that dominate when looking at 2D reach endpoint made from two separate start positions, rather than only directional errors from a single start position. Or perhaps the participants in the Vindras study are too variable and too few (only 10) to detect this rather small direction-dependent bias.

      The potential link between the oblique effect and the observed motor bias is an intriguing idea, one that we had not considered. However, after giving this some thought, we see several arguments against the idea that the oblique effect accounts for the pattern of motor biases.

      First, by the oblique effect, perceptual variability is greater along the diagonal axes compared to the cardinal axes. These differences in perceptual variability have been used to explain biases in visual perception through a Bayesian model under the assumption that the visual system has an expectation that stimuli are more likely to be oriented along the cardinal axes (Wei & Stocker, 2015). Importantly, the model predicts low biases at targets with peak perceptual variability. As such, even though those studies observed that participants showed large variability for stimuli at diagonal orientations, the bias for these stimuli was close to zero. Given we observed a large bias for targets at locations along the diagonal axes, we do not think this visual effect can explain the motor bias function.

      Second, the reviewer suggested that the observed motor bias might be largely explained by visual biases (or what we now refer to as target biases). If this hypothesis is correct, we would anticipate observing a similar bias pattern in tasks that use a similar layout for visual stimuli but do not involve movement. However, this prediction is not supported. For example, Kosovicheva & Whitney (2017) used a position reproduction/judgment task with keypress responses (no reaching). The stimuli were presented in a similar workspace as in our task. Their results showed four-peaked bias function while our results showed a two-peaked function.

      In summary, we don’t think oblique biases make a significant contribution to our results.

      A bias in estimating visual direction or visual movement vector Is a more realistic and relevant source of error than the proposed visual bias model. The Visual Bias model is based on data from a study by Huttenlocher et al where participants “point” to indicate the remembered location of a small target presented on a large circle. The resulting patterns of errors could therefore be due to localizing a remembered visual target, or due to relative or allocentric cues from the clear contour of the display within which the target was presented, or even movements used to indicate the target. This may explain the observed 4-peak bias function or zig-zag pattern of “averaged” errors, although this pattern may not even exist at the individual level, especially given the small sample size. The visual bias source argument does not seem well-supported, as the data used to derive this pattern likely reflects a combination of other sources of errors or factors that may not be applicable to the current study, where the target is continuously visible and relatively large. Also, any visual bias should be explained by a coordinates centre on the eye and should vary as a function of the location of visual targets relative to the eyes. Where the visual targets are located relative to the eyes (or at least the head) is not reported.

      Thank you for this question. A few key points to note:

      The visual bias model has also been discussed in studies using a similar setup to our study. Kosovicheva & Whitney (2017) observed a four-peaked function in experiments in which participants report a remembered target position on a circle by either making saccades or using key presses to adjust the position of a dot. However, we agree that this bias may be attenuated in our experiment given that the target is continuously visible. Indeed, the model fitting results suggest the peak of this bias is smaller in our task (~3°) compared to previous work (~10°, Kosovicheva & Whitney, 2017; Yousif, Forrence, & McDougle, 2023).

      We also agree with the reviewer that this “visual bias” is not an eye-centric bias, nor is it restricted to the visual system. A similar bias pattern is observed even if the target is presented proprioceptively (Yousif, Forrence, & McDougle, 2023). As such, this bias may reflect a domain-general distortion in the representation of position within polar space. Accordingly, in the revision, we now refer to this in a more general way, using the term “target bias”, rather than visual bias. We justify this nomenclature when introducing the model in the Results section (Lines 164-169). Please also see Reviewer 1 comment 2 for details.

      Motivated by Reviewer 2, we also examined multiple alternative visual bias models (please refer to our response to Reviewer 2, Point 3.

      The Proprioceptive Bias Model is supposed to reflect errors in the perceived start position. However, in the current study, there is only a single, visible start position, which is not the best design for trying to study the contribution. In fact, my paradigms also use a single, visual start position to minimize the contribution of proprioceptive biases, or at least remove one source of systematic biases. The Vindras study aimed to quantify the effect of start position by using two sets of radial targets from two different, unseen start positions on either side of the body midline. When fitting the 2D reach errors at both the group and individual levels (which showed substantial variability across individuals), the start position predicted most of the 2D errors at the individual level – and substantially more than the target direction. While the authors re-plotted the data to only illustrate angular deviations, they only showed averaged data without confidence intervals across participants. Given the huge variability across their 10 individuals and between the two target sets, it would be more appropriate to plot the performance separately for two target sets and show confidential intervals (or individual data). Likewise, even the VT model predictions should differ across the two targets set since the visual-proprioceptive matching errors from the Wang et al study that the model is based on, are larger for targets on the left side of the body.

      To be clear, in the Transformation bias model, the vector bias at the start position is also an important source of error. The critical difference between the proprioceptive and transformation models is how bias influences motor planning. In the Proprioceptive bias model, movement is planned in visual space. The system perceives the starting hand position in proprioceptive space and transforms this into visual space (Vindras & Viviani, 1998; Vindras et al., 2005). As such, the bias is only relevant in terms of the perceived start position; it does not influence the perceived target location. In contrast, the transformation bias model proposes that while both the starting and target positions are perceived in visual space, movements are planned in proprioceptive space. Consequently, when the start and target positions are visible, both positions must be transformed from visual space to proprioceptive coordinates before movement planning. Thus, bias will influence both the start and target positions. We also note that to set the transformation bias for the start/target position, we referred to studies in which bias is usually referred to as proprioception error measurement. As such, changing the start position has a similar impact on the Transformation and the Proprioceptive Bias models in principle, and would not provide a stronger test to separate them.

      We now highlight the differences between the models in the Results section, making clear that the bias at the start position influences both the Proprioceptive bias and Transformation bias models (Lines 192200).

      “Note that the Proprioceptive Bias model and the Transformation Bias model tap into the same visuo-proprioceptive error map. The key difference between the two models arises in how this error influences motor planning. For the Proprioceptive Bias model, planning is assumed to occur in visual space. As such, the perceived position of the hand (based on proprioception) is transformed into visual space. This will introduce a bias in the representation of the start position. In contrast, the Transformation Bias model assumes that the visually-based representations of the start and target positions need to be transformed into proprioceptive space for motor planning. As such, both positions are biased in the transformation process. In addition to differing in terms of their representation of the target, the error introduced at the start position is in opposite directions due to the direction of the transformation (see fig 1g-h).”

      In terms of fitting individual data, we have conducted a new experiment, reported as Exp 4 in the revised manuscript (details in our response to Reviewer 1, comment 3). The experiment has a larger sample size (n=30) and importantly, examined error for both movement angle and movement distance. We chose to examine the individual differences in 2-D biases using this sample rather than Vindras’ data as our experiment has greater spatial resolution and more participants. At both the group and individual level, the Transformation bias model is the best single source model, and the Transformation + Target Bias model is the best combined model. These results strongly support the idea that the transformation bias is the main source of the motor bias.

      As for the different initial positions in Vindras et al (2005), the two target sets have very similar patterns of motor biases. As such, we opted to average them to decrease noise. Notably, the transformation model also predicts that altering the start location should have limited impact on motor bias patterns: What matters for the model is the relative difference between the transformation biases at the start and target positions rather than the absolute bias.

      Author response image 3.

      I am also having trouble fully understanding the V-T model and its associated equations, and whether visual-proprioception matching data is a suitable proxy for estimating the visuomotor transformation. I would be interested to first see the individual distributions of errors and a response to my concerns about the Proprioceptive Bias and Visual Bias models.

      We apologize for the lack of clarity on this model. To generate the T+V (Now Transformation + Target bias, or TR+TG) model, we assume the system misperceives the target position (Target bias, see Fig S5a) and then transforms the start and misperceived target positions into proprioceptive space (Fig S5b). The system then generates a motor plan in proprioceptive space; this plan will result in the observed motor bias (Fig. S5c). We now include this figure as Fig S5 and hope that it makes the model features salient.

      Regarding whether the visuo-proprioceptive matching task is a valid proxy for transformation bias, we refer the reviewer to the comments made by Public Reviewer 1, comment 1. We define the transformation bias as the discrepancy between corresponding positions in visual and proprioceptive space. This can be measured using matching tasks in which participants either aligned their unseen hand to a visual target (Wang et al., 2021) or aligned a visual target to their unseen hand (Wilson et al., 2010).

      Nonetheless, when fitting the model to the motor bias data, we did not directly impose the visual-proprioceptive matching data. Instead, we used the shape of the transformation biases as a constraint, while allowing the exact magnitude and direction to be free parameters (e.g., a leftward and downward bias scaled by distance from the right shoulder). Reassuringly, the fitted transformation biases closely matched the magnitudes reported in prior studies (Fig. 2h, 1e), providing strong quantitative support for the hypothesized causal link between transformation and motor biases.

      Recommendations for the authors:

      Overall, the reviewers agreed this is an interesting study with an original and strong approach. Nonetheless, there were three main weaknesses identified. First, is the focus on bias in reach direction and not reach extent. Second, the models were fit to average data and not individual data. Lastly, and most importantly, the model development and assumptions are not well substantiated. Addressing these points would help improve the eLife assessment.

      Reviewer #1 (Recommendations for the authors):

      It is mentioned that the main difference between Experiments 1 and 3 is that in Experiment 3, the workspace was smaller and closer to the shoulder. Was the location of the laptop relative to the participant in Experiment 3 known by the authors? If so, variations in this location across participants can be used to test whether the Transformation bias was indeed larger for participants who had the laptop further from the shoulder.

      Another difference between Experiments 1 and 3 is that in Experiment 1, the display was oriented horizontally, whereas it was vertical in Experiment 3. To what extent can that have led to the different results in these experiments?

      This is an interesting point that we had not considered. Unfortunately, for the online work we do not record the participants’ posture.

      Regarding the influence of display orientation (horizontal vs. vertical), Author response image 4 presents three relevant data points: (1) Vandevoorde and Orban de Xivry (2019), who measured motor biases in-person across nine target positions using a tablet and vertical screen; (2) Our Experiment 1b, conducted online with a vertical setup; (3) Our in-person Experiment 3b, using a horizontal monitor. For consistency, we focus on the baseline conditions with feedback, the only condition reported in Vandevoorde. Motor biases from the two in-person studies were similar despite differing monitor orientations: Both exhibited two-peaked functions with comparable peak locations. We note that the bias attenuation in Vandevoorde may be due to their inclusion of reward-based error signals in addition to cursor feedback. In contrast, compared to the in-person studies, the online study showed reduced bias magnitude with what appears to be a four peaked function. While more data are needed, these results suggest that the difference in the workspace (more restricted in our online study) may be more relevant than monitor orientation.

      Author response image 4.

      For the joint-based proprioceptive model, the equations used are for an arm moving in a horizontal plane at shoulder height, but the figures suggest the upper arm was more vertical than horizontal. How does that affect the predictions for this model?

      Please also see our response to your public comment 1. When the upper limb (or the lower limb) is not horizontal, it will influence the projection of the upper limb to the 2-D space. Effectively in the joint-based proprioceptive model, this influences the ratio between L1 and L2 (see  Author response image 5b below). However, adding a parameter to vary L1/L2 ratio would not change the set of the motor bias function that can be produced by the model. Importantly, it will still generate a one-peak function. We simulated 50 motor bias function across the possible parameter space. As shown by  Author response image 5c-d, the peak and the magnitude of the motor bias functions are very similar with and without the L1/L2 term. We characterize the bias function with the peak position and the peak-to-valley distance. Based on those two factors, the distribution of the motor bias function is very similar ( Author response image 5e-f). Moreover, the L1/L2 ratio parameter is not recoverable by model fitting ( Author response image 5c), suggesting that it is redundant with other parameters. As such we only include the basic version of the joint-based proprioceptive model in our model comparisons.

      Author response image 5.

      It was unclear how the models were fit and how the BIC was computed. It is mentioned that the models were fit to average data across participants, but the BIC values were based on all trials for all participants, which does not seem consistent. And the models are deterministic, so how can a log-likelihood be determined? Since there were inter-individual differences, fitting to average data is not desirable. Take for instance the hypothetical case that some participants have a single peak at 90 deg, and others have a single peak at 270 deg. Averaging their data will then lead to a pattern with two peaks, which would be consistent with an entirely different model.

      We thank the reviewer for raising these issues.

      Given the reviewers’ comments, we now report fits at both the group and individual level (see response to reviewer 3 public comment 1). The group-level fitting is for illustration purposes. Model comparison is now based on the individual-level analyses which show that the results are best explained by the transformation model when comparing single source models and best explained by the T+V (now TG+TR) model when consider all models. These new results strongly support the transformation model.

      Log-likelihoods were computed assuming normally distributed motor noise around the motor biases predicted by each model.

      We updated the Methods section as follows (lines 841-853):

      “We used the fminsearchbnd function in MATLAB to minimize the sum of loglikelihood (LL) across all trials for each participant. LL were computed assuming normally distributed noise around each participant’s motor biases:

      [11] LL = normpdf(x, b, c)

      where x is the empirical reaching angle, b is the predicted motor bias by the model, c is motor noise, calculated as the standard deviation of (x − b). For model comparison, we calculated the BIC as follow:

      [12] BIC = -2LL+k∗ln(n)

      where k is the number of parameters of the models. Smaller BIC values correspond to better fits. We report the sum of ΔBIC by subtracting the BIC value of the TR+TG model from all other models.

      For illustrative purposes, we fit each model at the group level, pooling data across all participants to predict the group-averaged bias function.”

      What was the delay of the visual feedback in Experiment 1?

      The visual delay in our setup was ~30 ms, with the procedure used to estimate this described in detail in Wang et al (2024, Curr. Bio.). We note that in calculating motor biases, we primarily relied on the data from the no-feedback block.

      Minor corrections

      In several places it is mentioned that movements were performed with proximal and distal effectors, but it's unclear where that refers to because all movements were performed with a hand (distal effector).

      By 'proximal and distal effectors,' we were referring to the fact that in the online setup, “reaching movements” are primarily made by finger and/or wrist movements across a trackpad, whereas in the inperson setup, the participants had to use their whole arm to reach about the workspace. To avoid confusion, we now refer to these simply as 'finger' versus 'hand' movements.

      In many figures, Bias is misspelled as Bais.

      Fixed.

      In Figure 3, what is meant by deltaBIC (*1000) etc? Literally, it would mean that the bars show 1,000 times the deltaBIC value, suggesting tiny deltaBIC values, but that's probably not what's meant.

      ×1000' in the original figure indicates the unit scaling, with ΔBIC values ranging from approximately 1000 to 4000. However, given that we now fit the models at the individual level, we have replaced this figure with a new one (Figure 3e) showing the distribution of individual BIC values.

      Reviewer #2 (Recommendations for the authors):

      I have concerns that the authors only examine slicing movements through the target and not movements that stop in the target. Biases create two major errors - errors in direction and errors in magnitude and here the authors have only looked at one of these. Previous work has shown that both can be used to understand the planning processes underlying movement. I assume that all models should also make predictions about the magnitude biases which would also help support or rule out specific models.

      Please see our response to Reviewer 1 public review 3.

      As discussed above, three-dimensional reaching movements also have biases and are not studied in the current manuscript. In such studies, biomechanical factors may play a much larger role.

      Please see our response to your public review.

      It may be that I am unclear on what exactly is done, as the methods and model fitting barely explain the details, but on my reading on the methods I have several major concerns.

      First, it feels that the visual bias model is not as well mapped across space if it only results from one study which is then extrapolated across the workspace. In contrast, the transformation model is actually measured throughout the space to develop the model. I have some concerns about whether this is a fair comparison. There are potentially many other visual bias models that might fit the current experimental results better than the chosen visual bias model.

      Please refers to our response to your public review.

      It is completely unclear to me why a joint-based proprioceptive model would predict curved planned movements and not straight movements (Figure S1). Changes in the shoulder and elbow joint angles could still be controlled to produce a straight movement. On the other hand, as mentioned above, the actual movement is likely much more complex if the physical starting position is offset from the perceived hand.

      Natural movements are often curved, reflecting a drive to minimize energy expenditure or biomechanical constraints (e.g., joint and muscle configuration). This is especially the case when the task emphasizes endpoint precision (Codol et al., 2024) like ours. Trajectory curvature was also observed in a recent simulation study in which a neural network was trained to control a biomechanical model (2-limb, 6muscles) with the cost function specified to minimize trajectory error (reach to a target with as straight a movement as possible). Even under these constraints, the movements showed some curvature. To examined whether the endpoint reaching bias somehow reflects the curvature (or bias during reaching), we included the prediction of this new biomechanical model in the paper to show it does not explain the motor bias we observed.

      To be clear, while we implemented several models (Joint-based proprioceptive model and the new biomechanical model) to examine whether motor biases can be explained by movement curvature, our goal in this paper was to identify the source of the endpoint bias. Our modeling results reveal a previously underappreciated source of motor bias—a transformation error that arises between visual and proprioceptive space—plays a dominant role in shaping motor bias patterns across a wide range of experiments, including naturalistic reaching contexts where vision and hand are aligned at the start position. While the movement curvature might be influenced by selectively manipulating factors that introduce a mismatch between the visual starting position and the actual hand position (such as Sober and Sabes, 2003), we think it will be an avenue for future work to investigate this question.

      The model fitting section is barely described. It is unclear how the data is fit or almost any other aspects of the process. How do the authors ensure that they have found the minimum? How many times was the process repeated for each model fit? How were starting parameters randomized? The main output of the model fitting is BIC comparisons across all subjects. However, there are many other ways to compare the models which should be considered in parallel. For example, how well do the models fit individual subjects using BIC comparisons? Or how often are specific models chosen for individual participants? While across all subjects one model may fit best, it might be that individual subjects show much more variability in which model fits their data. Many details are missing from the methods section. Further support beyond the mean BIC should be provided.

      We fit each model 150 times and for each iteration, the initial value of each parameter was randomly selected from a uniform distribution. The range for each parameter was hand tuned for each model, with an eye on making sure the values covered a reasonable range. Please see our response to your first minor comment below for the range of all parameters and how we decide the iteration number for each model.

      Given the reviewers’ comments in the individual difference, we now fit the models at individual level and report a frequency analysis, describing the best fitting model for each participant. In brief, the data for a vast majority of the participants was best explained by the transformation model when comparing single source models and by the T+V (TR+TG) model when consider all models. Please see response to reviewer 3 public comment 1 for the updated result.

      We updated the method session, and it reads as follows (lines 841-853):

      _“_We used the fminsearchbnd function in MATLAB to minimize the sum of loglikelihood (LL) across all trials for each participant. LL were computed assuming normally distributed noise around each participant’s motor biases:

      [11]       𝐿𝐿 = 𝑛𝑜𝑟𝑚𝑝𝑑𝑓(𝑥, 𝑏, 𝑐)

      where x is the empirical reaching angle, b is the predicted motor bias by the model, c is motor noise, calculated as the standard deviation of x-b.

      For model comparison, we calculated the BIC as follows:

      [12] BIC = -2LL+k∗ln(n)

      where k is the number of parameters of the models. Smaller BIC values correspond to better fits. We report the sum of ΔBIC by subtracting the BIC value of the TR+TG model from all other models.

      Line 305-307. The authors state that biomechanical issues would not predict qualitative changes in the motor bias function in response to visual manipulation of the start position. However, I question this statement. If the start position is offset visually then any integration of the proprioceptive and visual information to determine the start position would contain a difference from the real hand position. A calculation of the required joint torques from such a position sent through the mechanics of the limb would produce biases. These would occur purely because of the combination of the visual bias and the inherent biomechanical dynamics of the limb.

      We thank the reviewer for this comment. We have removed the statement regarding inferences about the biomechanical model based on visual manipulations of the start position. Additionally, we have incorporated a recently proposed biomechanical model into our model comparisons to expand our exploration of sources of bias. Please refer to our response to your public review for details.

      Measurements are made while the participants hold a stylus in their hand. How can the authors be certain that the biases are due to the movement and not due to small changes in the hand posture holding the stylus during movements in the workspace. It would be better if the stylus was fixed in the hand without being held.

      Below, we have included an image of the device used in Exp 1 for reference. The digital pen was fixed in a vertical orientation. At the start of the experiment, the experimenter ensured that the participant had the proper grip alignment and held the pen at the red-marked region. With these constraints, we see minimal change in posture during the task.

      Author response image 6.

      Minor Comments

      Best fit model parameters are not presented. Estimates of the accuracy of these measures would also be useful.

      In the original submission, we included a Table S1 that presented the best-fit parameters for the TR+TG (Previously T+V) model. Table S1 now shows the parameters for the other models (Exp 1b and 3b, only). We note the parameter values from these non-optimal models are hard to interpret given that core predictions are inconsistent with the data (e.g., number of peaks).

      We assume that by "accuracy of these measures," the reviewers are referring to the reliability of the model fits. To assess this, we conducted a parameter recovery analysis in which we simulated a range of model parameters for each model and then attempted to recover them through fitting. Each model was simulated 50 times, with the parameters randomly sampled from distributions used to define the initial fitting parameters. Here, we only present the results for the combined models (TR+TG, PropV+V, and PropJ+V), as the nested models would be even easier to fit.

      As shown in Fig. S4, all parameters were recovered with high accuracy, indicating strong reliability in parameter estimation. Additionally, we examined the log-likelihood as a function of fitting iterations (Fig. S4d). Based on this curve, we determined that 150 iterations were sufficient given that the log-likelihood values were asymptotic at this point. Moreover, in most cases, the model fitting can recover the simulated model, with minimal confusion across the three models (Fig. S4e).

      What are the (*1000) and (*100) in the Change in BIC y-labels? I assume they indicate that the values should be multiplied by these numbers. If these indicate that the BIC is in the hundreds or thousands it would be better the label the axes clearly, as the interpretation is very different (e.g. a BIC difference of 3 is not significant).

      ×1000' in the original figure indicates the unit scaling, with ΔBIC values ranging from approximately 1000 to 4000. However, given that we now fit the models at the individual level, we have replaced this figure with a new one showing the distribution of individual BIC values.

      Lines 249, 312, and 315, and maybe elsewhere - the degree symbol does not display properly.

      Corrected.

      Line 326. The authors mention that participants are unaware of their change in hand angle in response to clamped feedback. However, there may be a difference between sensing for perception and sensing for action. If the participants are unaware in terms of reporting but aware in terms of acting would this cause problems with the interpretation?

      This is an interesting distinction, one that has been widely discussed in the literature. However, it is not clear how to address this in the present context. We have looked at awareness in different ways in prior work with clamped feedback. In general, even when the hand direction might have deviated by >20d, participants report their perceived hand position after the movement as near the target (Tsay et al, 2020). We also have used post-experiment questionnaires to probe whether they thought their movement direction had changed over the course of the experiment (volitionally or otherwise). Again, participants generally insist they moved straight to the target throughout the experiment. So it seems that they unaware of any change in action or perception.

      Reaction time data provide additional support that participants are unaware of any change in behavior. The RT function remains flat after the introduction of the clamp, unlike the increases typically observed when participants engage in explicit strategy use (Tsay et al, 2024).

      Figure 1h: The caption suggests this is from the Wang 2021 paper. However, in the text 180-182 it suggests this might be the map from the current results. Can the authors clarify?

      Fig 1e is the data from Wang et al, 2021. We formalized an abstract map based on the spatial constrains observed in Fig 1e, and simulated the error at the start and target position based on this abstraction (Fig 1h). We have revised the text to now read (Lines 182-190):

      “Motor biases may thus arise from a transformation error between these coordinate systems. Studies in which participants match a visual stimulus to their unseen hand or vice-versa provide one way to estimate this error(Jones et al., 2009; Rincon-Gonzalez et al., 2011; van Beers et al., 1998; Wang et al., 12/2020). Two key features stand out in these data: First, the direction of the visuo-proprioceptive mismatch is similar across the workspace: For right-handers using their dominant limb, the hand is positioned leftward and downward from each target. Second, the magnitude increases with distance from the body (Fig 1d). Using these two empirical constraints, we simulated a visual-proprioceptive error map (Fig. 1h) by applying a leftward and downward error vector whose magnitude scaled with the distance from each location to a reference point.”

      Reviewer #3 (Recommendations for the authors):

      The central idea behind the research seems quite promising, and I applaud the efforts put forth. However, I'm not fully convinced that the current model formulations are plausible explanations. While the dataset is impressively large, it does not appear to be optimally designed to address the complex questions the authors aim to tackle. Moreover, the datasets used to formulate the 3 different model predictions are SMALL and exhibit substantial variability across individuals, and based on average (and thus "smoothed") data.

      We hope to have addressed these concerns with the two major changes to revised manuscript: 1) The new experiment in which we examine biases in both angle and extent and 2) the inclusion in the analyses of fits based on individual data sets.

    1. eLife Assessment

      This study provides solid evidence that odor fear conditioning biases olfactory sensory neuron receptor choice in mice and that this bias is detectable in the next generation. The authors use rigorous histological and behavioral analyses, including unsupervised behavioral quantification, to support the conclusion that odor-specific sensory representations can be shaped by experience and partially transmitted across generations. While the behavioral effects in offspring are modest and the mechanistic basis of inheritance remains unresolved, the study offers an important and carefully executed contribution to understanding experience-dependent sensory plasticity and its intergenerational consequences.

    2. Reviewer #1 (Public review):

      Summary

      The revised manuscript by Liff et al. represents a substantial improvement over the original version. The authors have carefully addressed the key concerns raised in the initial review, most notably by expanding their behavioral analyses and incorporating additional experiments that strengthen the mechanistic links between olfactory sensory neuron (OSN) changes and behavioral outcomes. Their integration of unsupervised Keypoint-MoSeq analysis, extended behavioral metrics (distance travelled, mean speed, freezing time), and the inclusion of behavioral results in the main figures significantly enhance the clarity and impact of the work. The revised discussion also better contextualizes the findings in relation to previous literature, including the discrepancies with Dias & Ressler (2014), and provides more transparency regarding experimental choices.

      Overall Evaluation

      The revised version has substantially strengthened the manuscript. By addressing the initial concerns with new data, improved analyses, and clearer discussion, the authors provide a much more compelling and rigorous account of how odor-shock conditioning biases OSN fate and influences offspring. Although some questions remain open for future exploration, the present study now makes a clear, well-supported contribution to understanding intergenerational sensory inheritance. I commend the authors for their thoughtful and thorough revisions.

      Strengths

      Expanded behavioral analysis: The addition of multiple quantitative metrics, inclusion of freezing behavior, and use of Keypoint-MoSeq provide a much richer characterization of behavioral phenotypes in both F0 and F1 generations. These data convincingly demonstrate nuanced odor-specific effects that were not captured in the earlier version.

      Improved presentation: Behavioral data, previously relegated to supplementary materials, are now appropriately included in the main figures, supported by supplementary statistical tables. This makes the results more transparent and accessible.

      Potential Limitations

      Some behavioral effects in the F1 generation remain subtle; the discussion addresses this, but a cautious interpretation of behavioral inheritance would be appropriate.

      The MoSeq analysis is a valuable addition, though clarifying what "syllables" represent and how they relate to traditional behavioral measures could aid reader interpretation.

    3. Reviewer #2 (Public review):

      Summary:

      The authors examined inherited changes to the olfactory epithelium produced by odor-shock pairings. The manuscript demonstrates that odor fear conditioning biases olfactory bulb neurogenesis toward more production of the olfactory sensory neurons engaged by the odor-shock paring. Further the manuscript reveals that this bias remains in first generation male and female progeny produced by trained parents. Surprisingly, there was a disconnect between increased morphology of the olfactory epithelium for the conditioned odor and the response to odor presentation. The expectation based on previous literature and the morphological results were that F1 progeny would also show an aversion to the odor stimulus. However, the authors found that F1 progeny were not more sensitive to the odor compared to littermate controls

      Strengths:

      The manuscript includes conceptual innovation and some technical innovation. The results validate previous findings that were deemed controversial in the field, which is a major strength of the work. Moreover, these studies were conducted using a combination of genetically modified animals and state-of-the-art imaging techniques, highlighting the rigorous nature of the research. Lastly, the authors provide novel mechanistic details regarding the remodeling of the olfactory epithelium, demonstrating that biased neurogenesis, as opposed to changes in survival rates, account for the increase in odorant receptors after training.

      Weaknesses:

      The main weakness is the disconnect between the morphological changes reported and the lack of change in aversion to the odorant in F1 progeny. The authors also do not address the mechanisms underlying the inheritance of the phenotype, which may lie outside of the scope of the present study.

    4. Reviewer #3 (Public review):

      Liff et al. have made considerable effort to improve their manuscript. In their revised manuscript, the authors have substantiated their claims of intergenerationally inherited changes in the olfactory system in response to odor-dependent fear conditioning. Several new experiments and analyses now strengthen this study.

      I still find that the statement that the study provides "insight into the heritability of acquired phenotypes" is somewhat misleading. In their response to this initially raised point the authors correctly point out that their "results provide basic knowledge that will accelerate our ability to uncover the mechanisms driving heritable changes." That said, current "insights" are not mechanistic in nature.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      (1) Discrepancies with previous findings need clarification, especially regarding the absence of similar behavioral effects in F1. Lack of discussion on the decision to modify paradigms instead of using the same model. Presentation of behavioral data in supplementary materials, with a recommendation to include behavioral quantification in main figures. Absence of quantification for freezing behavior, a crucial measure in fear conditioning.

      We agree, thank you. One of the major revisions we have made to this version of the manuscript is the addition of much more thorough analysis of our F1 behavior. While not captured by the (relatively gross) measure of the approach-avoid index, further analysis has highlighted interesting differences between the F1s of unpaired and paired offspring, and in an odor-specific manner. As these analyses have given rise to many new results and conclusions, we have attempted to adjust the manuscript to reflect the major change that we do, in fact, find effects in F1, if subtle. 

      Classical odor-shock pairing was used in both Dias & Ressler’s and our study to directly expand upon the findings of increase in cell number. This enabled our discovery of biasing of newborn OSNs. For our behavioral readouts, we chose to focus on the ethological behavior of avoidance. From our extensive behavioral analysis (Figures 5 & 6), we successfully identified several behavioral differences in the F1 offspring that had not previously been described.

      Reviewer #2 (Public Review):

      (1) The main weakness is the disconnect between the morphological changes reported and the lack of change in aversion to the odorant in F1 progeny. The authors also do not address the mechanisms underlying the inheritance of the phenotype, which may lie outside of the scope of the present study.

      Thank you for your comments. Our revised manuscript includes both new experiments and new analyses that probe the relationship between a change in cell number and a change in avoidance behavior, and we have revised the manuscript text to address this point directly. In short, we find both in the F0 generation (at extended time points) and in the F1, that an increase in cell number does not always correlate with avoidance behavior. However, we do find nuanced behavioral differences between the offspring of unpaired and paired fathers. Whether the increase in cell number in offspring is necessary to observe the behavioral changes is outside the scope of the current study, but certainly a question we are interested in answering in future work. 

      Reviewer #3 (Public Review):

      (1) In the abstract / summary, the authors raise expectations that are not supported by the data. For example, it is claimed that "increases in F0 were due to biased stem cell receptor choice." While an active field of study that has seen remarkable progress in the past decade, olfactory receptor gene choice and its relevant timing in particular is still unresolved. Here, Liff et al., do not pinpoint at what stage during differentiation the "biased choice" is made. 

      EdU is only taken into stem cells in the S phase, and differences in EdU-labeled M71 or MOR23 OSNs across fear conditioning groups indicates a biasing in subtype identity. We do not make claims regarding the exact stage of OSN maturation at which biasing may occur; rather, we demonstrate that the stem cells that were dividing during EdU administration are more likely to mature into an M71 OSN if a mouse receives paired acetophenone conditioning compared to unpaired or no conditioning (and similarly with MOR23 and lyral). This phenomenon must involve receptor choice, as that is the mechanism by which OSN subtypes form. 

      (2) Similarly, the concluding statement that the study provides "insight into the heritability of acquired phenotypes" is somewhat misleading. The experiments do not address the mechanisms underlying heritability. 

      We do not claim to provide direct insight into the mechanisms underlying heritability. Our experiments do provide insight into the heritability of acquired phenotypes, as we corroborate previous studies that this olfactory fear conditioning paradigm induces heritable changes in the nose and in behavior. We also demonstrate odor-specific behavioral differences in the offspring conditioned fathers, suggesting that the mechanisms underlying the specific behavioral phenotypes may be unique to the conditioning odorant, and not one universal mechanism. These results provide basic knowledge that will accelerate our ability to uncover the mechanisms driving heritable changes. 

      (3) The statement that "the percentage of newborn M71 cells is 4-5 times that of MOR23 may simply reflect differences in the birth rates of the two cell populations" should, if true, result in similar differences in the occurrence of mature OSNs with either receptor identity. According to Fig. 1H & J, however, this is not the case. 

      We have removed that statement from the manuscript, as subtype-specific differences in proliferation rates are not the focus of this study and we do not wish to make claims about it based on our EdU experiments. We do not compare our iDISCO cell density counts to EdU co-labeling counts nor ratio counts, as differences between M71 and MOR23 quantification in cleared tissue versus EdU uptake may simply reflect the inherent differences between methodologies. Our claims are solely within M71 cohorts and MOR23 cohorts. 

      (4) An important result is that Liff et al., in contrast to results from other studies, "do not observe the inheritance of odor-evoked aversion to the conditioned odor in the F1 generation." This discrepancy needs to be discussed. 

      This is discussed in the manuscript, and we report behavioral differences revealed by additional analyses. 

      (5) The authors speculate that "the increase in neurons responsive to the conditioned odor could enhance the sensitivity to, or the discrimination of, the paired odor in F0 and F1. This would enable the F1 population to learn that odor predicts shock with fewer training cycles or less odorant when trained with the conditioned odor." This is a fascinating idea that, in fact, could have been readily tested by Liff and coworkers. If this hypothesis were found true, this would substantially enhance the impact of the study for the field.

      We agree that additional F1 behavioral paradigms are a major next step to understand the functional behavioral differences that may emerge from an increase in specific OSN subtype. Due to the nontrivial amount of time and effort it requires to generate F1 offspring (on the order of many months), and because we do not test individual offspring in multiple behavioral assays (such that they are naïve to their father’s conditioning odor), these experiments are outside the scope of this current study. 

      Reviewer #1 (Recommendations For The Authors):

      (1) Considering that the authors are expanding upon the previous findings of Dias and Ressler (2014), it is crucial to clarify the discrepancies in the results between both works in the discussion. While I acknowledge the use of a different experimental design by the authors, if the premise assumes there is a universal mechanism for transgenerational acquired modification it prompts the question: Why don't we observe similar behavioral effects in F1 in the present model? This issue needs extensive discussion in the manuscript to advance the field's understanding of this topic. Additionally, I am also curious about the author's decision to modify the paradigms instead of using exactly the same model to further extend their findings on stem cells, for example. Could you please provide comments on this choice and elaborate on this aspect in the discussion? 

      We agree, thank you. One of the major revisions we have made to this version of the manuscript is the addition of much more thorough analysis of our F1 behavior. While not captured by the (relatively gross) measure of the approach-avoid index, further analysis has highlighted interesting differences between the F1s of unpaired and paired offspring, and in an odor-specific manner. As these analyses have given rise to many new results and conclusions, we have attempted to adjust the manuscript to reflect the major change that we do, in fact, find effects in F1, if subtle. 

      Classical odor-shock pairing was used in both Dias & Ressler’s and our study to directly expand upon the findings of increase in cell number. This enabled our discovery of biasing of newborn OSNs. For our behavioral readouts, we chose to focus on the ethological behavior of avoidance. From our extensive behavioral analysis (Figures 5 & 6), we successfully identified several behavioral differences in the F1 offspring that had not previously been described. We have revised the discussion section to elaborate on these decisions.

      We incorporated the behavioral data into the main figures and included a freezing metric to Figure 5 (F, J, & N). We did do an analysis of time spent freezing in the control vs. conditioned chamber, but since the F0 paired mice spend so little time in the conditioned odor chamber, they also spend most of their time freezing in the control odor chamber. Thus, we felt it was better to show the overall time spent freezing during the trial.

      (2) It is unclear why the authors chose to present all behavioral data to supplementary materials. I strongly recommend not only incorporating the behavioral data into the main figures but also expanding the behavioral quantification. It appears that the author dismissed the potential effects on F1 without a thorough exploration of animals' behaviors. The task contains valuable information that could be further investigated, potentially altering the findings or even the conclusions of the study. Notably, the absence of quantification for freezing behavior is incomprehensive. Freezing is a crucial measure in fear conditioning, and it's surprising that the authors did not mention it throughout the manuscript. I encourage the author to include freezing data in the analysis and other behavioral quantification as follows: a) freezing during odor presentation and ITI for conditioning days. b) freezing during odor preference test in all compartments. c) it is not very clear the design of the Odor preference behavioral testing. Is the odor presented in a discrete manner or the order is constantly presented in the compartment? Could the authors quantify the latency to avoid after the visit in the compartment? d) in the video it is very clear the animals are doing a lot of risk assessment, this could be also analyzed and included as a fear measure.  

      Thanks for the suggestion. We incorporated the behavioral data into the main figures and included a freezing metric to Figure 5 (F, J, & N). We did do an analysis of time spent freezing in the control vs. conditioned chamber, but since the F0 paired mice spend so little time in the conditioned odor chamber, they also spend most of their time freezing in the control odor chamber. Thus, we felt it was better to show the overall time spent freezing during the trial. In the methods section we describe that the odor is continuously bubbled into the chamber throughout the trial, but we have clarified this in the main text as well. As for further behavioral metrics like latencies and risk assessment, initial analyses have not shown anything in the F1 data that we wished to report here. Future work from the lab will investigate this further.

      (3) In the Dias and Ressler paper, a crucial difference exists between the models that could elucidate the absence of transgenerational effects on F1. In their study, the presence of the unconditioned stimulus (US) is consistent across all generations in the startle task. I am curious whether, in the present study, the authors considered pairing the F1 with a US-paired task in a protocol that does not induce fear conditioning (e.g., lower shock intensity or fewer pairings). Could this potentially lead to an increased response in the parental-paired offspring? Did the author consider this approach? I understand how extensive this experiment can be, therefore I'm not directly requesting, although it would be a fantastic achievement if the results are positive. Please consider discussing this fundamental difference in the manuscript. 

      To clarify, the F1 generation is presented with the unconditioned stimulus, just never conditioned with it. In these experiments, we were primarily interested in the F1’s naïve reaction to their father’s conditioning odorant, and whether the presentation of that odor in the absence of a stressor would lead to any fear-like behavioral responses.

      We have considered the experiments you have suggested and have ongoing projects in the lab further investigating F1 effects and whether their father’s experiences affect their ability to learn in conditioning tasks. Because of the amount of time and effort it requires to generate F1 offspring, and because we do not wish to test individual offspring in multiple assays, we do not present any of these experiments in the current manuscript. Ongoing work is looking into whether 1-day (vs. 3-day) conditioning is sufficient in the offspring of paired mice, and we appreciate the suggestion of subthreshold shock intensity. We will also clarify in the discussion that future work will try to answer these questions. 

      (4) If the videos were combined it would be better to appreciate the behavioral differences of paired vs unpaired. 

      Thank you for the suggestion, fixed. Video S1 is now a combination of unpaired and paired example videos. 

      (5) Figure 3E, is there an outlier in the paired group that is driving the difference? Please run an outlier test on the data if this has not been done. If already done, please express the stats. 

      We ran an outlier test using the ROUT method (Q=1%) and did not find any outliers to be removed. We also ran the same test on all other data and removed one mouse from the Acetophenone F1 Paired group in Figure 5 (also described in the Methods section). 

      (6) I understand that using the term "olfactory" twice in the title may seem redundant. However, the authors specifically demonstrate the effects of olfactory fear conditioning. I suggest including "odor-induced" before "fear conditioning" in the title for greater specificity and accuracy. This modification would better reflect the study's focus on olfactory fear conditioning, especially given the authors did not explore fear conditioning broadly (e.g., contextual, and auditory aspects were not examined). 

      Thank you for your feedback. We found “olfactory” twice as cumbersome. We have changed the title to “Fear conditioning biases olfactory sensory neuron expression across generations”, to more accurately highlight the importance of the olfactory sensory neuron expression, intergenerationally. 

      (7) The last page of the manuscript has a list of videos (8 videos), but only two were presented.

      We have made sure to include all 7 videos (videos 1 and 2 were combined) in this version.  

      Reviewer #2 (Recommendations For The Authors):

      (1) The analyses mentioned on lines 210-220 should be presented. 

      Thank you for the suggestion. We have removed this part of the manuscript as we do not have a large enough n to draw conclusions about cell longevity in this paper. Future studies in the lab will incorporate this analysis.

      Reviewer #3 (Recommendations For The Authors):

      (1) The manuscript contains several supplementary figures and movies that are not referred to in the main text. 

      All supplementary figures and movies are now referred to in the manuscript text.

      (2) In the abstract, the authors state that they "investigated changes in the morphology of the olfactory epithelium." I think that is (technically) not what they did. In fact, the authors do not show any morphometry of the epithelium (e.g., thickness, layers, etc.), but count the density of OSNs that share a specific receptor identity. Along the same lines, the authors state in the abstract that recent work has shown that conditioning is "resulting in increases in olfactory receptor frequencies." However, recent studies did not show increased "receptor frequencies", but changes in cell count. Whether (or not) receptor expression per OSN is also changed remains unknown (would be interesting though). 

      Yes, agreed. We changed “morphology” to “cellular composition.” We also changed any references to “receptor frequencies” to “olfactory sensory neuron frequencies.”

      (3) Reference 20 needs to be updated. 

      Thank you, updated.

      (4) l.52: the distribution of OSNs into (four) zones is a somewhat outdated concept as zonal boundaries are rather blurry. Generally, of course, dorsoventral differences are real. 

      Yes, we agree and changed the verbiage to “region” as opposed to “zone.” We mainly bring this up because it later becomes relevant that both M71 and MOR23 are expressed in the same (antero-dorsal) region and thus can be quantified with the same methodology.

      (5) Fig. 3B & C: the EdU background staining is quite peculiar. Any reason why the epithelium is mostly (with the sustentacular nuclei being a noticeable exception) devoid of background? 

      We use the ThermoFisher Click-iT Plus EdU kit (Invitrogen, C10638) and it has consistently produced very good signal to noise ratio.

      Responses to Editor’s note

      We thank the editor for their constructive suggestions. 

      (1) Should you choose to revise your manuscript, please include full statistical reporting including exact p-values wherever possible alongside the summary statistics (test statistic and df) and 95% confidence intervals. These should be reported for all key questions and not only when the p-value is less than 0.05. 

      Thank you for the suggestion. We created two supplementary tables with statistical reporting: Table S1 for the main figure statistics, and Table S2 for the supplementary figure statistics.

    1. eLife Assessment

      This useful study characterizes the evolution of medial prefrontal cortex activity during the learning of an odor-based choice task. The evidence provided is solid, providing quantification of functional classes of cells over the course of learning using the longitudinal calcium recordings in prefrontal cortex, and quantification of prefrontal sequences. However, the experimental design appears to provide limited evidence to support strong conclusions regarding pre-existing representations or the functional relevance of neural sequences. The study will be of interest to neuroscientists investigating learning and decision-making processes.

    2. Reviewer #1 (Public review):

      Summary:

      The authors use longitudinal in vivo 1-photon calcium recordings in mouse prefrontal cortex throughout the learning of an odor-guided spatial memory task, with the goal of examining the development of task-related prefrontal representations over the course of learning in different task stages and during sleep sessions. They report replication of their previous results, Muysers et al. 2025, that task and representations in prefrontal cortex arise de novo after learning, comprising of goal selective cells that fire selectively for left or right goals during the spatial working memory component of the task, and generalized task phase selective cells that fire equivalently in the same place irrespective of goal, together comprising task-informative cells. The number of task-informative cells increases over learning, and covariance structure changes resulting in increased sequential activation in the learned condition, but with limited functional relevance to task representation. Finally, the authors report that similar to hippocampal trajectory replay, prefrontal sequences are replayed at reward locations.

      Strengths:

      The major strength of the study is the use of longitudinal recordings, allowing identification of task-related activity in the prefrontal cortex that emerges de novo after learning, and identification of sub-second sequences at reward wells.

      Comments on revisions:

      The authors have added additional analyses and clarifications that increase the strength of evidence, especially quantification of functional classes of cells using longitudinal calcium recordings in prefrontal cortex during learning of an odor cue guided task, and quantification of prefrontal sequences.

      There are a few remaining issues:

      (1) The manuscript quantifies changes over learning in prefrontal goal-selective cells (equated to "splitter" place cells in hippocampus) and task-phase selective cells (similar to non-splitter place cells that are not goal modulated). A subset of these task cells remain stable throughout learning, and are equated to schema representations in the study. In the memory literature, schemas are generally described as relational networks of abstract and generalized information, that enable adapting to novel context and inference by enabling retrieval of related information from previous contexts. The task-phase selective cells that stay stable throughout learning clearly will have a role in organizing task representations, but to this reviewer, denoting them as forming a schema is an unwarranted interpretation. By this definition, hippocampal non-splitter place cells that emerge early in learning and are stable over days would also form a schema. Therefore, schema notation cannot just be based on stability, it requires further evidence of abstraction such as cross-condition generalization.

      (2) The quantification of prefrontal replay sequences during reward is useful, but it is still unconvincing that the distinction between existence of sequences in the odor sampling phase and reward phase is not trivially expected based on prior literature. This is odor guided task, not a spatial exploration task with no cues, and it is very well-established (as noted in citations in the previous review) that during odor sampling, animals' will sniff in an exploratory stage, resulting in strong beta and respiratory rhythms in prefrontal cortex. Not having LFP recordings in this task does not preclude considering prior literature that clearly shows that odor sampling results in a unique internal state network state, when animals are retrieving the odor-associated goal, vastly different from a reward sampling phase. The authors argue that this is not trivial since they see some sequences during sampling, although they also argue the opposite in response to a question from Reviewer 2 about shuffling controls for sequences, that 'not' seeing these sequences in the sampling phase is an internal control. The bigger issue here is equating these sequences during sampling to replay/ preplay or reactivation sequences similar to the reward phase, since the prefrontal network dynamics are engaged in odor-driven retrieval of associated goals during sampling, as has been shown in previous studies.

    3. Reviewer #2 (Public review):

      Summary:

      The first part of the manuscript quantifies the proportion of goal-arm specific and task-phase specific cells during the learning and learned conditions and similar to their previously published Muysers et al., 2025 paper find that the task-phase coding cells (Muysers et al. call them path equivalent cells) increase in the learned condition. However, compared to the Muysers et al. 2025 paper, this work quantifies the proportion of cells that change coding type across learning and learned conditions. The second part of the paper reports firing sequences using a sequence similarity clustering-based method that the group developed previously and applied to hippocampal data in the past.

      Strengths:

      Identifying sequences by a clustering method in which sequence patterns of individual events are compared is an interesting idea.

      Weaknesses:

      Further controls are needed to validate the results.

      Comments on revisions:

      Further changes are needed to improve the description of the methods and the discussion needs to be extended to contrast the results with previously published results of the group. Some control figures would also be needed to quantitatively demonstrate, across the entire dataset, that sequence detection did not identify random events as sequences, even if the detection method was designed to exclude such sequences. For example, showing that sequences are not detected in randomised data with the current method would better convince readers of the method's validity.

      Although differences in the classification scheme relative to the Muysers et al. (2025) paper have been explained, the similarity (perhaps equivalence of results) is not sufficiently acknowledged - e.g., at the beginning of the discussion.

      Although the control of spurious sequences may have been built into the method, this is not sufficiently explained in the method. It is also not clear what kind of randomization was performed. Importantly, I do not see a quantification that shows that the detected sequences are significantly better than the sequence quality measure on randomized events. Or that randomized data do not lead to sequence clusters. Also, it is still not clear how the number of clusters was established. I understand that the previously published paper may have covered these questions; these should be explained here as well. Also, the sequence similarity description is still confusing in the method; please correct this sentence "Only the l neurons active in both sequences of a pair were taken into account. "

    4. Reviewer #3 (Public review):

      In the study the authors performed longitudinal 1P calcium imaging of mouse mPFC across 8 weeks during learning of an olfactory-guided task, including habituation, training, and sleep periods. The authors' goal was to determine how the mPFC representation of the task changed and what aspects of activity emerged between the learning and the learned conditions of the task. The task had 3 arms. Odor was sampled at the end of the middle arm (named the "Sample" period). The animal then needed to run to one of the two other arms (R or L) based on the odor. The whole period until they reached the end of one of the choice arms was the "Outward" period. The time at the reward end was the "Reward" period. They noted several changes from the learning condition to the learned condition:

      (1) They classified cells in a few ways. First each cell was classified as SI (spatially informative) if it had significantly more spatial information than shuffled activity, and ~50% of cells ended up being SI cells. Then among the SI cells they classified a cell as a TC (task cell) if it had statistically similar activity maps for R versus L arms, and a GC (goal arm cell) otherwise. Note that there are 4 kinds of these cells: outer arm TCs and GCs and middle arm TCs and GCs (with middle arm GCs essentially being like "splitter cells" since they are not similarly active in the middle arm for R versus L trials). There was an increase in TCs from the learning to the learned condition sessions. They also note the sources of these TCs (some came from GCs, others from non-SI cells).

      (2) They analyze activity sequences across cells. They extracted 500 ms duration bursts (defined as periods of activity > 0.5 standard deviations over what I assume is the mean, which is a permissive threshold encompassing a significant fraction of the activity in non-sleep, non-habituation periods). They first noted that the resulting "Burst rates were significantly larger during behavioral epochs than during sleep and during periods of habituation to the arena" and "Moreover, burst rates during correct trials were significantly lower than during error trials". For the sequence analysis they only considered bursts consisting of at least 5 active cells. A cell's activity within the burst was set to the center of mass calcium activity. Then they took all the sequences from all learned and learning sessions together and hierarchically clustered them based on the Spearman's rank correlation between the order of activity in each pair of sequences (among the cells active in both). The iterative hierarchical clustering process produces groups (clusters) of sequences such that there are multiple repeats of sequences within a cluster. Different sequences are expressed across all the longitudinally recorded sessions. They noted "large differences of sequence activation between learning and learned condition, both in the spatial patterns (example animal in Fig. 4D) and the distribution of the sequences (Fig. 4D,E). Rastermap plots (Fig. 4D) also reveal little similarity of sequence expression between task and habituation or sleep condition." They also note the difference in the sequences between learning and learned condition was larger than the different between correct and error trials within each condition. They conclude that during task learning new representations are established, as measured by the burst sequence content. They do additional analyses of the sequence clusters by assessing the spatial informativeness (SI) of each sequence cluster. Over learning they find an increase in clusters that are spatially informative (clusters that tend to occur in specific locations). Finally, they analyzed the SI clusters in a similar manner as SI cells and classified them as task phase selective sequences (TSs) and goal arm selective sequences (GSs) and did some further analysis. However, they themselves conclude that the frequency of TSs and GSs is limited because most sequence clusters were non-SI. In the discussion they say "In addition to GSs and TSs, we found that most of the recurring sequences are not related to behavior (not SI)".

      (3) As an alternative to analyzing individual cells and sequences of individual cells, they then look for trajectory replay using Bayesian population decoding of location during bursts. They analyze TS bursts, GS bursts, and non-SI bursts. They say "we found correlations of decoded position with time bin (within a 500 ms burst) strongly exceeding chance level only during outward and reward phase, for both GSs and TSs (Fig 5H)." Fig5H shows distributions indicating statistically significant bias in the forward direction (using correlations of decoded location versus time bin across 10 bins of 50 ms each within each 500ms burst). They find that the Outward trajectories appear to reflect the actual trajectory during running itself, so are likely not replay. But the sequences at the Reward are replay as they do not reflect the current location. Furthermore, replay at the Reward is in the forward direction (unlike the reverse replay at Reward seen in the hippocampus) and this replay is only seen in the learned and not the learning condition. At the same time, they find that replay is not seen during odor Sampling, from which they conclude there is no evidence of replay used for planning. Instead they say the replay at the Reward could possibly be for evaluation during the Reward phase, though this would only be for the learned condition. They conclude "Together with our finding of strong changes in sequence expression after learning (Fig 4E) these findings suggest that a representation of task develops during learning".

      This study provides valuable new information about the evolution of mPFC activity during the learning of a odor-based 2AFC T-maze-like task. They show convincing evidence of changes in single cell tuning, population sequences, and replay events. They also find novel forward replay at the Reward, and find that this is present only after the animal learned the task. In the discussion the authors note "the present study, to our knowledge, identified for the first time fast recurring neural sequence activity from 1-p calcium data, based on correlation analysis". Overall, they find a substantial amount of change among the features they analyzed and according to their methods, though they note a small amount of activity was preserved through learning.

      One comment is that the threshold for extracting burst events (0.5 standard deviations, presumably above the mean) seems lower than what one usually sees as a threshold for population burst detection, and the authors show (in Supplementary Fig 1) that this means bursts cover ~20-40% of the data. However, it is potentially a strength of this work that their results are found by using this more permissive threshold.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      The study mainly replicates the authors' previously reported results about generalized and trajectory-specific coding of task structure by prefrontal neurons, and stable and changing representations over learning (Muysers et al., 2024, PMID: 38459033; Muysers et al., 2025, PMID: 40057953), although there are useful results about changes in goal-selective and taskphase selective cells over learning. There are basic shortcomings in the scientific premise of two new points in this manuscript, that of the contribution of pre-existing spatial representations, and the role of replay sequences in the prefrontal cortex, both of which cannot be adequately tested in this experimental design.

      We agree with the reviewer that we have not made sufficiently clear which aspects of our paper add to previous publications. We have now better explained methodological differences.

      Also, we agree that our very general statements on pre-existing spatial representations in the introduction and abstract in the previous manuscript were not properly followed up in the Results section. In the revision, the respective statements are clarified, and we also added analysis of a further control condition (see response to A), which shows that particularly a subset of task cells maintains there firing fields from an early habituation period, arguing that, while the population representation of the task largely develops during learning, there exists a scaffold of small but significant amount of cells that could be interpreted as a schema.

      We also further clarified our view on replay sequences in the prefrontal cortex (see response to B). Particularly, we are grateful to the reviewer for the suggestion to also include other reactivation analysis which led to new results presented in new Figure 3.

      [A] The study denotes neurons that show precise spatial firing equivalently irrespective of goal, as generalized task representations, and uses this as a means to testing whether pre-existing spatial representations can contribute to task coding and learning. …. [I]n order to establish generalization for abstract task rules or cognitive flexibility, as motivated in the manuscript, there is a need to show that these neurons "generalize" not just to firing in the same position during learning of a given task… For an adequate test of pre-existing spatial structure, either a comparison task, as in the examples above, is needed, or at least a control task in which animals can run similar trajectories without the task contingencies. An unambiguous conclusion about pre-existing spatial structure is not possible without these controls.

      We thank the reviewer for this suggestion. We may, however, note that the previous manuscript did not make strong claims about pre-existing structures in the Results or Discussion. Also Schemas were only taken up as a discussion point. We nevertheless agree with the reviewer that assessment of the spatial prestructure requests further analysis. To address their point, we analyzed neuronal activity during the habituation phase before the start of task training, when the animals freely explored the same maze without any task contingency (animals explored mostly in the arms of the maze). We compared the place fields of neurons during this habituation period with their task-related activity. Consistent with the small overlap of firing rate maps between learning and learned phase, also this analysis revealed a small number of cells with significant correlations (up to 20% for task cells; a significant fraction according to a  binomial test). The results are shown as a new Figure supplement to Figure 2.

      [B] The scientific premise for the test of replay sequences is motivated using hippocampal activity in internally guided spatial working memory rule tasks [...] and applied here to prefrontal activity in a sensory-cue guided spatial memory task [...]. There are several issues with the conclusion in the manuscript that prefrontal replay sequences are involved in evaluating behavioral outcomes rather than planning future outcomes.

      We agree with the reviewer that preplay in Hippocampus and mPFC are distinct. We further emphasized this distinctiveness in the respective paragraph in the discussion (see response to B1).

      [B. 1] First, odor sampling in odor-guided memory tasks is an active sensory processing state that leads to beta and other oscillations in olfactory regions, hippocampus, prefrontal cortex, and many other downstream networks [...]. This is an active sensory state, not conducive to internal replay sequences, unlike references used in this manuscript to motivate this analysis, which are hippocampal spatial memory studies with internally guided rather than sensory-cue guided decisions, where internal replay is seen during immobility at reward wells. These two states cannot be compared with the expectation of finding similar replay sequences, so it is trivially expected that internal replay sequences will not be seen during odor sampling.

      We agree with the reviewer that the sampling phase cannot be compared with the “preplay” state in the hippocampus. We have rewritten the manuscript in the results and discussion sections to clarify. We, however, disagree, that the absence of replay sequences in the mPFC 1P calcium data is trivial, since we actually do see many sequences during sampling (Fig 4E, Fig 4 suppl 2 A). These sequences are just not related to task activity and may e.g. reflect activity related to sensing, but do not contain information about goal arm.

      [B. 2] Second, sequence replay is not the only signature of reactivation. Many studies have quantified prefrontal replay using template matching and reactivation strength metrics that do not involve sequences [...].  Third, previous studies have explicitly shown that prefrontal activity can be decoded during odor sampling to predict future spatial choices - this uses sensory-driven ensemble activity in prefrontal cortex and not replay, as odor sampling leads to sensory driven processing and recall rather than a reactivation state [..].

      We thank the reviewer for the suggestion to also perform reactivation analysis (Peyrache et al., 2009, 2010). The results are summarized in the new Figure 3. And show that indeed reactivation is stronger during the sampling phase and it is goal arm specific, arguing that sequence analysis extracts information (partly) complementary to rate covariance based analysis.

      We hope to have convinced the reviewer that, together, the complementary results of reactivation an sequence analysis, as well as the ability to follow these measures over an extended period of time, gives unique insights far beyond the previous publications of these data sets. A consistent analysis of population representation, however, required some reanalyses of previous findings, since we only could focus on a limited number of animals and cells, for which tracking was possible over such a long period of time.

      Reviewer #2 (Public review):

      Further controls are needed to validate the results.

      We thank the reviewer for their generally supportive statements. The revised manuscript contains a number of controls in several new figure supplements.

      Reviewer #3 (Public review):

      [They] conclude that the frequency of TSs and GSs is limited (I believe because most sequence clusters were non-SI - the authors can verify this and write it in the text?). In the discussion, they say, "In addition to GSs and TSs, we found that most of the recurring sequences are not related to behavior".

      The reviewer is correct most clusters were not SI (Fig 5 A). We have added this information in the MS.

      [...] They conclude "Together with our finding of strong changes in sequence expression after learning (Figure 3E) these findings suggest that a representation of task develops during learning, however, it does not reflect previous network structure." I am not sure what is meant here by the second part of this sentence (after "however ..."). Is it the idea that the replay represents network structure, and the lack of Reward replay in the learning condition means that the network structure must have been changed to get to the learned condition? Please clarify.

      The reviewer is correct in their assertion. We rewrote the sentence to clarify: “Together with our finding of strong changes in sequence expression after learning (now Fig 4E) these findings suggest that a representation of task develops during learning, however, it does not reflect sequence structure during learning and habituation”.

      (1) There are some statements that are not clear, such as at the end of the introduction, where the authors write, "Both findings suggest that the mPFC task code is locally established during learning." What is the reasoning behind the "locally established" statement? Couldn't the learning be happening in other areas and be inherited by the mPFC? Or are the authors assuming that newly appearing sequences within a 500-ms burst period must be due to local plasticity?

      We agree that the wording “local” can be misleading, we rephrased the corresponding sentences.

      (2) The threshold for extracting burst events (0.5 standard deviations, presumably above the mean, but the authors should verify this) seems lower than what one usually sees as a threshold for population burst detection. What fraction of all data is covered by 500 ms periods around each such burst? However, it is potentially a strength of this work that their results are found by using this more permissive threshold.

      Since we work with a slow calcium signal, we cannot use as strict thresholds as usually employed using electrophysiology. In addition, our sequence detection approach adds a further level of strictness such that we only consider bursts with recurring sequence structure. In response to this reviewer’s question, we have added quantification of the fraction of all data covered by 500 ms periods in Figure Supplement 1, panels D and E. Indeed we include a large fraction (20 to 40%) (except sleep and habituation), which is consistent with our interpretation that during the outward phase sequences mainly reflect task field firing.

      Reviewer #1(Recommendations for the  Authors):

      It is possible that 1-photon recordings do not have the temporal resolution and information about oscillatory activity to enable these kinds of analyses. Therefore, an unambiguous conclusion about the existence and role of prefrontal reactivation is not possible in this experimental and analytical design.

      We indeed cannot extract information encoded in LFP oscillations from the calcium signal, we now mention the relation between LFP oscillations and olfaction-guided behaviors in the discussion (including the suggested references). However, our finding that sequence and covariance-based analysis yield partly complementary results argues that it indeed allows conclusion about the existence and role of prefrontal reactivation.

      Reviewer #2 (Recommendations for the authors):

      The results of the Muysers et al. (2025) paper need to be discussed in detail and explain why the cell categorization is different, three groups of spatial cells vs two groups here. Also, explain in what aspect the major findings in this work go beyond what was shown in Figure 4 in that paper.

      The main goal of this paper was to explore sequence/replay like activity, which is not at all captured in the Muysers et al. 2025 paper. Because of this focus on sequences, we excluded the inward runs (from reward to sampling point) for better interpretability and thus ended up with only two types of cells. Muysers et al. included backward runs and could thereby also assess whether the place field remains in the outward and inward runs. We added this clarification in the Results section.

      Regarding the reviewer’s question regarding figure 4: Our task cells would largely overlap with the “path-equivalent cells” from Muysers et al. 2025 (albeit not taking into account inward runs). In this sense their finding that the share of path-equivalent cells increases with learning  is consistent with our report of increasing fraction of task cells in Figure 2 C. Our Figure 2 adds that some task cells develop from previous goal cells with fields at the same location (generalizing). Moreover, we use spatial information as a criterion to identify TC and GCs, showing that a large fraction of cells actually is and remains spatially unselective. In Muysers et al. 2025 a statistical criterion was not applied on spatial selectivity but peak height, with fewer neurons failing this test. Moreover, we were analyzing only those cells trackable over the whole period. Despite all these methodological differences, the result of increasing the number of task/path-equivalent cells over learning was consistent. The main reason for recategorization of the cells in the present manuscript was to be able to meaningfully link them to sequence activity (Fig. 5E, F).

      It is not clear from the description how the cell type transitions were quantified. Was the last learning day compared to the first learned day? Given that, particularly during learning, there are changes across days in the spatial representations according to Figure 2 of Muysers et al. (2025), this is the meaningful way to make the comparisons. Nevertheless, it is also not clear whether the daily variations within learning and learned conditions differ from the transition day, so without comparing these three conditions, it is hard to make a firm conclusion from examining only changes in the transition days.

      The analysis of cell type transitions was performed by pooling all learning sessions and comparing them with all learned sessions, without taking into account the chronological order of sessions within each category. This approach allowed us to identify broad changes associated with learning state. Figure supplement 1.C shows the session intervals per animal. We argue that the large interval between learning and learned session justifies this analysis approach.

      Identifying sequences by a clustering method in which sequence patterns of individual events are compared is an interesting idea. Nevertheless, there is a danger, as with any clustering method, that data without clustering tendency could be artificially subdivided into clusters.

      In Figure 4.C, we show three example sequence cluster templates (colored) obtained via hierarchical clustering, along with representative member sequences (black) sorted by cluster membership. In response to this reviewer’s comment, we now included a complete clustering result for one animal, including all sequence clusters and their member sequences. It is provided in Figure 4 supplement 1. This comprehensive visualization serves as an additional control, demonstrating that the clustering approach identifies consistent sequence patterns across the dataset.

      Furthermore, it is possible that some cells at the edge of the cluster boundary may show a more similar sequence tendency to events detected at the overlapping border region of another cluster. Was this controlled for? It would be essential to show that events clustered together all show higher similarity to each other than to events in any other clusters.

      By default, the clusters are rejected if in the adjacency matrix of the graph constructed by significant motif similarity,  the number of within cluster edges is smaller than the number of without cluster edges. In subsequent cluster merges the separation is increased since only those clusters are merged that show significant similarity. As a visual control, we monitor plots as shown in Figure 4 supplement 1. Sequence templates (color dot clouds) are supposed to show no serial correlation when ordered according to any one template other than its own. We have added more clarification to the Methods including a new Figure 6 illustrating the Method.

      From the description, it was not clear how the sequence similarity was established between pairs of individual events. The only way I can see it is that the sequence (orders at which cells fire) is established with one event, and the rank order correlation is calculated with this order for the other event. However, in this case, distance A-B is not the same as distance B-A. Not sure how this is handled with the clustering procedure. Secondly, how the number of clusters is established in the hierarchical clustering procedure needs to be explained. Furthermore, from the method description, it is not clear how GS and TS sequences are identified. Can an event be classified as both a TS and GS event at the same time?

      The reviewer is correct in their assertion that we compute all pairwise rank order correlations (that are then subject to a statistical test detailed in the original method publication Chenani et al., 2019). By nature of the rankorder correlation the coefficients A-B and B-A are symmetric. This is now more carefully explained in the Methods.

      Several control analyses are needed to show that the sequences detected reflect not random patterns but those that repeat at a higher than random chance. This requires, at the first step, to establish to what degree sequences are consistent within a cluster and to what degree individual events show a sequential firing tendency. And at the next stage, these need to be compared with randomised events in which spike timing of cells is jittered or spike identity is randomised, and show that these events result in poorer sequence tendency and less consistent clusters.

      The controls requested by the reviewer are already implemented in our Method (see original publication of the Method in Chenani et al., 2019). This is now made clearer in the Methods section.

      Firing rate and place-related firing of cells alone could generate sequences even if cells otherwise fire independently from each other. In a similar manner, it was shown before that reactivation of waking cell assemblies could be seen in sleep, in which case firing rate differences across cells belonging to the same assembly could also generate sequential patterns without temporal coordination. Appropriate shuffling procedures need to be performed to exclude such scenarios.

      We are aware that the sequential firing in our data (particularly during the outward phase when the animal is performing the task), is most likely resulting from the correlations between rate maps and the animals trajectory. During the reward, this is less likely. An intrinsic control is that during sampling we do not see these sequences. Given the nature of the calcium signal, a direct connection to firing rate is not possible. However, we argue that using our center of mass-approach of the calcium trace effectively normalizes for firing rate effects. Shuffling dF/F amplitudes (as a proxy for firing rates) would thus have no effect on the center of mass sequences. We, however, consider this to be an important methodological difference between sequence analysis with spikes and Calcium signals and have added a related comment to the Methods part.

      The past literature describing mPFC reactivation, replay, and sequences needs to be described, and findings of this work need to be appropriately acknowledged, and those findings compared with this work (starting with this work from 2007 PMID: 18006749). In the current reading, a novice reader of this field might conclude that this is the first work that identified relay and sequences in the mPFC.

      We would like to apologize that the manuscript evokes this impression. This was not our intention, in fact we have given strong emphasis on the Kaefer et al. paper in the Discussion. We have now added early references on PFC replay based on electro-physiological recordings in the Discussion section.

      The analysis of Figure 4H is not sufficient to show that only forward sequences occur. If 50% are forward and 50% are reverse, the median is zero. Some of the presented histograms look like Gaussian distributions with SD=1, which would show that those events were not real sequences. It should be tested whether the distributions are significantly different from the expected Gaussian.

      We agree with the reviewer that we did not explicitly test for significance of individual replays, but only tested for the rightward shift of the median. We have now added these significance tests/p values in Figure 5) and indeed could show that none of the significant backward replays exceed the fraction expected by chance, whereas forward replay significantly exceeds chance levels only in the cases where the median had a significant right ward shift (except for non-SI clusters). We would like to thank the reviewer for this suggestion, which we think makes the analysis stronger.

      Overall, the clarity of the text could be improved, and further examples of reactivated sequences should be shown, and the methods should be illustrated in the figures. At the current version, I fear that even readers in this field would give up on reading the current text given an insufficient level of clarity.

      We have included more examples of reactivated sequence (Suppl2 to Figure 5) and made extensive additions to the methods part. Particularly, we followed the reviewer’s request for method illustration (new Figure 6).

      Reviewer #3 (Recommendations for the authors):

      My main comment here is for the authors to increase the clarity of the manuscript.[...] For instance, it was difficult to follow what was being done to determine TSs and GSs.

      We have made extensive additions to the Methods section including a new Figure 6 depicting the workflow of the sequence analysis in a schematic manner.

    1. eLife Assessment

      This study developed a novel continuous dot-motion decision-making task, in which participants can see another player's responses as well as their own, to measure perceptual performance and confidence judgments in a social context. The study is a valuable contribution to social decision-making primarily by introducing a new task and offering convincing evidence on how participants are impacted by others' decisions during continuous perceptual choices. The manuscript delivers clear evidence that participants judgements are driven by metacognitive confidence over simpler primary uncertainty.

    2. Reviewer #1 (Public review):

      Summary:

      This paper reports an interesting and clever task which allows the joint measurement of both perceptual judgments and confidence (or subjective motion strength) in real / continuous time. The task is used together with a social condition to identify the (incidental, task-irrelevant) impact of another player on decision-making and confidence. The paper is well-written and clear.

      Strengths:

      The innovation on the task alone is likely to be impactful for the field, extending recent continuous report (CPR) tasks to examine other aspects of perceptual decision-making and allowing more naturalistic readouts. One interesting and novel finding is the observation of dyadic convergence of confidence estimates even when the partner is incidental to the task performance, and that dyads tend to be more risk-seeking (indicating greater confidence) than when playing solo.

      One concern with the novel task is whether confidence is disambiguated from a tracking of stimulus strength or coherence. The subjects' task is to track motion direction and use the eccentricity of the joystick to control the arc of a catcher - thus implementing a real-time sensitivity to risk (peri-decision wagering). The variable-width catcher has been used to good effect in other confidence/uncertainty tasks involving learning of the spread of targets (the Nassar papers). But in the context of an RDK task, one simple strategy here is to map eccentricity directly to (subjective) motion coherence - such that the joystick position at any moment in time is a vector with motion direction and strength. The revised version of the paper now includes a comprehensive analysis of the extent to which the metacognitive aspect of the task (the joystick eccentricity) tracks stimulus features such as motion coherence. The finding of a lagged relationship between task accuracy and eccentricity in conjunction with a relative lack of instantaneous relationships with coherence fluctuations, convincingly strengthens the inference that this component of the joystick response is metacognitive in nature, and dynamically tracking changes in performance. This importantly rebuts a more deflationary framing of the metacognitive judgment, in which what the subjects might be doing is tracking two features of the world - instantaneous motion strength and direction.

      The claim that the novel task is tracking confidence is also supported by new analyses showing classic statistical features of explicit confidence judgments (scaling with aggregate accuracy, and tracking psychometric function slope) are obtained with the joystick eccentricity measure.

    3. Reviewer #2 (Public review):

      Summary:

      Schneider et al examine perceptual decision-making in a continuous task setup when social information is also provided to another human (or algorithmic) partner. The authors track behaviour in a visual motion discrimination task and report accuracy, hit rate, wager, and reaction times, demonstrating that choice wager is affected by social information from the partner.

      Strengths:

      There are many things to like about this paper. The visual psychophysics has been undertaken with much expertise and care to detail. The reporting is meticulous and the coverage of the recent previous literature is reasonable. The research question is novel.

      Comments on revisions:

      The authors have addressed my suggestions adequately

    4. Author response:

      The following is the authors’ response to the original reviews

      Reviewer 1:

      Strengths:

      The innovation on the task alone is likely to be impactful for the field, extending recent continuous report (CPR) tasks to examine other aspects of perceptual decision-making and allowing more naturalistic readouts. One interesting and novel finding is the observation of dyadic convergence of confidence estimates even when the partner is incidental to the task performance, and that dyads tend to be more risk-seeking (indicating greater confidence) than when playing solo. The paper is well-written and clear.”

      We thank reviewer 1 for this encouraging evaluation. Below we address the identified weaknesses and recommendations.

      (1) Do we measure metacognitive confidence?

      One concern with the novel task is whether confidence is disambiguated from a tracking of stimulus strength or coherence. […] But in the context of an RDK task, one simple strategy here is to map eccentricity directly to (subjective) motion coherence - such that the joystick position at any moment in time is a vector with motion direction and strength. This would still be an interesting task - but could be solved without invoking metacognition or the need to estimate confidence in one's motion direction decision. […] what the subjects might be doing is tracking two features of the world - motion strength and direction. This possibility needs to be ruled out if the authors want to claim a mapping between eccentricity and decision confidence […].”

      We thank reviewer 1 for pointing out that the joystick tilt responses of our subjects could potentially be driven by stimulus coherence instead of metacognitive decision confidence. Below, we present four arguments to address this point of concern:

      (1.1) Similar physical coherence between high and low confidence states

      Nominal motion coherence is a discrete value, but the random noisiness in the stimulus causes the actual frame-by-frame coherence to be distributed around this nominal value. Because of this, subjects might scale their joystick tilt report according to the coherence fluctuations around the nominal value. To check if this was the case, we use a median split to separate stimulus states into states with large versus small joystick tilt, individually for each nominal coherence. For each stimulus state, we extracted the actual instantaneous (frame-to-frame) motion coherence, which is based on the individual movements of dots in the stimulus patch between two frames, recorded in our data files.

      First, we compared the motion coherence between stimulus states with large versus small joystick tilt. For each stimulus state, we calculated average instantaneous motion coherence, and analyzed the difference of the medians for the large versus small tilt distributions for each subject and each coherence level. The resulting histograms show the distribution of differences across all 38 subjects for each nominal coherence, and are, except for the coherence of 22%, not significantly different from zero across subjects (Author response image 1). For the 22% coherence condition, the difference amounts to 0.19% – a very small, non-perceptible difference. Thus, we do no find systematic differences between the average motion coherence in states with high versus low joystick tilt.

      Author response image 1.

      Histograms of within-subject difference between medians of average coherence distributions with large and small joystick tilt for all subjects. Coherence is color-coded (cyan – 0%, magenta – 98%). On top, the title of each panel illustrates the number of significant differences (Ranksum test in each subject) without correction for multiple comparisons (see Author response table 1 below). In the second row of the title, we show the result of the population t-test against zero. Only 22% coherence shows a significant bias. Positive values indicate higher average coherence for large joystick tilt.  

      Author response table 1.

      List of all individual significantly different coherence distributions between high and low tilt states, without correction for multiple comparisons. Median differences do not show a consistent bias (i.e. positive values) that would indicate higher average coherence for the large tilts.

      (1.2) Short-term stimulus fluctuations have no effect

      […] But to fully characterise the task behaviour it also seems important to ask how and whether fluctuations in motion energy (assuming that the RDK frames were recorded) during a steady state phase are affecting continuous reporting of direction and eccentricity, prior to asking how social information is incorporated into subjects' behaviour.

      In addition to the analysis of stimulus coherence and tilt averaged across each stimulus state (1.1), we analyzed moment-to-moment relationship between instantaneous coherence and ongoing reports of accuracy and tilt. Below, we provide evidence that short-term fluctuations in the instantaneous coherence (i.e. the motion energy of the stimulus) do not result in correlated changes in joystick responses, neither for tilt nor accuracy. For each continuous stimulus state, we calculated cross-correlation functions between the instantaneous coherence, tilt and accuracy, and then averaged the cross-correlation across all states of the same nominal coherence, and then across subjects. The resulting average cross-correlation functions are essentially flat. This further supports our interpretation that the joystick reports do not reflect short-term fluctuations of motion energy.

      Author response image 2.

      Cross-correlation between the length of the resultant vector with joystick accuracy (left) and tilt (right). Coherence is color-coded. Shaded background illustrates 95% confidence intervals.

      (1.3) Joystick tilt changes over time despite stable average stimulus coherence

      If perceptual confidence is derived from evidence integration, we should see changes over time even when the stimulus is stable. Here, we have analyzed the average slope of the joystick tilt as a function of time within each stimulus state for each subject and each coherence, to verify if our participants tilted their joystick more with additional evidence. This is illustrated with a violin plot below (Author response image 3). The linear slopes of the joystick tilt progression over the course of stimulus states are different between coherence levels. High coherence causes more tilt over time, resulting in positive slopes for most subjects. In contrast, low/no coherence results mostly in flat or negative slopes. This tilt progression over time indicates that low coherence results in lower confidence, as subjects do not wager more with weak evidence. In contrast, high coherence causes subjects to exhibit more confidence, indicated by positive slope of the joystick tilt.

      Author response image 3.

      Violin plots showing the fitted slopes of the joystick tilt time course in the last 200 samples (1667 ms) leading up to a next stimulus direction (cf. Figure 2D). Positive values signify an increase in joystick tilt over time. Each dot shows the average slope for one subject. Coherence is color-coded. The dashed line at zero indicates unchanged joystick tilt over the analyzed time window.

      (1.4) Cross-correlation between response accuracy and joystick tilt

      Similar to 1.2 above, we have cross-correlated the frame-by-frame changes of joystick accuracy and tilt for each individual stimulus state and each subject. Across subjects, changes in tilt occur later than changes in accuracy, indicating that changes in the quality of the report are followed by changes in the size of the wager. Given that this process is not driven by short-term changes in the motion energy of the stimulus (see 1.2 above), we interpret this as additional evidence for a metacognitive assessment of the quality of the behavioral report (i.e. accuracy) reflected in the size of the wager (our measure for confidence). (See Figure 2E).

      (2) Peri-decision wagering is different to post-decision wagering

      […] One route to doing this would be to ask whether the eccentricity reports show statistical signatures of confidence that have been established for more classical punctate tasks. Here a key move has been to identify qualitative patterns in the frame of reference of choice accuracy - with confidence scaling positively with stimulus strength for correct decisions, and negatively with stimulus strength for incorrect decisions (the so-called X-pattern, for instance Sanders et al. 2016 Neuron […].

      We thank reviewer 1 for the constructive feedback. Our behavioral data do not show similar signatures to the previously reported post-decision confidence expression (Desender et al., 2021; Sanders et al., 2016). The previously described patterns show, first of all, that confidence for the incorrect type1 decisions diverges from the correct type1 decisions, declining with stimulus strength (e.g. coherence), as compared to increase for correct decisions. In our task, there is a graded accuracy and (putative) confidence expression, but there are no correct or incorrect decisions – instead, there are hits and misses of the reward targets presented at nominal directions. Instead of a decline for misses, we observe an equally positive scaling with coherence for the confidence, both for hits and misses (Author response image 4A). This is because in our peri-decision wagering task, the expression of confidence causally determines the binary hit or miss outcome. The outcome in our task is a function of the two-dimensional joystick response: higher tilt (confidence) requires a more accurate response to successfully hit a target. Thus, a subject can display a high (but not high enough) level of accuracy and confidence but still remain unsuccessful. If we instead median-split the confidence reports by high and low accuracy (Author response image 4C), we observe a slight separation, especially for higher coherences, but still no clear different in slopes.

      We do observe the other two dynamic signatures of confidence (Desender et al., 2021): signature 2 – monotonically increasing accuracy as a function of confidence (Author response image 4), and signature 3 – steeper type 1 psychometric performance (accuracy) for high versus low confidence (Author response image 4D).

      Author response image 4.

      Confidence (i.e., joystick tilt, left column) and accuracy reports (right column) for different stimulus coherence, sorted by discrete outcome (hit versus miss, upper row) and the complementary joystick dimension (lower row, based on median split).

      Author response image 5.

      Accuracy reports correlate positively with confidence reports. For each stimulus state, we averaged the joystick response in the time window between 500 ms (60 samples) after a direction change until the first reward target appearance. If there was no target, we took all samples until the next RDP direction change into account. This corresponds to data snippets averaged in Figure 2D. Thus, for each stimulus state, we extracted a single value for joystick accuracy and for tilt (confidence). Subsequently, we fitted a linear regression to the accuracy-confidence scatter within each subject and within each coherence level. The plot above shows the average linear regression between accuracy and confidence across all subjects (i.e., the slopes and intercepts were averaged across n=38 subjects). Coherence is color-coded.

      (3)  Additional analyses regarding the continuous nature of our data

      I was surprised not to see more analysis of the continuous report data as a function of (lagged) task variables. […]

      Reviewer 1 requested more analyses regarding the continuous nature of our data. We agree that this is a useful addition to our paper, and thank reviewer 1 for this suggestion. To address this point, we revised main Figure 2 and provided additional panels. Panel D illustrates the continuous ramp-up of both accuracy and tilt (confidence) for high coherence levels, suggesting ongoing evidence integration and meta-cognitive assessment. Panel E shows the cross-correlation between frame-by-frame changes in accuracy and tilt (see 1.4 above). Here, we demonstrate that changes in the accuracy precede changes in joystick tilt, characterizing the continuous nature of the perceptual decision-making process.

      (4) Explicit motivation regarding continuous social experiments

      This paper is innovating on a lot of fronts at once - developing a new CPR task for metacognition, and asking exploratory questions about how a social setting influences performance on this novel task. However, the rationale for this combination was not made explicit. Is the social manipulation there to help validate the new task as a measure of confidence as dissociated from other perceptual variables? (see query 1 below). Or is the claim that the social influence can only be properly measured in the naturalistic CPR task, and not in a more established metacognition task?

      Our rationale for the combination of real-time decision making and social settings was twofold:

      i. Primates, including humans, are social species. Naturally, most behavior is centered around a social context and continuously unfolds in real-time. We wanted to showcase a paradigm in which distinct aspects of continuous perceptual decision-making could be assessed over time in individual and social environments.

      ii. Human behavior is susceptible to what others think and do. We wanted to demonstrate that the sheer presence of a co-acting social partner affects continuous decision-making, and quantify the extent and direction of social modulation.

      We agree that the motivation for combining the new task and this specific type of social co-action should be more clear. We have clarified this aspect in the Introduction, line 92-109. In brief, the continuous, free-flowing nature of the CPR task and real-time availability of social information made this design a very suitable paradigm for assessing unconstrained social influences. We see this study as the first step into disentangling the neural basis of social modulation in primates. See also the response to reviewer 2, point 2, below.

      (5) Response to minor points

      (5.1)  Clarification on behavioral modulation patterns

      Lines 295-298, isn't it guaranteed to observe these three behavioral patterns (both participants improving, both getting worse, only one improving while the other gets worse) even in random data?

      The reviewer is correct. We now simply illustrate these possibilities in Figure 4B and how these patterns could lead to divergence or convergence between the participants (see also line 282). Unlike random data, our results predominantly demonstrate convergence.

      (5.2) Clarification on AUC distributions

      Lines 703-707, it wasn't clear what the AUC values referred to here (also in Figure 3) - what are the distributions that are being compared? I think part of the confusion here comes from AUC being mentioned earlier in the paper as a measure of metacognitive sensitivity (correct vs. incorrect trial distributions), whereas my impression here is that here AUC is being used to investigate differences in variables (e.g., confidence) between experimental conditions.

      We apologize for the confusion. Indeed, the AUC analysis was used for the two purposes:

      (i) To assess the metacognitive sensitivity (line 175, Supplementary Figure 2).

      (ii) To assess the social modulation of accuracy and confidence (starting at line 232, Figures 3-6). 

      We now introduce the second AUC approach for assessing social modulation, and the underlying distributions of accuracy and confidence derived from each stimulus state, separately in each subject, in line 232.

      (5.3) Clarification of potential ceiling effects

      Could the findings of the worse solo player benefitting more than the better solo player (Figure 4c) be partly due to a compressive ceiling effect - e.g., there is less room to move up the psychometric function for the higher-scoring player?

      We thank the reviewer for this insight. First, even better performing participants were not at ceiling most of the times, even at the highest coherence (cf. Figure 2 and Supplementary Figure 3C). To test for the potential ceiling effect in the better solo players, we correlated their social modulation (expressed as AUC as in Figure 4) to the solo performance. There was no significant negative correlation for the accuracy (p > 0.063), but there was a negative correlation for the confidence (r = - 0.39, p = 0.0058), indicating that indeed low performing “better players in a dyad” showed more positive social modulation. We note however that this correlation was driven mainly by few such initially low performing “better” players, who mostly belonged to the dyads where both participants improved in confidence (green dots, Figure 4B), and that even the highest solo average confidence was at ceiling (<0.95). To conclude, the asymmetric social modulation effect we observe is mainly due to the better players declining (orange and red dots, Figure 4B), rather than due to both players improving but the better player improving less (green dots, Figure 4B).

      Reviewer 2:

      Strengths:

      There are many things to like about this paper. The visual psychophysics has been undertaken with much expertise and care to detail. The reporting is meticulous and the coverage of the recent previous literature is reasonable. The research question is novel.

      We thank reviewer 2 for this positive evaluation. Below we address the identified weaknesses and recommendations.

      (1) Streamlining the text to make the paper easier to read

      The paper is difficult to read. It is very densely written, with little to distinguish between what is a key message and what is an auxiliary side note. The Figures are often packed with sometimes over 10 panels and very long captions that stick to the descriptive details but avoid clarity. There is much that could be shifted to supplementary material for the reader to get to the main points.

      We thank reviewer 2 for the honest assessment that our article was difficult to read and understand, and for providing specific examples of confusion. We substantially improved the clarity:

      We added a Glossary that defines key terms, including Accuracy and Hit rate. 

      We replaced the confusing term “eccentricity” with joystick “tilt”.

      We simplified Figures 3 and 5, moving some panels into supplementary figures.

      We substantially redesigned and simplified our main Figure 4, displaying the data in a more straightforward, less convoluted way, and removing several panels. This change was accompanied by corresponding changes in the text (section starting at line 277).

      More generally, we shortened the Introduction, substantially revised the Results and the figure legends, and streamlined the Discussion.

      (2) Dyadic co-action vs joint dyadic decision making

      A third and very important one is what the word "dyadic" refers to in the paper. The subjects do not make any joint decisions. However, the authors calculate some "dyadic score" to measure if the group has been able to do better than individuals. So the word dyadic sometimes refers to some "nominal" group. In other places, dyadic refers to the social experimental condition. For example, we see in Figure 3c that AUC is compared for solo vs dyadic conditions. This is confusing.

      […] my key criticism is that the paper makes strong points about collective decision-making and compares its own findings with many papers in that field when, in fact, the experiments do not involve any collective decision-making. The subjects are not incentivized to do better as a group either. […]

      The reviewer is correct to highlight these important aspects. We did, in fact, not investigate a situation where two players had to reach a joint decision with interdependent payoff and there was no incentive to collaborate or even incorporate the information provided by the other player. To make the meaning of “dyadic” in our context more explicit, we have clarified the nature of the co-action and independent payoff (e.g. lines 107, 211, 482, 755 - Glossary), and used the term “nominal combined score” (line 224) and “nominal “average accuracy” within a dyad” (line 439).

      Concerning the key point about embedding our findings into the literature on collective decision-making, we would like to clarify our motivation. Outside of the recent study by Pescetelli and Yeung, 2022, we are not aware of any perceptual decision-making studies that investigated co-action without any explicit joint task. So naturally, we were stimulated by the literature on collective decisions, and felt it is appropriate to compare our findings to the principles derived from this exciting field.  Besides developing continuous – in time and in “space” (direction) – peri-decision wagering CPR game, the social co-action context is the main novel contribution of our work. Although it is possible to formulate cooperative or competitive contexts for the CPR, we leveraged the free-flowing continuous nature of the task that makes it most readily amendable to study spontaneously emerging social information integration.

      We now more explicitly emphasize that most prior work has been done using the joint decision tasks, in contrast to the co-action we study here, in Introduction and Discussion.

      (3) Addition of relevant literature to Discussion

      […] To see why this matters, look at Lorenz et al PNAS (https://www.pnas.org/doi/10.1073/pnas.1008636108) and the subsequent commentary that followed it from Farrell (https://www.pnas.org/doi/full/10.1073/pnas.1109947108). The original paper argued that social influence caused herding which impaired the wisdom of crowds. Farrell's reanalysis of the paper's own data showed that social influence and herding benefited the individuals at the expense of the crowd demonstrating a form of tradeoff between individual and joint payoff. It is naive to think that by exposing the subjects to social information, we should, naturally, expect them to strive to achieve better performance as a group.

      Another paper that is relevant to the relationship between the better and worse performing members of the dyad is Mahmoodi et al PNAS 2015 (https://www.pnas.org/doi/10.1073/pnas.1421692112). Here too the authors demonstrate that two people interacting with one another do not "bother" figuring out each others' competence and operate under "equality assumption". Thus, the lesser competent member turns out to be overconfident, and the more competent one is underconfident. The relevance of this paper is that it manages to explain patterns very similar to Schneider et al by making a much simpler "equality bias" assumption.

      We thank reviewer 2 for pointing out these highly relevant references, which we have now integrated in the Discussion (lines 430 and 467). Regarding the debate of Lorenz et al and Farell, although it is about very different type of tasks – single-shot factual knowledge estimation, it is very illuminating for understanding the differing perspectives on individual vs group benefit. We fully agree that it is naïve to assume that during independent co-action in our highly demanding task participants would strive to achieve better performance as a group – if anything, we expected less normative and more informational, reliability-driven effects as a way to cope with task demands.

      Mahmoodi et al. is a particularly pertinent and elegant study, and the equality bias they demonstrate may indeed underlie the effects we see. We admit that we did not know this paper at the time of our initial writing, but it is encouraging to see the convergence [pun intended] despite task and analysis differences. As highlighted above (2), our novel contributions remain that we observe mutual alignment, or convergence, in real-time without explicitly formulated collective decision task and associated social pressure, and that we separate asymmetric social effects on accuracy and confidence.

      Other reviewer-independent changes:

      Additional information: Angular error in Figure 2

      In panel A of the main Figure 2, we have added the angular error of the solo reports (blue dashed line) to give readers an impression about the average deviation of subjects’ joystick direction from the nominal stimulus direction. We have pointed out that angular error is the basis for accuracy calculation.

      Data alignment

      In the previous version of the manuscript, we have presented data with different alignments: Accuracy values were aligned to the appearance of the first target in a stimulus state (target-alignment) to avoid the predictive influence of target location within the remaining stimulus state, while the joystick tilt was extracted at the end of each stimulus state (state-alignment) to allow subjects more time to make a deliberate, confidence-guided report (Methods). We realized that this is confusing as it compares the social modulation of the two response dimensions at different points in time. In the revision, we use state-aligned data in most figures and analyses and clearly indicate which alignment type has been used. We kept the target-alignment for the illustration of the angular error in the solo-behavior (Figure 2). Specifically, this has only changed the reporting on accuracy statistics. None of the results have changed fundamentally, but the social modulation on accuracy became even stronger in state-aligned data.

      In summary, we hope that these revisions have resulted in an easier-to-understand and convincing article, with clear terminology and concise and important takeaway messages.

      We thank both reviewers and the editors again for their time and effort, and look forward to the reevaluation of our work.

      References

      Desender K, Donner TH, Verguts T. 2021. Dynamic expressions of confidence within an evidence accumulation framework. Cognition 207:104522. doi:10.1016/j.cognition.2020.104522

      Pescetelli N, Yeung N. 2022. Benefits of spontaneous confidence alignment between dyad members. Collective Intelligence 1. doi:10.1177/26339137221126915

      Sanders JI, Hangya B, Kepecs A. 2016. Signatures of a Statistical Computation in the Human Sense of Confidence. Neuron 90:499–506. doi:10.1016/j.neuron.2016.03.025

    1. eLife Assessment

      This study provides a valuable genome-centric characterization of microbial communities across deep sediment cores from a Spartina patens salt marsh. The study provides claims on the metabolic capabilities of the deep sediment microbiome as well as on a burial microbial assembly process and functional complementarity at depth. However, some of these claims remain incomplete and would benefit from further supporting evidence. Overall, this work will be of interest to microbial ecologists working on wetlands.

    2. Reviewer #1 (Public review):

      Summary:

      In this manuscript, Vineis et al. examined the structure and functional potential of microbial communities along a vertical sediment profile of a salt marsh, using a genome-centric metagenomic approach. They attempted to test whether (1) the microbial communities within dynamic upper layers contain genomes with diverse functional potential, (2) the energy limited deeper sediments contain microbial consortia assembled to metabolise complex carbon, and (3) microbial compositional changes in the low energy sediments mirror the burial processes observed in marine environments with similar energetic limitations. Results revealed a core microbial consortia that contains a collective metabolic potential for complex carbon and aromatics degradation, suggesting putative syntrophic interactions. Besides, the recovery of MAGs assembled independently from multiple depths in the same core and the consistent relative abundance structure of MAGs within co-occurrence network modules together suggest burial process as a likely mechanism for microbial assembly.

      Strengths:

      (1) Two long sediment cores (down to 240 cm deep) were collected in this study, allowing investigation of the less well characterised subsurface microbiome in salt marsh.

      (2) A genome-centric metagenomic approach was employed here, which provides information on both the structure and functional potential of the salt marsh sediment microbiome, which is not possible in commonly performed 16S rRNA-based surveys.

      Weaknesses:

      (1) In both the abstract and conclusion, the authors claimed that results from this study provide a "mechanistic understanding" of the assembly and distribution of the microbial communities in salt marsh sediment (P2, L31 and P35, L645-649). However, both claims are speculative and not supported by solid evidence. Firstly, the genomic data presented in this study and supplementary physical properties of sediments in the broader area are not enough to make a solid claim (that appears in the title) on microbial assembly being governed by a burial process. Alternative explanations include residual bioturbation, slow porewater advection, etc. Therefore, this remains an interesting hypothesis unless additional evidence is provided to rule out the alternative explanations. Similarly, the claim on the detailed syntrophic interactions among members within a co-occurrence network module (e.g. P36, L649-652) is purely speculative and warrants functional validation experiments to prove.

      (2) A major aim of this work was to study complex carbon degradation. However, neither CAZymes, the first-line carbon degradation enzymes, nor peptidases, which can be important contributors to carbon degradation at depth, was examined here. METABOLIC, which the authors used for functional annotation of MAGs, by default generates peptidases outputs and can be easily integrated here.

      (3) No geochemical data is available to provide context for the genomic analysis here. Without such information, readers cannot even tell whether the surface sediment samples were oxic or anoxic. A reference to a PhD thesis is provided (P6, L126) but it would be most helpful to extract relevant data from there and provide as a supplementary table.

      (4) A single metagenomic binning tool, CONCOCT, was used in this study, which very likely has resulted in a limited number of MAGs recovered. More (high-quality) MAGs are expected with the use of additional binners and a bin consolidation procedure.

      (5) Several terminologies are misleading here. Firstly, the term "co-occurring" or "co-located" microbes or MAGs (e.g. P1, L19 and P31, L537) can be misleading as it could imply a close spatial relationship. However, co-occurrence networks rely on correlations of (relative) abundance and show statistical associations instead of direct spatial or physical relationships. I would suggest alternative names such as co-abundant or statistically associated microbes. Secondly, the term "persistent conversion of soil organic carbon" (P36, L654) in the conclusion is also misleading as it implies an active process, which cannot be tested without metatranscriptomics or metaproteomics data.

      (6) Based on a NMDS plot of KEGG IDs (Figure 4B), the authors claimed that the functional potential among MAGs in modules 1, 2 and 7 was very similar (P18, L346). However, the dispersions of modules 1 and 2 were just too large. A proper statistical test, such as PERMANOVA, should be used to support the claim.

      (7) Genome-scale metabolic networks was analysed using Metag2Metabo (M2M) and results were discussed in detail (P26, L453-466). However, the source data should be provided in a supplementary table to show what metabolites are producible by which MAGs.

    3. Reviewer #2 (Public review):

      This work provides a detailed metabolic reconstruction of sediment microbiomes along a depth profile in a Spartina patens salt marsh in Massachusetts, USA. Using a combination of genome reconstruction, co-occurrence network analysis, and metabolic profiling, the authors describe the metabolic potential of co-occurring microbial consortia in understudied deep sediments.

      Major strengths of this study include the detailed metagenomic characterization of the understudied deep marsh sediments. The authors recovered genomes representing a substantial portion of the deep sediment microbiome (up to ~60%) and provided an initial explanation of pathways related to the potential for organic carbon decomposition in this environment. Of particular interest is the capability of the deep sediment microbiome to process aromatic organic compounds, highlighting the need for a collaborative consortium to carry out their decomposition. Improved understanding of the microbial transformation of deep sediment organic carbon in blue carbon ecosystems is vital to better understand the fate of this large carbon pool in the face of climate change.

      However, I have a few concerns in the interpretation of the results, and in the case of the surface sediments there is a lack of strong evidence in my opinion.

      (1) A stronger ecological interpretation is needed regarding the meaning of the co-occurrence network analysis. The authors correctly note that their analysis identifies groups of co-occurring genomes, which may indicate shared niche space, not necessarily interspecific ecological interactions (as the authors imply for instance in lines 423-425). When performing network analysis using samples from the entire sediment profile (0-240 cm), they identified consortia that co-vary in relative abundance along the depth gradient most likely because of shared environmental filtering forces, such as changes in redox potential and sediment chemistry. Supplementary Figure S4 showing that different modules have distinct abundance distributions along the sediment profile supports this idea. Being that the case, I would like the authors to define the ecological significance of the "connector hub". Is it merely taxa that is prevalent in the whole sediment profile? Since the modules are physically separated (in different sediment depth layers), they are not really interacting between each other. As it stands, it is not clear why the authors decide to study connector hubs in greater detail, along with their subnetworks.

      (2) I question if the lack of network modules in the surface sediment is really a consequence of non-significant interspecific ecological interactions and not the result of methodological biases. The low MAG recovery and thus short read recruitment in surface-level metagenomes may hinder the ability of the authors to identify co-varying microorganisms in the surface sediment. The high diversity of the surface sediment prevents proper assembly of the surface microbiome. I would also argue that as redox potential declines sharply in salt marsh sediments just below the root surface, the microbial community in the first few centimeter's changes rapidly and is significantly different from the more stable deep sediment microbiome. Due to the sampling design, the study has less representation of the surface layer (only 0-30 cm, while the cores extend down to 240 cm). Grouping sediment microbiomes by depth based on similarity in their sequence space (e.g., Mash) or functional profile (e.g., KEGG annotation) before performing network analysis could help to better infer ecological relationships within the distinct ecological niches of the marsh sediment profile, rather than performing a single network analysis of all samples combined.

      (3) Normalizing the relative abundance of MAGs by dividing by the total reads mapping to a particular sample can be misleading due to differences in recruitment levels across samples (and depths). A better approach would be to normalize by metagenome library size, or preferably by genome equivalents (e.g., using MicrobeCensus) or a similar approach.

    1. eLife Assessment

      The authors used a Bayesian modeling framework to fit behavior and serotonin neuron activity to reward history across multiple timescales. A key goal was to distinguish value coding from other influences, particularly thirst, by comparing model fits across neurons. Although the question and approach are valuable, several limitations of the current manuscript mean that support for the conclusions is incomplete.

    2. Reviewer #1 (Public review):

      Summary:

      The authors aimed to determine whether individual serotonin neurons encode a slowly evolving estimate of environmental value during a dynamic Pavlovian conditioning task. They used a Bayesian modeling framework to fit neural activity and behavior to reward history across multiple timescales. A key goal was to distinguish value coding from other influences, particularly thirst, by comparing model fits across neurons. Ultimately, they sought to quantify the prevalence and properties of value coding in single serotonin neurons and assess its relationship to behavior.

      Strengths:

      The authors employ a Bayesian modeling framework that allows for nuanced hypothesis testing on long timescales of reward history. This approach is well-suited to the complexity of single-neuron data, where noise and variability can obscure meaningful patterns. By fitting generative models to both neural activity and behavior, the authors move beyond descriptive statistics to infer latent variables such as value and thirst, and quantify their contributions to firing rate.

      The use of hierarchical Bayesian models enables partial pooling across neurons and sessions, improving parameter estimation while accounting for individual variability. The mixture modeling strategy further strengthens the analysis by explicitly testing whether neurons encode value, thirst, or neither - rather than assuming a single coding scheme. This avoids overfitting and provides a principled way to assess the prevalence and properties of value coding in the serotonergic population.

      The authors also validate their modeling choices through cross-validation and comparisons with null and trend models, demonstrating that their value model explains neural activity better than simpler alternatives. This lends credibility to their claim that serotonin neurons encode slowly evolving estimates of value.

      Weaknesses:

      The authors' decision to analyze neural activity during the ITI is methodologically sound in terms of maximizing spike counts and improving statistical power for single-unit modeling. Their generative model performs best when applied to ITI firing, and the longer duration and higher spike density of this period make it well-suited for capturing slow dynamics in serotonergic neurons.

      However, this strength simultaneously introduces a conceptual limitation. The behavioral readout-anticipatory licking-occurs during the cue periods, not the ITI. This creates a temporal disconnect between the neural and behavioral data streams. While the authors cite theoretical work suggesting that ITI value scales with trace period value, this assumption is not directly validated in the current dataset. As a result, it remains unclear whether ITI firing reflects behaviorally relevant value signals or merely captures slow fluctuations unrelated to immediate behavioral output. For example, after all of the analyses performed, the final results section point reads: "Taken together, anticipatory licking is explained partially by value integration occurring at a faster time scale than seen in serotonergic cells and partially by value integration happening at a timescale that matches the serotonergic cells, but the part of the behaviour matching the timescale seen in serotonergic cells is better explained by a model of thirst than a model of value." This appears to negate much of the work of the prior analyses.

      The manuscript lacks sufficient population-level illustrations of behavior. Figure 1 presents a single-session example, which does not allow the reader to assess consistency across mice or neurons. Figure 2 improves on this by showing individual traces and means, but the data are already processed and smoothed, obscuring raw behavioral variability.

      Additionally, key behavioral metrics are not clearly defined. For instance, the calculation of "reward collection probability" is ambiguous. It is unclear whether this refers to licking during the cue, the outcome window, or some other period. The relationship between reward collection probability and anticipatory licking is also not explicitly described, making it difficult to interpret how these behavioral measures relate to the modeled value signals. The reader is also not shown what licking looks like during the ITI - the precise period the authors analyse and focus on.

      Thirst plays a central role in the manuscript, both as a behavioral driver and as a confounding variable in interpreting serotonergic activity. However, the method used to quantify thirst, a linear decrease from an initial value following each drinking event, is overly simplistic and potentially misleading. This approach assumes that thirst diminishes uniformly with each reward, without accounting for the physiological complexity of hydration and satiety regulation.

      In reality, thirst is influenced by multiple factors, including fluid balance, timing of intake, and individual variability. Modeling it as a monotonic function of reward consumption risks conflating motivational state with mere reward history. Given how prominently thirst features in the analysis and interpretation, a more nuanced or empirically validated measure would strengthen the manuscript's conclusions.

      Minor, but I did not find Panel A of Figure S1 to be helpful to the manuscript. The panel says height, while the caption says hairline. This manuscript is not about faculty, height, or hairline.

    3. Reviewer #2 (Public review):

      Summary:

      The authors recently published a seminal work (Nature 2025), in which they proposed that the activity of serotonin neurons encodes a "prospective code for value" (value with low-pass filtered negative feedback, roughly resulting in rate-of-change + (compressed) value) and validated this proposal by analyzing several data sets and showing that their theory provided better fit than existing other theories. In the present work, the authors analyzed the activity of serotonin neurons and the licking behavior in reference to their theory by using the data of mice performing a dynamic Pavlovian task, in which the reward probability occasionally changed without a cue in a block-wise manner. While serotonin neuronal activity during task trials in the same data set was analyzed in their previous work, in the present work, the authors focused on the activity during inter-trial intervals and longer time-scale changes. The authors' analyses using Bayesian model fitting revealed that serotonin neurons' activities reflected reward history over long time scales (on average about 100 trials or 10~20 minutes) and the time scales for individual neurons considerably varied (30~300 trials, 5~60 minutes). Analysis of licking, on the other hand, revealed that licking frequency mainly reflected reward history over shorter time scales, and the remaining long-time-scale components could be mostly explained by (gradually decreasing) thirst.

      Strengths:

      (1) The results supported and further elaborated the authors' prospective value coding theory of serotonin.

      (2) The results also raised a question about what then determines the frequency of licking behavior and how.

      Weaknesses:

      (1) A limitation of the current analyses is the lack of consideration of the effort cost of licking. Given that both involvement of serotonin in effort cost computation (Meyniel et al., 2016 eLife 17282) and the existence/influence of effort cost of licking (Hage et al., 2023 eLife 87238) have been suggested, it is desired to consider (most desirably, formally analyze) such an effect in the current data set. A simple way of incorporating effort cost would be to assume a small (free parameter) negative reward for every single licking (anticipatory and other) and combine these negative rewards with positive (liquid) rewards in the calculation of value. This may not drastically change the main claims of the present work, but could still provide insights into whether/how serotonin is involved in cost-benefit computation (or whether/how reward and cost are combined in the serotonin system).

      (2) Another possibility related to effort cost is that the accumulation of effort cost of licking over a long time scale may cause fatigue. Since such a fatigue is expected to gradually increase across the entire session, potentially in a similar time course to thirst (but with a positive rather than negative slope), it may be needed to ask whether the suggested positive effect of thirst on licking (i.e., decrease of licking due to decrease of thirst) could be (partially) explained by a negative effect of fatigue (i.e., decrease of licking due to increase of fatigue).

      (3) Are there also possibilities that the decrease of licking (partially) reflects a decrease in the degree of exploration (over the selection between licking and no-licking) and/or meta learning about the occasional sudden changes in the reward probability, such as the meta learning observed in animals engaging in a repetitive reversal learning task (Hattori et al., 2023 Nat Neurosci)?

    4. Reviewer #3 (Public review):

      Summary:

      The authors are reanalyzing previously published data to test the hypothesis that serotonin neurons encode state value. Here, the authors focus on analyzing the firing rate of serotonin neurons during the inter-trial interval, in which no cues or outcomes are delivered. The goal is to quantify and find neurons whose activity is explained by value encoding, and for those that have this property, determine what the timescale of reward integration is (e.g., a few trials, tens of trials, or the entire session) in individual neurons.

      Strengths:

      The major strengths are the use of a Bayesian modelling approach to extract value and thirst coding features from individual neurons, and comparison of the time course of adaptation of serotonin neurons with a behavioral output, licking in this case. I also appreciate the use of a separate dataset to establish prior distributions for baseline firing rate to be used in the modelling done here, which is an attempt to deal with the main weakness of this study:

      Weaknesses:

      The weakness of this study is the small number of neurons available for analysis, resulting in a small number of neurons that unequivocally are modulated by value.

      The authors did achieve their aims, but the results show that it is hard to unequivocally separate value-coding neurons with long timescales from thirst-coding neurons, which is acknowledged by the authors.

      While the experimental results do not allow for a strong conclusion regarding the distinction of value versus thirst coding in serotonin neurons, the methods employed and the rationale for using them are of great utility to the community and for considerations of behavioral task design and data analysis in future studies. This is a point that the authors could discuss/develop more.

      Additional significance of the work:

      The comparison between time courses for behavior (anticipatory licking) and serotonin activity (as well as the reference to dopamine activity's time course from a previous study) is of great significance for any researcher studying behavioral control. Mounting evidence suggests that multiple brain circuits contribute to any given action selection. Therefore, expecting a perfect alignment between the time course of neuromodulator activity and behavioral output might be unreasonable. For future studies, modelling behavioral output as a combination of policies determined by multiple brain circuits or neuromodulators might be a promising approach.

    1. eLife Assessment

      This important study advances our understanding of vertebrate forelimb development, specifically the contribution of Hox genes to zebrafish pectoral fin formation. The authors have employed a robust and extensive genetic approach to tackle a key and unresolved question. The findings are overall convincing and will be of broad interest to developmental and evolutionary biologists.

    2. Reviewer #1 (Public review):

      Summary:

      The authors have used gene deletion approaches in zebrafish to investigate the function of genes of the hox clusters in pectoral fin "positioning" (but perhaps more accurately pectoral fin "formation"

      Strengths:

      The authors have employed a robust and extensive genetic approach to tackle an important and unresolved question.

      The results are largely very clearly presented.

      Weaknesses:

      The Abstract suggests that no genetic evidence exists in model organisms for a role of Hox genes in limb positioning. There are, however, several examples in mouse and other models (both classical genetic and other) providing evidence for a role of Hox genes in limb position, which is elaborated on in the Introduction.

      It would perhaps be more accurate to state that several lines of evidence in a range of model organisms (including the mouse) support a role for Hox genes in limb positioning. The author's work is not weakened by a more inclusive introduction that cites the current literature more comprehensively.

      It would be helpful for the authors to make a clear distinction between "positioning" of the limb/fin and whether a limb/fin "forms" at all, independent of the relative position of this event along the body axis.

      Discussion of why the zebrafish is sensitive to Hoxb loss with reference to the fin, but mouse Hoxb mutants do make a limb?

      Is this down to exclusive expression of Hoxbs in the zebrafish pectoral fin forming region rather than a specific functional role of the protein? This is important as it has implications for the interpretation of results throughout the paper and could explain some apparently conflicting results. .

      Why is hoxba more potent than hoxbb? Is this because Hioxba has Hox4/5 present while hox bb has only hoxb5? Hoxba locus has retained many more hox genes in,cluater than hoxbb therefore might expect to see greater redundancy in this locus)<br /> Deletion of either hox a or hox d in background of hoxba mutant does have some effect. IS this a reflection of protein function or expression dynamics of hoax/hoxd genes?

      Can we really be confident there is a "transformation of pectoral fin progenitor cells into cardiac cells"?

      The failure to repress Nkx2.5 in the posterior (pelvic fin) domain is clear but have these cells actually acquired cardiac identity? They would be expected to express Tbx5a (or b) as cardiac precursors but this domain does not broaden. There is no apparent expansion of the heart (field)/domain or progenitors beyond 16 somite stage. The claimed "migration" of heart precursors iin the mutant is not clear. The heart/cardiac domain that does form in the mutant is not clearly expanded in the mutant. The domain of cmlc2 looks abnormal in the mutant but I am not convinced it is "enlarged" as claim by the authors. The authors have not convincingly shown that " the cells that should form the pectoral fin instead differentiate into cardia cells."

      The only clear conclusion is the loss of pectoral fin-forming cells rather than these fin-forming cells being "transformed" into a new identity. It would be interesting to know what has happened to the cells of the pectoral fin forming region in these double mutants.

      It is not clear what the authors mean by a "converse" relationship between forelimb/pectoral fin and heart formation. The embryological relationship between these two populations is distinct in amniotes.

      The authors show convincing data that RA cannot induce Tbx5a in the absence of Hob clusters but I am not convinced by the interpretation of this result. The results shown would still be consistent with RA acting directly upstream of tbx5a but merely that RA acts in concert with hox genes to activate tbx5a. IN the absence of one or the other tbx5a would not be expressed. It is not necessary that RA and hoxbs act exclusively in a linear manner (i.e. RA regulates hoxb that in turn regulate tbx5a)

      The authors have carried out a functional test for the function of hoxb6 and hoxb8 in the hemizygous hoxb mutant background. What is lacking is any expression analysis to demonstrate whether hoxb6b or hoxb8b are even expressed in the appropriate pectoral fin territory to be able to contribute to pectoral fin development either in this assay or in normal pectoral fin development.

      (The term "compensate" used in this section is confusing/misleading.)

      The authors' confounding results described in Figures 6-7 are consistent with the challenges faced in other model organisms in trying to explore the function of genes in the hox cluster and the known redundancy that exists across paralogous groups and across individual clusters.

      Given the experimental challenges in deciphering the actual functions of individual or groups of hox genes, a discussion of the normal expression pattern of individual and groups of hox genes (and how this may change in different mutant backgrounds) could be helpful to make conclusions about likely normal function of these genes and compensation/redundancy in different mutant scenarios.

      Comments on revisions:

      No further issues to address.

    3. Reviewer #2 (Public review):

      Summary:

      The authors of this manuscript performed a fascinating set of zebrafish mutant analysis on hox cluster deletion and pinpoint the cause of the pectoral fin loss in one combinatorial hox cluster mutant of hoxba and hoxbb. I support the publication of this manuscript.

      Strengths:

      The study is based on a variety of existing experimental tools that enabled the authors' past construction of hox cluster mutants and is well-designed. The manuscript is well written to report the author's findings on the mechanism that positions the pectoral fin.

      Weaknesses:

      The study does not focus on the other hox clusters than ba and bb, and is confined to the use of zebrafish, as well as the comparison with existing reports from mouse experiments.

      Comments on revisions:

      The authors have sufficiently addressed the concerns raised in my previous review. The revised manuscript substantially strengthens the original work.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review): 

      Summary: 

      The authors have used gene deletion approaches in zebrafish to investigate the function of genes of the hox clusters in pectoral fin "positioning" (but perhaps more accurately pectoral fin "formation"). 

      Strengths: 

      The authors have employed a robust and extensive genetic approach to tackle an important and unresolved question. The results are largely presented in a very clear way. 

      We thank the reviewer for the positive summary and for recognizing the strengths of our genetic approach and presentation.

      Weaknesses: 

      The Abstract suggests that no genetic evidence exists in model organisms for a role of Hox genes in limb positioning. There are, however, several examples in mouse and other models (both classical genetic and other) providing evidence for a role of Hox genes in limb position, which is elaborated on in the Introduction.

      It would perhaps be more accurate to state that several lines of evidence in a range of model organisms (including the mouse) support a role for Hox genes in limb positioning. The author's work is not weakened by a more inclusive introduction that cites the current literature more comprehensively. 

      Thank you for this constructive comment. We agree that our Abstract implied an absence of genetic evidence across model organisms and could be misleading. We have revised the Abstract to acknowledge that multiple lines of evidence—including classical and molecular studies in mouse and other models—support a role for Hox genes in limb/fin positioning. We have also expanded the Introduction to cite this literature more comprehensively. These changes clarify the current state of knowledge while preserving the novelty of our zebrafish genetic findings.

      It would be helpful for the authors to make a clear distinction between "positioning" of the limb/fin and whether a limb/fin "forms" at all, independent of the relative position of this event along the body axis.

      We thank the reviewer for pointing this out. In the revised manuscript, we now make a distinction between these two aspects: we describe “positioning” as being specified by the expression domains of Hox genes along the anterior–posterior axis, while the “formation” of pectoral fins reflects the functional requirement of Hox genes to induce tbx5a expression and thereby initiate fin development. We have clarified this distinction in the text to better separate these related but distinct roles of Hox genes.

      Discussion of why the zebrafish is sensitive to Hoxb loss with reference to the fin, but mouse Hoxb mutants do make a limb?  

      We thank the reviewer for this important comment. Our interpretation is that paired fins first appeared in vertebrates that already possessed four Hox clusters. It is likely that novel functions related to pectoral fin positioning emerged within the HoxB cluster at that time, contributing to the origin of pectral fins. In zebrafish, we found that these functions remain largely restricted to the hoxba and hoxbb clusters, such that loss of both results in complete absence of pectoral fins. In contrast, mice exhibit a high degree of functional redundancy across Hox clusters. For example, deletion of all HoxB genes except Hoxb13 does not result in forelimb loss (Medina-Martinez et al., 2000), and forelimbs are still present in Hoxa5;Hoxb5;Hoxc5 triple knockouts (Xu et al., 2013). Thus, although we cannot fully explain why HoxB cluster deletions alone do not abolish forelimb formation in mice, it is plausible that overlapping functions from other Hox clusters compensate for the loss of HoxB genes, consistent with the general robustness of the mammalian Hox system. We have revised the Discussion to clarify this point.

      Is this down to exclusive expression of Hoxbs in the zebrafish pectoral fin forming region rather than a specific functional role of the protein? This is important as it has implications for the interpretation of results throughout the paper and could explain some apparently conflicting results.  

      We thank the reviewer for this insightful comment. To address this point, we newly analyzed the expression patterns of PG4–8 genes in the hoxba and hoxbb clusters. Our in situ hybridization results revealed that only hoxb4a, hoxb5a, and hoxb5b are detectably expressed in the pectoral fin buds (Figure 5C, 5E, Figure 7M-R). While we cannot completely exclude the possibility of functional differences among Hox proteins, our data strongly suggest that the loss of pectoral fins in hoxba;hoxbb cluster mutants is primarily due to the expression domains of these specific Hox genes in the fin-forming region, rather than to unique biochemical properties of the proteins. We have added these new data as a figure in the revised manuscript (Figure 7M-R) and clarified this point in the text (lines 312-316).

      Why is Hoxba more potent than Hoxbb? Is this because Hoxba has Hox4/5 present, while Hoxbb has only Hoxb5? Hoxba locus has retained many more Hox genes in cluster than hoxbb; therefore, one might expect to see greater redundancy in this locus).  

      We thank the reviewer for raising this important point. At present, we do not know the precise reason why hoxba appears more potent than hoxbb. The possibility raised by the reviewer—that differences in retained gene content (e.g., Hox4/5 in hoxba versus only Hoxb5 in hoxbb) may underlie this discrepancy—is certainly plausible. However, our previous study on the formation of dorsal and anal fins showed a similar situation: although PG11–13 Hox genes are present in both hoxca and hoxcb clusters, deletion of hox            genes in hoxca cluster had a more pronounced effect on median fin development (Adachi et al., 2024). This suggests that, following the teleost-specific whole-genome duplication, duplicated Hox clusters are not functionally equivalent, and asymmetric retention or deployment of functions may occur. The mechanistic basis of such bias remains unclear and warrants further investigation.

      Deletion of either Hoxa or Hoxd in the background of the Hoxba mutant does have some effect. Is this a reflection of protein function or expression dynamics of Hoxa/Hoxd genes?  

      We appreciate the reviewer’s comment and the opportunity to clarify this point. In Figure 2, we compared several double mutants with the hoxba single mutant. Among thesm, only the hoxba;hoxbb mutant exhibited a complete loss of tbx5a expression, whereas other combinations did not differ substantially from the hoxba mutant alone. Therefore, we consider that additional deletions such as hoxaa, hoxab, and hoxda do not have a strong effect beyond the hoxba deletion itself, and it is unlikely that Hoxa or Hoxd proteins functionally compensate for Hoxba in regulating tbx5a expression. Consistent with this interpretation, in our previous study we did not detect abnormalities in tbx5a expression in the hoxaa;hoxab;hoxda triple mutant (Ishizaka et al., 2024). Taken together, these observations support our view that the hoxba and hoxbb clusters are specifically required for the induction of tbx5a in the pectoral fin field.

      Can we really be confident that there is a "transformation of pectoral fin progenitor cells into cardiac cells"? 

      The failure to repress Nkx2.5 in the posterior (pelvic fin) domain is clear, but have these cells actually acquired cardiac identity? They would be expected to express Tbx5a (or b) as cardiac precursors, but this domain does not broaden. There is no apparent expansion of the heart (field)/domain or progenitors beyond the 16 somite stage. The claimed "migration" of heart precursors in the mutant is not clear. The heart/cardiac domain that does form in the mutant is not clearly expanded in the mutant. The domain of cmlc2 looks abnormal in the mutant, but I am not convinced it is "enlarged" as claimed by the authors. The authors have not convincingly shown that "the cells that should form the pectoral fin instead differentiate into cardiac cells."  The only clear conclusion is the loss of pectoral fin-forming cells rather than these fin-forming cells being "transformed" into a new identity. It would be interesting to know what has happened to the cells of the pectoral fin-forming region in these double mutants. 

      We sincerely thank the reviewer for this important comment. We agree that our data do not yet allow us to conclude with certainty that the presumptive pectoral fin progenitor cells in hoxba;hoxbb cluster mutants are fully “transformed” into cardiac cells. Our intention was to describe the striking posterior expansion of nkx2.5 expression and the altered morphology of the cmlc2-positive cardiac field in the mutants, which suggested a shift in cell fate. However, as the reviewer correctly points out, we did not directly demonstrate that the missing fin progenitors acquire bona fide cardiac identity.

      To address this, we have revised the text to clarify that the most robust conclusion from our current dataset is the loss of pectoral fin-forming cells in hoxba;hoxbb cluster mutants. We have softened or removed the claim of “transformation” and instead emphasize that our observations are consistent with an expansion of cardiac marker expression domains into the region where fin progenitors normally arise. We also acknowledge that the cmlc2 domain is abnormal rather than unequivocally enlarged, and have adjusted our wording accordingly.

      It is not clear what the authors mean by a "converse" relationship between forelimb/pectoral fin and heart formation. The embryological relationship between these two populations is distinct in amniotes.  

      We thank the reviewer for pointing this out. Our intention was to highlight the reciprocal balance between pectoral fin and cardiac progenitors in zebrafish. In particular, Waxman et al. (2008) demonstrated that retinoic acid signaling promotes pectoral fin formation while restricting the expansion of cardiac progenitors, thereby illustrating this reciprocal relationship. To avoid confusion, we have revised the text to explicitly state that this applies to zebrafish.

      The authors show convincing data that RA cannot induce Tbx5a in the absence of Hob clusters, but I am not convinced by the interpretation of this result. The results shown would still be consistent with RA acting directly upstream of tbx5a, but merely that RA acts in concert with hox genes to activate tbx5a. In the absence of one or the other, Tbx5a would not be expressed. It is not necessary that RA and hoxbs act exclusively in a linear manner (i.e., RA regulates hoxb that in turn regulates tbx5a).  

      We appreciate the reviewer’s thoughtful comment. We agree that our original wording in the Results section implied a strictly linear model of RA→Hox→tbx5a. In response, we have revised the Results to state only the experimental observation, namely that RA-dependent induction of tbx5a does not occur in the absence of the hoxba and hoxbb clusters.

      We have moved the broader interpretation to the Discussion, where we now emphasize that  our data are compatible with multiple models. One possibility is a linear pathway in which RA induces Hox expression that subsequently activates tbx5a. Alternatively, it is also plausible that RA induces Hox expression and that RA and Hox proteins act cooperatively to induce tbx5a. Our findings do not distinguish between these possibilities, and both models remain consistent with the data. We believe this restructuring addresses the reviewer’s concern by keeping the Results factual and limiting mechanistic interpretation to the Discussion.

      The authors have carried out a functional test for the function of hoxb6 and hoxb8 in the hemizygous hoxb mutant background. What is lacking is any expression analysis to demonstrate whether Hoxb6b or Hoxb8b are even expressed in the appropriate pectoral fin territory to be able to contribute to pectoral fin development, either in this assay or in normal pectoral fin development. 

      We thank the reviewer for emphasizing the importance of expression analyses. In response, we performed a comprehensive whole-mount in situ hybridization survey of all eight PG4–8 Hox genes from the hoxba and hoxbb clusters (hoxb4a, hoxb5a, hoxb5b, hoxb6a, hoxb6b, hoxb7a, hoxb8a, and hoxb8b) during pectoral fin development (18–30 hpf). Among these, only hoxb4a, hoxb5a, and hoxb5b displayed detectable expression in the developing pectoral fin buds. In contrast, hoxb6a, hoxb6b, hoxb7a, hoxb8a, and hoxb8b were not expressed in this territory. These new data have been incorporated into the revised manuscript (Fig. 7M-R). We believe that this dataset provides a more complete and systematic picture of which Hoxb genes are available to function in pectoral fin development, and we are grateful to the reviewer for this valuable suggestion, which significantly strengthened our study.

      (The term "compensate" used in this section is confusing/misleading.) 

      We thank the reviewer for this helpful remark. We agree that the term “compensate” was misleading in this context, as it could be confused with genetic compensation mechanisms such as transcriptional adaptation. To avoid this ambiguity, we have revised the wording.

      Specifically, we replaced “compensate for” with “mimic the effect of” or “phenocopy” depending on the context. We believe this revision improves clarity and prevents misunderstanding.

      The authors' confounding results described in Figures 6-7 are consistent with the challenges faced in other model organisms in trying to explore the function of genes in the hox cluster and the known redundancy that exists across paralogous groups and across individual clusters.  Given the experimental challenges in deciphering the actual functions of individual or groups of hox genes, a discussion of the normal expression pattern of individual and groups of hox genes (and how this may change in different mutant backgrounds) could be helpful to make conclusions about likely normal function of these genes and compensation/redundancy in different mutant scenarios.  

      We appreciate the reviewer’s thoughtful comment. We agree that functional analyses of Hox genes are often complicated by redundancy within and across clusters. In this revision, we have included additional expression data of PG4–8 genes from the hoxba and hoxbb clusters, showing that only hoxb4a, hoxb5a, and hoxb5b are expressed in the fin buds. Although we did not analyze expression changes across mutant backgrounds in this study, we consider this an important direction for future experiments.

      Reviewer #2 (Public review): 

      Summary: 

      The authors of this manuscript performed a fascinating set of zebrafish mutant analyses on hox cluster deletion and pinpointed the cause of the pectoral fin loss in one combinatorial hox cluster mutant of Hoxba and Hoxbb. 

      Strengths: 

      The study is based on a variety of existing experimental tools that enabled the authors' past construction of hox cluster mutants, and is well-designed. The manuscript is well written to report the authors' findings on the mechanism that positions the pectoral fin. 

      Weaknesses: 

      The study does not focus on the other hox clusters other than ba and bb, and is confined to the use of zebrafish, as well as the comparison with existing reports from mouse experiments.  

      We thank the reviewer for the thoughtful and encouraging evaluation of our manuscript. We are pleased that the strengths of our study design and clarity of writing were recognized. We also acknowledge the noted limitations, and while our focus here is on zebrafish hoxba and hoxbb clusters, we agree that future studies should expand to other hox clusters and additional models. Below, we provide individual responses to the specific points raised.

      Reviewer #1 (Recommendations for the authors): 

      (1) Some additional expression analyses of Hoxb6/b8 etc, could be carried out to address some issues raised in the main review.  

      We thank the reviewer for this suggestion. In response, we performed additional whole-mount in situ hybridization analyses of PG4–8 genes from the hoxba and hoxbb clusters, including hoxb6b and hoxb8b. These experiments showed that only hoxb4a, hoxb5a, and hoxb5b are expressed in the developing fin buds, whereas hoxb6a, hoxb6b, hoxb7a, hoxb8a, and hoxb8b are not. We have incorporated these new data into the revised manuscript (Figure 7M-R), which we believe clarify why functional tests of hoxb6b and hoxb8b did not uncover specific requirements in fin development.

      (2) The discussion section, particularly the more speculative section on evolutionary significance, could be reduced. Discussion of pelvic fin could be removed also, as this has not and could not be addressed with the current experimental design.  

      We thank the reviewer for this helpful suggestion. In line with the recommendation, we have reduced the speculative section on evolutionary significance in the Discussion to make it more concise and focused. We have also removed the discussion of pelvic fins, as these were not directly addressed by our current experimental design. We believe these changes improve the clarity and focus of the Discussion section.

      (3) The conclusions on transformation to cardiac identity could be reevaluated and presented differently.  

      We appreciate the reviewer’s insightful comment. In the revised manuscript, we have toned down our interpretation regarding a transformation to cardiac identity. Instead, we now describe the findings more cautiously, emphasizing the clear loss of fin precursors rather than a definitive acquisition of cardiac fate. We believe this revision presents a more balanced interpretation of the data.

      (4) Minor typographical - I would suggest removing "Genetic Evidence:" from the title.  

      We appreciate the reviewer’s suggestion. In accordance with this comment, we have revised the title to: “HoxB-derived hoxba and hoxbb clusters are essential for the anterior-posterior positioning of zebrafish pectoral fins”.

      Reviewer #2 (Recommendations for the authors): 

      (1) The authors mention the redundancy (between the a type and b type) of Hox clusters derived from an additional whole genome duplication in the teleost fish lineage. But, they do not refer to whether the zebrafish Tbx5 ortholog has an additional copy. This information helps the readers' interpretation of the data presented. First of all, tbx5a suddenly appears on line 143 without introducing its relationship with Tbx5, which needs to be explained in a revised manuscript.  

      We thank the reviewer for highlighting this important point. In zebrafish, there are indeed two Tbx5 orthologs, tbx5a and tbx5b. In the revised manuscript, we have modified the text around line 124 to introduce tbx5a in the context of its orthology to Tbx5, ensuring that its appearance in the Results is clear to the readers.

      (2) I did not readily get whether the limb/fin 'positioning' that the authors focus on in this study is 'anteroposterior' positioning, but not anything else. If it is what is meant, the word 'anteroposterior' should just be inserted at the first appearance of the word 'positioning'.  

      We thank the reviewer for pointing this out. Our study specifically addresses the anteroposterior positioning of paired appendages, that is, how the initial site of pectoral fin formation is defined along the anterior–posterior axis of the body. To clarify this, we have revised the text to insert the word “anteroposterior” at the first appearance of the term “positioning” in both the Abstract and Introduction (lines 26 and 53). We believe this change resolves the ambiguity and makes the focus of our study explicit.

      (3) Figure 5B also shows the remarkable reduction of hoxc1a expression, which the authors do not mention at all. I wonder how this is explained and how the authors justify no remark on this throughout the manuscript. 

      We thank the reviewer for this insightful comment. As correctly noted, we did observe a marked reduction of hoxc1a expression in Figure 5B. However, based on our genetic analyses, we consider that the causal genes underlying the phenotype are most likely located in hoxba and hoxbb clusters. Therefore, although the change in hoxc1a expression is indeed a notable phenomenon, we did not emphasize it in the manuscript in order to maintain focus on the primary clusters responsible for the observed phenotype (lines 240-241). We agree that this point should be acknowledged, and we have now added a brief note in the Results to clarify our findings.

      (4) Figure 1 consists of multiple panels (A-M) but lacks panel D.  

      We apologize for the oversight. We have corrected it.

      (5) Line 85 - precise role -> exact role.  

      We have corrected it (line 95).

      (6) Line 87 - the vertebrate class Actinopterygii & the class Sarcopterygii. 

      Thank the reviewer for pointing out. We have corrected it (line 98-99).

      (7) Line 90 - homologous -> orthologous. 

      We have corrected it (line 102).

      (8) Figure 5 - For interpretability of the data, I suggest writing 'Paralogous groups' on the top of the panels A and B, and 'Cluster' vertically on the left.  

      We thank the reviewer for this helpful suggestion. As recommended, we have added

      “Paralogous groups” at the top of panels A and B, and “Clusters” vertically on the left side of Figure 5 to facilitate interpretation of the data.

      (9) Some subheading titles are too long. They can be shortened into 'hoxb5a and -b5b expression in pectoral fin buds are RA-dependent' instead of 'Expression patterns of hoxb5a and hoxb5b in pectoral fin buds are dependent on RA', for example.  

      We appreciate the reviewer’s suggestion regarding the length of the subheading titles. In response, we have shortened the relevant subheadings in both the Results and Discussion sections to make them more concise while retaining their scientific meaning. For example, the subheading originally written as “Expression patterns of hoxb5a and hoxb5b in pectoral fin buds are dependent on RA” has been revised to “hoxb5a/b5b expression in pectoral fin buds is

      RA-dependent.” Similar adjustments have been made to other subheadings throughout these sections. We believe these changes improve readability and consistency without altering the intended content.

      (10) Line 408 - why tetrapods, instead of cartilaginous fishes, which are thought of as natural in this context? 

      We appreciate the reviewer’s careful reading and insightful comment. However, in response to Reviewer 1’s suggestion, we have substantially reduced the speculative section on evolutionary significance in the Discussion. As a result, this specific part of the text has now been deleted. We thank the reviewer for raising this point.

    1. eLife Assessment

      By leveraging optical coherence tomography this study provides important insight into the deformation of human fingertip ridges when contacting raised features such as edges and contours. The study provides compelling evidence that such features tend to cause deformation and relative movement of what the authors term ridge flanks rather than bending of the ridges themselves.

    2. Reviewer #2 (Public review):

      Summary:

      The authors investigate sub-skin surface deformations to a number of different, relevant tactile stimuli, including pressure and moving stimuli. The results demonstrate and quantify the tension and compression applied from these types of touch to fingerprint ridges, where pressure flattens the ridges. Their study further revealed that on lateral movement, prominent vertical shearing occurred in ridge deformation, with somewhat inconsistent horizontal shear. This also shows how much the deeper skin layers are deformed in touch, meaning the activation of all cutaneous mechanoreceptors, as well as the possibility of other deeper non-cutaneous mechanoreceptors.

      Strengths:

      The paper has many strengths. As well as being impactful scientifically, the methods are sound and innovative, producing interesting and detailed results. The results reveal the intricate workings of the skin layers to pressure touch, as well as sliding touch over different conditions. This makes it applicable to many touch situations and provides insights into the differential movements of the skin, and thus the encoding of touch in regards to the function of fingerprints. The work is very clearly written and presented, including how their work relates to the literature and previous hypotheses about the function of fingerprint ridges. The figures are very well-presented and show individual and group data well. The additional supplementary information is informative and the video of the skin tracking demonstrates the experiments well.

      Weaknesses:

      There are very few weaknesses with the work; rather the authors detail well the limitations in the discussion. Therefore, this opens up lots of possibilities for future work.

      Impact/significance:

      Overall, the work will likely have a large impact on our understanding of the mechanics of the skin. The detail shown in the study goes beyond current understanding, to add profound insights into how the skin actually deforms and moves on contact and sliding over a surface, respectively. The method could be potentially applied in many other different settings (e.g. to investigate more complex textures, how skin deformation changes with factors like dryness and aging). This fundamental piece of work could therefore be applied to understand skin changes and how these impact on touch perception. It can further be applied to understand skin mechanoreceptor function better and model these. Finally, the importance of fingertip ridges is well-detailed, demonstrating how these play a role in directly shaping our touch perception and how they can shape the interactions we have with surfaces.

    3. Reviewer #3 (Public review):

      Summary:

      The publication presents unique in-vivo images of the upper layer of the epidermis of glabrous skin when a flat object compresses or slides on the fingertip. The images are captured using OCT and show the strain that fingerprints experience during mechanical stimulation.

      The most important finding is, in my opinion, that fingerprints undergo pure compression/tension without horizontal shear, suggesting that the shear stress caused by tangential load is transferred to the deeper tissues and ultimately to the mechanoreceptors (SA-I / RA-I).

      Strengths:

      Fascinating new insights into the mechanics of glabrous skin. To the best of my knowledge, this is the first experimental evidence of the mechanical deformation of fingerprints when subjected to dynamic mechanical stimulation. The OCT measurement allows unprecedented measurement of skin depth, whereas previous works were limited to tracking surface deformation.

      The robust data analysis reveals the continuum mechanics underlying the deformation of the fingerprint ridges.

      Weaknesses:

      I do not see any major weaknesses. The work is mainly experimental and is rigorously executed.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review): 

      Summary: 

      This manuscript uses optical coherence tomography (OCT) to visualize tissue microstructures about 1-2 mm under the finger pad skin surface. Their geometric features are tracked and used to generate tissue strains upon skin surface indentation by a series of transparent stimuli both normal and tangential to the surface. Then movements of the stratum corneum and the upper portion of the viable epidermis are evaluated. Based upon this data, across a number of participants and ridges, around 300 in total, the findings report upon particular movements of these tissue microstructures in various loading states. A better understanding of the mechanics of the skin microstructures is important to understand how surface forces propagate toward the locations of mechanoreceptive end organs, which lie near the edge of the epidermis and dermis, from which tactile responses of at least two peripheral afferents originate. Indeed, the microstructures of the skin are likely to be important in shaping how neural afferents respond and enhance their sensitivity, receptive field characteristics, etc. 

      Strengths: 

      The use of OCT in the context of analyzing the movements of skin microstructures is novel. Also novel and powerful is the use of distinct loading cases, e.g., normal, tangential, and stimulus features, e.g., edges, and curves. I am unaware of other empirical visualization studies of this sort. They are state-of-the-art in this field.

      Moreover, in addition to the empirical imaging observations, strain vectors in the tissues are calculated over time. 

      Weaknesses: 

      The interpretation of the results and their framing relative to the overall hypotheses/questions and prior works could be articulated more clearly. In particular, the major findings of the manuscript are in newly describing a central concept regarding "ridge flanks," but such structures are neither anatomically nor mechanistically defined in a clear fashion. For example, "... it appears that the primary components of ridge deformation and, potentially, neural responses are deformations of the ridge flanks and their relative movement, rather than overall bending of the ridges themselves." From an anatomical perspective, I think what the authors mean by "ridge flanks" is a differential in strain from one lateral side of a papillary ridge to the other. But is it unclear what about the continuous layers of tissue would cause such behaviors. Perhaps a sweat duct or some other structure (not visible to OCT) would subdivide the "flanks" of a papillary ridge somehow? If not due to particular anatomy, then is the importance of the "ridge flank" due to a mechanistic phenomenon of some sort? Given that the findings of the manuscript center upon the introduction of this new concept, I think a greater effort should be made to define what exactly are the "ridge flanks." It is clear from the results, especially the sliding case, that there is something important that the manuscript is getting at with this concept. 

      We apologize for the confusion around our use of ‘ridge flanks’. To recap the overall goal briefly, we wanted to measure the deformation of papillary ridges and their associated sub-surface structures to different tactile stimuli. Capturing these deformations and comparing them against different proposed ideas, for example bending (horizontal shear) of the entire ridge versus differential deformations of different sub-parts, constrains neural activation mechanisms, has implications for how well tactile stimuli can be spatially resolved on the skin, and for whether sub-surface deformations can be easily predicted from surface movements alone. Our mesh was dense enough to compare the stratum corneum and the viable epidermis directly, where we expected some differences due to their previously documented mechanical differences, as well as the ridge flanks, which refers to the two (proximal and distal) sides of a single papillary ridge and their associated structure in the SC and VE (as correctly surmised by the reviewer). Differential behaviour across ridge flanks might be seen, because various observations of the surface of the stratum corneum had suggested mechanical differences between the papillary ridges and the grooves dividing them, potentially leading to differential deformations of these two halves depending on which direction they were facing tissue with different mechanical properties.

      We now provide a clearer definition of ridge flanks in Figure 1 and in the main text. Importantly, existing prior research is better connected to our own investigation in the Introduction and we now specifically explain why we investigate ridge flanks.

      The OCT used herein cannot visualize deep and fully into what the manuscript refers to as a "ridge"(note others have previously broken apart this concept apart into "papillary", "intermediate" and "limiting" ridges) near locations of the mechanoreceptive end organs lie at the epidermal-dermal border. Therefore, the OCT must make inferences about the movements of these deeper tissues, but cannot see them directly, and it is the movements of these deeper tissues that are likely driving the intricacies of neural firing. Note the word "ridge" is used often in the manuscript's abstract, introduction, and discussion but the definition in Fig. 1 and elsewhere differs in important ways from prior works of Cauna (expert in anatomy). Therefore, the manuscript should clarify if "ridge" refers to the papillary ridge (visible at the exterior of the skin), intermediate ridge (defined by Cauna as what the authors refer to as the primary ridge), and limiting ridge (defined by Cauna as what the authors refer to as the secondary ridge). What the authors really mean (I think) is some combination of the papillary and intermediate ridge structures, but not the full intermediate ridge. The manuscript acknowledges this in the "Limitations and future work" section, stating that these ridges cannot be resolved. This is important because the manuscript is oriented toward tracking this structure. It sets up the narrative and hypotheses to evaluate the prior works of Cauna, Gerling, Swensson, and others who all directly addressed the movement of this anatomical feature which is key to understanding ultimately how stresses at these locations might move the peripheral end organs (i.e., Merkel cells, Meissner corpuscles). 

      Thank you for these observations. Indeed, our terminology was not consistent. We have now switched to Cauna’s terminology and added additional labels in Figure 1, explaining all mentioned structures in the main text. We have also changed the language in many instances in the main text to make it clearer whether we are referring to individual anatomical ridges (papillary, limiting, etc.) or the whole structure. Additionally, it is now clearer from the start which features are tracked, and we specifically state  that intermediate ridges are excluded from our tracking.

      Regarding the intermediate ridge, it indeed plays a big role in Cauna’s lever hypothesis. Given the intermediate ridge is excluded from our analysis, we can neither prove nor disprove this hypothesis in our current work. However, there are many mechanical mysteries to solve regarding the structures directly above, which are the main focus of this paper. We have rewritten the introduction to make these questions clearer. For example, Cauna observed pliability of the papillary ridges in surface experiments. Swensson found differential expression patterns of keratin in epidermis tissue in and above the intermediate ridges, but the direct mechanical consequences that are proposed in their paper concern the behaviour of papillary ridges, rather than relying on a mechanical role of intermediate ridges. Even Cauna’s lever idea implies specific deformation of the stratum corneum, which would be measurable in our study, as the upper handle of the ‘lever’ needs turning. We observed little movement in accordance with this idea, putting the lever mechanism into question. While this does not rule out a mechanical role of the intermediate ridge, these findings constrain its potential mechanisms.

      Reviewer #2 (Public Review): 

      Summary: 

      The authors investigate sub-skin surface deformations to a number of different, relevant tactile stimuli, including pressure and moving stimuli. The results demonstrate and quantify the tension and compression applied from these types of touch to fingerprint ridges, where pressure flattens the ridges. Their study further revealed that on lateral movement, prominent vertical shearing occurred in ridge deformation, with somewhat inconsistent horizontal shear. This also shows how much the deeper skin layers are deformed in touch, meaning the activation of all cutaneous mechanoreceptors, as well as the possibility of other deeper non-cutaneous mechanoreceptors. 

      Strengths: 

      The paper has many strengths. As well as being impactful scientifically, the methods are sound and innovative, producing interesting and detailed results. The results reveal the intricate workings of the skin layers to pressure touch, as well as sliding touch over different conditions. This makes it applicable to many touch situations and provides insights into the differential movements of the skin, and thus the encoding of touch in regards to the function of fingerprints. The work is very clearly written and presented, including how their work relates to the literature and previous hypotheses about the function of fingerprint ridges. The figures are very well-presented and show individual and group data well. The additional supplementary information is informative and the video of the skin tracking demonstrates the experiments well. 

      Weaknesses: 

      There are very few weaknesses in the work, rather the authors detail well the limitations in the discussion. Therefore, this opens up lots of possibilities for future work. 

      We thank the reviewer for these encouraging comments.

      Impact/significance: 

      Overall, the work will likely have a large impact on our understanding of the mechanics of the skin. The detail shown in the study goes beyond current understanding, to add profound insights into how the skin actually deforms and moves on contact and sliding over a surface, respectively. The method could be potentially applied in many other different settings (e.g. to investigate more complex textures, and how skin deformation changes with factors like dryness and aging). This fundamental piece of work could therefore be applied to understand skin changes and how these impact touch perception. It can further be applied to understand skin mechanoreceptor function better and model these. Finally, the importance of fingertip ridges is well-detailed, demonstrating how these play a role in directly shaping our touch perception and how they can shape the interactions we have with surfaces. 

      Reviewer #3 (Public Review): 

      Summary: 

      The publication presents unique in-vivo images of the upper layer of the epidermis of the glabrous skin when a flat object compresses or slides on the fingertip. The images are captured using OCT, and are the process of recovering the strain that fingerprints experience during the mechanical stimulation. 

      The most important finding is, in my opinion, that fingerprints undergo pure compression/tension without horizontal shear, hinting at the fact that the shear stress caused by the tangential load is transferred to the deeper tissues and ultimately to the mechanoreceptors (SA-I / RA-I). 

      Strengths: 

      Fascinating new insights into the mechanics of glabrous skin. To the best of my knowledge, this is the first experimental evidence of the mechanical deformation of fingerprints when subjected to dynamic mechanical stimulation. The OCT measurement allows an unprecedented measurement of the depth of the skin whereas previous works were limited to tracking the surface deformation.  - The robust data analysis reveals the continuum mechanics underlying the deformation of the fingerprint ridges. 

      Weaknesses: 

      I do not see any major weaknesses. The work is mainly experimental and is rigorously executed. Two points pique my curiosity, however: 

      (1) How do the results presented in this study compare with previous finite element analysis? I am curious to know if the claim that the horizontal shear strain is transferred to the previous layer is also captured by these models. The reason is that the FEA models typically use homogeneous materials and whether or not the behavior in-silico and in-vivo matches would offer an idea of the nature of the stratum corneum. 

      Very few modeling studies have examined combined normal and tangential loading of the fingertip. Additionally, results are often expressed in terms of Von Mises stresses, and not deformation [1,2], making direct comparison challenging. Nevertheless, one multilayered study [3] supports our finding that the largest deformations are found in deeper tissues.

      (1) Shao, F., Childs, T. H. C., Barnes, C. J. & Henson, B. Finite element simulations of static and sliding contact between a human fingertip and textured surfaces. Tribology International 43, 2308–2316 (2010).

      (2) Tang, W. et al. Investigation of mechanical responses to the tactile perception of surfaces with different textures using the finite element method. Advances in Mechanical Engineering 8, (2016).

      (3) Amaied, E., Vargiolu, R., Bergheau, J. M. & Zahouani, H. Aging effect on tactile perception: Experimental and modelling studies. Wear 332–333, 715–724 (2015). 

      (2) Was there a specific reason why the authors chose to track only one fingerprint? From the method section, it seems that nothing would have prevented tracking a denser point cloud and reconstructing the stain on a section of the skin rather than just one ridge. With such data, the author could extend their analysis to multiple ridges interaction and get a better sense of the behavior of the entire strip of skin. 

      We apologise for the confusion regarding this point. While in our illustration and the accompanying videos, we only show a single tracked ridge for clarity, we do indeed track all visible ridges in every frame. As imaging slices were 4 mm wide, often 8-9 ridges were visible concurrently. However, during the sliding experiments the skin was sometimes dragged along with the stimulus, causing some ridges to disappear from view for certain periods and then re-enter the frame. This would make it difficult to expand the analysis to multiple ridges, but in any case, we found neighbouring ridges to behave very consistently within a given trial, so that their mechanical behaviour (relative to the tactile feature, if any) could be averaged in the analysis.

      Reviewer #1 (Recommendations For The Authors): 

      Discussion, line 213, "Thus, the primary mechanism through which the ridge conforms to the object involves the relative movement and shearing of the ridge flanks, rather than relying on the groves as articulated joints." I don't see this as definitely proven in the imaging and analysis. This could be a hypothesis to come from this work for further evaluation but is a quite strong statement not obviously supported by the evidence. 

      We have rephrased this statement as a proposal for further testing:

      “Therefore, we propose that the primary mechanism through which a ridge conforms to an object might involve the relative movement and shearing of the ridge flanks, rather than relying on the grooves as articulated joints.”

      Discussion, line 220, "Our findings strongly indicate that the majority of the surface movement of the skin was observed by deeper tissue rather than surface layers of the skin." But since there are no measurements of such tissues, or of collagen bundle tightening, etc. it is not obvious to me how this can be proven as it is not directly observable and was not modeled. 

      We have reworded this paragraph to be more cautious and have included potential avenues for future testing of this idea:

      “It is possible that the majority of the surface movement of the skin was absorbed by deeper tissues rather than the surface layers of the skin imaged in the present study. If that is the case, recent modeling work has suggested that tissue deformations are highly dependent on the orientation of collagen fibers in these tissues (Duprez et al., 2024), which might be amenable to tracking in future OCT work to test this idea directly. Additionally, previous work investigating tactile afferent responses to tangential skin movements has reported strong activation of SA-2 receptors, thought to measure skin stretch mainly in deeper tissues (Saal et al., 2025), providing further indirect evidence.”

      Figure 1, A. As noted elsewhere, there are issues with the naming of the anatomy, and there is no definition of the concept of "ridge flanks." Also, it does not indicate the depth point to which OCT can resolve. 

      We have updated and expanded the labels in Figure 1A to clarify the anatomy (along with changes in the text described above). Figure 1C now includes a sentence about the resolvability of features below the mesh:

      “Detail view of a single OCT frame showing ridged skin structure and clear boundary between the stratum corneum and viable epidermis. A mesh covering the stratum corneum and the upper part of the viable epidermis (without the intermediate ridge) is overlaid spanning a single papillary ridge. The border between the viable epidermis and dermis is less clearly delineated, but some deeper features are resolved less well.”

      The concept of a ridge flank is now illustrated in Figure 1B(i) and Figure 1B(iv), and referred to in both the caption and main text. Updated figure caption text:

      “These deformations need not apply to the whole ridge structure but might affect different parts separately, e.g. via shearing in different directions across both ridge flanks  as shown on the far right

      (see darker shading to highlight a single ridge flank).”

      Updated text in the main manuscript:

      “Additionally, if there are indeed mechanical differences between papillary ridges and their neighbouring grooves at the level of the stratum corneum, this might result in differential movements of the two sides of each papillary ridge, here referred to as ridge flanks (see Figure 1B-iv, right, for a potential example).”

      Note that Figure 4B also includes an illustration of this concept.

      Figure 1, B. This mechanical representation does not capture the entirety of the papillary-intermediate ridge unit in question, as set up by the authors in the introduction. Also, in the caption it is not ridge deformation, but upper SC and VE deformation. And the OCT cannot resolve the whole ridge. 

      We have reworded the figure caption”

      “Potential deformations of the tracked ridge structure, including the stratum corneum and the bulk of the viable epidermis, during tactile interactions, with arrows indicating the directions of relative deformation. [...]”

      Importantly, the main manuscript text has been rewritten in the introduction section to clarify our research question and how much of the sub-surface ridge structure is tracked:

      “From a mechanical standpoint, these conflicting interpretations raise the question of how the outermost two skin layers typically deform at the resolution of single papillary ridges, whether by tension, compression, or shear (see examples in Figure 1B). Additionally, such deformations might apply to individual papillary ridges and all their sub-surface structures equally, for example horizontal shearing that bends the papillary ridge in a certain direction, while levering its sub-surface aspects in the opposite direction. Conversely, individual parts of the ridge structure might deform differently. For example, the viable epidermis might deform to a different extent or in different directions due to its lower stiffness and different morphology. Additionally, if there are indeed mechanical differences between papillary ridges and their neighbouring grooves at the level of the stratum corneum, this might result in differential movements of the two sides of each papillary ridge, here referred to as ridge flanks (see Figure 1B-iv, right, for a potential example). To empirically address these questions, we employed Optical Coherence Tomography (OCT) to precisely measure the sub-surface deformation of individual fingerprint ridges in response to a variety of mechanical events. Specifically, we focused on the stratum corneum and the bulk of the viable epidermis (excluding intermediate ridges), which could be robustly resolved and tracked by our setup.”

      Figure 1, C: While it is noted in the caption that the locations of the intermediate and limiting ridges, as well as the collagen bundles, are clearly visible, it is not clear to me, although the caption uses these words. This is especially the case below the orange mesh. From the picture, and because this is not labeled, it leaves it up to my interpretation, it seems like the secondary ridge (limiting) is larger than the primary (intermediate). 

      We have reworded the caption as follows:

      “Detail view of a single OCT frame showing ridged skin structure and clear boundary between the stratum corneum and viable epidermis. A mesh covering the stratum corneum and the upper part of the viable epidermis (without the intermediate ridge) is overlaid spanning a single papillary ridge. The border between the viable epidermis and dermis is less clearly delineated.”

      Indeed, while the intermediate ridge was often visible in the OCT images, its size was rather inconsistent and it could appear as larger or smaller than the limiting ridge, while in histological images it is generally shown as larger (however note that there is somewhat limited data). This difference might be due to imaging artifacts, e.g. limited visibility into the deeper tissues, might reflect individual differences between participants, or could indicate that intermediate ridges are not of a consistent height in the (out-of-plane) direction along a given ridge. We have clarified this in the Limitations section of the Discussion:

      “[...] while we could confidently track landmarks associated with the stratum corneum, we could not reliably identify intermediate ridges in the viable epidermis, though they were visible in some of the frames, limiting the depth of the fitted mesh. We hypothesize that the additional depth of these ridges combined with their slender morphology might have degraded the signal. 3D OCT imaging (see below) might help to resolve these features in future work and settle open questions regarding their precise morphology.”

      Figure 1, D, and E: How do these measurements compare with the literature? They seem reasonable to me based on a cursory review, but there is a need to directly compare, especially since measurements in this context with the OCT are novel and could be valuable. 

      We have clarified this in the main text and added more references to the existing literature:

      “We measured an average ridge width of 0.47 mm across participants (Figure 1D), consistent with previous studies (Moore, 1989; Ohler and Cummins, 1942). Average skin layer thickness was 0.38 mm for the stratum corneum and 0.12 mm for the viable epidermis across our dataset (Figure 1E), again in agreement with previous studies using both in vivo imaging and ex vivo histology (Fruhstorfer et al., 2000; Lintzeri et al., 2022; Maiti et al., 2020).”

      Abstract 4th sentence's structure makes me think that hundreds of individual fingerprint ridges can be tracked at the same time. Perhaps it could be tweaked to clearly indicate that hundreds were tracked between trials between participants. 

      We have changed the sentence to now read:

      “Here, we used optical coherence tomography to image and track sub-surface deformations of hundreds of individual fingerprint ridges across ten participants and four individual contact events at high spatial resolution in vivo.”

      Introduction, 1st sentence, the fingertip per se is not an organ, though the skin is an organ. 

      Changed the wording from “organ” to “structure”.

      Introduction, 1st sentence, "... that convert skin deformations ..." Need to add word skin to be clear. 

      Done.

      Introduction, 3rd paragraph, "Alternately, the grooves may be stiffer or less ...". In this paragraph, and this sentence in particular, Cauna is cited and the words groves and ridges are used. But this is not adequately explained. Cauna had distinct terminology, where he referred to papillary, intermediate, and limiting ridges, that exist in addition to ready ridges. It is important because the manuscript uses the word "ridges" in a non-specific way. This is done not just here but throughout the manuscript, and is central to the questions which can be addressed with OCT. 

      Anatomy has been better defined and more extensively labelled in Figure 1A, including labels for ‘papillary ridges’ and ‘grooves’. We have reworded this paragraph to better explain the concepts and how they relate to the subsequent analyses in the paper

      “Consequently, the mechanical response of the skin below its immediate surface remains largely unknown, leading to conflicting interpretations in the literature. For instance, it has been proposed that the papillary ridges are stiffer than the neighbouring grooves (Swensson et al., 1998), which might imply that normal loading of the skin might not affect the ridges’ profile appreciably. Conversely, other observations have suggested that the grooves are relatively stiff, allowing the papillary ridges to deform considerably (Cauna, 1954; Johansson and LaMotte, 1983). However, the sub-surface consequences of this putative pliability during object contact or stick-to-slip transitions (see e.g. Delhaye et al., 2016) are unclear: the whole ridge structure might bend as proposed in Cauna’s lever mechanism (Cauna, 1954), but this view has proved controversial (see e.g. Gerling and Thomas, 2008), with direct empirical evidence lacking.”

      Figure 1. Avoid red-green dots for colorblind accessibility. PMMA is not in the caption. 

      We have switched the colors of the mechanoreceptors in panel A to a colorblind-friendly scheme. We now also specify the material of the plates in the figure 1 caption.

      Results, line 102. "... papillary ridge structure...." Is this the ridge to which is being referred? 

      In conjunction with the updated labeling in Figure 1A, we have updated the terminology throughout the paper to be more consistent.

      Results, line 99. "We noted a small increase in the area of the strateum corneum, which was likely an artifact due to the fit of the mesh to the ridge's curvature ..." There is very little discussion of Fig. F's finding related to an increase in area in the SC and decrease in the VE. It makes me question if this finding in this panel is an artifact. With stiff tissue like stratum corneum, how would the area increase? 

      This finding could be a measurement artifact or it could be the result of skin from neighbouring regions pushing into the imaged space. We have reworded the brief description in the Results:

      “We noted a small increase in the area of the stratum corneum, which was possibly an artifact due to the imperfect fit of the mesh to the ridge's curvature (but see Discussion for an alternative explanation).”

      Additionally, we have added a short section in the Discussion in the Limitations section:

      “Some of our tactile interactions might have caused skin deformations out-of-plane that were thus not measurable. For example, the slight increase in thickness of the stratum corneum under normal load might be explained as a measurement artifact due to the coarse nature of the mesh fitted, but could alternatively reflect tissue from out-of-plane regions pushing into the imaged space. Indeed, recent surface measurements of the skin's behaviour during initial object contact have reported compression of the skin in the plane parallel to its surface (Doumont et al., 2025), which would result in increasing thickness, assuming that the stratum corneum is incompressible. Future studies could consider creating three-dimensional reconstructions of the fingerprint structure to study such effects.”

      Figure 3. The colors used in slip and stick are not colorblind accessible. 

      We have changed the background colors in Figure 3A,B,C to a colorblind accessible version.

      Results, line 151, "Thus, most of this shearing must be sustained by deeper tissues." But there are no direct observations as such. Also, in the next sentence, "collagen fiber bundles" are referred to in a non-specific way. This section is highly speculative with no systematic visualization of these structures, and should probably be moved to the discussion. 

      We have reworded this sentence to be more cautious. We have now also highlighted collagen fiber bundles visible in the figure. Systematic analysis of these is beyond the scope of the present study, as these were not tracked, but might be possible in future studies. The reworded sentence reads as follows:

      “Thus, it is possible that shearing is sustained by deeper tissues, an effect that could be tested in future studies by directly tracking the angle and orientation of collagen fiber bundles anchoring the epidermis to deeper tissues (see highlighted examples in Figure 3B).”

      Results, line 161, " Horizontal shear ..." do you mean surface shear, per the Fig. 1 definition? 

      For consistency, we have changed the labels to ‘Horizontal shear’ and ‘Vertical shear’ in Figure 1A(iii) and Figure 1A(iv) as these are the terms used throughout the paper.

      Discussion, line 198, "... flatten even at relatively low forces." This is an interesting point and it would be useful to note how low exactly. 

      We have reworded this sentence to better reflect the findings described earlier:

      “We found that individual ridges tended to flatten considerably at relatively low forces of 0.5 N, with higher forces increasing deformations only moderately.”

      Reviewer #2 (Recommendations For The Authors): 

      Minor comments that could improve the paper even further 

      In the abstract, it may be good to specify that the stimuli were all applied to the finger, this was not an active, self-generated tactile interaction, e.g. change 'in response to a variety of tactile stimuli' to 'in response to a variety of passively-applied tactile stimuli'. 

      Done.

      Comment on the grey/blue colours in the figures. I like the combination of blue/orange for different conditions, but sometimes the blue is very difficult to see against the grey background. Is there any way of making the grey background shading lighter and/or the blue darker/more vivid?

      We have changed the color of the SC mesh to a darker shade of blue, which is more easily distinguished from the grey background. This applies to figures 2B/C, 3D, 4A/B/D/E, and all supplementary figures.

      Methods. Could you please add a little more detail about exactly where the images were taken, e.g. in the exact middle of the fingerpad, at the fingertip? Did you line up the skin fingerprint ridges to be in a plane? It is just to better understand how the stimulus moved against the skin, which itself is rounded, and whether it was at a point where the ridges were relatively linear or curved. 

      We have added the following text in the “Experimental set-up” section of the Methods:

      “The participant's finger was secured in a finger holder, which was positioned in such a way that the flat part of the fingertip distal to the whorl made initial contact with the plate as it was lowered onto the fingertip. The scanner was positioned such that its scan path aligned with the distal-proximal axis of the plate, targeting the centre line of the fingerpad so that the fingerprint ridges were oriented orthogonally to the line scan.”

      and

      “For these experiments, imaging focused on the central flat part of the contact area, such that all fingerprint ridges visible in the imaged region were in contact with the plate throughout the trial.”

      Methods. There is no section about statistics, yet you do use them in the paper. It may be good to add a few details in the methods to outline the package you used to do the statistics, as well as why you chose the tests you carried out. 

      We have added a new Statistics section at the end of the Methods:

      “Statistical tests were run in Python using the scipy.stats package. As distributions were skewed, we used non-parametric analyses throughout the study. Bonferroni corrections were used when multiple comparisons were made.”

      A very minor point. Discussion, line 210: 'In this study...' is vague, which study exactly? It is preferable to be more precise, e.g. 'In the present/current study...'. 

      Fixed.

      Discussion. One point you may want to add is the possibility of looking at other skin regions. For example, would this approach work on the palm, on border glabrous/hairy skin, on various hairy skin sites, and on the foot? The possibilities could be endless if it could be applied anywhere, but it may depend on the technical positioning and skin itself. However, it would be interesting to know. 

      We have added the following text at the end of the Discussion section:

      “Finally, while we focused on the fingertip only, many other skin regions present interesting mechanical challenges waiting to be explored. The general ridged structure observed on the fingertip is common to all glabrous skin, but the local ridge mechanics might still differ: glabrous skin on the foot sole exhibits some morphological differences in order to support large weights that might well influence its mechanical response (Boyle et al., 2019). For example, the morphology of transverse ridges (running orthogonal to and connecting limiting with intermediate ridges) differs across regions on the foot sole (Nagashima and Tsuchida, 2011) and very likely from the hand (Yamada et al., 1996). Our method should be directly applicable to study deformations of these ridges, though three-dimensional observations might be needed to resolve some of the open questions. Hairy skin in contrast differs from glabrous skin in that the stratum corneum is much thinner. It also lacks the clearly organised ridge structure, but exhibits more loosely oriented skin folds instead, which very likely also serve a mechanical function (Leyva-Mendivil et al., 2015) and in principle are amenable to study using OCT.”

      In the last lines of the discussion, you mention the possible effects of skin moisturization. The Tomlinson et al. paper refers to the hydration of the skin with regard to water, which I would say is a slightly different factor. I think you can mention this paper and talk about the water level of the skin/hydration, but also add specifically that moisturization (i.e. by an emollient, humectant, or occlusive substance) is another factor to consider (e.g. effects found by Dione et al, 2023 Sci Rep). Overall, these two points relate to the dryness of the skin and the humidity of surfaces being contacted, therefore you could expand on both. 

      Thank you for the correction! We now mention both skin hydration and moisturization separately in this section.

    1. eLife Assessment

      The authors have performed a potentially valuable new kind of analysis in connectomics, mapping to an interesting developmental problem of synaptic input to sensory neurons. While the analysis itself is solid, the authors have drawn broader conclusions than are directly supported by the presented data. With more measured claims and greater clarity and explanations for the analysis, the study could potentially become stronger.