10,000 Matching Annotations
  1. Mar 2026
    1. eLife Assessment

      This study identifies the uncharacterised protein FAM53C as a novel, potential regulator of the G1/S cell cycle transition, linking its function to the DYRK1A kinase and the RB/p53 pathways. The work is valuable and of interest to the cell cycle field, leveraging a strong computational screen to identify a new candidate. The findings are solid, although confidence in the siRNA depletion phenotypes would have been higher with rescue experiments using an siRNA-resistant cDNA.

      [Editors' note: this paper was reviewed by Review Commons.]

    2. Reviewer #1 (Public review):

      [Editors' note: This version has been assessed by the Reviewing Editor without further input from the original reviewers. The authors have addressed comments raised in the previous round of review, shown below, through minor changes to the text without additional experiments.]

      Summary:

      Taylar Hammond and colleagues identified new regulators of the G1/S transition of the cell cycle. They did so by screening publicly available data from the Cancer Dependency Map and identified FAM53C as a positive regulator of the G1/S transition. Using biochemical assays they then show that FAM53 interacts with the DYRK1A kinase to inhibit its function. They show in RPE1 cells that loss of FAMC53 leads to a DYRK1A + P53-dependent cell cycle arrest. Combined inactivation of FAM53C and DYRK1A in a TP53-null background caused S-phase entry with subsequent apoptosis. Finally the authors assess the effect of FAM53C deletion in a cortical organoid model, and in Fam53c knockout mice. Whereas proliferation of the organoids is indeed inhibited, mice show virtually no phenotype.

      Reviewer #2 (Public review):

      The authors sought to identify new regulators of the G1/S transition by mining the Cancer Dependency Map (DepMap) co-dependency dataset. This analysis successfully identified FAM53C, a poorly characterized protein, as a candidate. The strength of the paper lies in this initial discovery and the subsequent biochemical work convincingly showing that FAM53C can directly interact with the kinase DYRK1A, a known cell cycle regulator.

      The authors then present evidence, primarily from acute siRNA knockdown in RPE-1 cells, that loss of FAM53C induces a strong G1 cell cycle arrest. Their follow-up investigation proposes a model where FAM53C normally inhibits DYRK1A, thereby protecting Cyclin D from degradation and preventing p53 activation, to allow for G1/S progression. The authors have commendably addressed some concerns from the initial review: they have now demonstrated the G1 arrest using two independent siRNAs (an improvement over the initial pool), shown the effect in several additional cancer cell lines (U2OS, A549, HCT-116), and developed a more nuanced model that incorporates p53 activation, which helps to explain some of the complex data.

    3. Reviewer #3 (Public review):

      In this study Hammond et al. investigated the role of Dual-specificity Tyrosine Phosphorylation regulated Kinase 1A (DYRK1) in G1/S transition. By exploiting Dependency Map portal, they identified a previously unexplored protein FAM53C as potential regulator of G1/S transition. Using RNAi, they confirmed that depletion of FAM53C suppressed proliferation of human RPE1 cells and that this phenotype was dependent on the presence protein RB. In addition, they noted increased level of CDKN1A transcript and p21 protein that could explain G1 arrest of FAM53C-depleted cells but surprisingly, they did not observe activation of other p53 target genes. Proteomic analysis identified DYRK1 as one of the main interactors of FAM53C and the interaction was confirmed in vitro. Further, they showed that purified FAM53C blocked the ability of DYRK1 to phosphorylate cyclin D in vitro although the activity of DYRK1 was likely not inhibited (judging from the modification of FAM53C itself). Instead, it seems more likely that FAM53C competes with cyclin D in this assay. Authors claim that the G1 arrest caused by depletion of FAM53C was rescued by inhibition of DYRK1 but this was true only in cells lacking functional p53. This is quite confusing as DYRK1 inhibition reduced the fraction of G1 cells in p53 wild type cells as well as in p53 knock-outs, suggesting that FAM53C may not be required for regulation of DYRK1 function. Instead of focusing on the impact of FAM53C on cell cycle progression, authors moved towards investigating its potential (and perhaps more complex) roles in differentiation of IPSCs into cortical organoids and in mice. They observed a lower level of proliferating cells in the organoids but if that reflects an increased activity of DYRK1 or if it is just an off-target effect of the genetic manipulation remains unclear. Even less clear is the phenotype in FAM53C knock-out mice. Authors did not observe any significant changes in survival nor in organ development but they noted some behavioral differences. Whether and how these are connected to the rate of cellular proliferation was not explored. In the summary, the study identified previously unknown role of FAM53C in proliferation but failed to explain the mechanism and its physiological relevance at the level of tissues and organism.

      Comments on the previous version:

      In the revised version of the manuscript, authors addressed most of the critical points. They now include new data with depletion of FAM53C using single siRNAs that show small but significant enrichment of population of the G1 cells. This G1 arrest is likely caused by a combined effects on induction of p21 expression and decreased levels of cyclin D1. Authors observed that inhibition of DYRK1 rescued cyclin D1 levels in FAM53 depleted cells suggesting that FAM53C may inhibit DYRK1. This possibility is also supported by in vitro experiments. On the other hand, inhibition of DYRK1 did not rescue the G1 arrest upon depletion of FAM53C, suggesting that FAM53C may have also DYRK1-independent role in G1. Functional rescue experiments with cyclin D1 mutants and detection of DYRK1 activity in cells would be necessary to conclusively explain the function of FAM53C in progression through G1 phase but unfortunately these experiments were technically not possible. Knock out of FAM53C in iPSCs and in mice suggest that FAM53C may have additional functions besides the cell cycle control and/or that adaptation may have occurred in these model systems. Overall, the study implicated FAM53C in fine tuning DYRK1 activity in cells that may to some extent influence the progression through G1 phase. In addition, FAM53C may also have DYRK1 and cell cycle independent functions that remain to be addressed by future studies.

    4. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Taylar Hammond and colleagues identified new regulators of the G1/S transition of the cell cycle. They did so by screening publicly available data from the Cancer Dependency Map and identified FAM53C as a positive regulator of the G1/S transition. Using biochemical assays they then show that FAM53 interacts with the DYRK1A kinase to inhibit its function. They show in RPE1 cells that loss of FAMC53 leads to a DYRK1A + P53-dependent cell cycle arrest. Combined inactivation of FAM53C and DYRK1A in a TP53-null background caused S-phase entry with subsequent apoptosis. Finally the authors assess the effect of FAM53C deletion in a cortical organoid model, and in Fam53c knockout mice. Whereas proliferation of the organoids is indeed inhibited, mice show virtually no phenotype.

      The authors have revised the manuscript, and I respond here point-by-point to indicate which parts of the revision I found compelling, and which parts were less convincing. So the numbering is consistent with the numbering in my first review report.

      (1) The p21 knockdowns are a valuable addition, and the claim that other p53 targets than p21 are involved in the FAMC53 RNAi-mediated arrest is now much more solid. Minor detail: if S4D is a quantification of S4C, it is hard to believe that the quantification was done properly (at least the DYRK1Ai conditions). Perhaps S4C is not the best representative example, or some error was made?

      We appreciate the concern from the Reviewer. As explained in the first round of revisions, we have mostly used an immunoassay based on capillary transfer (WES system), which is very quantitative (much more than classical immunoblot). As for the other WES assays, the panel in S4C is a representation from the signal in the capillary from one of the experiments we performed (in many ways, we should simply not show these representations but readers and reviewers expect them). We agree that this was not visually the most representative, likely because of the saturation of the signal, and we replaced it with another one.

      (2a) I appreciate the decision to remove the cyclin D1 phosphorylation data. A more nuanced model now emerges. It is not clear to me however why the Protein Simple immunoassay was used for experiments with RPE cells, and not the cortical organoids. Even though no direct claims are made based on the phospho-cyclin D data in Figure 5E+G, showing these data suggests that FAM53C deletion increases DYRK1A-mediated cyclin D1 phosphorylation. I find it tricky to show these data, while knowing now that this effect could not be shown in the RPE1 cells.

      The Reviewer raises a valid point. The data we had presented in the first version of the manuscript were strongly suggestive of changes in Cyclin D1 phosphorylation and protein stability but we followed the Reviewer’s advice to remove them from the revised manuscript because the effects were sometimes small. We decided to keep these data in the organoid model because we felt this is a question that many readers would have (how do changes in FAM53C affect Cyclin D levels?). As the Reviewer mentions, we did not draw conclusions about this but we felt and still feel it is important to connect the dots, even if imperfectly, between FAM53C and the cell cycle, and these data in Figure complement the data in Figure 3F. The experiments with RPE-1 cells were mostly performed in the Sage lab with the WES assay while the experiments with organoids were largely performed in the Pasca lab where more ‘classic’ immunoblots are routinely used. More generally, some antibodies work better with one method vs. the other and we often go back and forth between the two.

      (2b) The quantifications of the immunoassays are not convincing. In multiple experiments, the HSP90 levels vary wildly, which indicates big differences in protein loading if HSP90 is a proper loading control. This is for example problematic for the interpretation of figure 3F and S3I. The cyclin D1 "bands" look extremely similar between siCtrl and siFAM53C (Fig S3I), in fact the two series of 6 samples with different dosages of DYRK1Ai look seem an identical repetition of each other. I did not have to option to overlay them, but it would be important to check if a mistake was made here. The cyclin D1 signals aside, the change in cycD1/HSP90 ratios seems to be entirely caused by differences in HSP90 levels. Careful re-analysis of the raw data and more equal loading seem necessary. The same goes (to a lesser extent) for S3J+K.

      As mentioned above, the representation of the fluorescence signal may be important for readers who are used to seeing immunoblot (Western blots), but the quantification is performed on the values directly obtained from the WES system from ProteinSimple. In these experiments, we make sure that the numbers we obtain are in a validated range, allowing us to use the values, even if sometimes the loading is a bit different between lanes. The sensitivity of the WES assay allows for high accuracy in intra-well quantification allowing for accurate inter-well quantification once loading control normalization is completed.

      (2c) the new model in Fig S4L: what do the arrows at the right FAM53C and p53 that merge a point straight towards S-phase mean? They suggest that p53 (and FAM53C) directly promote S-phase progression, but most likely this is not what the authors intended with it.

      Very good point. We were trying to be inclusive of various signaling pathways that may be implicated in the regulation of the cell cycle by this group of proteins. FAM53C does promote S-phase entry (more cycling when FAM53C is overexpressed) but we removed the arrow coming from p53, which is certainly not a positive regulator of cell cycle progression. Thank you for helping us correct this mistake.

      (3) Clear; nicely addressed.

      (4) Thank you for correcting.

      (5) I appreciate that the authors are now more careful to call the IMPC analysis data preliminary. This is acceptable to me, but nevertheless, I suggest the authors to seriously consider taking this part entirely out. The risk of chance finding and the extremely skewed group sizes (as reviewer #2 had pointed out) hamper the credibility of this statistical analysis.

      We appreciate this concern but feel that it is important for the community to be aware of these phenotypes so other investigators either study FAM53C in different genetic contexts or, for example, generate a conditional knockout allele to study more acute effects of FAM53C loss during development and in adult mice. We believe that the text is carefully written and acknowledge the caveats of small sample sizes in some statistical analyses.

      Reviewer #2 (Public review):

      The authors sought to identify new regulators of the G1/S transition by mining the Cancer Dependency Map (DepMap) co-dependency dataset. This analysis successfully identified FAM53C, a poorly characterized protein, as a candidate. The strength of the paper lies in this initial discovery and the subsequent biochemical work convincingly showing that FAM53C can directly interact with the kinase DYRK1A, a known cell cycle regulator.

      The authors then present evidence, primarily from acute siRNA knockdown in RPE-1 cells, that loss of FAM53C induces a strong G1 cell cycle arrest. Their follow-up investigation proposes a model where FAM53C normally inhibits DYRK1A, thereby protecting Cyclin D from degradation and preventing p53 activation, to allow for G1/S progression. The authors have commendably addressed some concerns from the initial review: they have now demonstrated the G1 arrest using two independent siRNAs (an improvement over the initial pool), shown the effect in several additional cancer cell lines (U2OS, A549, HCT-116), and developed a more nuanced model that incorporates p53 activation, which helps to explain some of the complex data.

      However, a central and critical weakness persists. The entire functional model is built upon the very strong G1 arrest phenotype observed in vitro following acute knockdown. This finding is in stark contrast to data from other contexts. As the authors note, the knockout of Fam53c in mice results in minimal phenotypes, and the DepMap data itself suggests the gene is largely non-essential in most cancer cell lines.

      This major discrepancy creates two competing interpretations:

      As the authors suggest, FAM53C has a critical role in the cell cycle, but its loss is rapidly masked by compensatory mechanisms in long-term knockout models (like iPSCs and mice) or in established cancer cell lines.

      The strong acute G1 arrest is an experimental artifact of the siRNA-mediated knockdown, and not a true reflection of FAM53C's primary function.

      The authors' new controls (using two individual siRNAs and showing the arrest is RB-dependent) make an off-target effect less likely, but they do not definitively rule it out. The gold-standard experiment to distinguish between these two possibilities-a rescue of the phenotype using an siRNA-resistant cDNA-has not been performed.

      Because this key control is missing, the foundation of the paper's functional claims is not as solid as it needs to be. While the study provides an interesting and valuable new candidate for the cell cycle field to investigate, readers should be cautious in accepting the strength of FAM53C's role in the G1/S transition until this central discrepancy is definitively resolved.

      We appreciate this concern from the Reviewer. Genetically, FAM53C is linked to a number of genes coding for known regulators of the G1/S transition and its loss of function would be predicted to lead to G1 arrest based on these genetic interactions. As the Reviewer nicely summarizes, we have data in several cell types, including non-cancerous immortalized cells (RPE-1) and several cancer cell lines, that FAM53C acute knock-down leads to a G1 arrest. Our data also indicate that this arrest is RB dependent and p53 independent. Furthermore, genetic knockout of FAM53C in iPSC-derived human cortical organoids results in decreased proliferation. All these elements point to a role for FAM53C in G1/S. We performed some pilot rescue experiments, as suggested by the Reviewer, but these preliminary assays could not identify the right “dose” of FAM53C. We agree that it will be important in future studies to develop better genetic systems in which FAM53C can be manipulated genetically. However, our overexpression experiments show increased proliferation, providing more support for a role of FAM53C at the G1/S transition of the cell cycle.

      Reviewer #3 (Public review):

      Summary:

      In this study Hammond et al. investigated the role of Dual-specificity Tyrosine Phosphorylation regulated Kinase 1A (DYRK1) in G1/S transition. By exploiting Dependency Map portal, they identified a previously unexplored protein FAM53C as potential regulator of G1/S transition. Using RNAi, they confirmed that depletion of FAM53C suppressed proliferation of human RPE1 cells and that this phenotype was dependent on the presence protein RB. In addition, they noted increased level of CDKN1A transcript and p21 protein that could explain G1 arrest of FAM53C-depleted cells but surprisingly, they did not observe activation of other p53 target genes. Proteomic analysis identified DYRK1 as one of the main interactors of FAM53C and the interaction was confirmed in vitro. Further, they showed that purified FAM53C blocked the ability of DYRK1 to phosphorylate cyclin D in vitro although the activity of DYRK1 was likely not inhibited (judging from the modification of FAM53C itself). Instead, it seems more likely that FAM53C competes with cyclin D in this assay. Authors claim that the G1 arrest caused by depletion of FAM53C was rescued by inhibition of DYRK1 but this was true only in cells lacking functional p53. This is quite confusing as DYRK1 inhibition reduced the fraction of G1 cells in p53 wild type cells as well as in p53 knock-outs, suggesting that FAM53C may not be required for regulation of DYRK1 function. Instead of focusing on the impact of FAM53C on cell cycle progression, authors moved towards investigating its potential (and perhaps more complex) roles in differentiation of IPSCs into cortical organoids and in mice. They observed a lower level of proliferating cells in the organoids but if that reflects an increased activity of DYRK1 or if it is just an off-target effect of the genetic manipulation remains unclear. Even less clear is the phenotype in FAM53C knock-out mice. Authors did not observe any significant changes in survival nor in organ development but they noted some behavioral differences. Weather and how these are connected to the rate of cellular proliferation was not explored. In the summary, the study identified previously unknown role of FAM53C in proliferation but failed to explain the mechanism and its physiological relevance at the level of tissues and organism. Although some of the data might be of interest, in current form the data is too preliminary to justify publication.

      Major comments:

      (1) Whole study is based on one siRNA to Fam53C and its specificity was not validated. Level of the knock down was shown only in the first figure and not in the other experiments. The observed phenotypes in the cell cycle progression may be affected by variable knock-down efficiency and/or potential off target effects.

      We fully acknowledge these limitations in our study. First, we agree that the efficiency of the knock-down can be variable across experiments; unfortunately, antibodies against FAM53C are currently still not optimal and immunoassays against this protein have not always been reliable in our hands. It will be important in the future to develop better antibodies for this poorly studied factor. Second, we also agree that the siRNA pool is perhaps not optimal (note that we used a pool, not a single siRNA). We provide data in the manuscript that single siRNAs (from the pool) also arrest cells in G1. Our data also show that this arrest in observed in several cell lines (cancerous and not cancerous), in a p53 independent but RB dependent way. We further note that we also provide data in cortical spheroids derived from CRISPR/Cas9 knockout iPSCs showing a similar inhibition of proliferation, validating our observations in a completely orthogonal system. Finally, overexpression studies support a role for FAM53C at the G1/S transition (i.e., FAM53C overexpression is sufficient to promote proliferation).

      (2) Experiments focusing on the cell cycle progression were done in a single cell line RPE1 that showed a strong sensitivity to FAM53C depletion. In contrast, phenotypes in IPSCs and in mice were only mild suggesting that there might be large differences across various cell types in the expression and function of FAM53C. Therefore, it is important to reproduce the observations in other cell types.

      As mentioned above, we have observed cell cycle arrest in several cancer cell lines (U2OS, A549, HCT-116) and in iPSC-derived organoids. We acknowledge that RPE-1 cells seem most sensitive to the knock-down and, currently, we do not understand why. In the future, it will be critical to gain a better understanding of the cellular/genetic contexts in which FAM53C plays more important roles in the G1/S transition; it will be also critical to understand what mechanisms may compensate for loss of FAM53C in cells, in culture and in vivo.

      (3) Authors state that FAM53C is a direct inhibitor of DYRK1A kinase activity (Line 203), however this model is not supported by the data in Fig 4A. FAM53C seems to be a good substrate of DYRK1 even at high concentrations when phosphorylations of cyclin D is reduced. It rather suggests that DYRK1 is not inhibited by FAM53C but perhaps FAM53C competes with cyclin D. Further, authors should address if the phosphorylation of cyclin D is responsible for the observed cell cycle phenotype. Is this Cyclin D-Thr286 phosphorylation, or are there other sites involved?

      We completely agree with the Reviewer that the functional interactions between FAM53C and DYRK1A will need to be explored further. Our data (and other data from mass spectrometry experiments in other contexts) support a model in which FAM53C binds to DYRK1A. Genetics analyses indicate that FAM53C is antagonistic to DYRK1A function. Our phosphorylation assays show decreased DYRK1A activity when FAM53C is present. Because our data also show that DYRK1A phosphorylates FAM53C, there may be more than one level of functional interaction between the two proteins, including effects by DYRK1A on FAM53C through its phosphorylation activity. We state in the text that our data suggest “that FAM53C may be a competitive substrate and/or an inhibitor of DYRK1A”, and we agree that we cannot provide a stronger conclusion at this point.

      We believe that genetic data from DepMap and our data support a model in which Cyclin D is downstream of FAM53C in its regulation of the G1/S progression. As discussed with Reviewer #1, it has proven challenging to investigate how FAM53C may control the phosphorylation and degradation of Cyclin D. Thr286 is certainly a critical phosphorylation site, and this residue can be phosphorylated by DYRK1A, but whether FAM53C and DYRK1A engage with other residues or domains is not known and should be the focus of future studies.

      (4) At many places, information on statistical tests is missing and SDs are not shown in the plots. For instance, what statistics was used in Fig 4C? Impact of FAM53C on cyclin D phosphorylation does not seem to be significant. In the same experiment, does DYRK1 inhibitor prevent modification of cyclin D?

      We thank the Reviewer for this comment. We made sure in the revised version to mention all the statistical tests used.

      (5) Validation of SM13797 compound in terms of specificity to DYRK1 was not performed.

      We provided tables in Figure S3 that summarize the biochemical characterization of this DYRK1A inhibitor (performed by Biosplice Therapeutics, where this compound was developed)

      (6) A fraction of cells in G1 is a very easy readout but it does not measure progression through the G1 phase. Extension of the S phase or G2 delay would indirectly also result in reduction of the G1 fraction. Instead, authors could measure the dynamics of entry to S phase in cells released from a G1 block or from mitotic shake off.

      This is an interesting point raised by the Reviewer. It is correct that we only performed a more in-depth characterization of cell cycle phenotypes in certain contexts (e.g., cell counting, EdU incorporation) (see Figures 1 and S1). It is possible that different cell types adapt differently to loss or overexpression of FAM53C, and assays to synchronize the cells, including by mitotic shake off, maybe useful in future experiments to further characterize the cell cycle of FAM53C mutant cells.

      Comments to the revised manuscript:

      In the revised version of the manuscript, authors addressed most of the critical points. They now include new data with depletion of FAM53C using single siRNAs that show small but significant enrichment of population of the G1 cells. This G1 arrest is likely caused by a combined effects on induction of p21 expression and decreased levels of cyclin D1. Authors observed that inhibition of DYRK1 rescued cyclin D1 levels in FAM53 depleted cells suggesting that FAM53C may inhibit DYRK1. This possibility is also supported by in vitro experiments. On the other hand, inhibition of DYRK1 did not rescue the G1 arrest upon depletion of FAM53C, suggesting that FAM53C may have also DYRK1-independent role in G1. Functional rescue experiments with cyclin D1 mutants and detection of DYRK1 activity in cells would be necessary to conclusively explain the function of FAM53C in progression through G1 phase but unfortunately these experiments were technically not possible. Knock out of FAM53C in iPSCs and in mice suggest that FAM53C may have additional functions besides the cell cycle control and/or that adaptation may have occurred in these model systems. Overall, the study implicated FAM53C in fine tuning DYRK1 activity in cells that may to some extent influence the progression through G1 phase. In addition, FAM53C may also have DYRK1 and cell cycle independent functions that remain to be addressed by future studies.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      All my minor points (6-11) were addressed adequately. No further comments.

      Reviewer #2 (Recommendations for the authors):

      The paper's conclusions would be substantially strengthened and the primary concern about off-target effects could be definitively resolved by performing one of the following two experiments:

      (1) Perform a rescue experiment. This would involve transfecting RPE-1 cells with an expression vector for an siRNA-resistant FAM53C cDNA (alongside a control vector) and then treating the cells with the FAM53C siRNAs. If the G1 arrest is a true on-target effect, the cells expressing the resistant cDNA should be "rescued" and continue to proliferate, while the control cells arrest. This is the most direct and standard way to validate a phenotype derived from siRNA.

      (2) Use an acute gene deletion approach that bypasses siRNAs entirely. The authors could use a lentiviral gRNA/Cas9 system to induce acute knockout of FAM53C in RPE-1 cells and assess the cell cycle phenotype at an early time point (e.g., 48-72 hours post-infection). This would provide a direct comparison to the acute siRNA knockdown, and if it recapitulates the strong G1 arrest, it would confirm the phenotype is due to FAM53C loss and not an artifact of the RNAi machinery. The current knockout models (iPSC, mice) are stable and long-term, which allows for the compensatory mechanism argument; an acute knockout would be a much stronger control. The authors could then also follow the fate of the cells and determine the nature of the suspected compensatory mechanisms.

      Addressing this central point is critical for the credibility of the proposed G1/S control element.

      As discussed above, the observations of similar phenotypes in four cell lines (RPE-1 cells and three cancer cell lines) using a pool of siRNAs and in cortical organoids derived from iPSCs using a knockout approach strongly support our results. But we agree that our current study has limitations, including the lack of genetic re-introduction of FAM53C in knock-down or mutant cells. We also note that strong genetic evidence points to a role for FAM53C at the G1/S transition. We hope that some of the readers will be excited by FAM53C as an understudied factor with possible critical roles in fundamental cell biology and human diseases, and future studies will continue to investigate its function in cells using additional approaches.

    1. eLife Assessment

      This important study investigates how differences in heart anatomy and electrical activity relate to observed patterns in ECG signals, with potential implications for understanding sex‑ and disease‑related variation. The study has several compelling strengths, including the development of an open-source pipeline for reconstruction and analysis of heart/torso geometry from a large cohort. However, the strength of evidence remains incomplete, as the conclusions rely heavily on linear modeling approaches whose assumptions are not fully validated, and for which the impact of model error and non‑linear interactions has not been rigorously quantified. The work will be of interest to researchers studying cardiovascular physiology and data‑driven modeling, but the main claims require stronger analytical support. In particular, it would benefit from a more robust evaluation of model uncertainty, clearer presentation of the mathematical framework, and comparison to alternative regression strategies that can better address collinearity and non‑linearity.

    1. eLife Assessment

      This paper provides solid electrophysiological evidence that an individual's effort expenditure increases the subjective value of a subsequent reward when the beneficiary is the individual themselves, but decreases the subjective value of a reward when the beneficiary is someone else. These findings have valuable implications for our understanding of how effort investment shapes reward evaluation during prosocial behavior.

    2. Reviewer #1 (Public review):

      Summary:

      The authors test the hypotheses, using an effort-exertion and an effort-based decision-making task, while recording brain dynamics with EEG, that the brain processes reward outcomes for effort differentially when they earned for themselves versus others.

      Strengths:

      The strengths of this experiment include what appears to be a novel finding of opposite signed effects of effort on the processing of reward outcomes when the recipient is self versus others. Also, the experiment is well-designed, the study seems sufficiently powered, and the data and code are publicly available.

      Weaknesses:

      There is some concern about the fact that participants report feeling less subjective effort, but also more disliking of tasks when they were earning rewards for others versus self. The concern is that participants worked with less vigor during self-versus-others trials and this may partly account for a key two-way Recipient x Effort interaction on the size of the Reward Positivity EEG component. Of note, participants took longer to complete tasks when working for others. While it is true that, in all cases, participants met the requisite task demands (they pressed the required number of buttons) they did so more sluggishly when earning rewards for others. The Authors argue that this reflects less motivation when working for others, which is a plausible explanation. The Authors also try to rule out this diminished vigor as a confounding explanation by showing that the two way interaction remains even when including reaction times (and also self-reported task liking) as a covariate. Nevertheless, it is possible that covariates do not fully account for the effects of differential motivation levels which would otherwise explain the two-way interaction. As such, I think a caveat is warranted regarding this particular result.

    3. Reviewer #2 (Public review):

      Summary:

      Measurements of the reward positivity, an electrophysiological component elicited during reward evaluation, have previously been used to understand how self-benefitting effort expenditure influences processing of rewards. The present study is the first to complement those measurements with electrophysiological reward after-effects of effort expenditure during prosocial acts. The results provide solid evidence that effort adds reward value when the recipient of the reward is the self but discounts reward value when the beneficiary is another individual.

      Strengths:

      An important strength of the study is that amount of effort, the prospective reward, the recipient of the reward, and whether the reward was actually gained or not were parametrically and orthogonally varied. In addition, the researchers examined whether the pattern of results generalized to decisions about future efforts. The sample size (N=40) and mixed-effects regression models are also appropriate for addressing the key research questions. Those conclusions are plausible and adequately supported by statistical analyses.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      The authors test the hypotheses, using an effort-exertion and an effort-based decision-making task, while recording brain dynamics with EEG, that the brain processes reward outcomes for effort differentially when they earned for themselves versus others.

      The strengths of this experiment include what appears to be a novel finding of opposite signed effects of effort on the processing of reward outcomes when the recipient is self versus others. Also, the experiment is well-designed, the study seems sufficiently powered, and the data and code are publicly available.

      We thank Reviewer #1 for the affirmative appraisal of our manuscript as well as the thoughtful and insightful comments, which have enabled us to significantly improve the manuscript.

      (1) Inferences rely heavily on the results of mixed effects models which may or may not be properly specified and are not supported by complementary analyses.

      We thank Reviewer #1 for raising this critical issue of model specification. We have re-fitted our mixed-effects models and performed complementary analyses to validate the robustness of our findings. Specifically, we adopted the maximal converging random-effects structure (including random slopes for Recipient, Effort, and Magnitude where feasible) while ensuring model stability (see Responses to Reviewer #1’s Recommendations point 2). Crucially, our primary findings, including the Recipient × Effort and Recipient × Effort × Magnitude interactions, remained robust. Furthermore, additional analyses confirmed that these results were not confounded by factors such as response speed and subjective effort rating (see Responses to Reviewer #1’s Recommendations point 5).

      (2) Also, not all results hang together in a sensible way. For example, participants report feeling less subjective effort, but also more disliking of tasks when they were earning rewards for others versus self. Given that participants took longer to complete tasks when earning effort for others, it is conceivable that participants might have been working less hard for others versus themselves, and this may complicate the interpretation of results.

      We thank Reviewer #1 for this insightful point (which also relates to Reviewer #3’s point 5). In our study, participants were asked to rate three specific dimensions: Effort (“How much effort did you exert to complete each effort condition when earning rewards for yourself [or the other person]?”), Difficulty (“How much difficulty did you perceive in each effort condition when earning rewards for yourself [or the other person]?”), and liking (“How much did you like each effort condition when earning rewards for yourself [or the other person]?”).

      We acknowledge the Reviewer #1’s concern that the lower subjective effort ratings for others seems contradictory to the higher disliking and longer completion times. We propose that in this paradigm, subjective effort ratings are susceptible to demand characteristics and likely captured motivational engagement (e.g., “how hard I tried” or “how willing I was”) rather than perceived task demands. To disentangle these factors, we included a measure of perceived task difficulty, which is anchored in task properties and is less prone to social desirability biases (Harmon-Jones et al., 2020; Wright et al., 1990). We found no differences in perceived difficulty between self- and other-benefiting trials (Figure 2D), suggesting that the task demands were perceived as equivalent across conditions. To examine this interpretation more directly, we analyzed correlations among participants’ ratings of difficulty, effort, and liking. As illustrated in Figure S1, we found no correlation between difficulty and effort ratings. Crucially, liking ratings were negatively correlated with difficulty ratings.

      More importantly, our performance data contradict the interpretation that participants “worked less hard” for others in terms of task completion. While participants took longer to complete tasks for others, they maintained comparable, near-ceiling success rates for self (97%) and other (96%) recipients (b = -0.46, p = 0.632; Supplementary Table S1). This dissociation suggests that although participants were less motivated (e.g., lower subjective ratings, longer completion times, and greater disliking) to work for others, they ultimately exerted the necessary physical effort to achieve successful outcomes. Thus, the results consistently point to a decrease in prosocial motivation (consistent with prosocial apathy) rather than a failure of effort exertion.

      Wright, R. A., Shaw, L. L., & Jones, C. R. (1990). Task demand and cardiovascular response magnitude: Further evidence of the mediating role of success importance. Journal of Personality and Social Psychology, 59(6), 1250-1260. https://doi.org/10.1037/0022-3514.59.6.1250

      Harmon-Jones, E., Willoughby, C., Paul, K., & Harmon-Jones, C. (2020). The effect of perceived effort and perceived control on reward valuation: Using the reward positivity to test a dissonance theory prediction. Biological Psychology, 107910. https://doi.org/10.1016/j.biopsycho.2020.107910

      Reviewer #2 (Public review):

      Measurements of the reward positivity, an electrophysiological component elicited during reward evaluation, have previously been used to understand how self-benefitting effort expenditure influences the processing of rewards. The present study is the first to complement those measurements with electrophysiological reward after-effects of effort expenditure during prosocial acts. The results provide solid evidence that effort adds reward value when the recipient of the reward is the self but discounts reward value when the beneficiary is another individual.

      An important strength of the study is that the amount of effort, the prospective reward, the recipient of the reward, and whether the reward was actually gained or not were parametrically and orthogonally varied. In addition, the researchers examined whether the pattern of results generalized to decisions about future efforts. The sample size (N=40) and mixed-effects regression models are also appropriate for addressing the key research questions. Those conclusions are plausible and adequately supported by statistical analyses.

      We appreciate Reviewer #2’s positive appraisal of our manuscript. We are fortunate to receive your thoughtful and insightful suggestions and have revised the manuscript accordingly.

      (1) Although the obtained results are highly plausible, I am concerned whether the reward positivity (RewP) and P3 were adequately measured. The RewP and P3 were defined as the average voltage values in the time intervals 300-400 ms and 300-440 ms after feedback onset, respectively. So they largely overlapped in time. Although the RewP measure was based on frontocentral electrodes (FC3, FCz, and FC4) and the P3 on posterior electrodes (P3, Pz, and P4), the scalp topographies in Figure 3 show that the RewP effects were larger at the posterior electrodes used for the P3 than at frontocentral electrodes. So there is a concern that the RewP and P3 were not independently measured. This type of problem can often be resolved using a spatiotemporal principal component analysis. My faith in the conclusions drawn would be further strengthened if the researchers extracted separate principal components for the RewP and P3 and performed their statistical analyses on the corresponding factor scores.

      We thank Reviewer #2 for raising this issue. We would like to clarify that these two components were time-locked to different types of feedback and therefore reflect neural responses to distinct stages of the prosocial effort task. Specifically, the P3 was time-locked to performance feedback (the effort-completion cue; e.g., the tick shown in Figure 1B), whereas the RewP was time-locked to reward feedback (e.g., the display of “+0.6”). Thus, despite the numerical similarity in the post-stimulus windows, the components capture neural activity evoked by independent events separated in time, corresponding to the performance monitoring versus reward evaluation stages of the task. To avoid misunderstanding, we have made this distinction more explicit in the revised manuscript, which now reads, “Single-trial RewP amplitude was measured as mean voltage from 300 to 400 ms relative to reward feedback onset (i.e., reward delivery) over frontocentral channels (FC3, FCz, FC4). We also measured the parietal P3 (300–440 ms; averaged across P3, Pz, and P4) in response to performance feedback (i.e., effort completion), given its relationship with motivational salience (Bowyer et al., 2021; Ma et al., 2014)” (page 27, para. 1, lines 2–6).

      Reviewer #3 (Public review):

      This study investigates how effort influences reward evaluation during prosocial behaviour using EEG and experimental tasks manipulating effort and rewards for self and others. Results reveal a dissociable effect: for self-benefitting effort, rewards are evaluated more positively as effort increases, while for other-benefitting effort, rewards are evaluated less positively with higher effort. This dissociation, driven by reward system activation and independent of performance, provides new insights into the neural mechanisms of effort and reward in prosocial contexts.

      This work makes a valuable contribution to the prosocial behaviour literature by addressing areas that previous research has largely overlooked. It highlights the paradoxical effect of effort on reward evaluation and opens new avenues for investigating the mechanisms underlying this phenomenon. The study employs well-established tasks with robust replication in the literature and innovatively incorporates ERPs to examine effort-based prosocial decision-making - an area insufficiently explored in prior work. Moreover, the analyses are rigorous and grounded in established methodologies, further enhancing the study's credibility. These elements collectively underscore the study's significance in advancing our understanding of effort-based decision-making.

      We thank Reviewer #3 for the positive assessment. We are particularly encouraged by the reviewer’s recognition of our novel integration of ERPs to uncover the distinct effects of effort on reward evaluation for self versus others. We have carefully addressed the specific recommendations raised in the subsequent comments to further strengthen the rigor and clarity of the manuscript.

      (1) Incomplete EEG Reporting: The methods indicate that EEG activity was recorded for both tasks; however, the manuscript reports EEG results only for the first task, omitting the decision-making task. If the authors claim a paradoxical effect of effort on self versus other rewards, as revealed by the RewP component, this should also be confirmed with results from the decision-making task. Omitting these findings weakens the overall argument.

      We thank Reviewer #3 for giving us the opportunity to verify the specific roles of our two tasks. The primary aim of our study is to elucidate the neural after-effects of effort exertion on subsequent reward evaluation during prosocial acts. The prosocial effort task was specifically designed for this purpose, as it involves actual effort expenditure followed by reward outcomes. Furthermore, this task uses preset effort-reward combinations, ensuring balanced trial counts and adequate signal-to-noise ratios across conditions, a critical requirement for robust ERP analysis. In contrast, the prosocial decision-making task was included specifically to quantify behavioral preference (i.e., prosocial effort discounting) rather than neural reward processing. Specifically, this task involves choices without immediate effort execution and reward feedback, making it impossible to examine the neural after-effects of effort exertion. However, the decision-making task remains indispensable for our study structure: it provides an independent behavioral phenomenon of prosocial apathy, which allowed us to link individual differences in behavioral motivation to the neural dissociations observed in the prosocial effort tasks (as detailed in our Responses to Reviewer #3’s 2). Thus, the two tasks provide complementary, rather than redundant, insights into the behavioral and neural mechanism of prosocial effort.

      (2) Neural and Behavioural Integration: The neural results should be contrasted with behavioural data both within and between tasks. Specifically, the manuscript could examine whether neural responses predict performance within each task and whether neural and behavioural signals correlate across tasks. This integration would provide a more comprehensive understanding of the mechanisms at play.

      We thank Reviewer #3 for this insightful and helpful suggestion. We agree that linking neural signatures with behavioral patterns is crucial for establishing the functional significance for our ERP findings. Regarding within-task association, it is important to note that the prosocial effort task was designed to require participants to exert fixed, preset levels of physical effort to earn uncertain rewards. This experimental control was necessary to standardize effort exertion across self-benefiting and other benefiting trials, thereby minimizing confounds such as differences in physical or perceived effort prior to the feedback phase. Indeed, the neural after-effects remained after controlling for these behavioral measures (i.e., response speed and self-reported effort; as detailed in responses to Reviewer #1’Recommendations point 5). Furthermore, unlike the prosocial effort task, the decision-making task inherently precludes the examination of the neural after-effects of effort; therefore, within-task association in this task was not possible.

      Given these considerations, we focused on the cross-task association. We examined whether the neural after-effects of effort (indexed by the RewP) in the prosocial effort task were modulated by individual differences in effort discounting. We used the K value estimated from the prosocial decision-making task as the index of effort discounting. We entered the K value (log-transformed and z-scored) as a continuous predictor into the mixed-effects models of RewP amplitudes. The full regression estimates for the model are presented in Table S1 (left).

      We observed a significant four-way interaction among recipient, effort, magnitude, and K value (b = 0.58, p = 0.013). To decompose this complex interaction, we performed simple slopes analyses separately for self- and other-benefiting trials at high and low levels of reward magnitude and discounting rate (±1 SD). As shown in Figure S2, for self-benefiting trials, the effort-enhancement effect on the RewP was significant only for participants with high discounting rates at low reward magnitude (b = 1.02, 95% CI = [0.22, 1.82], p = 0.012). In contrast, participants with low discounting rates exhibited no significant effort effect (b = -0.37, 95% CI = [-0.89, 0.15], p = 0.159). At high reward magnitude, simple slopes analyses detected no significant effort effects for either high (b = 0.35, 95% CI = [-0.44, 1.14], p = 0.383) or low (b = 0.45, 95% CI = [-0.07, 0.97], p = 0.093) discounting individuals. These findings strongly support the cognitive dissonance account (Aronson & Mills, 1959): those who find effort most aversive are most compelled to inflate the value of small rewards to justify their exertion. For these individuals, the completion of a costly action for a small reward may trigger a stronger internal justification effect, resulting in an amplified neural reward response.

      For other-benefiting trials, participants with low discounting rates exhibited a significant effort-discounting effect at high reward magnitude (b = -0.97, 95% CI = [-1.74, -0.20], p = 0.014). In contrast, no significant effort effects were observed for participants with high discounting rates at either high (b = -0.45, 95% CI = [-0.97, 0.08], p = 0.098) or low (b = -0.16, 95% CI = [-0.69, 0.38], p = 0.564) reward magnitudes, nor for participants with low discounting rates at low reward magnitude (b = 0.14, 95% CI = [-0.64, 0.92], p = 0.729). These results suggest that the justification mechanism observed for self-benefiting effort appears absent for other-benefiting effort. Instead, we observed a persistent effort discounting before, during, and after effort expenditure, which was most pronounced in individuals with low effort sensitivity (low K) when reward magnitude was high. This seemingly paradoxical pattern might be interpreted through the lens of disadvantageous inequity aversion (Fehr & Schmidt, 1999). Specifically, the combination of high personal effort and high monetary reward for another person creates a salient disparity between the participant’s incurred cost and the recipient’s gain. Although low-K individuals are behaviorally willing to tolerate this cost, their neural valuation system may nonetheless track the “unfairness” of this asymmetry, thereby attenuating the neural reward signal (Tricomi et al., 2010). These insights suggest that facilitating prosocial behavior may require not just lowering costs, but potentially framing outcomes to trigger the effort justification mechanisms that drive the effort paradox observed in self-benefiting acts (Inzlicht & Campbell, 2022).

      To confirm this four-way interaction, we also replaced the high-effort choice proportions in the decision-making task and observed a similar four-way interaction among recipient, effort, magnitude, and high-effort choice proportions (b = -0.58, p = 0.014; see Table S1 for detailed regression estimates). Together, this cross-task analysis not only provides a more comprehensive understanding of the mechanisms at play but also justifies the inclusion of the prosocial decision-making task. We sincerely thank Reviewer #3’ for this valuable suggestion, which has significantly strengthened our manuscript. We have included this analysis (page 16, para. 2; page 17, paras. 1–2) and discussed the results (page 20, para. 2, lines 10–15; page 20, para. 3; page 21, para. 1, lines 1–8) in the revised manuscript.

      Aronson, E., & Mills, J. (1959). The effect of severity of initiation on liking for a group. The Journal of Abnormal and Social Psychology, 59(2), 177-181. https://doi.org/10.1037/h0047195

      Fehr, E., & Schmidt, K. M. (1999). A theory of fairness, competition, and cooperation. The Quarterly Journal of Economics, 114(3), 817-868. http://www.jstor.org/stable/2586885

      Tricomi, E., Rangel, A., Camerer, C. F., & O'Doherty, J. P. (2010). Neural evidence for inequality-averse social preferences. Nature, 463(7284), 1089-1091. https://doi.org/10.1038/nature08785

      (3) Success Rate and Model Structure: The manuscript does not clearly report the success rate in the prosocial effort task. If success rates are low, risk aversion could confound the results. Additionally, it is unclear whether the models accounted for successful versus unsuccessful trials or whether success was included as a covariate. If this information is present, it needs to be explicitly clarified. The exclusion criteria for unsuccessful trials in both tasks should also be detailed. Moreover, the decision to exclude electrodes as independent variables in the models warrants an explanation.

      We appreciate the opportunity to clarify these points. In the revised manuscript, we have now explicitly reported the descriptive statistics and the results of a mixed-effects logistic model on response success in the revised manuscript (page 8, para. 1, lines 2–4; Supplementary Table S1). Participants achieved similarly high success rates in both self (M = 97%) and other trials (M = 96%; Figure S3). As shown in Table S2, success rates decreased as effort increased (b = -4.77, p < 0.001). However, no other effects reached significance (ps > 0.245). These near-ceiling success rates indicate strong task engagement and effectively rule out risk aversion as a potential confound.

      Regarding model structure, we excluded unsuccessful trials from statistical analyses because they were rare and distributed equally across conditions. Given the near-ceiling performance, we did not include success rate as a covariate, as it offers limited variance.

      Finally, we did not include electrodes as an independent variable because our hypotheses focused on condition effects rather than topographic differences. Following established research (e.g., Krigolson, 2018; Proudfit, 2015), we averaged RewP amplitudes across a frontocentral cluster (FC3, FCz, and FC4) and P3 amplitudes across a parietal cluster (P3, Pz, and P4), where activity is typically maximal. Averaging across these theoretically grounded clusters improves the signal-to-noise ratio and provides more reliable estimates of the underlying components. We have explicitly included this rationale in the revised manuscript, which reads, “Data were averaged across the selected electrode clusters to improve signal-to-noise ratio and reliability” (page 27, para. 1, lines 9–10).

      Proudfit, G. H. (2015). The reward positivity: From basic research on reward to a biomarker for depression. Psychophysiology, 52(4), 449-459. https://doi.org/10.1111/psyp.12370

      Krigolson, O. E. (2018). Event-related brain potentials and the study of reward processing: Methodological considerations. Int J Psychophysiol, 132(Pt B), 175-183. https://doi.org/10.1016/j.ijpsycho.2017.11.007

      (4) Prosocial Decision Computational Modelling: The prosocial decision task largely replicates prior behavioural findings but misses the opportunity to directly test the hypotheses derived from neural data in the prosocial effort task. If the authors propose a paradoxical effect of effort on self-rewards and an inverse effect for prosocial effort, this could be formalised in a computational model. A model comparison could evaluate the proposed mechanism against alternative theories, incorporating the complex interplay of effort and reward for self and others. Furthermore, these parameters should be correlated with neural signals, adding a critical layer of evidence to the claims. As it is, the inclusion of the prosocial decision task seems irrelevant.

      We thank Reviewer #3 for this thoughtful suggestion regarding the value of computational modelling. We fully agree that formalizing mechanisms is crucial, but we would like to clarify why a computational model of decision-making cannot directly capture the paradoxical after-effects observed in our neural data. The paradoxical after-effect of effort exertion we report refers to experienced utility (i.e., how prior costs modulate the hedonic consumption of a reward), whereas the decision task measures decision utility (i.e., how prospective costs and benefits are integrated to guide choice). We included the prosocial decision task to establish a behavioral baseline and replicate the well-documented phenomenon of prosocial apathy. Consistent with prior work (e.g., Lockwood et al., 2017; Lockwood et al., 2022), our data show that at the decision stage (ex-ante), effort functions as a universal cost: participants discounted rewards for both self and others, differing only quantitatively (steeper discounting for others). It is only after effort is exerted (ex-post) that the pattern reverses: effort is valued for self but remains costly for others, representing a qualitative shift. Crucially, incorporating a "paradoxical valuation" parameter (i.e., effort as a reward) into our decision model would mathematically contradict the behavioral reality. Since participants actively avoided high-effort options, a model assuming effort adds value might fail to fit the choice data. The theoretical novelty of our study lies precisely in this temporal dissociation: whereas self-benefiting effort paradoxically enhances reward valuation, other-benefiting effort induces a persistent reward devaluation.

      To address the reviewer’s interest in bridging these two domains, we examined whether these distinct stages are linked at the level of individual differences. We hypothesized that an individual’s sensitivity to prospective effort cost (discounting rate K) might modulate their susceptibility to the retrospective neural after-effect. As detailed in our Responses to Reviewer #3’s point 2, we found that for self-benefiting trials, high-discounting individuals showed an effort-enhancement effect on the RewP at low reward magnitude, while for other-benefiting trials, low-discounting individuals exhibited effort-discounting effects at high reward magnitude. We sincerely thank Reviewer #3’ for this valuable suggestion, which has successfully correlated the two tasks and facilitated our understanding of the mechanisms at play.

      Lockwood, P. L., Hamonet, M., Zhang, S. H., Ratnavel, A., Salmony, F. U., Husain, M., & Apps, M. A. J. (2017). Prosocial apathy for helping others when effort is required. Nat Hum Behav, 1(7), 0131. https://doi.org/10.1038/s41562-017-0131.

      Lockwood, P. L., Wittmann, M. K., Nili, H., Matsumoto-Ryan, M., Abdurahman, A., Cutler, J., Husain, M., & Apps, M. A. J. (2022). Distinct neural representations for prosocial and self-benefiting effort. Curr Biol, 32(19), 4172-4185 e4177. https://doi.org/10.1016/j.cub.2022.08.010.

      (5) Contradiction Between Effort Perception and Neural Results: Participants reported effort as less effortful in the prosocial condition compared to the self condition, which seems contradictory to the neural findings and the authors' interpretation. If effort has a discounting effect on rewards for others, one might expect it to feel more effortful. How do the authors reconcile these results? Additionally, the relationship between behavioural data and neural responses should be examined to clarify these inconsistencies.

      This point aligns with the issues raised in Reviewer #1’s point 2. We acknowledge the apparent discrepancy between lower reported effort in the prosocial condition and the neural discounting effect. As detailed in our Responses to Reviewer #1’s point 2, we reconcile this by proposing that subjective effort ratings in this paradigm likely reflect motivational engagement (e.g., “how hard I tried” or “how willing I was”) rather than perceived task demands. Under this interpretation, the lower effort ratings for others reflect a withdrawal of engagement (consistent with prosocial apathy), which conceptually aligns with, rather than contradicts, the neural discounting effect. To validate this, we contrasted effort ratings with difficulty ratings (a more reliable index of objective demand). Our correlational analysis revealed no significant relationship between difficulty and effort ratings (r = -0.21, p = 0.196), suggesting that they capture distinct constructs. Furthermore, liking ratings were negatively correlated with difficulty ratings (r = -0.43, p = 0.011) but not with effort ratings (r = 0.32, p = 0.061), further dissociating the two measures. Crucially, as detailed in our Responses to Reviewer #1’s Recommendations point 5, our RewP effects remained significant even after controlling for individual effort ratings. This demonstrates that the neural effort-discounting effect for others is a physiological signature that operates independently of the subjective report bias.

      (6) Necessary Revisions to Manuscript: If the authors address the issues above, corresponding updates to the introduction and discussion sections could strengthen the narrative and align the manuscript with the additional analyses.

      We thank Reviewer #3 for the above insightful and helpful comments. We have carefully addressed these issues raised above and have updated the manuscript accordingly, including abstract, introduction, result, and discussion sections.

      Recommendations for the Authors:

      Reviewer #1 (Recommendations for the authors):

      Major comments:

      (1) The two biggest concerns I have are

      - Whether the mixed-effect models are properly specified, and

      - Whether the main interaction between the Recipient and effort on the reward positivity (RewP) reflects different levels of effort exertion when working for self versus others.

      We thank Reviewer #1 for identifying these two critical issues. We have carefully considered these points and conducted additional analyses to address them. Below, we provide a detailed response to each concern, explaining how we have improved the model specification and ruled out alternative interpretations regarding effort exertion.

      (2) On the first point, I noticed that the authors selectively excluded random effects for Effort and Magnitude when regressing RewP on Effort, Magnitude, Recipient, and Valence. This is important because the key result in the paper is a fixed effect two-way interaction between Recipient and Effort and a three-way interaction between Recipient, Effort, and Magnitude. It is not clear that these results will remain significant when Effort and Magnitude are included as random effects in the model. Thus the authors should justify their exclusion as random effects, and/or show that the results don't depend on including those random effects in the model. The same logic applies to the specification of other mixed effects models (e.g. the effect of Magnitude in the model predicting RTs).

      We thank Reviewer #1 for raising this important methodological point. We fully agree that including random slopes wherever possible reduces Type 1 error rates and yields more conservative tests of fixed effects. In our analyses, we determined the random effects structure for each model using singular value decomposition (SVD). Specifically, we began with a maximal model that included by-participant random slopes for all main effects and interactions as well as a participant-level random intercept. When the model failed to converge or yielded a singular fit, we applied SVD to identify redundant dimensions (i.e., components explaining zero variance) and iteratively removed these terms until convergence was achieved. This procedure allowed us to retain the maximal converging random-effects structure while ensuring model stability. We have clarified this procedure in the revised manuscript as follows, “For each model, we fitted the maximal random-effects structure and, when the model was overparameterized, used singular value decomposition to simplify the random-effects structure until the model converged” (page 28, para. 1, lines 5–8).

      Regarding the RewP model, including all variables (i.e., Recipient, Effort, Magnitude, and Valence) in the random-effects structure resulted in a boundary (singular) fit. Examination of the variance-covariance structure of the random effects revealed that the random slopes for Valence and Magnitude were perfectly negatively correlated (r = -1.00), indicating severe overparameterization. In our original submission, we removed the random slopes for Effort and Magnitude because the SVD analysis indicated redundant dimensions in the model structure.

      However, we agree with the Reviewer that retaining slopes for variables involved in key interactions is crucial. Therefore, we re-evaluated the model strategy: instead of removing Effort and Magnitude, we removed the random slope for Valence (which was the primary source of the perfect correlation). This modification successfully resolved the singularity while allowing us to retain the random slopes for the critical variables (i.e., Effort and Magnitude).

      Critically, this updated model yielded the same pattern of results as our original submission: the two-way interaction between Recipient and Effort and the three-way interaction between Recipient, Effort, and Magnitude remained significant (see Table S3). As expected, including the random slopes for Effort and Magnitude yielded a more conservative test of the fixed effects. While the critical three-way interaction remained significant (p = 0.019), the simple slope for the Self condition at high reward magnitude shifted slightly from significant (p = 0.041) to marginally significant (p = 0.056). However, the effect size remained largely unchanged (b = 0.42 vs. original b = 0.43), and the dissociation pattern, where self-benefiting trials show a positive trend while other-benefiting trials show a significant negative slope, remains robust and is statistically supported by the significant interaction. We have adopted this updated model in the revised manuscript and updated the relevant sections accordingly. Finally, note that we have removed the RewP table from the Supplementary Materials because the RewP model results are now presented as a figure in the main text (as suggested by Reviewer #1’s Recommendations point 3).

      We have also carefully verified the random effects structures for other mixed-effects models, including the RT and Performance-P3 models in the prosocial effort task, as well as the decision time and decision choice models in the prosocial decision-making task. The updated information is detailed as follows:

      Regarding the RT model, we replaced it with a more reasonable model of response speed (button presses per second), as suggested by Reviewer #1 (see our responses to Reviewer #1’s Recommendations point 4 for details).

      Regarding the performance-P3 model, the random-effects structure could only support Effort, as in our original submission; thus, the results remain unchanged.

      Regarding the decision time model, we have updated our results to include the quadratic effort term, as suggested by Reviewer #1 (see our responses to Reviewer #1’s Recommendations point 6 for details).

      Regarding the decision choice model, we included Recipient, Effort, and Magnitude in the random-effects structure. As shown in Table S4, the results remain largely consistent with the original model, except for a newly significant interaction between effort and magnitude. Follow-up simple slopes analyses revealed that the discounted effect of effort was more pronounced at low reward magnitude (M − 1SD: b = -2.69, 95% CI = [-3.09, -2.29], p < 0.001) than at high reward magnitude (M + 1SD: b = -2.38, 95% CI = [-2.82, -1.94],p < 0.001).

      In summary, we have improved the model specification following Reviewer #1’s suggestion. Crucially, the results remain qualitatively consistent with our original findings. We have updated the Results section, figures (Figures 2, 4, and 5), and OSF documents (including a new R Markdown file and an HTML output file detailing the final results) to reflect these analyses. Additionally, we have explicitly stated the method used for calculating p-values in the mixed-effects models (page 28, para. 1, lines 8–10), which was omitted in the original submission.

      (3) Regarding the mixed models, it would also be good to show a graphical depiction summarizing key effects (e.g. the Recipient by Effort interaction on RewP) rather than just showing the predictions of the fitted mixed effects models.

      This point is well-taken. Please see Figure S4, which visualizes the key effects and has now been included in the revised manuscript as Figure 4A.

      (4) Finally, regarding the mixed effect models of RTs - given the common finding that RTs are not normally distributed, the Authors might be better off regressing 1/RT (interpreted as speed rather than latency) since 1/RT will often make distributions less asymmetric and heavy-tailed.

      We thank Reviewer #1 for this helpful suggestion regarding data distribution. In our original analysis, the dependent variable was “completion time” (i.e., the latency to complete the required button presses with the 6-s window). We agree that these raw latency data exhibited characteristic non-normality (see Figure S5, Left). Based on Reviewer #1’s suggestion, we adopted “response speed” (calculated as button presses per second) as the dependent variable. As expected, this transformation substantially improved the normality of the distribution (see Figure S5, Right). We have refitted the mixed-effects model using this speed metric. Critically, the results largely replicated the patterns observed in our original model, with the exception that the main effect of reward magnitude did not reach significance in the speed model (see Table 5). Given the superior distributional properties of the speed metric, we have replaced the original latency analysis with the response speed model in the revised manuscript. We have updated the Results section (page 8, para. 1, lines 4–9) and Figures 2B–C accordingly.

      (5) Regarding the level of effort exerted, there are two reasons to suspect that participants exerted less for others versus themselves. The first is that they were slower to complete the button pressing for others versus themselves. The second is that they reported paradoxically less subjective effort for others versus self (paradoxical because they also reported liking the task less for others versus self). The explanation for both may be that they exerted less effort for others versus self and this has important implications for interpreting the main effects. If they exerted less effort for others, this may partly account for the key Recipient:Effort and Recipient:Effort:Magnitude interactions in the mixed effects regression of RewP. Do either median effort durations or self-reported effort predict the magnitude of the Recipient:Effort and Recipient:Effort:Magnitude interactions (if these were included as random effects)? If so, that would provide evidence supporting this story. Alternatively, if median durations or self-reported effort were included as covariates, do these interactions still obtain? In any case, the Authors should include caveats regarding this potential explanation of the self-versus-other interactions with effort and magnitude on the RewP" (or explain why this can not explain the interactions).

      We thank Reviewer #1 for raising this important interpretational issue. We acknowledge the concern that differences in physical exertion or perceived effort could potentially confound the neural findings. However, we argue that the observed RewP effects are not driven by these factors for several reasons.

      First, the prosocial effort task enforced fixed effort thresholds (10%–90% of their maximum effort level) across self-benefiting and other-benefiting trials. Importantly, participants achieved ceiling-level success rates that were highly comparable between self-benefiting (97%) and other-benefiting (96%) trials, indicating that they successfully exerted the required effort across conditions.

      Second, regarding the slower response speed for others (we used response speed instead of completion time, as the former is more suitable for statistical analysis; see details in Responses to Reviewer #1’s Recommendations point 4), we interpret this as a reduction in motivation rather than a reduction in the amount of effort exerted. Similarly, as detailed in our Responses to Reviewer#1’s point 2, subjective effort ratings in this paradigm appear to be influenced by demand characteristics and do not reliably track physical exertion. For instance, liking ratings were associated with difficulty (r = -0.43, p = 0.011) instead of effort (r = 0.32, p = 0.061) ratings.

      To empirically rule out the possibility that these behavioral differences account for the neural effect, we followed the reviewer’s suggestion and re-ran the mixed-effects model predicting RewP amplitudes with trial-by-trial response speed and subjective effort rating included as covariates. These control analyses revealed that neither response speed (b = -0.07, p = 0.614) nor self-reported effort (b = 0.10, p = 0.186) significantly predicted RewP amplitudes (see Table S6). Most importantly, the key interactions of interest (Recipient × Effort and Recipient × Effort × Magnitude) remained significant and virtually unchanged. These findings suggest that the observed neural after-effects of prosocial effort are not driven by variations in motor execution or perceived effort.

      Minor comments:

      (6) In Figure 5A a quadratic effect (not a linear effect) seems fairly obvious in decision times as a function of effort level. This makes sense given that participants are close to indifference, on average, around the 50-70% effort level. I recommend fitting a model that has a quadratic predictor and not just a linear predictor when regression decision times on effort levels.

      We thank Reviewer #1 for this insightful suggestion. We agree that decision times likely track decision conflict, which typically peaks near indifference points (e.g., moderate effort levels). Accordingly, we reanalyzed the decision time data using a mixed-effects model that included both linear and quadratic terms for effort. As detailed in Table S7, this analysis revealed a significant quadratic main effect of effort, which was further qualified by a significant interaction between the quadratic effort term and reward magnitude. Decomposition of this interaction (Figure S6) revealed that the quadratic effort effect was more pronounced at low reward magnitude (M − 1SD: b = -160.10, 95% CI = [-218.30, -101.90], p < 0.001) than at high reward magnitude (M + 1SD: b = -99.50, 95% CI = [-157.60, -41.40], p = 0.001). However, we found no significant interactions involving the quadratic effort term and recipient. We have updated the Results section (page 13, para. 2; page 14, para. 1) and Figures 5A–B (right panel) to reflect these findings.

      (7) The distinction between the effort and decision-making tasks wasn't super clear from the main text. A sentence early on in the results section could be useful for readers' understanding.

      This point is well taken. In the revised manuscript, we have clarified this distinction at the beginning of the Results section (page 6, para. 2, lines 1–10). In addition, we have explicitly indicated the corresponding task within each subsection heading in the Results:

      “2.1 Investing effort for others is less motivating than for self in the prosocial effort task” (page 7)

      “2.2 Effort adds reward value for self but discounts reward value for others in the prosocial effort task” (page 9)

      “2.3 Reward is devalued by effort to a higher degree for others than for self in the prosocial decision-making task” (page 13)

      (8) To what does "three trials" refer to on lines 143-144?

      Thank you for raising this point. Participants completed three trials in which they were asked to press a button as rapidly as possible with their non-dominant pinky finger for 6000 ms. The maximum effort level was operationalized as the average button-press count across the three trials. To improve clarity, we have also provided more detailed description in the Results section, which reads: “The mean maximum effort level (i.e., the average button-press count across three 6000-ms trials; see Procedure for details) ….” (page 7, para. 1, lines 1–2).

      (9) It is unclear how the authors select their time windows for ERP analyses.

      We thank Reviewer #1 for this comment. Measurement parameters (i.e., time windows and channel sites) were determined based on the grand-averaged ERP waveforms and topographic maps collapsed across all conditions. This procedure is orthogonal to the conditions of interest and prevents bias in the selection of measurement windows and channels, consistent with the “orthogonal selection approach” (Luck & Gaspelin, 2017). We have clarified this point in the revised manuscript, which now reads, “Measurement parameters (time windows and channel sites) were determined from the grand-averaged ERP waveforms and topographic maps collapsed across all conditions, which was thus orthogonal to the conditions of interest (Luck & Gaspelin, 2017)” (page 27, para. 1, lines 6–9).

      Luck, S., & Gaspelin, N. (2017). How to get statistically significant effects in any ERP experiment (and why you shouldn't). Psychophysiology, 54(1), 146-157.

      (10) There are a few typos throughout. For example, Line 124 should read "other half benefitted...", Line 127 should read "interest at each effort level...", "following" on Line 369, and Supplemental table titles incorrectly spell the word "Results".

      We thank Reviewer #1 for catching these errors. We have corrected all the specific typos noted (page 6, para. 2, lines 11 and 15; page 22, para. 3, line 2; Supplementary Table S2). Furthermore, we have conducted a thorough proofreading of the entire text and supplementary materials to ensure linguistic accuracy and consistency throughout the manuscript.

      Reviewer #2 (Recommendations for the authors):

      Minor comments:

      (1) Lines 84-86. "The RewP ... has its neural sources in the anterior cingulate cortex (Gehring & Willoughby, 2002) and ventral striatum (Foti et al., 2011)." This is a better reference for the ACC source: https://pubmed.ncbi.nlm.nih.gov/23973408/. And perhaps remove the reference to the ventral striatum; most people would agree that activity in the ventral striatum cannot be measured with scalp EEG.

      We thank Reviewer #2 for providing the updated reference, which has been cited in the revised manuscript. We agree that activity in the VS cannot be reliably measured with scalp EEG and thus have removed the reference to the VS. The revised sentence now reads, “… has its neural sources in the anterior cingulate cortex (Gehring & Willoughby, 2002; Hauser et al., 2014)” (page 4, para. 2, lines 12–13).

      (2) Lines 152-153. What exactly is shown in Figure 2A? How did the authors average across subjects?

      We thank Reviewer #2 for raising this issue. Figure 2A depicts the distribution of the maximum effort level, defined as the average button-press count across three 6000-ms trials completed before the prosocial effort task. In these trials, participants were instructed to press the button as rapidly as possible with their non-dominant pinky fingers. To improve clarity, we have revised the figure caption as: “(A) Distribution of the maximum effort level (i.e., the average button-press count across three 6000-ms trials) across participants” (Figure 2).

      (3) Lines 160-164. "As expected (Figure 2D), participants perceived increased effort as more difficult ... and more disliking (b = -0.62, p < 0.001) when the beneficiary was others than themselves." Does this sentence describe the main effect of the beneficiary or the interaction between beneficiary and effort level, as the start of the sentence ("increased effort") suggests?

      We thank Reviewer #2 for pointing out this ambiguity. The sentence describes the main effect of beneficiary rather than the interaction between beneficiary and effort level. In the revised manuscript, we have rephrased the sentence as: “They felt less effort (b = -0.32, p = 0.019) and more disliking (b = -0.62, p = 0.001) for other-benefiting trials compared to self-benefiting trials” (page 9, para. 1, lines 4–6).

      (4) Lines 195-196. "..., we conducted post-hoc simple slopes analyses at -1 SD ("Low") and + SD ("High") reward magnitude." I did not understand what the authors meant with these reward magnitudes, given that the actual potential rewards were ¥0.2, ¥0.4, ¥0.6, ¥0.8, and ¥1.0.

      In our analyses, the actual reward magnitudes (¥0.2, ¥0.4, ¥0.6, ¥0.8, and ¥1.0) were z-scored and entered as a continuous regressor in the mixed-effects models. Post-hoc simple slopes analyses were then conducted at ±1 SD from the mean of the z-scored reward magnitude. To clarify, we have revised the sentence as “… we conducted post-hoc simple slopes analyses at 1 standard deviation (SD) below (“Low”) and above (“High”) the mean reward magnitude” (page 11, para. 2, lines 8–9). This standard method for testing simple effects for continuous predictors is recommended by Aiken and West (1991). Aiken, L. S., West, S. G., & Reno, R. R. (1991). Multiple regression: Testing and interpreting interactions. Sage.

      (5) Lines 253 and 275. I would not call this a computational model. The authors fit a curve to data, there is no model of the computations involved.

      This point is well taken. We have replaced “computational model” with “discounting” (Figure 5) and “parabolic discounting model” (page 15, para. 1, line 15).

      (6) Line 710. Figure S1 does not show topographic maps of the P3, as the figure caption suggests.

      We thank Reviewer #2 for identifying this oversight. We have now included topographic maps of the P3 in Figure S1.

      (7) Please check language in lines 33 (effect between), 38 (shape), 49 (highest cost form?), 74 (tunning), 90 (omit following), 127 (interest on at each effort level), 135 (press buttons >> rapidly press a button?), 142 (motivated), 219 (should low be high?), 265-266 (missing word), 275 (confirmed by following), 292 (an action can be effortful, a feeling cannot), 315 (when it comes into), 330-331 (data is plural; the aftereffect of prosocial effect), 387 (interest on at each effort level), 405 (should quickly be often?).

      We thank Reviewer #2 for the careful review and feedback about these language issues. We have revised all the phrasing you identified. The corrections are as follows:

      Line 33: “effect between” has been changed to “effects for” (page 2, para. 1, line 6).

      Line 38: “shape” has been updated to “shapes” (page 2, para. 1, line 13).

      Line 49: “highest cost form?” has been revised to “the most common cost type” (page 3, para. 1, lines 7–8).

      Line 74: “tunning” has been corrected to “tuning” (page 4, para. 2, line 1).

      Line 90: omit following. Done (page 5, para. 1, line 2).

      Line 127: “interest on at each effort level” has been corrected to “liking for each effort level” (page 6, para. 2, line 15).

      Line 135: “press buttons” has been updated to “rapidly press a button” (the caption of Figure 1).

      Line 142: “motivated” has been revised to “motivating” (page 7).

      Line 219: should low be high? Yes, we have corrected this (the caption of Figure 4).

      Lines 265–266: The missing word “with” has been inserted (page 15, para. 1, line 2).

      Line 275: “confirmed by following” has been revised as “corroborated by a parabolic …” (page 15, para. 1, line 15).

      Line 292: an action can be effortful, a feeling cannot. We have changed the word “effortful” to “effort” (page 18, para. 2, line 3).

      Line 315: “when it comes into” has been revised to “when it came to” (page 19, para. 1, line 10).

      Lines 330–331: These two expressions have been revised to “our data establish …” and “the after-effect of prosocial effort” (page 20, para. 1, lines 2–3).

      Line 387: “interest on at each effort level” has been corrected to “interest at each effort level” (page 23, para. 2, line 5).

      Line 405: should quickly be often? We agree that “quickly” might imply latency or speed of a single press, whereas the task required maximizing the frequency of presses within the time window. To capture this meaning accurately, we have revised the phrase to “pressed a button as rapidly as possible” (implying repetition rate) in the revised manuscript (page 24, para. 2, lines 3–4).

    1. eLife Assessment

      This fundamental study substantially advances our understanding of sibling chimerism in marmosets by demonstrating that chimerism is limited to hematopoietic cells. The evidence supporting these findings is compelling, demonstrated through comprehensive analyses, including single-cell RNA-seq data from multiple individuals and tissues. A few minor concerns were successfully addressed in a revision. The work will be of broad interest to many fields of biology.

    2. Reviewer #1 (Public review):

      Summary:

      Del Rosario et al characterized the extent and cell types of sibling chimerism in marmosets. To do so, they took advantage of the thousands of SNPs that are transcribed in single-nucleus RNA-seq (snRNA-seq) data to identify the sibling genotype of origin for all sequenced cells across 4 tissues (blood, liver, kidney, and brain) from many marmosets. They found that chimerism is prevalent and widespread across tissues in marmosets, which has previously been shown. However, their snRNA-seq approach allowed them to identify precisely which cells were of sibling origin, and which were not. In doing so they definitively show that sibling chimerism across tissues is limited to cells of myeloid and lymphoid lineages. The authors then focus on a large sample of microglia sequenced across many brain regions to quantify: (1) variation in chimerism across brain regions in the same individual, and (2) the relative importance of genetic vs. environmental context on microglia function/identity. (1) Much like across different tissues in the same individual, they found that the proportion of chimeric microglia varies across brain regions collected from the same individuals (as well as differing from the proportion of sibling cells found in blood of the same animals), suggesting that cells from different genetic backgrounds may differ in their recruitment and/or proliferation across regions and local tissue contexts, or that this may be linked to stochastic bottleneck effects during brain development. (2) Their (admittedly smaller sample size) analyses of host-sibling gene expression showed that the local environment dominates genotype. All told, this thoughtful and thorough manuscript accomplishes two important goals. First, it all but closes a previously open question on the extent and cell origins of sibling chimerism. Second, it sets the stage for using this unique model system to examine, in a natural context, how genetic variation in microglia may impact brain development, function, and disease.

      The conclusions of this paper are well supported by the data, and the authors exert appropriate care when extrapolating their results that come from smaller samples. However, there are a few concerns that should be addressed.

      The "modest correlation" mentioned in lines 170-172 does not take into account the uncertainty in estimates of each chimeric cell proportion (although the plot shows those estimates nicely). This is particularly important for the macrophages, which are far less abundant. Perhaps a more appropriate way to model this would be in a binomial framework (with a random effect for individual of origin). Here, you could model sibling identity of each macrophage as a function of the proportion of sibling-origin microglia and then directly estimate the percent variance explained.

      A similar (albeit more complicated because of the number of regions being compared) approach could be applied to more rigorously quantify the variation in chimerism across brain regions (L198-215; Fig 4). This would also help to answer the question of whether specific brain regions are more "amenable" to microglia chimerism than others.

      While the sample size is small, it would be exciting to see if any microglia eQTL are driven by sibling chimerism across the marmosets.

      L290-292: The authors should propose ways in which they could test the two different explanations proposed in this paragraph. For instance, a simulation-based modeling approach could potentially differential more stochastic bottleneck effects from recruitment-like effects.

      While intriguing, the gene expression comparison (Fig 5) is extremely underpowered. It would be helpful to clarify this and note the statistical thresholds used for identifying DEGs (the black points in the figure).

      Comments on revisions:

      The authors have thoroughly addressed all my suggestions.

    3. Reviewer #2 (Public review):

      Summary:

      This manuscript reports a novel and quite important study of chimerism among common marmosets. As the authors discuss, it has been known for years that marmosets display chimerism across a number of tissues. However, as the authors also recognize, the scope and details of this chimerism have been controversial. Some prior publications have suggested that the chimerism only involves cells derived from hematopoietic stem cells, while other publications have suggested more cell types can also be chimeric, including a wide range of cell types present in multiple organs. The present authors address this question and several other important issues by using snRNA-seq to track the expression of host and sibling-derived mRNAs across multiple tissues and cell types. The results are clear and provide convincing evidence that for the various organs analyzed, all chimeric cells are derived from hematopoietic cell lineages.

      This work will have impact on studies using marmosets to investigate various biological questions, but will have biggest impact on neuroscience and studies of cellular function within the brain. The demonstration that microglia and macrophages from different siblings from a single pregnancy, with different genomes expressing different transcriptomes, are commonly present within specific brain structures of a single individual opens a number of new opportunities to study microglia and macrophage function as well as interations between microglia, macrophages and other cell types.

      Strengths:

      The paper has a number of important strengths. This analysis employs the first unambiguous approach providing a clear answer to the question of whether sibling-derived chimeric cells arise only from hematopoietic lineages or from a wider array of embryonic sources. That is a long-standing open question and these snRNA-seq data seem to provide a clear answer, at least for brain and liver and kidney. In addition, the present authors investigate quantitative variation in chimeric cell proportions across several dimensions, comparing the proportion of chimeric cells across individual marmosets, across organs within an individual and across brain regions within an individual. All these are significant questions, and the answers have important implications for multiple research areas. Marmosets are increasingly being used for a range of neuroscience studies, and a better understanding of the process that leads to chimerism of microglia and macrophages in the marmoset brain is a valuable and timely contribution. But this work also has implications for other lines of study such as defining embryological and development processes and the potential to track specific cell populations within genetically engineered marmosets. Third, the snRNA-seq data will be made available through Brain Initiative NeMO portal and the software used to quantify host vs. sibling cell proportions in different biosamples will be available through Github.

      Comments on revisions:

      Several minor weaknesses have been addressed by the authors in a revision of the original manuscript. Each of my concerns and perceived weaknesses regarding the initial submission have been satisfactorily addressed in the revision.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Del Rosario et al characterized the extent and cell types of sibling chimerism in marmosets. To do so, they took advantage of the thousands of SNPs that are transcribed in single-nucleus RNA-seq (snRNA-seq) data to identify the sibling genotype of origin for all sequenced cells across 4 tissues (blood, liver, kidney, and brain) from many marmosets. They found that chimerism is prevalent and widespread across tissues in marmosets, which has previously been shown. However, their snRNA-seq approach allowed them to identify precisely which cells were of sibling origin, and which were not. In doing so they definitively show that sibling chimerism across tissues is limited to cells of myeloid and lymphoid lineages. The authors then focus on a large sample of microglia sequenced across many brain regions to quantify: (1) variation in chimerism across brain regions in the same individual, and (2) the relative importance of genetic vs. environmental context on microglia function/identity.

      (1) Much like across different tissues in the same individual, they found that the proportion of chimeric microglia varies across brain regions collected from the same individuals (as well as differing from the proportion of sibling cells found in the blood of the same animals), suggesting that cells from different genetic backgrounds may differ in their recruitment and/or proliferation across regions and local tissue contexts, or that this may be linked to stochastic bottleneck effects during brain development.

      (2) Their (admittedly smaller sample size) analyses of host-sibling gene expression showed that the local environment dominates genotype.

      All told, this thoughtful and thorough manuscript accomplishes two important goals. First, it all but closes a previously open question on the extent and cell origins of sibling chimerism. Second, it sets the stage for using this unique model system to examine, in a natural context, how genetic variation in microglia may impact brain development, function, and disease.

      The conclusions of this paper are well supported by the data, and the authors exert appropriate care when extrapolating their results that come from smaller samples. However, there are a few concerns that should be addressed.

      The "modest correlation" mentioned in lines 170-172 does not take into account the uncertainty in estimates of each chimeric cell proportion (although the plot shows those estimates nicely). This is particularly important for the macrophages, which are far less abundant. Perhaps a more appropriate way to model this would be in a binomial framework (with a random effect for individuals of origin). Here, you could model the sibling identity of each macrophage as a function of the proportion of sibling-origin microglia and then directly estimate the percent variance explained.

      We appreciate this good suggestion. We performed an analysis along these lines, and found that it supported the conclusion of a lack of strong relationship between microglial and macrophage chimerism. In particular (and as we now have added to the Methods):

      “To perform an analysis of Fig. 2D that takes into account the uncertainty in the estimate of the chimeric cell proportion, we performed a binomial generalized linear mixed-effects model analysis in R using the command glmer( y~(1|indiv) + chimerism_micro, family=binomial), where y is a vector (of length 1,333) containing the genomic identity of each macrophage (either host or twin), 1|indiv models a random effect for the identity of each animal, and chimerism_micro is the microglia chimerism of the animal’s brain. The fixed effects probability of chimerism_micro was 0.795, indicating that microglial chimerism fraction was not statistically significant as a predictor for macrophage chimerism fraction. The estimate for the intercept was -0.8115 and the estimate for chimerism_micro was 0.3106, which indicates that the probability of a cell is a macrophage given the microglia chimerism fraction was only 0.57 (plogis(-0.8115+0.3106)).”

      We have added the following in the main text:

      “We investigated further by performing a statistical test that takes into account the uncertainty in the estimates of the chimeric cell proportion using a binomial framework (Methods); in this analysis, microglia chimerism fraction was not a statistically significant predictor of macrophage chimerism fraction (Methods). This suggests that in addition to the cell’s genome, other factors such as local host environment play a role in differential recruitment, proliferation or survival of the sibling cells. (We note that macrophages often transit the fluid-filled perivascular space, with a substantially different migration history and arrival dynamics than microglia.)”

      Given this new analysis, and our original observation that the Pearson correlation was only 0.31, we believe that other factors in addition to the cell’s genome play a role in differential recruitment or survival of sibling cells.

      A similar (albeit more complicated because of the number of regions being compared) approach could be applied to more rigorously quantify the variation in chimerism across brain regions (L198-215; Figure 4). This would also help to answer the question of whether specific brain regions are more "amenable" to microglia chimerism than others.

      We performed the analysis along these lines and added the following in the Methods section:

      “We used the same framework to further analyze Fig. 4. We included brain region as a covariate in the binomial framework: glmer( y~(1|indiv) + brain_reg + assay, family=binomial), where, y is a vector (of length 48,439) containing the genomic identity of each microglia, and assay is either “Drop-seq” or “10X”. The brain regions assayed in Fig. 4 are the cortex, hippocampus, hypothalamus, striatum, thalamus, and basal forebrain. All these brain regions were statistically significant predictors for microglia chimerism fraction (all P-values<2x10<sup>-16</sup>), supporting the conclusion that chimerism varies across brain regions. We also re-analyzed Supplementary Fig. 4 (Fig. 4B in original manuscript) using the same framework and found that 18 out of 27 brain substructures were statistically significant predictors for microglia chimerism fraction.”

      We have added the following sentences in the main text:

      “We used the binomial generalized linear mixed-model framework and found that all brain regions were statistically significant predictors for microglia chimerism fraction, supporting the conclusion that chimerism varies across brain regions (Methods).

      Analysis of finer brain substructures showed a similar result (Supplementary Fig. 4; the binomial generalized linear mixed-model framework determined that 18 out of 27 brain substructures were statistically significant as predictors for microglia chimerism fraction, Methods).”

      While the sample size is small, it would be exciting to see if any microglia eQTL are driven by sibling chimerism across the marmosets.

      We like this idea, but our study is underpowered for eQTL analysis since we only have 14 data points in the correlation analysis (eight cases in which an animal’s brain hosted microglia derived from a single sibling, plus three cases in which an animal’s brain hosted microglia derived from two siblings, collectively allowing 8 + (2*3)=14 pairwise analyses).

      L290-292: The authors should propose ways in which they could test the two different explanations proposed in this paragraph. For instance, a simulation-based modeling approach could potentially differentiate more stochastic bottleneck effects from recruitment-like effects.

      While intriguing, the gene expression comparison (Figure 5) is extremely underpowered. It would be helpful to clarify this and note the statistical thresholds used for identifying DEGs (the black points in the figure).

      We agree; to help clarify this for readers, we added the following sentence at the end of the paragraph discussing Fig. 5A-C.

      “In all eleven individual marmosets, analysis identified genes whose differential expression distinguished microglia with the two sibling genomes (hundreds of genes in total), documenting a substantial effect of sibling genetic differences on microglial gene expression. However, we did not find any gene whose expression level recurrently distinguished “host” microglia (microglia with the same genome as neural cell types) from “guest” microglia (microglia with the sibling genome), aside from the XIST gene (a proxy for sibling sex differences, which were of course common) (Supplementary Fig. 5, Fig. 5A-C). In other words, although there were always gene-expression differences between sibling microglia, none of them consistently distinguished between host and guest microglia, suggesting that they were instead due to sibling genetic differences. We note that both analyses are power-limited, as the number of microglia in most animals, especially guest microglia, were modest (Supplementary Fig. 5); thus, we cannot rule out the possibility that there may be one or more genes whose expression levels reflect developmental histories (host vs. guest origin), just as there are likely far more genes (than the hundreds we identified) that can have sibling expression differences due e.g. to genetic differences between siblings. We sought to increase power (beyond single-gene analysis) by using latent factor analysis (Ling et al., 2024) to identify and quantify the expression of microglial gene-expression programs; however, even this analysis did not find any gene expression programs that exhibited consistent host-twin differences in expression levels (Methods).”

      And in the caption of Fig. 5A-C, we have included the statistical threshold for identifying DEGs:

      “In (A) to (C), each point represents a gene; its location on the plot represents the level of expression of that gene among microglia with two different genomes in the same animal. x- and y-axes: normalized gene expression levels (number of transcripts per 100,000 transcripts). FC: fold-change of gene expression, female/male for XIST. Fold-change and P-values were calculated using the binomTest method from the edgeR package (Robinson et al., 2010). Differentially expressed genes (black dots) were defined as: FDR Q-value<0.05 and fold-change>1.5 (in either direction) and the gene must be expressed in at least 10% of at least one of the two sets of microglia being compared.”

      Reviewer #2 (Public review):

      Summary:

      This manuscript reports a novel and quite important study of chimerism among common marmosets. As the authors discuss, it has been known for years that marmosets display chimerism across a number of tissues. However, as the authors also recognize, the scope and details of this chimerism have been controversial. Some prior publications have suggested that the chimerism only involves cells derived from hematopoietic stem cells, while other publications have suggested more cell types can also be chimeric, including a wide range of cell types present in multiple organs. The present authors address this question and several other important issues by using snRNA-seq to track the expression of host and sibling-derived mRNAs across multiple tissues and cell types. The results are clear and provide strong evidence that all chimeric cells are derived from hematopoietic cell lineages.

      This work will have an impact on studies using marmosets to investigate various biological questions but will have the biggest impact on neuroscience and studies of cellular function within the brain. The demonstration that microglia and macrophages from different siblings from a single pregnancy, with different genomes expressing different transcriptomes, are commonly present within specific brain structures of a single individual opens a number of new opportunities to study microglia and macrophage function as well as interactions between microglia, macrophages, and other cell types.

      Strengths:

      The paper has a number of important strengths. This analysis employs the first unambiguous approach providing a clear answer to the question of whether sibling-derived chimeric cells arise only from hematopoietic lineages or from a wider array of embryonic sources. That is a long-standing open question and these snRNA-seq data seem to provide a clear answer, at least for the brain, liver, and kidney. In addition, the present authors investigate quantitative variation in chimeric cell proportions across several dimensions, comparing the proportion of chimeric cells across individual marmosets, across organs within an individual, and across brain regions within an individual. All these are significant questions, and the answers have important implications for multiple research areas. Marmosets are increasingly being used for a range of neuroscience studies, and a better understanding of the process that leads to the chimerism of microglia and macrophages in the marmoset brain is a valuable and timely contribution. But this work also has implications for other lines of study. Third, the snRNA-seq data will be made available through the Brain Initiative NeMO portal and the software used to quantify host vs. sibling cell proportions in different biosamples will be available through GitHub.

      Weaknesses:

      I find no major weaknesses, but several minor ones. First, the main text of the manuscript provides no information about the specific animals used in this study, other than sex. Some basic information about the sources of animals and their ages at the time of study would be useful within the main paper, even though more information will be available in the supplementary material.

      We moved the table containing animal information (age at time of study, sex, source, tissues analyzed) from Supplementary Table 1 into the main text as Table 1. We also added the following sentences starting on line 140:

      “Brain snRNA-seq was performed on 11 animals (6 adults, 3 neonates and 1 six months old; Table 1). All were unrelated except for CJ006 and CJ007 which are birth siblings, and CJ025 and CJ026 which are (non-birth) siblings. All animals come from the three main marmoset colonies that comprise the animals in our facilities: New England Primate Research Center (NEPRC), CLEA Japan, and from a non-clinical contract research organization in Massachusetts. All adult marmosets had no known previous disease and were selected as part of a larger project to create a single cell atlas of the marmoset brain. The three neonates had died shortly after birth due to unknown reasons and were subsequently selected for snRNA-seq analysis.”

      Second, it is not clear why only 14 pairs of animals were used for estimating the correlation of chimerism levels in microglia and macrophages. Is this lower than the total number of pairwise comparisons possible in order to avoid using non-independent samples? Some explanation would be helpful.

      Only birth siblings (twins and triplets) can be meaningfully included in this analysis. The 14 pairs of animals we used to estimate the correlation of chimerism levels in microglia and macrophages included all pairs that we could use for this analysis: eight cases in which an animal’s brain hosted microglia derived from a single sibling, plus three cases in which an animal’s brain hosted microglia derived from two siblings, collectively allowing 8 + (2*3)=14 pairwise analyses.

      Finally, I think more analysis of the consistency and variability of gene expression in microglia across different regions of the brain would be valuable. Are there genetic pathways expressed similarly in host and sibling microglia, regardless of region of the brain? Are there pathways that are consistently expressed differently in host vs sibling microglia regardless of brain region?

      For brain-region differences in microglial gene expression, we are under-powered and would only be scratching the surface of a question (interesting but beyond the focus and scope of this paper) that needs deeper experimental sampling.

      For the questions about sibling-sibling differences (regardless of which sibling is host) and recurring host-sibling differences, we can do a stronger analysis, because these analyses have similar power to each other. We describe this analysis in the revised manuscript as follows:

      “In all eleven individual marmosets, analysis identified genes whose differential expression distinguished microglia with the two sibling genomes (hundreds of genes in total), documenting a substantial effect of sibling genetic differences on microglial gene expression. However, we did not find any gene whose expression level recurrently distinguished “host” microglia (microglia with the same genome as neural cell types) from “guest” microglia (microglia with the sibling genome), aside from the XIST gene (a proxy for sibling sex differences, which were of course common) (Supplementary Fig. 5, Fig. 5A-C). In other words, although there were always gene-expression differences between sibling microglia, none of them consistently distinguished between host and guest microglia, suggesting that they were instead due to sibling genetic differences. We note that both analyses are power-limited, as the number of microglia in most animals, especially guest microglia, were modest (Supplementary Fig. 5); thus, we cannot rule out the possibility that there may be one or more genes whose expression levels reflect developmental histories (host vs. guest origin), just as there are likely far more genes (than the hundreds we identified) that can have sibling expression differences due e.g. to genetic differences between siblings.”

      We also, as suggested, tried to get beyond single-gene analyses to expression of programs/pathways, by performing latent factor analysis on the single-cell gene expression measurements. 

      “Following the method described in (Ling et al., 2024), we performed latent factor analysis using the probabilistic estimation of expression residuals (PEER, Stegle et al., 2010) on the gene-by-donor matrix expression of microglia. We started by creating a gene-by-cell matrix of microglia gene expression from all animals, and we normalized the matrix using SCT transform version 2 (Choudhary and Satija, 2022) with 3000 variable features. We obtained the Pearson residuals from SCT normalization and summed up the residuals across cells with the same genome to obtain a gene-by-donor matrix of expression measurements of microglia. We used this matrix as input to PEER and ran the tool with a provided number of factors from 9 to 12. For each gene-expression latent factor, to evaluate whether host/sibling identity had a consistent effect on expression levels, we performed a linear regression with host/sibling identity using glm(peer_factor_k ~ host_or_twin). For all factors, the P-values for the effect of host_or_twin were all insignificant (greater than 0.1), indicating that no PEER factor associated with host-vs-twin identity. Thus, our results found no large-scale gene expression program that was consistently expressed differently between hosts and twins.”

      We have added the text above to the Methods section, and we added the following at the end of the section on Gene-expression comparisons of host- to sibling-derived microglia (lines 264-267):

      “We sought to increase power (beyond single-gene analysis) by using latent factor analysis (Ling et al., 2024) to identify and quantify the expression of microglial gene-expression programs; however, even this analysis did not find any gene expression programs that exhibited consistent host-twin differences in expression levels (Methods).”

      Gene-expression pathways/factors did (within some animals) did show host-twin differences in expression levels, but without a consistent host-twin direction of effect that was shared across the many host-twin comparisons. In particular, we used the PEER analysis that we have performed above and calculated the host-sibling expression level difference for each latent factor. Many factors differed in expression in individual cases, though none did so in all cases nor in a consistent-sign manner:

      Author response image 1.

      Difference between host and sibling expression of gene-expression latent factors for each of the 12 factors computed (using PEER) from the single-cell dataset. For a given factor, the factor expression value of the sibling-genome cells is subtracted from that of the host-genome cells and the difference is divided by the maximum of the absolute value of all elements in that factor.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      In the introduction (line 62), the authors mention that chimerism might have shaped behavior in marmosets (and perhaps been selected for). It would be helpful to see this revisited in the discussion. Is it possible that additional genetic variation in immune cells (resident and circulating) provides adaptive benefits and/or disease resistance? In the case of microglia, could the proportion of sibling cells be related (either positively or negatively) to local/regional pathology?

      We liked this suggestion and have added the following in the Discussion:

      “Chimerism could also enable interesting future analyses of whether there are adaptive benefits of chimerism in marmoset immune cells, among whom chimerism could in principle allow presentation of a wider variety of antigens for adaptive immunity. In a recent outbreak of yellow fever in Brazil in 2016-2018, marmosets were found to be less susceptible than other primates that lack immune system chimerism, including the howler monkeys (Alouatta), robust capuchins (Sapajus), and titi monkeys (Callicebus) (de Azebedo Fernandes, et al., 2021). In studying future outbreaks in marmosets, one could use single-cell RNA-seq and the methods described here to study how genetically distinct immune cells (in the same animal) have differentially migrated to affected tissues and/or assumed "activated" immune cell states. Recent innovations in spatial transcriptomics with sequencing readouts (that detect SNP alleles) may also make it possible to identify any differential recruitment of genetically distinct immune cells to focal infection sites.”

      Minor comments:

      L300 delete "temporal.”

      We have revised the text accordingly.

      L305: "more-restricted" should not be hyphenated.

      We have revised the text accordingly.

      L309: "from the non-cell" - delete "the.”

      We have revised the text accordingly.

      L367: Louvain, not Louvaine.

      We have revised the text accordingly.

      Figure 2B can be removed - it does not add much information and takes up a lot of space.

      We have moved Figure 2B to panel J Supplementary Fig. 1 (it is now displayed together with all other animals).

      The same can be said for Figure 4B, which is too tiny. There might be more effective ways to show this variation across animals.

      We have moved Figure 4B to Supplementary Fig. 4 and we have increased the font sizes to make the text in the figures more readable.

      Reviewer #2 (Recommendations for the authors):

      I would suggest providing some basic information about the sources of study animals within the main text. At a minimum, it would be useful to state which colonies are represented in the data, and if there is anything significant about the individual animal histories (e.g. prior exposure to surgical intervention or infectious disease). I believe this basic information should be in the main text, despite the inclusion of a broader range of information in the supplements.

      We appreciate this suggestion and revised lines 143 to 149 of the main text as follows:

      “All animals come from the three main marmoset colonies that comprise the animals in our facilities: New England Primate Research Center (NEPRC), CLEA Japan, and from a non-clinical contract research organization. All adult marmosets had no known previous disease and were selected as part of a larger project to create a single-cell atlas of the marmoset brain (Krienen et al., 2020; Krienen et al., 2023). The three neonates died shortly after birth due to unknown reasons and were subsequently selected for snRNA-seq analysis.”

      I would include the species name (Callithrix jacchus) in line 48.

      “On lines 47-48, we now indicate the name of the genus: “Chimerism is common, however, in the Callitrichidae family that consists of the marmosets (Callithrix) and their close relatives the tamarins (Saguinus)...”

      Then on line 65, we now indicate the species name: “Here, we analyze chimerism in the common marmoset (Callithrix jacchus) brain, liver, kidney and blood,...”

      The word "organisms" in line 59 should be "organs.”

      We have modified the text accordingly.

      Lines 100-101: I would suggest this would be clearer to readers if it read: "The relative likelihoods of the original source of each cell could be strongly...".

      We have modified the text accordingly.

    1. eLife Assessment

      This well-designed study offers important insights into the development of infants' responses to music based on the exploration of EEG neural auditory responses and video-based movement analysis. The compelling results revealed that evoked responses emerge between 3 and 12 months of age, but no age group demonstrated evidence of coordinated movements to music. This study will be of significant interest to developmental psychologists and neuroscientists, as well as researchers interested in music processing and in the translation of perception into action.

    2. Reviewer #1 (Public review):

      Summary:

      This study aims to investigate the development of infants' responses to music by examining neural activity via EEG and spontaneous body kinematics using video-based analysis. The authors also explore the role of musical pitch in eliciting neural and motor responses, comparing infants at 3, 6, and 12 months of age.

      Strengths:

      A key strength of the study lies in its analysis of body kinematics and modeling of stimulus-motor coupling, demonstrating how the amplitude envelope of music predicts infant movement, and how higher musical pitch may enhance auditory-motor synchronization.

      EEG data provide evidence for enhanced neural responses to music compared to shuffled auditory sequences. These findings ecourage further investigation of the proposed developmental trajectory of neural responses to music and their link to musical behavior in infants.

      Comments on revisions:

      I thank the authors for the considerable effort devoted to revising the manuscript and addressing the raised questions and comments. I particularly appreciate the additional analyses and the extended arguments included in the discussion. I believe that this paper represents a valuable contribution to the literature on music development.

      One remaining comment concerns the evoked response observed in the shuffled condition, which I still find intriguing. Considering that the auditory events in the shuffled condition display a clear rise time, particularly for those events that were selected based on being preceded and followed by longer periods of silence, one would expect to observe an evoked response emerging from baseline. However, this pattern is not evident in the presented curves. The authors may further examine and discuss the shape and characteristics of these response patterns.

    3. Reviewer #2 (Public review):

      Summary:

      Infants' auditory brain responses reveal processing of music (clearly different from shuffled music patterns) from the age of 3 months; however, they do not show related increase in spontaneous movement activity to music until the age of 12 months.

      Strengths:

      This is a nice paper, well designed, with sophisticated analyses and presenting clear results filling an important gap about early infant sensitivity, detection, and differentiation of musical sounds. The addition of EEG recordings (specifically ERPs) in response to music presentations at 3 different infant ages in the first postnatal year is important, and the manipulation of the music stimuli into shuffled, high and low pitch to capture differences in brain response processing and spontaneous movements is interesting. Further, the movement analysis based on Quantity of Movements (QoM) and movement subdivision into 10 distinct Principal Movements (PMs) is novel and creative.

      Overall, results show that ERPs responses to music occurs earlier than QoM in early development, and that even at 12 months, motor responses to music remain coarse and not rhythmically aligned with the music tempo. This work increases our fundamental understanding of infants' early music perception in relation to auditory processing and motor response.

      Comments on revisions:

      The authors have addressed my questions in their revision. I have no other questions. Thanks again for the opportunity to read and evaluate this interesting work.

    4. Reviewer #3 (Public review):

      Summary

      This study provides a detailed investigation of neural auditory responses and spontaneous movements in infants listening to music. Analyses of EEG data (event-related potentials and steady-state responses) first highlighted that infants at 3, 6 and 12 months of age and adults showed enhanced auditory responses to music than shuffled music. 6-month-olds also exhibited enhanced P1 response to high-pitch vs low-pitch stimuli, but not the other groups. Besides, whole body spontaneous movements of infants were decomposed into 10 principal components. Kinematic analyses revealed that the quantity of movement was higher in response to music than shuffled music only at 12 months of age. Although Granger causality analysis suggested that infants' movement was related to the music intensity changes, particularly in the high-pitch condition, infants did not exhibit phase-locked movement responses to musical events, and the low movement periodicity was not coordinated with music.

      Strengths

      This study investigates an important topic on the development of music perception and translation to action and danse. It targets a crucial developmental period that is difficult to explore. It evaluates two modalities by measuring neural auditory responses and kinematics, while cross-modal development is rarely evaluated. Overall, the study fills a clear gap in the literature.

      Besides, the study uses state-of-the-art analyses. Detailed investigations were performed, as well as exploratory analyses in supplementary information. The discussion is rich in neurodevelopmental interpretations and comparisons with the literature. All steps are clearly detailed. The manuscript is very clear, well-written and pleasant to read. Figures are well-designed and informative. The authors' responses to previous reviews are also detailed and informative.

      Comments on revisions:

      The authors answered all my questions.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This study aims to investigate the development of infants' responses to music by examining neural activity via EEG and spontaneous body kinematics using video-based analysis. The authors also explore the role of musical pitch in eliciting neural and motor responses, comparing infants at 3, 6, and 12 months of age.

      Strengths:

      A key strength of the study lies in its analysis of body kinematics and modeling of stimulus-motor coupling, demonstrating how the amplitude envelope of music predicts infant movement, and how higher musical pitch may enhance auditory-motor synchronization.

      Weaknesses:

      The neural data analysis is currently limited to auditory evoked potentials aligned with beat timing. A more comprehensive approach is needed to robustly support the proposed developmental trajectory of neural responses to music.

      We thank the reviewer for this comment and would like to clarify that there has been a misunderstanding: our EEG analyses were time-locked to actual tone onsets, not to expected beat positions. For both music and shuffled conditions, ERPs were computed by epoching around all real auditory events present in each stimulus. This approach ensures that the AEPs reflect neural responses to actual auditory events rather than to predicted or expected events that do not exist in the shuffled stimuli. We have now clarified this further in the revised manuscript (p. 9).

      Reviewer #2 (Public review):

      Summary:

      Infants' auditory brain responses reveal processing of music (clearly different from shuffled music patterns) from the age of 3 months; however, they do not show a related increase in spontaneous movement activity to music until the age of 12 months.

      Strengths:

      This is a nice paper, well designed, with sophisticated analyses and presenting clear results that make a lot of sense to this reviewer. The additions of EEG recordings in response to music presentations at 3 different infant ages are interesting, and the manipulation of the music stimuli into shuffled, high, and low pitch to capture differences in brain response and spontaneous movements is good. I really enjoyed reading this work and the well-written manuscript.

      Weaknesses:

      I only have two comments. The first is a change to the title. Maybe the title should refer to the first "postnatal" year, rather than the first year of life. There are controversies about when life really starts; it could be in the womb, so using postnatal to refer to the period after birth resolves that debate.

      Thank you very much for your thoughtful suggestion regarding the title. To ensure clarity and to unambiguously indicate that our study focuses on the period after birth, we agree that specifying "first postnatal year” in the title is appropriate. We have revised the title accordingly.

      The other comment relates to the 10 Principal Movements (PMs) identified. I was wondering about the rationale for identifying these different PMs and to what extent many PMs entered in the analyses may hinder more general pattern differences. Infants' spontaneous movements are very variable and poorly differentiated in early development. Maybe, instead of starting with 10 distinct PMs, a first analysis could be run using the combined Quantity of Movements (QoM) without PM distinctions to capture an overall motor response to music. Maybe only 2 PMs could be entered in the analysis, for the arms and for the legs, regardless of the patterns generated. Maybe the authors have done such an analysis already, but describing an overall motor response, before going into specific patterns of motor activation, could be useful to describe the level of motor response. Again, infants provide extremely variable patterns of response, and such variability may potentially hinder an overall effect if the QoM were treated as a cumulated measure rather than one with differentiated patterns.

      We agree that due to the high variability and limited differentiation of infant motor responses at this age, it is important to consider an overall measure of movement in addition to specific PMs. To address exactly this, we had included an analysis in which we combined all 10 PMs into a single global QoM metric. This ‘All PMs’ measure reflects the overall motor response to the different auditory stimuli. For clarity, this result is presented in Figure 5, where we show the denoised global QoM signal and highlight the observed Condition × Age interaction (which averaged QoM for all PMs and is therefore equivalent to QoM without PM distinction). We now emphasize this analysis more clearly in the Results section (p. 16).

      Reviewer #3 (Public review):

      Summary:

      This study provides a detailed investigation of neural auditory responses and spontaneous movements in infants listening to music. Analyses of EEG data (event-related potentials and steady-state responses) first highlighted that infants at 3, 6, and 12 months of age and adults showed enhanced auditory responses to music than shuffled music. 6-month-olds also exhibited enhanced P1 response to high-pitch vs low-pitch stimuli, but not the other groups. Besides, whole body spontaneous movements of infants were decomposed into 10 principal components. Kinematic analyses revealed that the quantity of movement was higher in response to music than shuffled music only at 12 months of age. Although Granger causality analysis suggested that infants' movement was related to the music intensity changes, particularly in the high-pitch condition, infants did not exhibit phase-locked movement responses to musical events, and the low movement periodicity was not coordinated with music.

      Strengths:

      This study investigates an important topic on the development of music perception and translation to action and dance. It targets a crucial developmental period that is difficult to explore. It evaluates two modalities by measuring neural auditory responses and kinematics, while cross-modal development is rarely evaluated. Overall, the study fills a clear gap in the literature.

      Besides, the study uses state-of-the-art analyses. All steps are clearly detailed. The manuscript is very clear, well-written, and pleasant to read. Figures are well-designed and informative.

      Weaknesses:

      (1) Differences in neural responses to high-pitch vs low-pitch stimuli between 6-month-olds and other infants are difficult to interpret.

      We agree with the reviewer that the differences in neural responses to high-pitch versus low-pitch stimuli between 6-month-olds and other infants are difficult to interpret. We have offered several possible explanations for these findings, including developmental changes in auditory plasticity, social interaction effects, maturation of the auditory system, and arousal or exposure differences. If the reviewer has additional perspectives or alternative explanations, we would be very pleased to incorporate them into the revised manuscript.

      (2) Making some links between the neural and movement responses that are described in this manuscript could be expected, given the study goal. Although kinematic analyses suggested that movement responses are not phase-locked to the music stimuli, analyses of Granger causality between motion velocity and neural responses could be relevant.

      We appreciate the suggestion that exploring links between neural and movement responses would be valuable, especially given the study's goals. We were initially cautious about interpreting potential Granger-causal relations between neural and motor activity, as temporal scale differences between the two measures can easily bias directionality estimates. Neural responses typically occur on the scale of milliseconds, whereas movement unfolds over seconds. As a result, an apparent directional relation might emerge simply due to these intrinsic timescale differences rather than reflecting genuine causal influence.

      Nevertheless, we agree that this relationship warrants further investigation and added the following analyses to the supplements (p. 9). Accordingly, we conducted additional exploratory analyses to examine whether ERP amplitudes correlated with movement measures. To this end, we computed correlations between neural and movement responses using participant-averaged data (not single trials). For neural measures, we extracted mean ERP amplitudes in the time window post-tone-onset encompassing the P1 component derived from cluster-based analyses. For movement measures, we used: (1) total movement quantity (mean velocity across the entire trial), and (2) Granger causality F-values reflecting music-to-movement coupling strength. These analyses included comparisons between music and shuffled music conditions, as well as between high- and low-pitch conditions. We therefore ran two linear mixed-effects models, with ERP amplitudes as response variables and either QoM or Granger causality F-values as fixed effects. Infants were modelled as random intercepts. Our results showed no significant correlations between ERP amplitudes and movement quantity, irrespective of conditions (p>.124), and neither when comparing music vs shuffled music (p>.111) nor when comparing high vs low pitch (p>.071) across all age groups. We also do not find significant correlations between ERP amplitudes and Granger causality F-values, irrespective of conditions (p>.164), and when comparing music vs shuffled music (p>.494) or high vs low pitch (p>.175) across all age groups. The absence of robust correlations suggests that neural sensitivity to musical structure (as indexed by ERPs) and motor responsiveness to music (as indexed by movement quantity or coupling strength) develop somewhat independently during the first year of life. This dissociation aligns with broader developmental theories proposing that perceptual sensitivity often precedes and enables later motor coordination, rather than developing together.

      (3) The study considers groups of infants at different ages, but infants within each group might be at different stages of motor development. Was this assessed behaviorally? Would it be possible to explore or take into account this possible inter-individual variability?

      We agree this is important. Infants in each age group were within a quite narrow age range (3 months: M=113.04 days, SD=5.68 days, Range=98-120 days, 6 months: M=195.88 days, SD=9.46 days, Range=182-211 days,12-13 months: M=380.44 days, SD=14.93 days, range=361-413 days), as detailed in the sample description on p. 37. Despite this, we asked parents to report on infants' major motor milestones, specifically their ability to sit and/or walk. At 6 months, 25% of infants were able to sit (N = 20), and at 12 months, 50% of infants were able to walk (N = 18). Given the relatively small group sizes for these milestones, we are concerned that conducting detailed analyses could yield unstable or misleading results that may not generalize beyond our sample. Therefore, we chose to focus on broader analyses that are more robust given our current dataset. We fully support your suggestion that future studies with larger samples and more comprehensive motor assessments will better clarify these developmental trajectories.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      While the analysis and findings on auditory-evoked spontaneous movement are highly interesting, the results from the neural data raise questions about the genuine role of music in the observed evoked and induced responses.

      General comments on the findings related to neural data

      (1) The main neural finding is a larger response in the Music condition compared to the Shuffled Music condition. To address their hypothesis, the authors computed the AEP to tones at the beat position and compared responses between the Music and Shuffled Music conditions, aligning the onset to the expected beat position. However, given that inter-onset intervals were permuted in the Shuffled condition, an AEP time-locked to the expected beat position is not meaningful, as no tone is expected at that time. Therefore, it is expected to have a relatively flat AEP in response to the shuffled condition. Furthermore, given the reduced regularity in the Shuffled condition, the observed difference in ASSR at the beat frequency is expected. Similar results could be obtained using an isochronous sequence of pure tones and a shuffled version of the same sequence. Therefore, these two analyses do not strongly support the conclusion of infants' enhanced neural responses to music.

      The authors could consider comparing AEPs by aligning onsets in the Shuffled condition to the actual tone positions, potentially focusing only on tones with sufficiently long preceding and following IOIs to avoid confounds from short intervals. The two conditions could then be compared with correction for the number of tones. Potential differences in this case could have suggested an impact beyond the auditory evoked responses.

      We agree that ASSR analyses at the beat frequency is not enough to evidence enhanced neural responses to music. However, we would like to clarify that for the AEP analyses, the EEG data were epoched to all actual tone onsets rather than the expected beat positions, therefore adding to the ASSR analysis. Thus, for the shuffled music condition, the EEG was aligned with the real tone onsets present in that sequence, not with hypothetical beat positions derived from a regular rhythm. This approach ensures that the AEPs reflect neural responses to actual auditory events rather than to predicted or expected events that do not exist in the shuffled stimuli.

      We further clarify this in the results section on p. 9

      “Figure 2 shows the average ERPs to the bassline notes in the auditory stimuli, with EEG data time-locked to actual tone onsets (see Methods for details).”

      Finally, following the reviewer’s suggestion, we carried out three control analyses: 1) including only epochs corresponding to bassline tones whose prior inter-onset interval (IOI) exceeded the median IOI duration, 2) including only epochs corresponding to bassline tones whose subsequent IOI exceeded the median IOI duration, and 3) including only epochs corresponding to both melody and bassline tones whose prior and subsequent IOI exceeded the median IOI duration. These analyses yielded event-related potentials in the shuffled music condition that were highly similar to those obtained when all epochs were included (see Figure S1). Therefore, the greater neural response to music compared with shuffled music likely reflects an effect of predictability in the musical condition or, more generally, infants’ disengagement with the shuffled stimuli.

      It would also be helpful to see whether the authors explored other approaches for evaluating neural responses across conditions, such as brain-stimulus synchronization, coherence measures, or temporal response functions (TRF), and whether these yielded comparable results.

      Thank you for this question. We have not explored these approaches, but we agree that alternative methods for evaluating neural responses, such as brain-stimulus synchronization, coherence measures, or temporal response functions (TRF), could offer complementary insights. Given the scope and focus of the present work, and the already extensive set of neural and behavioral measures reported, we chose to prioritize analyses most directly relevant to our initial research questions. Incorporating further methods might risk complicating the narrative and obscuring the key findings. We appreciate the value of these additional methods and consider them promising avenues for future investigations.

      (2) Another important finding concerns the difference in AEPs between the High Pitch and Low Pitch conditions in 6-month-old infants, a pattern not observed in the younger (3-month) or older (12-month and adult) groups. The authors interpret this as heightened sensitivity to high-pitch sounds, typical of infant-directed speech. However, the absence of this effect at 12 months raises questions. It would be helpful to consider whether this pattern may be influenced by data quality differences across age groups. Additionally, the authors could discuss this observation in relation to studies showing stronger neural tracking of rhythms in infants, particularly for low-frequency sounds (e.g., Lenc et al., Developmental Science, 2022).

      This is an interesting consideration that we investigated further. Regarding data quality differences, we considered different measures and now report these in the methods section (p. 30) and supplements (p. 1).

      “We conducted two analyses to compare the EEG data quality across age groups. First, we compared the number of trials that were included in the final analysis per age group. The trial number did not differ significantly across age groups (p > .361). Second, we calculated the SNR by dividing the EEG power at the frequency of interest (i.e., 2.25 Hz, matching the musical beat) by the background noise in surrounding bins (3rd to 5th bin, see ASSR methodology for further details; c.f., Christodoulou et al., 2018; Cirelli et al., 2014). This division yields a signal-to-noise ratio that can be averaged across conditions and compared across age groups to assess variations in signal quality (especially when focusing on the pitch conditions with the same beat frequency). Here, we find that all three age groups show considerable SNR above 1 (3m: M = 2.569, SD = 1.104; 6m: M = 2.743, SD = 1.001; 12m: M = 1.907, SD = 0.749), with no statistically significant differences (three t-tests, FDR-corrected, p > .134). Importantly, our key comparison of High vs. Low Pitch was performed within each age group, thus controlling for any overall differences in signal quality across groups. Together, these two analyses indicate that signal quality was comparable across age groups.”

      Overall, these control analyses seem to support the observed high-pitch sensitivity in the neural response of 6-month-olds, specifically, and in line with previous research investigating this age range (Trainor & Zacharias, 1998; Fernald & Kuhl, 1987). What is more is that there might be some particular changes towards the end of the first year that mark infants’ widening of their attention towards others (beyond their primary caregivers) and objects in their environment (Cooper et al., 1997; Newman & Hussain, 2006), as well as a decrease in exposure to face-to-face interactions with their primary caregivers (Jayaraman et al., 2015). Taken together, research shows that infants' preference for infant-directed speech decreases significantly between 4.5 and 9 months, coinciding with developmental changes in attentional systems and social interaction patterns. This might explain the absence of high-pitch sensitivity in 12-month-olds. However, further research is needed to determine if and in which contexts high-pitch sensitivity to music changes throughout infancy.

      We also edited the discussion in order to compare our results to those of Lenc et al., 2023, p. 23: “It should also be noted that our musical stimuli comprised polyphonic (two-voice) music, carrying sound frequencies falling within the typical range of infant-directed song (~200-400 Hz, Cirelli et al., 2020; Nguyen, Reisner, et al., 2023b; Trainor & Zacharias, 1998). As such, our results might specifically speak for infants’ ability to separate (and prioritize among) simultaneous communicative auditory streams (Marie & Trainor, 2013; Trainor, 2015). Indeed, other studies presenting one-voice pure tone sequences (single isochronous and isotonous tones) with high vs. low pitch - notably at frequencies outside our range (130 vs. 1237 Hz) - have reported stronger neural responses to relatively low frequencies (Lenc et al., 2023). Together, these contrasting observations suggest that pitch prioritization changes not only throughout development but also depends on the polyphonic complexity and spectral characteristics of the perceived stimuli. Further research might investigate this interesting issue further.”

      (3) It would also be helpful if the authors provided more detailed information on the stimuli, including both temporal/rhythmic and spectral content, for the original music, high-pitch and low-pitch variations, and shuffled versions.

      Absolutely. We agree that this is important to report. We have added a Table to the Results (Table 1) and a Table S1 with M, SD and range of the envelope to further describe the temporal and spectral features of the Stimuli.

      General comments on the findings related to body kinematics

      (4) Quantification of movement based on the PMs did not lead to any differences between the High Pitch and Low Pitch conditions. However, Granger causality showed high prediction strength for the High Pitch condition. In the discussion, the authors proposed that high-pitch music might have led to higher arousal. If this were the case, one might expect to observe increased movement in the High Pitch condition relative to the Low Pitch condition in the PM analyses. I propose that the authors revise the discussion to address the misalignment between different findings.

      We thank the reviewer for highlighting this important point and welcome the suggestion to clarify the relationship between movement quantification based on principle movements (PM) and the Granger causality results. We agree that the apparent discrepancy between these measures merits further clarification. We note that the discrepancy suggests that Granger causality may capture subtler temporal coordination between movements and the music, rather than gross movement magnitude. We have incorporated this reasoning into the revised discussion paragraph (page 23-24), which now reads as:

      “If increased arousal were to result in greater overall movement, we would expect higher movement levels in the high pitch condition; however, this was not observed. QoM analyses based on the PMs did not reveal significant differences between the high pitch and low pitch conditions. This discrepancy may arise because Granger causality captures subtler temporal coordination between movement and music rather than gross movement quantity. Thus, high-pitch music may modulate the timing and coordination of motor responses without necessarily increasing the overall amount of movement. In line with prior work (e.g., Bigand et al., 2024), this interpretation emphasizes that musical coordination often involves changes in coupling strength rather than movement quantity per se.”

      (5) The authors report a lack of periodicity and phase-locked movement in infants. Considering the developmental stage, I assume that spontaneous movements to music have emerged over short periods during each exposition period. Probably to further investigate movement periodicity, which has been previously suggested, the authors can first automatically extract periods of periodic movement and further evaluate the tempo/frequency and synchronization with the stimulus during these specific periods.

      We thank the reviewer for this thoughtful suggestion. We conducted similar analyses prior to submission, using methods comparable to previous studies (Fujii et al., 2014). These analyses did not yield additional insights beyond those already presented in the manuscript, so we opted not to include them initially. For completeness, we briefly mention these results on p. 19:

      “Robustness analyses based on thresholding of variation in the time series to identify movement burst epochs (similar to Fujii et al., 2014) yielded consistent results. No significant movement-to-music synchronization was found across age groups (all ps > .563).“

      It is important to clarify that while movement periodicity in infants listening to music has been previously suggested, the evidence for actual synchronization to musical beats remains limited and has been frequently misinterpreted in the literature. The seminal study by Zentner and Eerola (2010) is often cited as evidence for infant rhythmic entrainment, but their findings actually demonstrated tempo flexibility rather than synchronization, i.e., infants moved faster when the music was faster. Similarly, Fujii et al. (2014) found that while individual infants showed some movement-to-music coordination, this occurred in only 2 out of 11 tested infants (18%), and the authors emphasized that "movement-to-music synchronization is rare in infants and observed at an individual level".

      (6) A last general comment is that the authors try to explain the findings of the current study, providing hypotheses, for instance, on the origin of differences in the neural response to high and low pitch only at 6 months. It would be helpful if the authors also consider the misalignment of results with previous findings.

      We thank the reviewer for this comment and acknowledge the importance of placing our findings in the context of prior research on infant pitch perception, including some apparent inconsistencies such as those noted for Lenc et al. (2023), which we have addressed in our response to comment 2. We agree that results inevitably vary across studies due to differences in methods, stimuli, and participant samples—all factors that contribute to some variability in developmental trajectories observed in the literature.

      Importantly, our observation of a transient difference in neural responses to high versus low pitch emerging at 6 months aligns with existing evidence indicating significant neural reorganization occurring around this age (Carr et al., 2022) and continuing toward 12 months (Kuhl et al., 2014). This may reflect a sensitive developmental window during which infants show heightened sensitivity to prosodic features important for early social and communicative interactions. After this window, attentional and auditory processing priorities shift, which could explain the subsequent decline in pitch sensitivity.

      We emphasize that these interpretations are preliminary, and further systematic investigations—preferably longitudinal studies incorporating diverse pitch ranges and multimodal attentional and neural measures—are needed to delineate the developmental course of pitch sensitivity comprehensively.

      Reviewer #2 (Recommendations for the authors):

      Thank you for the opportunity to read this interesting work.

      Thank you for the constructive comments.

      Reviewer #3 (Recommendations for the authors):

      (1) I would suggest replacing "first year of life" with "first post-natal year".

      Thank you for the suggestion. In line with yours and Reviewer #2’s comments, we have revised the title to “first postnatal year”.

      (2) Precising the music paradigm and the stimuli nature/timing would be useful at the beginning of the Results section.

      We agree and have added two tables (Table 1 and Table S1 for continued information on the envelope) for further information about the paradigm and stimuli to the beginning of the results section (p.8).

      In addition, the stimuli are also shared on a repository: https://doi.org/10.48557/DCSCFO.

      (3) Since the infants moved during the experiment, EEG data might show movement artefacts. Was the approach used to correct these artefacts satisfactory, even in 12-month-olds who moved more?

      We appreciate the reviewer’s important question regarding artifact correction in infant EEG data, especially given increased movement in older infants. We recognize that movement-related artifacts are an inherent challenge in EEG recordings with infants, and complete elimination of such artifacts is technically difficult (if not impossible). However, several points support the robustness of our ERP findings despite spontaneous movement:

      First, we used a two‐stage pipeline to maximize artifact removal without bias: First, Artifact Subspace Reconstruction (ASR) repaired brief, high‐variance artifacts by reconstructing contaminated channels from clean data. Second, Independent Component Analysis (ICA, as implemented in ICLabel) decomposed the ASR‐cleaned EEG into independent components, allowing us to remove residual non‐neural artifacts (e.g., eye movements) based on their spatial and spectral features. Both ASR and ICA operate agnostically to condition or age group and automatically, without subjective decisions, ensuring unbiased cleaning and reliable ERP comparisons.

      As noted in the response to R1 Comment (2), we also compared the EEG data quality across age groups and conditions. The trial number did not differ significantly across age groups (p > .361). Second, we calculated the SNR by dividing the EEG power at the frequency of interest and found no statistically significant differences across age groups (three t-tests, FDR-corrected, p > .134). Together, these two analyses indicate that signal quality was comparable across age groups.

      Infant movements during the session were sporadic and, most importantly not time-locked to tone onsets (see Fig S2). Because artifact rejection (namely, Artifact Subspace Reconstruction and Independent Component Analysis) discarded only those epochs containing large, transient artifacts irrespective of condition, residual movement-related noise would not systematically inflate ERPs.

      (4) The timing of the P200 response peak could be specified in adults as for infants.

      The timing of the P200 in adults is mentioned on page 9: “[…] a second positivity peaking at 158 ms post-stimulus (so-called “P200”, here reaching an amplitude of 0.85 µV).” The timing of the infant P2 is specified on p 10 and 11: “The P2 ranged between 307 and 325 ms post-stimulus and peaked at 316 ms, reaching an average amplitude of 1.026 µV.”

      (5) In infants, the evocation of "peaking at 212ms" is not completely clear: does this timing correspond to the P1 peak at 3 months of age or to the time when the response to music was enhanced compared to shuffled music?

      Thank you for highlighting the need for greater clarity regarding the timing of the P1 peak and its relation to the observed enhancement. We have revised the text to explicitly state that 212 ms corresponds to the P1 peak in 3-month-old infants within the window where the response to music was significantly enhanced compared to shuffled music.

      p.9: “Importantly, and in line with the adults’ data, all infant groups exhibited enhanced P1 amplitudes in response to music compared to shuffled music. Cluster-based permutation (nPerm=1000) testing revealed that 3-month-old infants’ P1 amplitude was enhanced between 177 and 305 ms post-stimulus (cluster-t=1111.90, p=.002). Within this window, the P1 peaked at 212 ms and reached an amplitude of 1.8 µV.”

      (6) It might be useful to put the results of this study into perspective with other studies of infant motor development (e.g., Hinnekens et al, eLife 2023).

      Thank you for pointing out this study. We have integrated the Hinnekens et al. (2023) findings into our discussion of infant motor development toward dance-like behaviors. p.22 “Taking a broader perspective on infants’ motor development, our findings align with research on locomotion across the first 14 months of life, which shows that as the number of motor primitives increases, their intrinsic variability decreases (Hinnekens et al., 2023). Viewed together, these patterns point toward a gradual refinement of motor control: the human motor system first develops the capacity to control individual muscles, and gradually to integrate them into motor synergies that support complex, coordinated behaviours, such as locomotion, musical synchronization, and dance.”

      (7) Regarding the progressive maturation of the auditory/linguistic pathways during infancy, the authors might also refer to (Dubois et al, Cerebral Cortex 2016).

      Thank you for the suggestion. We added the study to the discussion on page 22: “This developmental trajectory aligns with neuroimaging evidence showing that while the ventral linguistic pathway (connecting temporal and frontal regions via the extreme capsule) is well-established at birth, the dorsal pathway—particularly the arcuate fasciculus connecting temporal regions to inferior frontal areas—continues maturing throughout the first postnatal months, with different maturational timelines for dorsal versus ventral connections (Dubois et al., 2016).“

    1. eLife Assessment

      This manuscript makes a valuable contribution to the concept of fragility of meta-analyses via the so-called 'ellipse of insignificance for meta-analyses' (EOIMETA). The strength of evidence is convincing, supported primarily by an example of the fragility of meta-analyses in the association between Vitamin D supplementation and cancer mortality, but the approach could be applied in other meta-analytic contexts. The significance of the work could be enhanced with a more thorough assessment of the impact of between-study heterogeneity, additional case studies, and improved contextualization of the proposed approach in relation to other methods.

    2. Reviewer #1 (Public review):

      Summary:

      This manuscript addresses an important methodological issue-the fragility of meta-analytic findings-by extending fragility concepts beyond trial-level analysis. The proposed EOIMETA framework provides a generalizable and analytically tractable approach that complements existing methods such as the traditional Fragility Index and Atal et al.'s algorithm. The findings are significant in showing that even large meta-analyses can be highly fragile, with results overturned by very small numbers of event recodings or additions. The evidence is clearly presented, supported by applications to vitamin D supplementation trials, and contributes meaningfully to ongoing debates about the robustness of meta-analytic evidence. Overall, the strength of evidence is moderate to strong.

      Strengths:

      (1) The manuscript tackles a highly relevant methodological question on the robustness of meta-analytic evidence.<br /> (2) EOIMETA represents an innovative extension of fragility concepts from single trials to meta-analyses.<br /> (3) The applications are clearly presented and highlight the potential importance of fragility considerations for evidence synthesis.

    3. Reviewer #3 (Public review):

      Summary and strengths:

      In this manuscript, Grimes presents an extension of Ellipse of Insignificant (EOI) and Region of Attainable Redaction (ROAR) metrics to meta-analysis setting as metrics for fragility and robustness evaluation of meta-analysis. The author applies these metrics to three meta-analyses of Vitamin D and cancer mortality, finding substantial fragility in their conclusions. Overall, I think extension/adaption is a conceptually valuable addition to meta-analysis evaluation, and the manuscript is generally well-written.

      Specific comments:

      (1) The manuscript would benefit from a clearer explanation of in what sense EOIMETA is generalizable. The author mentions this several times, but without a clear explanation of what they mean here.

      (2) The authors mentioned the proposed tools assume low between-study heterogeneity. Could the author illustrate mathematically in the paper how the between-study heterogeneity would influence the proposed measures? Moreover, the between-study heterogeneity is high in Zhang et al's 2022 study. It would be a good place to comment on the influence of such high heterogeneity on the results, and specifying a practical heterogeneity cutoff would better guide future users.

      (3) I think clarifying the concepts of "small effect", "fragile result", and "unreliable result" would be helpful for preventing misinterpretation by future users. I am concerned that the audience may be confusing these concepts. A small effect may be related to a fragile meta-analysis result. A fragile meta-analysis doesn't necessarily mean wrong/untrustworthy results. A fragile but precise estimate can still reflect a true effect, but whether that size of true effect is clinically meaningful is another question. Clarifying the effect magnitude, fragility, and reliability in the discussion would be helpful.

      Comments on revisions:

      I am unable to find the author's responses to my previous round comments (Reviewer #3) in the revision package, though replies to the other reviewers are present. I will provide my updated feedback once these responses are available for review.

    4. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This manuscript addresses an important methodological issue - the fragility of meta-analytic findings - by extending fragility concepts beyond trial-level analysis. The proposed EOIMETA framework provides a generalizable and analytically tractable approach that complements existing methods such as the traditional Fragility Index and Atal et al.'s algorithm. The findings are significant in showing that even large meta-analyses can be highly fragile, with results overturned by very small numbers of event recodings or additions. The evidence is clearly presented, supported by applications to vitamin D supplementation trials, and contributes meaningfully to ongoing debates about the robustness of meta-analytic evidence. Overall, the strength of evidence is moderate to strong, though some clarifications would further enhance interpretability.

      Strengths:

      (1) The manuscript tackles a highly relevant methodological question on the robustness of meta-analytic evidence.

      (2) EOIMETA represents an innovative extension of fragility concepts from single trials to meta-analyses.

      (3) The applications are clearly presented and highlight the potential importance of fragility considerations for evidence synthesis.

      Weaknesses:

      (1) The rationale and mathematical details behind the proposed EOI and ROAR methods are insufficiently explained. Readers are asked to rely on external sources (Grimes, 2022; 2024b) without adequate exposition here. At a minimum, the definitions, intuition, and key formulas should be summarized in the manuscript to ensure comprehensibility.

      (2) EOIMETA is described as being applicable when heterogeneity is low, but guidance is missing on how to interpret results when heterogeneity is high (e.g., large I²). Clarification in the Results/Discussion is needed, and ideally, a simulation or illustrative example could be added.

      (3) The manuscript would benefit from side-by-side comparisons between the traditional FI at the trial level and EOIMETA at the meta-analytic level. This would contextualize the proposed approach and underscore the added value of EOIMETA.

      (4) Scope of FI: The statement that FI applies only to binary outcomes is inaccurate. While originally developed for dichotomous endpoints, extensions exist (e.g., Continuous Fragility Index, CFI). The manuscript should clarify that EOIMETA focuses on binary outcomes, but FI, as a concept, has been generalized.

      Reviewer #2 (Public review):

      Summary:

      The study expands existing analytical tools originally developed for randomized controlled trials with dichotomous outcomes to assess the potential impact of missing data, adapting them for meta-analytical contexts. These tools evaluate how missing data may influence meta-analyses where p-value distributions cluster around significance thresholds, often leading to conflicting meta-analyses addressing the same research question. The approach quantifies the number of recodings (adding events to the experimental group and/or removing events from the control group) required for a meta-analysis to lose or gain statistical significance. The author developed an R package to perform fragility and redaction analyses and to compare these methods with a previously established approach by Atal et al. (2019), also integrated into the package. Overall, the study provides valuable insights by applying existing analytical tools from randomized controlled trials to meta-analytical contexts.

      Strengths:

      The author's results support his claims. Analyzing the fragility of a given meta-analysis could be a valuable approach for identifying early signs of fragility within a specific topic or body of evidence. If fragility is detected alongside results that hover around the significance threshold, adjusting the significance cutoff as a function of sample size should be considered before making any binary decision regarding statistical significance for that body of evidence. Although the primary goal of meta-analysis is effect estimation, conclusions often still rely on threshold-based interpretations, which is understandable. In some of the examples presented by Atal et al. (2019), the event recoding required to shift a meta-analysis from significant to non-significant (or vice versa) produced only minimal changes in the effect size estimation. Therefore, in bodies of evidence where meta-analyses are fragile or where results cluster near the null, it may be appropriate to adjust the cutoff. Conducting such analyses-identifying fragility early and adapting thresholds accordingly-could help flag fragile bodies of evidence and prevent future conflicting meta-analyses on the same question, thereby reducing research waste and improving reproducibility.

      Weaknesses:

      It would be valuable to include additional bodies of conflicting literature in which meta-analyses have demonstrated fragility. This would allow for a more thorough assessment of the consistency of these analytical tools, their differences, and whether this particular body of literature favored one methodology over another. The method proposed by Atal et al. was applied to numerous meta-analyses and demonstrated consistent performance. I believe there is room for improvement, as both the EOI and ROAR appear to be very promising tools for identifying fragility in meta-analytical contexts.

      I believe the manuscript should be improved in terms of reporting, with clearer statements of the study's and methods' limitations, and by incorporating additional bodies of evidence to strengthen its claims.

      Reviewer #3 (Public review):

      Summary and strengths:

      In this manuscript, Grimes presents an extension of the Ellipse of Insignificant (EOI) and Region of Attainable Redaction (ROAR) metrics to the meta-analysis setting as metrics for fragility and robustness evaluation of meta-analysis. The author applies these metrics to three meta-analyses of Vitamin D and cancer mortality, finding substantial fragility in their conclusions. Overall, I think extension/adaptation is a conceptually valuable addition to meta-analysis evaluation, and the manuscript is generally well-written.

      Specific comments:

      (1) The manuscript would benefit from a clearer explanation of in what sense EOIMETA is generalizable. The author mentions this several times, but without a clear explanation of what they mean here.

      (2) The authors mentioned the proposed tools assume low between-study heterogeneity. Could the author illustrate mathematically in the paper how the between-study heterogeneity would influence the proposed measures? Moreover, the between-study heterogeneity is high in Zhang et al's 2022 study. It would be a good place to comment on the influence of such high heterogeneity on the results, and specifying a practical heterogeneity cutoff would better guide future users.

      (3) I think clarifying the concepts of "small effect", "fragile result", and "unreliable result" would be helpful for preventing misinterpretation by future users. I am concerned that the audience may be confusing these concepts. A small effect may be related to a fragile meta-analysis result. A fragile meta-analysis doesn't necessarily mean wrong/untrustworthy results. A fragile but precise estimate can still reflect a true effect, but whether that size of true effect is clinically meaningful is another question. Clarifying the effect magnitude, fragility, and reliability in the discussion would be helpful.

      I am very appreciative of the insightful comments you all shared, and in light of them have made several clarifications and revisions. Thank you again, I am grateful to have received such considered feedback and I hope I’ve addressed any outstanding issues. I have replied to each reviewer’s recommendations in this document sequentially for ease of scanning, and am most grateful for the summary strengths and weaknesses, which I am also incorporated into these replies. Thank you again!

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) The manuscript makes the important argument that many meta-analyses are inherently fragile, which aligns with prior work (e.g., PMID: 40999337). Please add the reference to the statements.

      Excellent point, thank you – I’ve expanded the discussion of fragility analysis, and its application to meta-analysis, including this reference.

      (2) The rationale and mathematical underpinnings of the proposed EOI and ROAR methods are not sufficiently explained. While the authors cite Grimes (2022, 2024b), readers are expected to rely heavily on these external sources without adequate exposition in the current paper. This limits the ability to fully evaluate the reasonableness of the methods or to reproduce the approach. I strongly recommend expanding the description of EOI and ROAR within the manuscript.

      I agree fully – I was a little remiss in this scope, as I was worried about overwhelming the reader. However, I was too sparse with detail and have now extended the text this way to describe the methods intuitively as possible (see Discussion, subsection “Ellipse of Insignificance and Region of Attainable Redaction”

      (3) In the Methods, the authors note that EOIMETA is applicable when between-study heterogeneity is low. However, the manuscript provides little guidance on how to interpret results when heterogeneity is high (e.g., larger I² values). I recommend clarifying this issue in the Results or Discussion sections, emphasizing the limitations of EOIMETA under high heterogeneity. Ideally, the authors could include either a small simulation study or an illustrative example to demonstrate the performance of the method in such settings.

      This is an excellent question, and I was remiss for not considering it better in the manuscript. Originally, the simple idea was to just pool the results for EOI, in which case heterogeneity would be an issue. But I then subsequently added weighed-inverse variance methods to account for situations with increased heterogeneity, so my initial comment was not strictly correct. I’ve changed the text in several places, notably in the methods and in the discussion (see reply point 5).

      (4) While EOIMETA is introduced as a generalizable fragility metric for meta-analyses, the illustrative examples would benefit from clearer comparisons with the traditional Fragility Index (FI). Because FI is well established in the RCT literature and familiar to many readers, presenting side-by-side results (e.g., FI at the trial level versus EOIMETA at the meta-analytic level) would provide important context. Such comparisons would also highlight the added value of EOIMETA, underscoring that even when individual trials appear robust under FI, the pooled meta-analysis may remain fragile.

      This is an excellent idea! The new table is given below. Note that traditional FI are not defined for non-significant results, and EOI is ambiguous for counts <2.

      (5) In the Discussion currently states that the Fragility Index (FI) applies only to binary outcomes. This is not entirely accurate. While the original FI was indeed developed for dichotomous endpoints, subsequent methodological work has extended the concept to other data types, including continuous outcomes (continuous fragility index, CFI). The manuscript should acknowledge this distinction: EOIMETA presently focuses on binary outcomes at the meta-analytic level, but FI more broadly is not restricted to binary data. Adding this clarification, with appropriate citations, would improve accuracy and place EOIMETA more clearly within the broader fragility literature.

      Thank you for this catch – clarified now in the discussion:

      Reviewer #2 (Recommendations for the authors):

      (1) Typos/inconsistencies/writing clarifications: All table and figure legends and titles are missing a period at the end of each sentence. In the sentence "to be estimated by bootstrap methods. Initially, we ran...", there should be a space between "methods" and "Initially" (line 113).

      Apologies, these are now remedied.

      (2) In Table 2, the total number of patients in the meta-analysis of all 12 studies is reported as 133,262, whereas the text states 133,475 patients. Based on my calculations from Figure 2, the total appears to be 133,262. Could you please clarify this discrepancy?

      Certainly – your calculations are correct. The text figure was a typo based on a very early draft where the summation function was not correctly run, and doubled counted some cases. This was fixed for the figure but not the text. The text should now match, thank you for spotting this. There are some issues with figure 2, which I will address in next few points.

      (3) Regarding this point, the meta-analysis by Zhang et al. (2019) shows some inconsistencies in the reported number of patients in the paper. According to the data provided on GitHub the total number of patients is 37671. However, Table 1 of the paper lists 38538 patients, and the main text states "5 RCTs involving 39168 patients." Similarly, for Guo et al. (2023), the main text reports that the meta-analysis included 11 RCTs with 112165 patients, whereas the table lists 111952, which appears consistent with the data available on GitHub. There is also a discrepancy in Zhang et al. (2022), which cites 61853 patients in the introduction but 61223 patients in Table 1. These inconsistencies should be clarified, as even small discrepancies in reported sample sizes can undermine the credibility of the analyses presented.

      Well-spotted – the incorrect figures are artefacts of an early draft with a double-counting summation function, and I should have spotted them and removed them prior to submission. To clarify, the correct figures from each study (which agree with github data) are given in the corrected table 1.

      Thus, there are 38,538 subjects in the Zhang et al 2019 analysis, which matches the first sheet of the github listing. The confusion comes from sheet 2 which was included only with this, which breaks these events down into events / non-events (hence the total non-events being 37,671) but keeps the old labels. This is needlessly confusing, and accordingly I have re-uploaded the data with correct headers for sheet 2.  This summation problem was also apparent in the total of figure 2, which has been replaced with a correct version now. Thank you for spotting this!

      (4) In line 158, who does "He" refer to? Please clarify this in more detail.

      Apologies, this was a typo and should have read “the” – now corrected.

      (5) The discrepant results of the RCT by Scragg et al. (2018) between the meta-analysis by Zhang et al. and that by Guo et al. could be presented in a table. This could be included as supplementary material or, preferably, in the main text (Results section).

      To avoid confusion, I will add a version of this to the github files for interested users to explore.

      (6) In the legend of Figure 2, a period is missing at the end of the sentence. Additionally, although it is generally understood, it would be helpful to specify that the numbers in parentheses represent the confidence intervals. Please confirm whether these are 95%, 89%, or 99% confidence intervals.

      Apologies, these are 95% CIs. Clarified now in updated legends.

      (7) The statement of "The more recent and robust methods for fragility analysis (EOI) and redaction (ROAR) have potential applications beyond fragile-by-design RCTs, extending to cohort studies, preclinical work, and even ecological studies, as stated by the author" in line 163. Could you please provide references supporting these claims? I believe the relevant references may be included in the EOI paper, but it would be helpful to cite them here as well.

      This has recently been used in new analysis now cited in the introduction with fuller description of method for context. Please see response to reviewer 1, points 2

      (8) Since the study was previously published as a preprint (https://www.medrxiv.org/content/10.1101/2025.08.15.25333793v1.full-text), this should be mentioned in the manuscript.

      Added as a note now.

      (9) It would also be valuable to include a figure illustrating ROAR for the same meta-analyses presented in Figure 1 for EOI, possibly as supplementary material.

      See reply to point 10.

      (10) Finally, it would be interesting to provide plots of both EOI and ROAR for the meta-analyses of all 12 included studies. These graphs could be replicated using the code examples provided by the author in the original EOI and ROAR publications.

      These have now been added to the github repository as supplementary material.

      (11a) Replications of EOI fragility: eoicfunc.R (github): - In the code provided on GitHub, an error occurred in the "EllipseFromEquation" function within eoifunc. This was due to the PlaneGeometry package not being available for the latest version of R. I attempted several installation methods (using devtools, remotes, and GitHub, as well as direct installation from a URL). However, after adjusting the code, I was able to run the analyses. For the full cohort, including all 12 studies using the EOI approach, I obtained a Minimal Experimental Arm only recoding (xi) = 14 and a Minimal Control Arm only recoding (yi) = 15, whereas the authors reported that 5 recodings were sufficient. It appears that differences in code versions or functions might have slightly affected the results. After downgrading R and running the eoic function with PlaneGeometry successfully installed, the fragility index for the EOI approach was 15 rather than 5.

      Apologies for the issue with PlaneGeometry, I will try to fix this for future iterations. The difference you see is an artefact of running EOIFUNC on pooled data, rather than the dedicated EOIMETA function, with the chief difference being that EOIFUNC doesn’t apply WIV correction.  If we simply pool events, this is the output:

      Author response image 1.

      If the reviewer uses the EOIMETA function which employs inverse weighing, then to define each trial we use a vector of events and non-events in each arm. For all the 12 studies, this would be (in R code syntax, or import from github file)

      Author response image 2.

      Then they will obtain:

      Author response image 3.

      If the reviewer runs a simple pooler analysis with weighed inverse correction turned off, they should return a similar answer as a simple eoifunc call, save the zero count correction difference. But EOIMETA weighs the sample, and is reported in main paper.

      (12) I recalculated the eoic function for Zhang et al. (2019) and found a fragility index (dmin) of 1. FECKUP Vector Length: 0.5722. Minimal Experimental Arm Recoding (xi): 0.7738. Minimal Control Arm Recoding (yi): 0.8499.

      This again appears to be an artefact of using eoifunc rather than eoimeta; with eoimeta, which uses WIV to adjust the studies for heterogeneity effects, this is the reported output:

      Author response image 4.

      (13) Using the previous code (before downgrading R and loading PlaneGeometry), I recalculated the EOI for Zhang et al. (2022) and found Minimal Experimental Arm only recoding (xi) = 55 and Minimal Control Arm only recoding (yi) = 59-results slightly closer to those reported by the authors. After properly loading PlaneGeometry, I recalculated and obtained for Zhang et al. (2022): Fragility index (dmin) = 57; FECKUP Vector Length = 39.948; Minimal Experimental Arm Recoding (xi) = 54.5436; Minimal Control Arm Recoding (yi) = 58.635.

      Again this appears to be a difference in using eoifunc or eoimeta as a call -  I can replicate this result using EOIFUNC:

      Author response image 5:

      But adjusting for study weighing with eoimeta:

      Author response image 6.

      (14) For Guo et al. (2022), the EOI fragility index was 17 [dmin = 17]. FECKUP Vector Length: 11.3721. Minimal Experimental Arm Recoding (xi): -15.6825. Minimal Control Arm Recoding (yi): -16.5167. However, the authors report an EOI fragility of 38. Since I was able to load PlaneGeometry properly and run eoicfunc.R (from GitHub) without errors, the discrepancies likely reflect minor coding or version inconsistencies rather than software limitations.

      These again stem from using eoifunc on simple pooled data versus eoimeta, which adjusts by study.

      (15) Replications of ROAR fragility: roarfunc.R (github): - For Guo et al. (2022), the ROAR fragility calculated using roarfunc.R was 16 [rmin (Redaction Fragility Index) = 16]. FOCK Vector Length: 15.942. Minimal Experimental Arm Redaction (xc): 15.9442. Minimal Control Arm Redaction (yc): 978.8906. In the main text, the author reports a redaction fragility of 37. What might explain these discrepancies?

      Again, this stems from EOIMETA versus EOIFUNC (and roarfunc calls without weighed adjustment). As the reviewer has observed, the fragility increases when there is no study level adjustment, which we have now added to the discussion text.

      (16) In generic_run.R, line 6 contains a bug - it is missing a forward slash (/) between the directory path and the filename. The correct line of code should be: pathload = paste0(pathname, "/", filename, exname). The same issue occurs in generalcode.R.

      Apologies, I will correct this in the upload!

      (17) Theoretical framework: Is there any other method available for comparison besides the one proposed by Atal et al.? Could you include a brief literature review describing alternative approaches?

      To my knowledge, there is not – Xing et al (now referenced) covered this earlier in the year, and I have included an expanded background for this purpose. Please see reply to reviewer 1, point 1.

      (18a) There appears to be no heterogeneity in the meta-analysis in terms of effect sizes and I², likely because most values are quite large, yet the included studies address very different populations (e.g., patients with COPD, NSCLC survivors, older adults, women, and GI cancer survivors). This could have been explained more clearly, including how such diverse literature might influence fragility indices or whether there is a logical rationale for combining these studies. Could you perform a sensitivity analysis or provide a conceptual explanation of how the heterogeneity - or lack thereof - across these trials may affect the fragility indices? Although I² values are small, the conceptual heterogeneity among studies suggests that the pooled results may be comparing fundamentally different clinical contexts, which requires clarification.

      I think this is a very pertinent point, I am unsure as to why these authors combined such diverse populations without any consideration of whether they were comparable, but this is a common problem in meta-analysis. I have added the following to the discussion to address this problem:

      “The use of vitamin D meta-analyses in this work was chosen as illustrative rather than specific, but it is worth noting that there are methodological concerns with much vitamin D research. (Grimes aet al., 2024). The three studies cited in this work report relatively low heterogeneity in their meta-analysis in both effect sizes and I<sup>2</sup> values, but it is worth noting that the included studies addressed very different populations, including patients with Chronic Obstructive Pulmonary Disease, Non small cell lung cancer survivors, women only cohorts, older adults, and gastrological cancer survivors. These groups have presumably different risk factors for cancer deaths, and why the authors of these studies combined the cohorts with fundamentally different clinical contexts is unclear. Why the heterogeneity appeared so relatively low in different groups is also a curious feature. This goes beyond the scope of the current work, but serves as an example of the reality that meta-analysis is only as strong as its underlying data and methodological rigor in comparing like-with-like, and the conclusions drawn from them must always be seen in context.”

      Reviewer #3 (Recommendations for the authors):

      (1) Line 156, acronym FI not defined.

      Apologies, I this is now defined at the outset as “fragility index”.

      (2) Line 158, typo "He"?

      Apologies again, this was a typo and was supposed to read “the”, fixed now.

      (3) Across the manuscript, I think the "re-coding" phrasing may confuse clinical readers. Maybe rephrasing to "flipping event classification" or "flipping group" would be better.

      Excellent point – this has now been modified at the outset.

    1. eLife Assessment

      This important study combines microscopy and CRISPR screening to identify factors involved in global chromatin organisation, using centromere clustering as a proxy. The authors present solid evidence demonstrating that acute depletion of a range of mitotic regulators alters centromere distribution in interphase. The work will be of interest to researchers studying genome organisation, nuclear architecture, chromosome biology, and the mechanisms linking mitosis to interphase nuclear organisation.

    2. Reviewer #1 (Public review):

      Summary:

      In this manuscript, Guin and colleagues establish a microscopy-based CRISPR screen to find new factors involved in global chromatin organization. As a proxy of global chromatin organization they use centromere clustering in two different cell lines. They find 52 genes whose CRISPR depletion leads to centrome clustering defects in both cell lines. Using cell cycle synchronisation, they demonstrate that centromeres-redistribution upon depletion of these hits necessitates cell cycle progression through mitosis.

      Strengths:

      This manuscript explores the mechanisms of global chromatin organization, which is a scale of chromatin organization which remains poorly understood. The imaging based CRISPR screeen is very elegant and use of appropriate positive and negative controls reinforces the solidity of the findings.

      Weaknesses:

      The manuscript shows interesting observations but left a major question unanswered: what is the functional relevance of centromeres clustering?

    3. Reviewer #2 (Public review):

      The authors begin by highlighting the importance of genome organisation in cellular compartmentalisation and identity. They focus their study on centromeres - key chromosomal features required for segregation-and aim to identify proteins responsible for their spatial distribution in interphase nuclei. However, none of the experimental data addresses broader aspects of genome architecture, such as individual chromosome territories or A/B compartments. As such, the title of the article may be misleading and would benefit from being more specific, for example: "Identification of factors influencing centromere positioning in interphase."

      Strengths:

      One of the strengths of the paper is the comprehensive CRISPR-based screening and the comparative analysis between two distinct cell lines.

      Including further investigation into factors that behave differently across these cell lines - particularly in relation to expression levels or the unique "inverted architecture" of RPE cells-would have added valuable depth.

      Comments on revisions:

      From the previous review:<br /> The Authors have undertook a very minimal revision of the paper. The Authors have addressed some of the comments raised by rewarding the text and being slightly more critical in the interpretation of their results and added previously published literature.<br /> They have provided more details on the characterisation of the new cell lines and added some statistical analyses.

      However, I still believe that the title does not reflect the finding, as it is all about centromere position rather than "interphase genome architecture" as claimed.<br /> As I said in my previous comments, this will make a precedent and will cause mis-interpretations in the field.

      Changes from the previous version:

      While in the new manuscript the Authors have discussed that degradation of NUF2 and SPC24 caused some aberrant nuclear phenotypes, this is at odd with the first screening where these morphologies were used as part of the exclusion criteria. Some comments would be required.

    4. Reviewer #3 (Public review):

      Summary:

      In this manuscript, Guin et al. use a CRISPR KO screen of ~1000 candidates in two human cell lines along with high-throughput image analysis to demonstrate that orderly progression through mitosis shapes centromere organization. They identify ~50 genes that perturb centromere clustering when depleted in both RPE1 and HCT116 cells and validate many of these hits using RNAi. They then use auxin-mediated acute depletion of four factors (NCAPH2, KI67, SPC24 and NUF2) to demonstrate that their effects on centromere clustering require passage through mitosis. They further suggest that lack of these factors during mitosis leads to disorganization of centromeres on the mitotic spindle and these effects persist in the subsequent interphase. Overall, the manuscript is clear, well-written, the experiments performed are appropriate and the data is interpreted accurately. In my opinion, the main strength of this manuscript is the discovery of several hits associated with altered centromere clustering. These hits will serve as a solid foundation for future work investigating the functional significance of centromere clustering in human cells. On the other hand, how the changes in centromere clustering relate to other aspects of interphase genome architecture (A/B compartments, chromosome territories etc) remains unclear and represents the main limitation of this manuscript.

    5. Author Response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Although the data are generally solid and well interpreted, a control showing that protein depletion works properly in cell-cycle arrested cells is lacking, both when using siRNAs and degron-based depletion.

      We now demonstrate in Fig. S9 efficient degron-mediated depletion of both NUF2 and SPC24 in cell-cycle arrested cells by Western blotting. We show similar data for siRNA knockdowns. Our siRNA knockdown experiments include a “siDEATH” control that induces cytotoxicity by targeting several essential genes. In Fig. S6a we now show that siDEATH transfection results in strong cytotoxicity and cell death in cycling as well as cell cycle arrested G1/S and G2/M populations indicating efficient protein depletion. Additionally, in Fig. S6b we now show depletion NCAPH2 protein levels by siRNA knockdown in cycling as well as cell cycle arrested cell populations by Western blot analysis. We mention these results on page 11 and page 13.

      Reviewer #2 (Public review):

      The filtering strategy used in the screen imposes significant constraints, as it selects only for non-essential or functionally redundant genes. This is a critical point, as key regulators of chromatin organisation - such as components of the condensin and cohesin complexes-are typically essential for viability. Similarly, known effectors of centromere behaviour (e.g., work by the Fachinetti's lab) often lead to aneuploidy, micronuclei formation, and cell cycle arrest in G1. The implication of this selection criterion should be clearly discussed, as it fundamentally shapes the interpretation of the study's findings.

      We discussed our hit selection criteria on page 8 and in the Methods section. Some of the concerns regarding a bias towards non-essential genes are alleviated by the fact that our screen is limited to a relative short duration of 72 hours rather than the longer timepoints that are generally used to assess essentiality in pooled CRISPR-KO screens, allowing us to identify genes that may be essential if eliminated permanently. In support of this notion, we identify subunits of the essential condensin and cohesin complexes as hits with only limited effect on cell viability. In this case, the Z-score for change in cell number upon NCAPH2 knockout was -0.26 indicating only a mild reduction compared to the average cell number across all targets.

      Other confounding effects on hit selection due to micronuclei formation, cell cycle effects etc. are minimized as we closely monitor micronuclei formation and cell viability in our screen. Finally, aneuploidy is similarly not a confounding factor in hit identification since, as we previously demonstrated, the Ripley’s K-based clustering score is robust to changes in spot number (Keikhosravi, A., et al. 2025).

      A major limitation of the study is the lack of connection between centromere clustering and its biological significance. It remains unclear whether this clustering is a meaningful proxy for higher-order genome organisation. Additionally, the study does not explore potential links to cell identity or transcriptional landscapes. Readers may struggle to grasp the broader relevance of the findings: if gene knockouts that alter centromere positioning do not affect cell viability or cell cycle progression, does this imply that centromere clustering - and by extension, interphase genome organisation - is not biologically significant?

      We appreciate these points. Given the presence of one centromere on each chromosome, we used centromeres as surrogate landmarks of higher-order nuclear genome organization and considered centromere patterns as a general indicator of overall genome organization. While the relationship of centromere patterns to other genome features is poorly understood in mammalian cells, a link is suggested by observations in other organisms. For example, in yeast, the clustering of centromeres reflects the overall Rabl configuration of chromosomes. Having said that, we agree that our extrapolation to overall genome organization is somewhat speculative, and we have toned down these conclusions throughout the manuscript.

      We agree that one of the most interesting questions emerging from our study is whether centromere clustering has a functional role. In follow-up studies we will use some of the key regulator identified in these screens to perturb the native centromere distribution and assay for various cellular responses including in gene expression and genome integrity. These studies will be the subject of future publications.

      Another point requiring clarification is the conclusion that the four identified genes represent independent pathways regulating centromere clustering. In reality, all of these proteins localise to centromeres. For example, SPC24 and NUF2 are components of the NDC80 complex; Ki-67, a chromosome periphery protein, has been mapped to centromeres; and CAP-Hs, a subunit of the condensin II complex that during G1 promotes CENP-A deposition. Given their shared localisation, it would be informative to assess aneuploidy indices following depletion of each factor. Chromosome-specific probes could help determine whether centromere dysfunction leads to general mis-segregation or reflects distinct molecular mechanisms. Additionally, exploring whether Ki-67 mutants that affect its surfactant-like properties influence centromere clustering could provide a more mechanistic insight.

      We thank the reviewer for this comment. We now clarify the relationship of these proteins to centromeres in more detail on page 12. While they all have some relationship to centromeres, as would be expected if they contributed to centromere clustering, they represent multiple distinct pathways and processes.

      The observed effects on clustering are unlikely due to aneuploidy as only very limited aneuploidy is observed in our cells and because Ripley’s K measurement of centromere clustering is robust to change in chromosome copy number. Follow-up studies using live cell imaging approaches are currently in progress to address some of these mechanistic questions.

      Finally, the additive effects observed mild mis-segregation effects are amplified when two proteins within the same pathway are depleted. This possibility should be considered in the interpretation of the data.

      We rephrased the text on page 14 based on the reviewer’s recommendations.

      Reviewer #3 (Public review):

      Given the authors' suggestion that disorderly mitotic progression underlies the changes in centromere clustering in the subsequent interphase, I think it would be beneficial to showcase examples of disorderly mitosis in the AID samples and perhaps even quantify the misalignment on the metaphase plate.

      We now include in Fig. S11 examples of disordered mitotic nuclei observed in the absence of NUF2 or SPC24.

      I don't quite agree with the description that centromeres cluster into chromocenters (p4 para 2, p17 para 1, and other instances in the manuscript). To the best of my knowledge, chromocenters primarily consist of clustered pericentromeric heterochromatin, while the centromeres are studded on the chromocenter surface. This has been beautifully demonstrated in mouse cells (Guenatri et al., JCB, 2004), but it is true in other systems like flies and plants as well.

      We have modified this description on page 4.

      Recommendations for the authors:

      Reviewing Editor Comments:

      (1) Proper characterisation of the cell lines used in the manuscript. Tagged proteins have been known to affect protein levels compared to the parental cell, and where this is the case (or not), it needs to be transparently shown in the manuscript.

      The cell lines to conditionally deplete NCAPH2 and KI67 have previously been published, and they have been characterized to show normal expression levels of the tagged protein (Takagi et al., 2018). We also show quantification of Western blots to compare protein level of tagged SPC24 and NUF2 to that of the untagged proteins in the parental cell line (Fig. S8e-f) and discuss these results on page 11 and page 12.

      (2) Demonstration of protein depletion in the degron cell lines.

      We showed efficient protein depletion in the degron cell lines (Fig. S8c and S8d). In addition, we now show in Fig. S9 depletion of SPC24 and NUF2 in cells arrested at G1/S and G2/M.

      (3) The study examines centromere clustering, but not genome architecture. While it is understood that a complete investigation of genome architecture is beyond the scope of the current study, the interpretation does not match the data. The authors are suggested to pay attention to this point throughout the manuscript and consider their findings in terms of centromere clustering rather than genome architecture, including changing the title accordingly.

      We have toned down our statements regarding overall genome organization throughout the manuscript. Since centromeres are a natural fiducial marker for overall genome organization and a link to overall genome organization has been suggested in some organisms such as yeast, we have retained the wording in a few select instances, including the title. We also make it clear that we do not intend to draw conclusions regarding TADs or even compartments but consider centromere patterns an indicator of overall genome organization.

      Reviewer #1 (Recommendations for the authors):

      (1) Controls of depletion by western blot in synchronized cells (siRNAs and degrons) are lacking.

      We now show Western blots demonstrating efficient depletion of the target proteins in degron (Fig. S9) and siRNA treated cell-cycle arrested cells (Fig. S6b).

      It would have been very nice to discuss the implications of these findings further. For example, do centromere clustering changes gene expression/repression of pericentromeric heterochromatin expression? Is centromere clustering associated with specific diseases? How is global chromatin organization affecting gene expression/genome stability, etc? Although some of these aspects are unknown, a discussion about them would have been nice.

      We appreciate these interesting points. These questions are the subject of our ongoing follow up studies. We now discuss possible consequences of centromere re-organization on gene expression and genome stability on page 18.

      Reviewer #2 (Recommendations for the authors):

      Major Comments:

      (1) Clarify Scope and Avoid Overinterpretation

      (a) The study exclusively investigates centromere positioning, without addressing broader aspects of genome architecture.

      (b) There is no established link presented between centromere positioning and higher-order genome organisation.

      We have toned down our statements regarding overall genome organization throughout the manuscript. Since centromeres are a natural fiducial marker for overall genome organization and observations in yeast suggest such a link, we have retained the wording in a few select instances. We make it clear that we do not intend to draw conclusions regarding TADs or even compartments but consider centromere patterns an indicator of overall genome organization.

      (c) The exclusion criteria used in the screen should be clearly explained, including the implications of selecting only non-essential or redundant genes.

      We discuss on page 8 and in the Methods section the exclusion criteria used in the screen, including the implications for identifying essential genes.

      (d) The authors should discuss why the identified proteins significantly affect centromere clustering but do not impact cell cycle progression.

      We now discuss this topic briefly on page 9. While some hits are expected to affect both cell-cycle progression and centromere clustering (Fig. S4c), it is not a priori expected that all hits would affect both.

      (2) Supplementary Figure 1

      This figure appears unnecessary. The co-localisation between CENP-C and CENP-A is well established in the literature, and the scoring provided does not add essential new information.

      The data was included in response to repeat questions from a centromere expert. We prefer to retain this data for completeness.

      (3) Differential Hits between Cell Lines 

      For hits that behave differently across cell lines, expression data should be provided. Are the genes equally expressed in both cell types? What is the level of depletion achieved?

      It is possible that cell-type specific hits arise due to difference in expression. Cell-type specific hits may also arise due multiple other reason including cancer vs. non-cancer origin, hTERT-immortalization, cell growth properties, variation in underlying DNA sequences of the Cas9 target loci, initial state of centromere clustering to name a few. Each of these possibilities requires additional experiments to identify the exact reason for cell-type specificity of a given factor. A full analysis of the reason for cell-type specificity is, however, beyond the scope of current study.

      (4) Efficiency of Cell Cycle-Specific Degradation

      Degradation efficiency likely varies across cell cycle stages. The authors should provide Western blots showing the extent of protein depletion at each cell cycle block.

      We provide Western blot data in Fig. S9 to demonstrate efficient knockdown of proteins in G1/S and G2/M arrested cells.

      (5) Figure S6 - Validation of New Cell Lines

      Genotyping data for the newly generated cell lines should be included, along with Western blots using protein-specific antibodies (not just the tag), compared to the parental cell line.

      We provide in Fig. S7c-d genotyping data and in Fig. S8e-f Western blot data to compare levels of tagged and untagged proteins.

      (6) Figure S7 - G2/M Block Efficiency

      The G2/M block appears suboptimal after 20 hours in RO-3306, with only ~50% of cells in G2/M and just 21-27% for Ki-67, where most cells remain in S phase. This raises concerns about the interpretation of mitotic depletion effects. It is possible that cells never progressed from G1 or completed S phase without Ki-67. Prior studies (van Schaik et al., 2022; Stamatiou et al., 2024) have shown delayed and uneven replication of centromeric/pericentromeric regions upon Ki-67 depletion during S phase, which could affect the readout. Live-cell imaging would be a more robust approach to confirm mitotic status.

      For KI67 after RO-3306 treatment, 73 and 67% cells were arrested at the G2/M boundary in the presence or absence of KI67, respectively (Fig. S10a-b). Upon release from G2/M arrest, the proportion of G1 cells increased from 6-13% to 28-60% in all four factors tested (Fig. S10b, and d). Please note that our results are not directly dependent on release efficiency, since we use single-cell staging (Fig. 3b) and selectively analyze only G1 populations (Fig. 5c).

      We are currently working towards live cell imaging, but this requires development and characterization of additional cell lines which is beyond the scope of this study.

      Statistical analyses of cell cycle phase distributions should also be included.

      We include statistical analyses of cell cycle phase distributions in Fig. S4c and Fig. S10c-d by performing t-tests with FDR corrections to compare percentage of cells in either in G1, S or G2 in the presence and absence of each factor tested.

      (7) Aneuploidy Assessment

      Aneuploidy scores for the four key proteins should be provided, ideally using centromere-specific FISH probes.

      While an aneuploidy score for each hit would be interesting piece of information, we showed in a previous publication that the Ripley’s K-based Clustering Score method used here is robust to aneuploidy (Keikhosravi et al., 2025) and aneuploidy would thus not lead to spurious identification of these proteins in our screen.

      (8) Add-Back Experiment (Page 14)

      While the add-back experiment is conceptually strong, its execution could be improved. <br /> It should be performed on synchronised cells: deplete the protein in G2/M, arrest in thymidine, then release into G1 without the protein to observe the unclustering phenotype.

      Re-expression should occur during the block, followed by release and analysis in the next G1 phase. This would better demonstrate whether clustering defects from the previous division can be rescued.

      We have attempted these types of long-term depletion experiments in cell-cycle arrested cells, but have observed significant viability defects, making results uninterpretable.

      (9) Statistical Analyses

      Several figures lack statistical analysis, which is essential for data interpretation:

      (a) Figure 1B-E

      (b) Figure 3I

      (c) Figure 4B

      (d) Figure 5B, C, G

      (e) Supplementary Figures S4B and S7

      Statistical analyses were performed for a) Fig. 1b-e, b) Fig. 3i, c) Fig. 4b, d) Fig. 5b-c and the details of the test are mentioned in the corresponding figure legends. We also include statistical tests for Fig. 5g, S5b and S7c-d.

      Minor Comments:

      (1) Page 9: "Reassuringly, in line with known centromere-nucleoli association (Bury, Moodie et al. 2020, van Schaik, Manzo et al. 2022)..."

      The citation "van Schaik, Manzo et al. 2022" is incorrect and should be revised.

      We have removed this reference.

      (2) Page 10:

      "...were grouped into six categories: regulators of chromatin structure, kinetochore proteins, nucleolar proteins, nuclear pore complex components..."

      The authors should note that NUP160, listed as a nuclear pore complex hit, is also a kinetochore component during mitosis and may be linked to mitotic defects.

      We now mention this on page 10.

      (3) Page 12:

      "Progression through S phase was equally efficient in the presence or absence of KI67."

      While bulk S phase progression may appear unaffected, refined analyses (e.g., Repli-seq, EdU patterning) have shown delayed replication of centromeric/pericentromeric regions upon Ki-67 depletion. This should be acknowledged, especially given the study's focus on centromeres (see Schaik et al., 2022; Stamatiou et al., 2024).

      Our statement was meant to describe the results we observed in this study. We indicate that overall progression is not affected, but subtle effects may persist, and we cite the relevant references on page 13.

      (4) Page 12:

      "KI67 is a well-known marker of cell proliferation..."

      The first study demonstrating the dependency of chromosome periphery on Ki-67 was Booth et al., 2014, which should be cited.

      This citation has been added.

      Reviewer #3 (Recommendations for the authors):

      (1) On page 14, paragraph 1, the authors suggest that NCAPH2 and SPC24 act independently on centromere clustering. I'm not convinced that this is the right interpretation of the data. Rather, the lack of an additive phenotype following NCAPH2 and SPC24 dual depletion suggests to me that these two proteins are acting in the same pathway.

      We show that knockdown of NCAPH2 and SPC24 results in opposite effects in centromere clustering. However, knockdown of SPC24 in NCAPH2-AID cells produces an intermediate level of clustering compared to depletion of NCAPH2 or SPC24 knockdown alone. This indicates additive effects. We have modified our description of these results on p. 14.

      (2) The analysis and experimental design in Figure 5g could be improved. For one, I would add statistical comparisons like the other figure panels. Second, the authors would ideally perform AID depletion in a synchronized G2 population before washout during the subsequent G1. This design might make some of the more subtle changes (e.g., KI67-AID) more obvious.

      We now include statistical analysis in Fig. 5g. We have attempted long-term depletion experiments in cell-cycle arrested cells, but have observed significant viability defects, making results uninterpretable.

      (3) In the discussion, the authors allude to centromere clustering data from the NDC80 complex, HMGA1, and other HMGs but fail to direct the reader to where they may find the data. If these data are in Tables S4 and S5, perhaps the authors could make these tables more reader-friendly?

      For each target, the mean Z-score of two biological replicates based on Clustering Score is located in column H in Table S4 and S5.

      (4) In my opinion, the term 'clustering score' comes across a bit ambiguous. In most cases, this term appears to refer to the distance between centromeric foci but is used occasionally to refer to the number of centromeric spots. For example, on page 9, paragraph 1, line 3, cluster/clustering is used three times but with slightly different meanings. Perhaps the authors can consider using the word 'clustering' to indicate the number of spots, 'dispersion' to indicate distance between centromeres, and 'radial distribution' to indicate distance from the nuclear center? Or other ways to improve the consistency of the descriptive terms.

      We apologize for not being clear. The Clustering Score is a very specific parameter derived from use of a Ripley’s K clustering algorithm as described in Materials and Methods. We now ensure that the term is used correctly throughout and that the other terms are also used consistently.

    1. eLife Assessment

      This manuscript provides a valuable contribution by identifying a stress-responsive circuit and its regulation of anxiety-related behaviors. The evidence is convincing that the supramammillary nucleus contains stress-responsive neurons that increase anxiety-like behaviors when activated, and that ventral subiculum projections to the supramammillary are also activated by stress and their inhibition alleviates some effects of stress. Evidence that this pathway encodes and is functionally specific to anxiety is, at present, not sufficiently support and will require future studies. This work offers new insights into how distinct circuits are activated by stress and can regulate emotional behaviors and will be of interest to those interested in brain systems of aversive emotional and behavioral states.

    2. Reviewer #1 (Public review):

      In the revised manuscript, the authors refine their conclusions, narrow their interpretation, and add limited new analyses but have not added additional new data or made fundamental changes in the analyses of their data.

      The central findings are that the SuM contains neurons that are activated by stressors (foot shock and social defeat). Chemogenetic activation of SuM and the neurons genetically tagged as active during foot shocks, which the authors define as Stress Activated Neurons, increases classic anxiety-like behaviors. The subiculum projects to the SuM, and terminals in the SuM from the ventral versus dorsal subiculum are differentially active during elevated plus-maze transitions. Chronic inhibition of vSub neurons that project to SuM mitigates CSDS-induced anxiety-like behaviors.

      Due to limitations in the data and experimental design the findings are felt to remain incomplete. A central limitation is the discordance between the temporal resolution of the behavioral assays and the neural interventions used. This weakens support for the conclusions drawn about the causal roles the SuM and specific vSub projections to SuM (vSub→SuM) may play in anxiety and anxiety-like behaviors. The authors acknowledge this limitation but do not address it experimentally in the revised manuscript. Furthermore, the connection between chronic inhibition of vSub→SuM neurons for 10 days and the alleviation of CSDS-induced anxiety is incomplete. Separately, the use of foot shock and social defeat stressors in connection with SuM neurons, with limited exploration of the potential (or lack thereof) relation between the two groups, further limits the ability to draw conclusions from the data.

      Although a number of interesting points are raised through the experiments the weakness noted will reduce the impact of the work in the field.

    3. Reviewer #2 (Public review):

      This manuscript investigates the neural mechanisms of anxiety and identifies the supramammillary nucleus (SuM) as a critical hub in mediating anxiety-related behaviors. The authors describe a population of neurons in the SuM that are activated by acute and chronic stress. While their activity is not required for fear memory recall, reactivation of these neurons after chronic stress robustly increases anxiety-like behaviors as well as physiological stress markers. Circuit analysis further shows that these stress-activated neurons are driven by inputs from the ventral, but not dorsal, subiculum, and inhibition of this pathway exerts an anxiolytic effect.

      The study provides an elegant integration of techniques linking stress, neuronal ensembles, and circuit function, advancing our understanding of the neural substrates of anxiety. A particularly notable point is the selective role of these stress-activated neurons in anxiety, but not in associative fear memory, highlighting functional distinctions between neural circuits underlying anxiety and fear.

      The recruited neuronal population is activated by acute and chronic stress, though the overlap across stress exposures is partial, suggesting that further studies will be important to define how these neurons respond under other stressors and conditions.

      Overall, this work identifies SuM stress-activated neurons and their ventral subiculum inputs as central elements of anxiety circuitry, providing a valuable framework for future studies and potential targeted interventions for stress-related disorders.

    4. Reviewer #3 (Public review):

      Summary:

      The authors aim to investigate the mechanisms of anxiety. The paper focuses on supramammillary nucleus (SuM) based on a fos screen and recordings showing that footshock and social defeat stress increases activity in this region. Using activity-dependent tagging, they show that reactivation of stress-activated neurons in SuM has an anxiety-like effect, reducing open-arm exploration in the elevated zero task. They then investigate the ventral subiculum as a potential source of anxiety-related information for SuM. They show that ventral subiculum (vSub) inputs to SuM are more strongly activated than dSub when mice explore open arms of the elevated zero. Finally, they show that DREADD-mediated inhibition of vSub-SuM projections alleviates stress-enhanced anxiety. Overall the results provide good evidence that SuM contains a stress-activated neuronal population whose later activity increases anxiety-like behavior. It further provides evidence that vSub projects to SuM are activated by stress and their inhibition alleviates some effects of stress.

      Strengths:

      Strengths of this paper include the use of convergent methods (e.g., fos plus electrode recordings, footshock and social defeat) to demonstrate that the SuM is activated by different forms of stress. The activity-dependent tagging experiment shows that footshock-activated SuM neurons are reactivated by social defeat but not sucrose is also compelling because it provides evidence that SuM neurons are driven by some integrative aspect of stress rather than by a simple sensory stimulus.

      Weaknesses:

      The strength of some evidence is judged to be incomplete. The paper provides good evidence that SuM contains stress-responsive neurons, and the activity of these neurons increases some measure of anxiety-like behavior. However, the evidence that the vSub-SuM projection "encodes anxiety" and that the SuM is a key regulator of anxiety is judged to be incomplete. I am not convinced that the identified SuM cells have a specific anxiety function. As the authors mention in the introduction, SuM regulates exploration and theta activity. Since theta potently regulates hippocampal function, there is the concern that SuM manipulations could have broad effects beyond anxiety-like behavior.

    5. Author Response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Weaknesses

      As presented, the manuscript has limitations that weaken support for the central conclusions drawn by the authors. Many of the findings align with prior work on this topic, but do not extend those findings substantially.

      An overarching limitation is the lack of temporal resolution in the manipulations relative to the behavioral assays. This is particularly important for anxiety-like behaviors, as antecedent exposures can alter performance. In the open field and elevated zero maze assays, testing occurred 30 minutes after CNO injection. During much of this interval, the targeted neurons were likely active, making it difficult to determine whether observed behavioral changes were primary - resulting directly from SuM neuronal activity - or secondary, reflecting a stress-like state induced by prolonged activation of SuM and related circuits. This concern also applies to the chronic inhibition of ventral subiculum (vSub) neurons during 10 days of CSDS.

      We appreciate the reviewer's concern regarding the timing of CNO administration relative to behavioral testing. The 30-minute interval was selected according to some previous studies[1, 2]. This window ensures stable and specific neuronal manipulation while minimizing off-target effects and was strictly performed through all experiments. We acknowledge that shorter interval (~15 mins) can be efficient to produce biological effect in vivo[3, 4]. We repeated chemogenetic tests 2-3 times to make sure to get reliable data for statistical analysis. However, we cannot exclude potential side-effects caused by chemogenetically prolonged activation of SuM because of its poor temporal resolution compared to optogenetic manipulation. We agree that employing techniques with higher temporal resolution, such as optogenetics, in future studies would provide an excellent complement to these findings.

      The combination of stressors (foot shock and CSDS) and behavioral assays further complicates interpretation. The precise role of SuM neurons, including SANs, remains unclear. Both vSub and dSub neurons responded to foot shock, but only vSub neurons showed activity differences associated with open-arm transitions in the EZM.

      We agree that the use of multiple stressors (foot shock and CSDS) adds complexity to the interpretation. Our rationale was to test the generality of the SuM response and the role of SANs across different stress modalities (acute vs. chronic). The key finding is that while both vSub and dSub projections to the SuM were activated by the acute stressor of foot shock (Figure 5N-R), only the vSub-SuM pathway showed a significant increase in calcium activity specifically during the anxiety-provoking transition from the closed to the open arms of the EZM (Figure 5I-M). This dissociation suggests a selective role for the vSub-SuM circuit in encoding anxiety-related information, beyond a general response to stress.

      In light of prior studies linking SuM to locomotion (Farrell et al., Science 2021; Escobedo et al., eLife 2024), the absence of analyses connecting subpopulations to locomotor changes weakens the claim that vSub neurons selectively encode anxiety. Because open- and closed-arm transitions are inherently tied to locomotor activity, locomotion must be carefully controlled to avoid confounding interpretations.

      We thank the reviewer for highlighting the important studies linking the SuM to locomotion. We acknowledge this known function and carefully considered it in our analyses. Non-selective activation of the entire SuM didn’t affect total distance traveled in open field and elevated zero maze (Supplemental Figure 2 B-C). Although the locomotion of mice in OF and EZM was affected while targeting SANs, we also compared the travel distance in the central area of OF, to some extent, to minimize the influence of locomotion on the estimation of anxiety produced avoidance to the central area (Figure 4 I). We agree that future work delineating the specific subpopulations within the SuM that regulate locomotion versus anxiety would be highly valuable.

      Another limitation is the narrow behavioral scope. Beyond open field and EZM, no additional assays were used to assess how SAN reactivation affects other behaviors. Without richer behavioral analyses, interpretations about fear engrams, freezing, or broader stress-related functions of SuM remain incomplete.

      In addition, small n values across several datasets reduce confidence in the strength of the conclusions.

      We acknowledge that the primary focus on OF and EZM tests is a limitation in fully characterizing the behavioral profile of SAN manipulation. These tests were selected as they are well-validated, standard assays for anxiety-like behavior in rodents[5–10]. However, we also included the reward-seeking test, where activation of SANs significantly suppressed sucrose consumption (Figure 4L), suggesting a broader impact on motivational state that is often linked to anxiety. We fully agree with the reviewer that employing a richer behavioral battery—such as tests for social avoidance, conditioned place aversion, or Pavlovian fear conditioning—in future studies will be essential to comprehensively define the functional scope of SuM SANs and to conclusively dissect their role from fear memory engrams.

      Figure level concerns:

      (1) Figure 1: In Figure 1, the acute recruitment of SuM neurons by for shock is paired with changes in neural activity induced by social defeat stress. Although interesting, the connections of changes induced by a chronic stressor to Fos induction following acute foot shock are unclear and do not establish a baseline for the studies in Figure 3 on activation of SANs by social stressors.

      Thank you for this important comment. We agree that directly linking acute foot shock-induced cFos expression with chronic social defeat stress (CSDS) electrophysiological changes may create an interpretive gap. In Figure 1, we aimed to demonstrate that both acute (foot shock) and chronic (CSDS) stressors can activate SuM neurons, using complementary methods (cFos for acute, in vivo recording for chronic). We did not intend to imply that the same neuronal population responds identically to both stressors.

      To address this, we have clarified in the text that the purpose of Figure 1 is to show that SuM is responsive to diverse stressors, rather than to establish a direct mechanistic link between acute and chronic activation patterns. The baseline for SAN studies in Figure 3 is established through the TRAP2 tagging protocol following foot shock, independent of the CSDS model. We acknowledge that future studies should compare SAN recruitment across acute vs. chronic stressors to better define their functional overlap.

      (2) Figure 2: The chemogenetic experiments using AAV-hSyn-Gq-DREADDs lack data or images, or hit maps showing viral spread across animals. This omission is critical given the small size of SuM, where viral spread directly determines which neurons are manipulated. Without this, it is difficult to interpret findings in the context of prior studies on SuM circuits involved in threats and rewards.

      Please see Supplemental Figure 2 for the infection area of AAV.

      (3) Figure 3: The TRAP experiments show that the number of labeled neurons following foot shock (Figure 3F) is approximately double that of baseline home-cage animals, though y-axis scaling complicates interpretation. It is unclear whether this reflects true Fos induction, low TRAP efficiency, or baseline recombination.

      We thank the reviewer for pointing out the axis scaling issue. We have modified the y-axis to start from 0. The SuM nucleus has been reported to play role in the awake of rodents, it’s reasonable to have some basal neuronal activation after 4-OHT i.p. injection.

      Overlap analyses are also limited. For example, it is not shown what proportion of foot shock SANs are reactivated by subsequent foot shock. Comparisons of Fos induction after sucrose reward are also weakened by the very low Fos signal observed. If sucrose reward does not robustly induce Fos in SuM, its utility in distinguishing reward- versus stress-activated neurons is questionable. Thus, conclusions about overlap between SANs and socially stressed neurons remain uncertain due to the missing quantification of Fos+ populations.

      Thank you for the question. We have replaced the reactivation chance graph with a new reactivation percent analysis graph to show the proportion of SANs that reactivated by subsequent sucrose reward or stress. The rationale we use social stress other than foot shock is to show the potential generality of foot-shock tagged neurons. The lower expression of cFos after sucrose exposure suggest first, the SuM may not involve in reward regulation, which we agree with you; second, those SANs are more likely to modulate anxiety-like behavior but not reward.

      (4) Supplemental Figure 3: The claim that "SANs in the SuM encode anxiety but not fear memory" is not well supported. Inhibition of SANs (Gi-DREADDs) did not alter freezing behavior, but the absence of change could reflect technical issues (e.g., insufficient TRAP efficiency, low expression of Gi-DREADDs). Moreover, the manuscript does not provide a positive control showing that SuM SANs inhibition alters anxiety-like behavior, making it difficult to interpret the negative result. Prior work (Escobedo et al., eLife 2024) suggests SuM neurons drive active responses, not freezing, raising further interpretive questions.

      We agree that here we didn’t provide enough data to confirm there is no regulation effect of SuM-SANs on fear memory. Relevant statement has been removed to avoid any further misunderstanding.

      (5) Figure 4: The statement that corticosterone concentration is "usually used to estimate whether an individual is anxious" (line 236) is an overstatement. Corticosterone fluctuates dynamically across the day and responds to a broad range of stimuli beyond anxiety.

      Thank you for your kind reminder. Corticosterone/cortisol, the primary stress hormone, is a well-established biomarker whose levels are elevated in response to stress and in anxiety states.[11, 12]. Some studies also reported that supplying corticosterone can produce anxiety-like behaviors in rodents[13–16]. We collect the blood sample at the same timepoint in Figure 4 C-D. We agree that line 236 is a kind of overstatement and has modified.

      (6) Figures 5-6: The conclusion that vSub neurons encode anxiety-like behavior is not firmly supported. Data from photo-activating terminals in SuM is shown for ex vivo recording, but not in vivo behavior, which would strengthen support for this conclusion. Both vSub and dSub neurons responded to foot shock. The key evidence comes from apparent differential recruitment during open-arm exploration. However, the timing appears to lag arm entry, no data are provided for closed-arm entry, and there is heterogeneity across animals. These limitations reduce confidence in the authors' central claim regarding vSub-specific encoding of anxiety.

      We thank the reviewer for this important point. To address the concern regarding the in vivo behavioral encoding specificity of the vSub-SuM pathway, we further analyzed the in vivo fiber photometry data. The new analysis revealed that calcium activity in vSub-SuM projection neurons exhibited bidirectional, instantaneous, and specific changes during transitions between the open and closed arms of the elevated plus maze: their activity significantly and immediately decreased when mice moved from the open arm to the closed arm (new results shown in Supplemental Figure 5), and conversely, significantly and immediately increased upon transitioning from the closed to the open arm. However, under the same behavioral events, dSub-SuM projection neurons showed no significant change in activity. We hope this finding could strengthens the role of the vSub-SuM pathway in encoding anxiety-like behavior.

      An appraisal of whether the authors achieved their aims, and whether the results support their conclusions:

      (1) From the data presented, the authors conclude that "the SuM is the critical brain region that regulates anxiety" (line 190). This interpretation appears overstated, as it downplays well-established contributions of other brain regions and does not place SuM's role within a broader network context. The data support that SuM neurons are recruited by foot shock and, to a lesser extent, by acute social stress. However, the alterations in activity of SuM subpopulations following chronic stress reported in Figure 1 remain largely unexplored, limiting insight into their functional relevance.

      Thank you for the suggestion. We have modified the line 190 with cautious “In this study, we combined multiple methods to determine whether the SuM is a brain region that involve in modulating anxiety.”

      (2) The limited temporal resolution of DREADD-based manipulations leaves alternative explanations untested. For example, if SANs encode signals of threat, generalized stress, or nociception, then prolonged activation could indirectly alter behavior in the open field and EZM assays, rather than reflecting direct anxiety regulation.

      We discussed the DREADD method in the first part in our response.

      (3) The conclusion that "SuM store information about stress but not memory" (line 240) is not fully supported, particularly with respect to possible roles in memory. The lack of a role in memory of events, as opposed to the output of threat or stress memory, may be true, but is functionally untested in presented experiments. The data do indicate activation of the SuM neuron by foot shock, which has been previously reported (Escobedo et al eLife 2024). The changes in SuM activity following chronic stress (Figure 1) are intriguing, but their relationship to "stress information storage" is not clearly established.

      Thank you for your valuable comments. Foot-shock-activated neurons may play role in modulate any of the following anxiety-like behaviors and emotional memory (fear memory). We realized that we didn’t fully test all aspects of anxiety and memory, thus resulting in some overstatements in the manuscript. It is more proper to focus on “anxiety avoidance” according to the reduced open-arm exploration in EZM/EPM.

      Reviewer #2 (Public review):

      This manuscript investigates the neural mechanisms of anxiety and identifies the supramammillary nucleus (SuM) as a critical hub in mediating anxiety-related behaviors. The authors describe a population of neurons in the SuM that are activated by acute and chronic stress. While their activity is not required for fear memory recall, reactivation of these neurons after chronic stress robustly increases anxiety-like behaviors as well as physiological stress markers. Circuit analysis further shows that these stress-activated neurons are driven by inputs from the ventral, but not dorsal, subiculum, and inhibition of this pathway exerts an anxiolytic effect.

      The study provides an elegant integration of techniques to link stress, neuronal ensembles, and circuit function, thereby advancing our understanding of the neural substrates of anxiety. A particularly notable point is the selective role of these stress-activated neurons in anxiety, but not in associative fear memory, which highlights functional distinctions between neural circuits underlying anxiety and fear.

      Some aspects would benefit from clarification. For example, how selective is the recruitment of this population to stress compared with other aversive states, and how should one best interpret their definition as "stress-activated neurons" given the relatively modest overlap across stress exposures? In addition, the use of the term "engram" in this context raises conceptual questions. Is it appropriate to describe a neuronal ensemble encoding an emotional state as an engram, a term usually tied to specific memory recall?

      Overall, this work makes a valuable contribution by identifying SuM stress-activated neurons and their ventral subiculum inputs as central elements of the circuitry underlying anxiety. These findings provide a valuable framework for future studies investigating anxiety circuitry and may inform the development of targeted interventions for stress-related disorders.

      We thank the reviewer for raising these important points. We agree that further clarification is warranted. In our study, we compared SAN reactivation across different stimuli: foot shock (acute physical stress), social stress (chronic psychosocial stress), and sucrose reward (non-aversive positive stimulus). As shown in Figure 3, SANs in the supramammillary nucleus (SuM) were significantly reactivated by social stress but not by sucrose reward. Moreover, the c-Fos response in SuM was markedly higher after foot shock compared to home cage controls (Figure 1). While we did not test all possible aversive states (e.g., pain, sickness), our data support that SuM SANs are preferentially recruited by stressors rather than by reward or neutral conditions. We acknowledge that the overlap across stress modalities is not complete, which may reflect differences in stress intensity, duration, or circuit engagement. Future work will systematically compare SAN recruitment across diverse aversive and non-aversive states to further define their selectivity.

      The term “stress-activated neurons” (SANs) here refers to neurons that are reliably activated by at least one type of stressor and can be reactivated by subsequent stress exposure. The partial overlap across stressors likely reflects the diversity of stress responses and the possibility that distinct subpopulations within SuM may encode different aspects of aversive experience. Importantly, chemogenetic activation of SANs was sufficient to induce anxiety-like behavior and elevate corticosterone (Figure 4), supporting their functional role in stress-related behavioral and physiological outputs. We have revised the manuscript to clarify that SANs represent a stress-responsive ensemble rather than a uniform population activated identically by all stressors.

      We appreciate the reviewer’s conceptual caution. In the revised manuscript, we intentionally avoided using the term “engram” to describe SANs. Our focus is on a stress-activated neuronal ensemble that drives anxiety-like behavior, not on memory recall per se. We refer to SANs as an “ensemble” or “population” rather than an engram, consistent with the TRAP-based labeling approach used to capture neurons activated during a specific experience. We agree that “engram” is best reserved for memory-encoding cells and will ensure this distinction remains clear throughout the text.

      Reviewer #3 (Public review):

      Weaknesses:

      The strength of some of the evidence is judged to be incomplete. The paper provides good evidence that SuM contains stress-responsive neurons, and the activity of these neurons increases some measure of anxiety-like behavior. However, the evidence that the vSub-SuM projection "encodes anxiety" and that the SuM is a key regulator of anxiety is judged to be incomplete. The claim that SuM generates an "anxiety engram" is also judged to be incompletely supported by the evidence. Namely, what is unclear is whether these cells/regions encode anxiety per se versus modulate behaviors (like exploration) that tend to correlate with anxiety. Since many brain regions respond to footshock and other stressors, the response of SuM to these stimuli is not strong evidence for a role in anxiety. I am not convinced that the identified SuM cells have a specific anxiety function. As the authors mention in the introduction, SuM regulates exploration and theta activity. Since theta potently regulates hippocampal function, there is the concern that SuM manipulations could have broad effects. As shown in Supplementary Figure 2, stimulating stress-responsive cells in SuM potently reduces general locomotor exploration. This raises concerns that the manipulation could have broader effects that go beyond just changes in anxiety-like behavior. Furthermore, the meaning of an "anxiety engram" is unclear. Would this engram encode stress, the sense of a potential threat, or the behavioral response? A more developed analysis of the behavioral correlates of SuM activity and the behavioral effects of SuM manipulations could give insight into these questions.

      We appreciate the reviewer’s thoughtful critique regarding the specificity of SuM’s role in anxiety and the interpretation of our findings. We acknowledge that SuM has broad functions, including regulating exploration and hippocampal theta. However, our data show that general SuM activation increases anxiety-like measures (reduced open-arm time in EZM, decreased center exploration in OF) without altering total locomotion (Fig. 2, Suppl. Fig. 2). The locomotor reduction in SAN activation experiments (Suppl. Fig. 2F–G) was observed alongside clear anxiety-like behavioral changes (e.g. suppressed reward seeking), suggesting that the effects are not solely due to motor suppression. We agree that the methods we used to estimate anxiety-like behaviors base on mice movement when testing, and this could be a shortage of this research when trying to link the data to anxiety. Therefore it will be more proper to interpret the results as modulation of anxiety-like behavior (anxiety related avoidance) but not anxiety itself. We have modified the manuscript to describe more precise to avoid overstatement.

      Our fiber photometry data (Fig. 5) show that vSub–SuM projection neurons increase activity specifically when mice enter open arms of the EZM—a behavioral transition associated with anxiety—whereas dSub–SuM projections do not. This activity correlates with anxiety-related behavior, not merely with movement or stress per se.

      We also agree that the term “engram” may be misleading in this context. In the manuscript, we refer to SANs as a “stress-activated neuronal ensemble” rather than an anxiety engram. Our data indicate that these neurons are recruited by stress and their reactivation produces more anxiety related avoidance to open arms. We have revised the text to avoid conceptual overreach and to clarify that SuM SANs likely contribute to a state of sustained anxiety/avoidance.

      Recommendations for the authors:

      Reviewing Editor Comments:

      Should you choose to revise your manuscript, if you have not already done so, please include full statistical reporting, including exact p-values wherever possible alongside the summary statistics (test statistic and df) and, where appropriate, 95% confidence intervals. These should be reported for all key questions and not only when the p-value is less than 0.05 in the main manuscript.

      Readers would also benefit from noting that the subjects were male in the abstract and discussion of the limitations of the exclusion of females.

      Thank you for the suggestion. We have included the full statistical detail in a separate sheet as Table 1. Also, we have modified the title of the manuscript to reflect the sex of the mice.

      Reviewer #1 (Recommendations for the authors):

      (1) In line 211, the authors state, "we recorded neuronal action potentials via multichannel extracellular recording while the mice were moving in the EPM, a traditional type of maze used to test anxiety in rodents,". However, it is unclear what data is presented in the paper, that is, extracellular recordings from SuM in mice on the elevated plus maze.

      We have deleted the description of multichannel recording data in EPM as the data was removed earlier.

      Minor corrections to the text and figures.

      (2) For bar plots, perhaps clarify how the data is presented. For example, in Figure 4, "The data in B, D, E and I-L are presented as the means {plus minus} SEMs," but this does not appear to be plotted as a mean with SEM error bars because the error bars cover all the values.

      Corrected.

      (3) In Figure 5, the white text for EGFP in panel B is very difficult to see.

      Corrected.

      (4) For Figure 5D, it would be helpful to more clearly specify which neurons in SuM were recorded from. Was it SANs or all SuM neurons?

      We did whole-cell recording on all SuM neurons.

      (5) Fos2A-iCreERT2 is mislabeled as "Fos2A-iCreERT" in the methods.

      Corrected.

      (6) The sentence at line 139 "To make sure foot shock induced anxiety won't last until manipulation, we subjected139mice to an acute stress protocol involving foot shocks and then performed the elevated plus140maze (EPM) and elevated zero maze (EZM) tests to evaluate anxiety on days 2 and 7," is unclear as written.

      Thank you for pointing this. We have modified the sentence to make it more clear. “To make sure mice are on similar basal condition while applying chemo-genetic manipulation, we subjected mice to an acute stress protocol involving foot shocks and then performed the elevated plus maze (EPM) and elevated zero maze (EZM) tests to evaluate anxiety on days 2 and 7 (Figure 4 A). The mice that experienced foot shocks showed decreases in the exploration time in the open arms on day 2. However, acute stress-induced anxiety was not detected on day 7 (Figure 4 B), which allow us to compare the reactivation of SANs produced anxiety-like behavior between groups at the same baseline.”

      (7) The details of the viral injections used for ex vivo electrophysiology are not sufficient to understand the experiment and the implications of the data. Which neurons (SANs?) are recorded from, what percent of those had inputs, were the sub-neurons globally labeled or just SANs?

      We performed whole-cell recording on global SuM neurons to show if the projection is innervated by glutamergic neurons in Sub as shown in Figure 5-B that the projection neurons in Sub are exclusively vglut1 expressed. Based on this aim of the experiment, we didn’t keep any neurons that were not response to the light stimulation, therefore can’t calculate the input percent in this case. We have added words to clearly show that we did global SuM neurons in Methods.

      (8) The scale used in Figure 6C renders that data unreadable. 120 to 40% changes in body weight are well beyond the variability in the data.

      We have modified the axis (90 to 110%) to show the body weight change clearer.

      (9) The dose of CNO used, 5 mg/kg, is high, and using lower doses or other DREADD ligands is worth considering.

      Thank you for your valuable comment. We have noticed that people are using relatively lower dose of CNO or other DREADD ligands that are reported much higher affinity and less side-effect. The dose of 5mg/kg was adapted from earlier papers that using DREADD and show no obvious side-effect in mice[17], e.g locomotion (S Figure 2B), in our experiments, so we keep using this dose in this project to make it consistent across different cohorts of experiments. We are switching to DCZ to avoid any potential side-effect of CNO in the following experiments based on this project.

      Reviewer #2 (Recommendations for the authors):

      This is a strong manuscript that provides important insights into the role of the supramammillary nucleus (SuM) and its inputs from the ventral subiculum in regulating anxiety. The combination of behavioral, imaging, electrophysiological, and circuit manipulation approaches is impressive, and the distinction the authors propose between anxiety-related and fear-related circuits is conceptually important.

      There are, however, some points that I think need clarification. The authors emphasize that the hippocampus is essential for fear memory recall, yet they do not directly evaluate whether the SuM-hippocampal pathway might contribute differentially to anxiety versus fear memory. Addressing this would help to explain where the dissociation between the two processes arises.

      Thank you for the suggestion. We realized that we didn’t collect enough data to exclude the role of those SANs on memory, especially fear memory, a memory formation bases on strong emotional training as aforementioned. The data and relevant discussion have been removed to avoid misunderstanding and overstatement.

      I am also not fully convinced about the definition of the "stress-activated neurons" (SANs). The overlap across repeated stress exposures is quite modest (around 20%), which suggests that this population may not be strictly stress-specific but rather a dynamic subset that is preferentially, though not exclusively, engaged by stress. Related to this, the use of the term "engram" raises conceptual questions. Since the classic engram refers to an ensemble encoding and recalling a specific memory, it is not obvious whether it is appropriate to apply the term to a neuronal population that appears to represent a persistent emotional state. The authors should consider justifying this choice of terminology more carefully or adopting a different term.

      Thank you for your important comments. Yes we agree that the SANs in this manuscript are more likely dynamic subset other than exclusive foot-stress engaged “engram”. That’s why we use “stress-activated neurons” but not “engram” to describe this neuronal ensemble. To avoid further misleading, we have made some modification to reduce the use of “engram” across the manuscript.

      Some parts of the text also need more precision. For example, the statement in lines 63-65 that "few studies have explored emotion-related engram cells" is potentially misleading, as most engram studies focus on memories with a strong emotional component. The rationale for this claim should be clarified.

      This sentence has been deleted since it is not necessary to link the text and misleading.

      In Figure 1, the choice of methods is also puzzling: cFos immunostaining is used after shock delivery, while electrophysiology is used for the CSDS paradigm. It would be helpful to explain why different readouts were chosen for different stress models, and whether this may affect the comparability of the results.

      Thank you for this important comment. In Figure 1, we aimed to demonstrate that both acute (foot shock) and chronic (CSDS) stressors can activate SuM neurons, using complementary methods (cFos for acute, in vivo recording for chronic). The reason we chose different method is that acute stress produces transit effect while chronic stress produces long-lasting effect. To our knowledge, cFos is a well-established marker for strong neuronal activation, but with short lifespan (~4-6 hours) and suits acute paradigm better. In vivo recording allows us to compare the neuronal activity before and after chronic experiments within subjects and has ability to reveal cumulative effect which cFos cannot. To address this, we have clarified in the text that the purpose of Figure 1 in Line 112-113: “To investigate if SuM would be responsive to diverse stressors, we next examined whether chronic stress, which different mechanism underlying…”

      Finally, some additional details would strengthen the presentation. The discussion of corticosterone and other physiological markers could be expanded to indicate whether these effects were robust across stress paradigms. Similarly, the relatively modest overlap between SANs activated by different stressors could be framed more explicitly as part of a broader principle of flexible ensemble recruitment in anxiety-related circuits.

      Thank you for your suggestion. We have added more discussion about the corticosterone and the flexibility of SANs in the manuscript. See Line 267-270: “The serum corticosterone concentration can be used as a marker of stress-induced change in the peripheral blood. Previous studies showed serum corticosterone can be increased by various stress stimulation [39–42]; meanwhile, intentionally supplementing the diet with corticosterone can induce anxiety-like behaviors in rodents[43].” and Line 275-281: “However, the reactivation rate of SANs caused by different stressor was relatively lower than the initial activation rate caused by foot shock (Figure 3). This suggests that stress-activated neuronal clusters may have more flexible recruitment principles, with only a small number of neurons potentially encoding emotional information, while most other neurons remain involved in encoding other neural activities. Studies in other field, particularly studies of memory engram, has shown that the sets of neurons activated during learning are dynamic and exhibit high flexibility [44, 45].”

      Overall, the work is of high quality and provides a valuable contribution to the field, but addressing these points would help sharpen the mechanistic claims and ensure that the conceptual framework is as clear and precise as the experimental data.

      Reviewer #3 (Recommendations for the authors):

      (1) Since increased SuM activity is hypothesized to mediate the effects of stress on anxiety-like behavior, a logical step would be to test for necessity by silencing the stress-activated SuM cells.

      We agree this is a logical and valuable experiment. While our current study focused primarily on the sufficiency of SuM/SAN activation to induce anxiety-like behavior, we acknowledge that inhibition experiments would provide critical complementary evidence for necessity. We have added a statement in the Discussion noting that “future studies should examine whether silencing SuM SANs, either during stress exposure or during anxiety testing, can prevent or reduce stress-induced anxiety”. This will help establish a more complete causal role.

      (2) Discuss what is meant by "anxiety engram" and what features of anxiety the labeled cells might encode.

      We concur that “stress-activated neuron (SAN)” is a more precise descriptor than “engram” in this context. We have revised the text to avoid the potentially misleading term “engram” and instead refer to a “stress-activated neuron”. The labeled cells are preferentially reactivated by stress (not reward), and their activation promotes both behavioral avoidance and physiological stress markers (corticosterone). They likely contribute to the maintenance of an anxious state under perceived threat, rather than encoding discrete threat cues or memories.

      (3) A more nuanced analysis of behavioral correlates of SuM activity and/or the behavioral effects of SuM manipulations would strengthen this paper.

      To provide a more nuanced understanding of the behavioral correlates, we have performed additional analyses on our fiber photometry data (now presented in Supplemental Figure 6). and have also planned additional experiments for the future study to deepen our understanding.

      References:

      (1) Jendryka M, Palchaudhuri M, Ursu D, van der Veen B, Liss B, Kätzel D, et al. Pharmacokinetic and pharmacodynamic actions of clozapine-N-oxide, clozapine, and compound 21 in DREADD-based chemogenetics in mice. Sci Rep. 2019;9.

      (2) Koike H, Demars MP, Short JA, Nabel EM, Akbarian S, Baxter MG, et al. Chemogenetic Inactivation of Dorsal Anterior Cingulate Cortex Neurons Disrupts Attentional Behavior in Mouse. Neuropsychopharmacology. 2016;41:1014–1023.

      (3) Guettier J-M, Gautam D, Scarselli M, Ruiz De Azua I, Li JH, Rosemond E, et al. A chemical-genetic approach to study G protein regulation of cell function in vivo. Proceedings of the National Academy of Sciences. 2009;106:19197–19202.

      (4) Wess J, Nakajima K, Jain S. Novel designer receptors to probe GPCR signaling and physiology. Trends Pharmacol Sci. 2013;34:385–392.

      (5) Kraeuter AK, Guest PC, Sarnyai Z. The Elevated Plus Maze Test for Measuring Anxiety-Like Behavior in Rodents. Methods in Molecular Biology, vol. 1916, Humana Press Inc.; 2019. p. 69–74.

      (6) Kraeuter AK, Guest PC, Sarnyai Z. The Open Field Test for Measuring Locomotor Activity and Anxiety-Like Behavior. Methods in Molecular Biology, vol. 1916, Humana Press Inc.; 2019. p. 99–103.

      (7) Wall PM, Messier C. Methodological and conceptual issues in the use of the elevated plus-maze as a psychological measurement instrument of animal anxiety-like behavior. Neurosci Biobehav Rev. 2001;25:275–286.

      (8) Carobrez AP, Bertoglio LJ. Ethological and temporal analyses of anxiety-like behavior: The elevated plus-maze model 20 years on. Neurosci Biobehav Rev. 2005;29:1193–1205.

      (9) Seibenhener ML, Wooten MC. Use of the open field maze to measure locomotor and anxiety-like behavior in mice. Journal of Visualized Experiments. 2015. 6 February 2015. https://doi.org/10.3791/52434.

      (10) Prut L, Belzung C. The open field as a paradigm to measure the effects of drugs on anxiety-like behaviors: A review. Eur J Pharmacol. 2003;463:3–33.

      (11) Chen Y, Zhou X, Chu B, Xie Q, Liu Z, Luo D, et al. Restraint Stress, Foot Shock and Corticosterone Differentially Alter Autophagy in the Rat Hippocampus, Basolateral Amygdala and Prefrontal Cortex. Neurochem Res. 2024;49:492–506.

      (12) Hassell JE, Nguyen KT, Gates CA, Lowry CA. The Impact of Stressor Exposure and Glucocorticoids on Anxiety and Fear. Curr. Top. Behav. Neurosci., vol. 43, Springer; 2019. p. 271–321.

      (13) Peng B, Xu Q, Liu J, Guo S, Borgland SL, Liu S. Corticosterone attenuates reward-seeking behavior and increases anxiety via D2 receptor signaling in ventral tegmental area dopamine neurons. Journal of Neuroscience. 2021;41:1566–1581.

      (14) Myers B, Greenwood-Van Meerveld B. Elevated corticosterone in the amygdala leads to persistant increases in anxiety-like behavior and pain sensitivity. Behavioural Brain Research. 2010;214:465–469.

      (15) Demuyser T, Deneyer L, Bentea E, Albertini G, Van Liefferinge J, Merckx E, et al. In-depth behavioral characterization of the corticosterone mouse model and the critical involvement of housing conditions. Physiol Behav. 2016;156:199–207.

      (16) Shoji H, Maeda Y, Miyakawa T. Chronic corticosterone exposure causes anxiety- and depression-related behaviors with altered gut microbial and brain metabolomic profiles in adult male C57BL/6J mice. Molecular Brain . 2024;17.

      (17) Manvich DF, Webster KA, Foster SL, Farrell MS, Ritchie JC, Porter JH, et al. The DREADD agonist clozapine N-oxide (CNO) is reverse-metabolized to clozapine and produces clozapine-like interoceptive stimulus effects in rats and mice. Sci Rep. 2018;8.

    1. eLife Assessment

      This structural biology study provides insights into the assembly of the GID/CTLH E3 ligase complex. The multi-subunit complex forms unique, ring-shaped assemblies and the findings presented here describe a "specificity code" that regulates formation of subunit interfaces. The data supporting the conclusions are convincing, both in thoroughness and rigor. This study will be valuable to biochemists, structural biologists, and could lay foundation for novel designed protein assemblies.

    2. Reviewer #1 (Public review):

      Summary:

      GID/CTLH-type RING ligases are huge multi-protein complexes that play an important role in protein ubiquitylation. The subunits of its core complex are distinct and form a defined structural arrangement, but there can be variations in subunit composition, such as exchange of RanBP9 and RanBP10. In this study, van gen Hassend and Schindelin provide new crystal structures of (parts of) key subunits and use those structures to elucidate the molecular details of the pairwise binding between those subunits. They identify key residues that mediate binding partner specificity. Using in vitro binding assays with purified protein, they show that altering those residues can switch specificity to a different binding partner.

      Strengths:

      This is a technically demanding study that sheds light on an interesting structural biology problem in residue-level detail. The combination of crystallization, structural modeling and binding assays with purified mutant proteins is elegant and, in my eyes, convincing.

      Weaknesses:

      This study has no major weaknesses.

      It will be very interesting to see follow-up studies that use the mutants generated here to dive deeper into the biology of RING ligases, or design new mutants of multi-subunit complexes with an analogous methodology.

    3. Reviewer #2 (Public review):

      Summary:

      This is a very interesting study focusing on a remarkable oligomerization domain, the LisH-CTLH-CRA module. The module is found in a diverse set of proteins across evolution. The present manuscript focuses on the extraordinary elaboration of this domain in GID/CTLH RING E3 ubiquitin ligases, which assemble into a gigantic, highly ordered, oval-shaped megadalton complex with strict subunit specificity. The arrangement of LisH-CTLH-CRA modules from several distinct subunits is required to form the oval on the outside of the assembly, allowing functional entities to recruit and modify substrates in the center. Although previous structures had shown that data revealed that CTLH-CRA dimerization interfaces share a conserved helical architecture, the molecular rules that govern subunit pairing have not been explored. This was a daunting task in protein biochemistry that was achieved in the present study, which defines this "assembly specificity code" at the structural and residue-specific level.<br /> The authors used X-ray crystallography to solve high-resolution structures of mammalian CTLH-CRA domains, including RANBP9, RANBP10, TWA1, MAEA, and the heterodimeric complex between RANBP9 and MKLN. They further examined and characterized assemblies by quantitative methods (ITC and SEC-MALS) and qualitatively using nondenaturing gels. Some of their ITC measurements were particularly clever, and involved competitive titrations, and titrations of varying partners depending on protein behavior. The experiments allowed the authors to discover that affinities for interactions between partners is exceptionally tight, in the pM-nM range, and to distill the basis for specificity while also inferring that additional interactions beyond the LisH-CTLH-CRA modules likely also contribute to stability. Beyond discovering how the native pairings are achieved, the authors were able to use this new structural knowledge to reengineer interfaces to achieve different preferred partnerings.

      Strengths:

      Nearly everything about this work is exceptionally strong.<br /> -The question is interesting for the native complexes, and even beyond that has potential implications for design of novel molecular machines.<br /> -The experimental data and analyses are quantitative, rigorous, and thorough.<br /> -The paper is a great read - scholarly and really interesting.<br /> -The figures are exceptional in every possible way. They present very complex and intricate interactions with exquisite clarity. The authors are to be commended for outstanding use of color and color-coding throughout the study, including in cartoons to help track what was studied in what experiments. And the figures are also outstanding aesthetically.

      Weaknesses:

      There are no major weaknesses of note, and in the revision the authors addressed my minor suggestions for the text.

    4. Reviewer #3 (Public review):

      Summary:

      Protein complexes, like the GID/CTLH-type E3 ligase, adopt a complex three-dimensional structure, which is of functional importance. Several domains are known to be involved in shaping the complexes. Structural information based on cryo-EM is available, but its resolution does not always provide detailed information on protein-protein interactions. The work by van gen Hassend and Schindelin provides additional structural data based on crystal structures.

      Strengths:

      The work is solid and very carefully performed. It provides high-resolution insights into the domain architecture, which helps to understand the protein-protein interactions on a detailed molecular level. They also include mutant data and can thereby draw conclusions on the specificity of the domain interactions. These data are probably very helpful for others who work on a functional level with protein complexes containing these domains.

      Weaknesses:

      The manuscript contains a lot of useful, very detailed information. This information is likely very helpful to investigate functional and regulatory aspects of the protein complexes, whose assembly relies on the LisH-CTLH-CRA modules. However, this goes beyond the scope of this manuscript.

      Comments on revisions:

      I am fine with the revised version of the manuscript.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      GID/CTLH-type RING ligases are huge multi-protein complexes that play an important role in protein ubiquitylation. The subunits of its core complex are distinct and form a defined structural arrangement, but there can be variations in subunit composition, such as exchange of RanBP9 and RanBP10. In this study, van gen Hassend and Schindelin provide new crystal structures of (parts of) key subunits and use those structures to elucidate the molecular details of the pairwise binding between those subunits. They identify key residues that mediate binding partner specificity. Using in vitro binding assays with purified protein, they show that altering those residues can switch specificity to a different binding partner.

      Strengths:

      This is a technically demanding study that sheds light on an interesting structural biology problem in residue-level detail. The combination of crystallization, structural modeling, and binding assays with purified mutant proteins is elegant and, in my eyes, convincing.

      Weaknesses:

      I mainly have some suggestions for further clarification, especially for a broad audience beyond the structural biology community.

      We thank the reviewer for the careful evaluation of our manuscript and for the positive and encouraging assessment of our work. We also thank the reviewer for the constructive suggestions to improve clarity for a broader audience and have revised the manuscript accordingly.

      (1) The authors establish what they call an 'engineering toolkit' for the controlled assembly of alternative compositions of the GID complex. The mutagenesis results are great for the specific questions asked in this manuscript. It would be great if they could elaborate on the more general significance of this 'toolkit' - is there anything from a technical point of view that can be generalized? Is there a biological interest in altering the ring composition for functional studies?

      We thank the reviewer for raising this important point. Beyond addressing the specific pairwise assembly mechanisms analyzed in this study, we agree that the broader significance of this engineering toolkit warrants further discussion. The residue-level understanding of CTLH-CRA interfaces not only explains assembly specificity but also enables rational manipulation of ring composition in a controlled manner. We have therefore expanded the end of the discussion section to outline generalizable strategies for CRA-interface disruption and to highlight potential biological applications of altering ring composition for functional studies.

      (2) Along the same lines, the mutagenesis required to rewire Twa1 binding was very complex (8 mutations). While this is impressive work, the 'big picture conclusion' from this part is not as clear as for the simpler RanBP9/10. It would be great if the authors could provide more context as to what this is useful for (e.g., potential for in vivo or in vitro functional studies, maybe even with clinical significance?)

      We thank the reviewer for this important comment and agree that the broader implications of the more complex Twa1 rewiring were not sufficiently emphasized in the original manuscript. Through the competition ITC experiments (Fig. 5), we aimed to demonstrate a concrete application of the Twa1. At the same time, we recognize that additional use cases are conceivable. To address this point, we have expanded the discussion section to clarify the conceptual significance of Twa1 rewiring and briefly outline further potential applications of controlled interface manipulation. These additions aim to better contextualize the broader relevance of this approach beyond the specific mechanistic questions addressed in this study.

      (3) For many new crystal structures, the authors used truncated, fused, or otherwise modified versions of the proteins for technical reasons. It would be helpful if the authors could provide reasoning why those modifications are unlikely to change the conclusions of those experiments compared to the full-length proteins (which are challenging to work with for technical reasons). For instance, could the authors use folding prediction (AlphaFold) that incorporates information of their resolved structures and predicts the impact of the omitted parts of the proteins? The authors used AlphaFold for some aspects of the study, which could be expanded.

      We agree with the reviewer that the transferability of the domain constructs to the corresponding full-length proteins is an important consideration. In the original version of the manuscript, we addressed this point by fitting the experimentally determined CTLH-CRA domain structures of muskelin and RanBP9 into the cryo-EM maps of the full-length complexes (Fig. 5d), demonstrating that the applied truncations and fusion strategies are compatible with the architecture observed in the intact assembly. Following the reviewer’s suggestion, we have further strengthened this analysis by adding a new Supplementary Figure 1. In this figure, the experimentally determined CTLH-CRA domain structures are superposed with full-length AlphaFold predictions. This comparison shows that removal of flexible linker regions, such as those between the CTLH and CRA motifs or at terminal segments, does not alter the overall fold or the binding interfaces of the domains. Together, these analyses support the conclusion that the domain constructs faithfully represent the structural and interaction properties of the full-length proteins.

      Reviewer #2 (Public review):

      Summary:

      This is a very interesting study focusing on a remarkable oligomerization domain, the LisH-CTLH-CRA module. The module is found in a diverse set of proteins across evolution. The present manuscript focuses on the extraordinary elaboration of this domain in GID/CTLH RING E3 ubiquitin ligases, which assemble into a gigantic, highly ordered, oval-shaped megadalton complex with strict subunit specificity. The arrangement of LisH-CTLHCRA modules from several distinct subunits is required to form the oval on the outside of the assembly, allowing functional entities to recruit and modify substrates in the center. Although previous structures had shown that data revealed that CTLH-CRA dimerization interfaces share a conserved helical architecture, the molecular rules that govern subunit pairing have not been explored. This was a daunting task in protein biochemistry that was achieved in the present study, which defines this "assembly specificity code" at the structural and residue-specific level.

      The authors used X-ray crystallography to solve high-resolution structures of mammalian CTLH-CRA domains, including RANBP9, RANBP10, TWA1, MAEA, and the heterodimeric complex between RANBP9 and MKLN. They further examined and characterized assemblies by quantitative methods (ITC and SEC-MALS) and qualitatively using nondenaturing gels. Some of their ITC measurements were particularly clever and involved competitive titrations and titrations of varying partners depending on protein behavior. The experiments allowed the authors to discover that affinities for interactions between partners is exceptionally tight, in the pM-nM range, and to distill the basis for specificity while also inferring that additional interactions beyond the LisH-CTLH-CRA modules likely also contribute to stability. Beyond discovering how the native pairings are achieved, the authors were able to use this new structural knowledge to reengineer interfaces to achieve different preferred partnerings.

      Strengths:

      Nearly everything about this work is exceptionally strong.

      (1) The question is interesting for the native complexes, and even beyond that, has potential implications for the design of novel molecular machines.

      (2) The experimental data and analyses are quantitative, rigorous, and thorough.

      (3) The paper is a great read - scholarly and really interesting.

      (4) The figures are exceptional in every possible way. They present very complex and intricate interactions with exquisite clarity. The authors are to be commended for outstanding use of color and color-coding throughout the study, including in cartoons to help track what was studied in what experiments. And the figures are also outstanding aesthetically.

      Weaknesses:

      There are no major weaknesses of note, but I can make a few recommendations for editing the text.

      We are very grateful to the reviewer for this exceptionally positive and thoughtful assessment of our work. We sincerely appreciate the recognition of both the conceptual scope and the technical depth of the study. We are particularly encouraged by the reviewer’s comments regarding the clarity and presentation of the figures. Considerable effort went into ensuring that the structural and biochemical complexity of the CTLH assemblies could be conveyed in a clear and accessible manner, and we are grateful that this was appreciated. We thank the reviewer for the constructive recommendations for textual improvements.

      Reviewer #3 (Public review):

      Summary:

      Protein complexes, like the GID/CTLH-type E3 ligase, adopt a complex three-dimensional structure, which is of functional importance. Several domains are known to be involved in shaping the complexes. Structural information based on cryo-EM is available, but its resolution does not always provide detailed information on protein-protein interactions. The work by van gen Hassend and Schindelin provides additional structural data based on crystal structures.

      Strengths:

      The work is solid and very carefully performed. It provides high-resolution insights into the domain architecture, which helps to understand the protein-protein interactions on a detailed molecular level. They also include mutant data and can thereby draw conclusions on the specificity of the domain interactions. These data are probably very helpful for others who work on a functional level with protein complexes containing these domains.

      Weaknesses:

      The manuscript contains a lot of useful, very detailed information. This information is likely very helpful to investigate functional and regulatory aspects of the protein complexes, whose assembly relies on the LisH-CTLHCRA modules. However, this goes beyond the scope of this manuscript.

      We thank the reviewer for the detailed review of our manuscript and for the constructive and positive remarks. We greatly appreciate the recognition of the high-resolution structural insights and the value of combining crystallographic data with mutational analyses to elucidate domain-specific interactions. We are also grateful for the acknowledgment that these findings may serve as a useful resource for future functional and regulatory studies of LisH-CTLH-CRA-containing complexes. While such aspects extend beyond the immediate scope of the present study, we hope that the structural framework provided here will facilitate and inspire future investigations addressing these questions.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      (1) For the ITC measurements that are less accurate, the authors may want to represent that in the figures with an approximate sign.

      We thank the reviewer for this helpful suggestion. After consideration, we decided not to introduce an approximate sign in the main figures, as this would be inconsistent with the graphical conventions used throughout the manuscript (there is also no equal sign). Since the associated errors are reported directly alongside each K<sub>D</sub> value, we believe that the precision of the measurements is sufficiently conveyed. However, we agree that explicitly marking estimated values can be appropriate in specific cases. We have therefore added approximate signs in Supplementary Fig. 5 for the K<sub>D</sub> estimation of self-association.

      (2) The names of the proteins are from mammals and should probably be capitalized.

      We agree that capitalization is generally appropriate for mammalian protein names. In particular, for proteins such as Rmnd5a, which is identical in sequence between mouse and human, the use of human-style nomenclature would indeed be fully justified. Originally, we chose the current nomenclature to distinguish the proteins studied here from strictly human versions, as most constructs are derived from mouse and one (muskelin) from rat. This approach also avoids inconsistencies between the mouse and rat proteins within the manuscript and maintains alignment with the nomenclature used in our previous publications. For the sake of consistency and continuity, we have therefore retained the original formatting throughout the manuscript.

      (3) For the sequence alignments, it would be good to specify in the legend which organisms these are from, and where the differences are in mouse and rat proteins used in the study, and the human proteins.

      We appreciate this constructive suggestion. We have revised the sequence alignment legends to clearly specify the organism of origin for each sequence included in the analysis. In addition, we have added a new Supplementary Figure 1 presenting the AlphaFold predictions of the mouse proteins and rat muskelin used in this study. Within these models, sequence differences relative to the human proteins are indicated, and variations within the CTLH-CRA domains are explicitly annotated. These additions clarify how the constructs analyzed here relate to their human counterparts.

      (4) A few points about the referencing:

      (a) It was reference 27 that first described the dual-sided interactions where the CRA domain weaves back and forth such that CTLH-CRAN and LisH-CRAC mediate the contacts on the two sides. This should be cited.

      We fully agree and added the reference accordingly.

      (b) To this reviewer's knowledge, it was references 13 and 9 that resolved the daisy-chain of helical LisH-CTLHCRA interactions around the oval helical structures.

      We agree with the reviewer that references 13 and 9 resolved the helical LisH-CTLH-CRA daisy-chain arrangement around the oval structure. Reference 13 was already cited in the original manuscript, and we have now added reference 9 to appropriately acknowledge this contribution. We have retained reference 14, although it did not resolve the helical daisy-chain architecture, as it described a related oval assembly of CTLH complex components that remains relevant in the structural context discussed.

      (c) A cryo-EM map with RANBP10 was shown at low resolution in reference 8.

      We agree with the reviewer that a low-resolution cryo-EM map including RANBP10 was reported in reference 8. Our original wording was not sufficiently precise and may have given the impression that RANBP10 had not been characterized. Our intention was to convey that, although cryo-EM maps exist, detailed atomic-level information on subunit interfaces was lacking. We have revised the paragraph accordingly to clarify this point and now cite reference 8 explicitly in this context.

      (d) The Discussion requires referencing.

      We agree with the reviewer that additional referencing improves the clarity and contextualization of the Discussion. We have revised the Discussion section accordingly and added appropriate references to support the statements made.

    1. eLife Assessment

      This study presents a valuable contribution by introducing a model-based, Bayesian method for inferring action potentials from calcium imaging data that directly quantifies uncertainty in spike timing through posterior distributions. Using a Monte Carlo particle Gibbs sampling approach, the method achieves temporal resolution and accuracy comparable to existing techniques while offering the key added benefit of principled uncertainty estimates. The underlying methodology and characterization are convincing, and the work will be of particular interest to theoretically oriented neuroscientists seeking rigorous new tools for data-driven parameter inference.

    2. Reviewer #1 (Public review):

      Summary:

      In this study, Diana et al. present a Monte Carlo-based method to perform spike inference from calcium imaging data. A particular strength of their approach is that they can estimate not only averages but also uncertainties of the modeled process. The authors focus on the quantification of spike time uncertainties in simulated data and in data recorded with high sampling rate in cebellar slices with GCaMP8f, and they demonstrate the high temporal precision that can be achieved with their method to estimate spike timing.

      Strengths:

      - The author provide a solid ground work for sequential Monte Carlo-based spike inference, which extends previous work of Pnevmatikakis et al., Greenberg et al. and others.

      - The integration of two states (silence vs. burst firing) seems to improve the performance of the model.

      - The acquisition of a GCaMP8f dataset in cerebellum is useful and helps make the point that high spike time inference precision is possible under certain conditions.

      Weaknesses:

      - Although the algorithm is compared (in the revised manuscript) to other models to infer individual spikes (e.g., MLSpike), these comparisons could be more comprehensive. Future work that benchmarks this and other algorithms under varying conditions (e.g., noise levels, temporal resolution, calcium indicators) would help assess and confirm robustness and useability of this algorithm.

      - The mathematical complexity underlying the method may pose challenges for experimentalist who may want to use the methods for their analyses. While this is not a weakness of the approach itself, this highlights the need for further validation and benchmarking in future work, to build user confidence.

      Comments on revisions:

      Thank you for addressing the final comments, and congrats on this study!

    3. Reviewer #2 (Public review):

      Summary:

      Methods to infer action potentials from fluorescence-based measurements of intracellular calcium dynamics are important for optical measurements of activity across large populations of neurons. The variety of existing methods can be separated into two broad classes: a) model-independent approaches that are trained on ground truth datasets (e.g., deep networks), and b) approaches based on a model of the processes that link action potentials to calcium signals. Models usually contains parameters describing biophysical variables, such as rate constants of the calcium dynamics and features of the calcium indicator. The method presented here, PGBAR, is model-based and uses a Bayesian approach. A novelty of PGBAR is that static parameters and state variables are jointly estimated using particle Gibbs sampling, a sequential Monte Carlo technique that can efficiently sample the latent embedding space.

      Strengths:

      A main strength of PGBAR is that it provides probability distributions rather than point estimates of spike times. This is different from most other methods and may be an important feature in cases when estimates of uncertainty are desired. Another important feature of PGBAR is that it estimates not only the state variable representing spiking activity, but also other variables such as baseline fluctuations and stationary model variables, in a joint process. PGBAR can therefore provide more information than various other methods. The information in the github repository is well-organized. The authors demonstrate convincingly that PGBAR can resolve inter-spike intervals in the range of 5 ms using fluorescence data obtained with a very fast genetically encoded calcium indicator at very high sampling rates (line scans at >= 1 kHz).

      Weaknesses:

      The accuracy of spike train reconstructions is not higher than that of other model-based approaches, and lower than the accuracy of a model-independent approach based on a deep network in a regime of commonly used acquisition rates.

      Comments on revisions:

      I have no further comments on the manuscript.

    4. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this study, Diana et al. present a Monte Carlo-based method to perform spike inference from calcium imaging data. A particular strength of their approach is that they can estimate not only averages but also uncertainties of the modeled process. The authors focus on the quantification of spike time uncertainties in simulated data and in data recorded with high sampling rate in cebellar slices with GCaMP8f, and they demonstrate the high temporal precision that can be achieved with their method to estimate spike timing.

      Strengths:

      - The author provide a solid ground work for sequential Monte Carlo-based spike inference, which extends previous work of Pnevmatikakis et al., Greenberg et al. and others.

      - The integration of two states (silence vs. burst firing) seems to improve the performance of the model.

      - The acquisition of a GCaMP8f dataset in cerebellum is useful and helps make the point that high spike time inference precision is possible under certain conditions.

      Weaknesses:

      - Although the algorithm is compared (in the revised manuscript) to other models to infer individual spikes (e.g., MLSpike), these comparisons could be more comprehensive. Future work that benchmarks this and other algorithms under varying conditions (e.g., noise levels, temporal resolution, calcium indicators) would help assess and confirm robustness and useability of this algorithm.

      The metrics used for comparison follow the field's benchmarking conventions (see the CASCADE paper, Rupprecht et al. 2021). Indeed, improved standardized methods would be ideal to develop, which is beyond the scope of this manuscript.

      - The mathematical complexity underlying the method may pose challenges for experimentalist who may want to use the methods for their analyses. While this is not a weakness of the approach itself, this highlights the need for further validation and benchmarking in future work, to build user confidence.

      We acknowledge the challenges of understanding the mathematics underlying our method, but such a study is necessary to ensure its accuracy and reliability. Indeed, we will strive to improve the technique's user-friendliness in future instantiations.

      Reviewer #2 (Public review):

      Summary:

      Methods to infer action potentials from fluorescence-based measurements of intracellular calcium dynamics are important for optical measurements of activity across large populations of neurons. The variety of existing methods can be separated into two broad classes: a) model-independent approaches that are trained on ground truth datasets (e.g., deep networks), and b) approaches based on a model of the processes that link action potentials to calcium signals. Models usually contains parameters describing biophysical variables, such as rate constants of the calcium dynamics and features of the calcium indicator. The method presented here, PGBAR, is model-based and uses a Bayesian approach. A novelty of PGBAR is that static parameters and state variables are jointly estimated using particle Gibbs sampling, a sequential Monte Carlo technique that can efficiently sample the latent embedding space.

      Strengths:

      A main strength of PGBAR is that it provides probability distributions rather than point estimates of spike times. This is different from most other methods and may be an important feature in cases when estimates of uncertainty are desired. Another important feature of PGBAR is that it estimates not only the state variable representing spiking activity, but also other variables such as baseline fluctuations and stationary model variables, in a joint process. PGBAR can therefore provide more information than various other methods. The information in the github repository is well-organized.

      Weaknesses:

      On the other hand, the accuracy of spike train reconstructions is not higher than that of other model-based approaches, and clearly lower than the accuracy of a model-independent approach based on a deep network. The authors demonstrate convincingly that PGBAR can resolve inter-spike intervals in the range of 5 ms using fluorescence data obtained with a very fast genetically encoded calcium indicator at very high sampling rates (line scans at >= 1 kHz).

      In the revision, Figure 9 shows that temporal accuracy is very similar between PGBAR and the supervised method, CASCADE, and that PGBAR has a lower false positive rate. These results support the effectiveness of unsupervised Monte Carlo sampling, even with a simple autoregressive model.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      I'd like to thank the authors for their revisions. Their comments have addressed all my concerns, and I thank them for the clarifications. I have no further comments, except a few minor notes that the authors may consider or not:

      - The paragraph starting in line 367 is newly written and not yet as clear and mature as other parts of the manuscript. It is at several sentences roughly clear what it is about, but the precision of the wording is lacking. For example "distributions of the average time from ground-truth" seems a bit unclear, maybe "distributions of the average time of estimate spikes from ground-truth spikes" instead. Similarly, "the false detection rate, defined as the difference between detected and ground-truth spikes ..." could be rephrased using the difference between "numbers of spikes" instead of the difference between "spikes". But all of this is minor.

      - In the new Figure 9A, the error bars for the MLSpike method seem to be absent. In the same figure legend, it should be "excess" instead of "excess".

      We thank the reviewer for the feedback. We revised the wording of the new paragraph in response to the reviewer’s suggestions, restored the missing error bar in Figure 9, and corrected the figure legend.

      Reviewer #2 (Recommendations for the authors):

      Comparison to CASCADE: as far as I know there are no CASCADE models that have been trained on ground truth data in the regime of very fast (line scan) sampling, which is rarely used. A fair comparison of spike time estimates between PGBAR and CASCADE should take this into account. This can be done by training a new CASCADE model using the dataset of this paper. Given that performance of PGBAR and CASCADE is very similar already now (except for the false positive rate), a CASCADE model optimized for high sampling rate may be expected to catch up with (or even exceed) the performance of PGBAR. At a minimum, this possibility should be discussed.

      While this may be true, retraining a CASCADE model on high-frequency ground-truth data is beyond the scope of this manuscript. Indeed, a retrained CASCADE model optimized for line-scan or GCaMP8f data could improve performance and potentially match or exceed PGBAR, particularly in reducing false positives.

      Our aim, however, is not to benchmark supervised methods under their optimal retraining conditions, but to provide an unsupervised alternative that does not rely on labeled training data. In practice, retraining supervised models is constrained by the availability of suitable ground-truth datasets and by the uncertainty in how the method generalizes to acquisition regimes that differ substantially from the training set.

      We have therefore added a sentence in the Discussion (at the end of the subsection Comparison with benchmark datasets):

      [...] “While retraining supervised methods such as CASCADE on high-frequency or GCaMP8f ground-truth datasets could further improve its performance, limitations in dataset availability and generalization across acquisition regimes motivate complementary, training-free approaches such as PGBAR.”

      As stated in the manuscript, future extensions, such as using nonlinear biophysical models as the generative model for Monte Carlo–based inference, may further improve spike estimation accuracy.

    1. eLife Assessment

      This study presents a well-executed investigation into how the olfactory system disconnects from the environment during sleep and anesthesia, identifying a potential gating mechanism at the earliest synaptic stages of the olfactory bulb. The findings are important, as they challenge current theories by demonstrating that sensory gating occurs in non-thalamic pathways even under controlled airflow conditions. The strength of evidence is solid, supported by rigorous multimodal recordings, although the reliance on anesthetic models to draw conclusions about natural sleep is a limitation that requires further contextualization.

    2. Reviewer #1 (Public review):

      Summary:

      The authors of Serantes et al. produced a well-designed set of experiments to address the mechanisms of olfactory disconnection during sleep. In contrast to other sensory modalities, olfaction is not filtered or potentially gated by the thalamus, potentially opening the door to unimodal sensory stimulation during sleep. Recent work (Schreck, 2022) used optogenetically activated Olfactory Sensory Neurons to show that local field potential and activity across the olfactory pathway, not only remained open during sleep but were potentially even accentuated under these brain states. However, their optogenetic manipulation is an artificial perturbation to the system that could override naturalistic early-gating mechanisms. In a set of careful experiments, Serantes et al. show that coupling between airflow and brain activity at the Olfactory Bulb is diminished under sleep and anesthetic brain states. In contrast to a peripheral gating mechanism proposed by Schreck, this lack of respiration-locked activity, measured with EEG and LFP, persists even in the presence of intense respiration and even when nasal airflow is artificially induced and controlled. Their results point to nonthalamic early sensory gating of olfactory information during sleep, which is independent of nasal airflow but dependent on internal brain states. Their work elicits questions about potentially undiscovered mechanisms at the level of the early sensory pathway.

      Strengths:

      The strengths of this paper lie in the level of control afforded by the multiple preps and the wide array of physiological recordings. Specifically, both their control of airflow with a dual tracheotomy and their control of internal states using both sleep and urethane anaesthesia have a cumulative impact on the results.

      The paper is simple, well-written, well executed, has clear questions, describes the literature comprehensively, and points out conflicting results with precision and transparency. The same transparency and judgment should be used on their own results.

      Another strength of the paper is the clear, unambiguous results. The effect sizes presented in the paper are sizable and convincing.

      Weaknesses:

      The paper's shortcomings include open questions and a lack of a full mechanistic understanding of the suggested internal gating process. There are some open questions about the relative importance of airflow sensing vs. odorant sensing. Recent work by Mahajan et al., Sci.Adv 2025 points to OSN as sensing both odorants and airflow to produce anemotaxis. Potentially, other cells could contribute to anemosensation as well, so that gated or non-gated information might depend on the ratio of airflow to odorant information. Perhaps, optogenetic stimulation of OSN acts as an unnatural sensory stimulation that can alter both olfaction and anemosensation.

      Detailed ablation, pharmacological, and optogenetic experiments may be needed to elucidate the suggested mechanisms and determine the correct answer to the question posed by the authors.

    3. Reviewer #2 (Public review):

      Summary:

      In this study, Serantes and colleagues analysed how sleep and anesthesia impact the processing of olfactory inputs, focusing on early sensory processing (occurring at the first or second synaptic contacts). First, they show that the transition to sleep has a major impact on breathing-dependent gamma activity. Second, they show that this decrease originates at the first synaptic contact and is independent of respiration itself. Third, they show a decrease in connectivity associated with neocortical slow waves. These results are very interesting and supported by a robust methodology. However, I have two major concerns regarding this work.

      First, the authors fail to adequately contextualize their work. For example, the impact of sleep on respiration-locked gamma activity was reported several years ago and is, in fact, used in some laboratories to score sleep using data from the olfactory bulb.

      Second, the authors should exercise much more caution when comparing the urethane anesthesia model with NREM/REM sleep cycles. There are very significant differences between the two. Yet, the title and abstract of the article mention only sleep and anesthesia. More concerningly, the results obtained under urethane anesthesia are uncritically generalized to sleep.

      In conclusion, the first finding was already shown in previous studies, and the second and third results were obtained not during sleep but during an anesthetic state that only resembles certain aspects of sleep.

      Strengths:

      The authors deploy an interventional approach that allows them to determine with compelling evidence the relationship of the gamma activity time-locked to breathing and different aspects of breathing, proving in particular that the disconnection is independent of respiratory dynamics. They leveraged invasive recordings that allow them to pinpoint at which level the disconnection occurs.

      Weaknesses:

      (1) My first comment concerns how this work fits within the state of the art. The introduction of the article leaves out very important and highly relevant work.

      (1a) First, "disconnection" is not a defining feature of sleep; "unresponsiveness" is. It is often assumed that this unresponsiveness (which can be directly measured, contrary to disconnection) is due to a form of disconnection, but there has been substantial work over the past decade showing that disconnection is not as extensive as initially expected. It is therefore incorrect, in my view, to state that "most models attribute sensory gating to thalamocortical mechanisms". Most models attribute sensory gating to a combination of thalamocortical and cortical mechanisms.

      (1b) The rationale of the article appears unclear ("the olfactory system-bypassing the thalamus-offers a unique window into earlier stages of sensory disconnection"). If the idea is to investigate gating mechanisms before the thalamus, then any sensory modality would suffice, since even modalities that later relay through the thalamus involve pre-thalamic processing stages. I assume that the authors instead mean that, because olfactory information does not relay through the thalamus, gating mechanisms in the olfactory stream could occur very early. However, this also implies that focusing on olfactory processing would say little about other sensory modalities.

      (1c) Key previous results have been completely overlooked. First, the impact of sleep on respiration-locked gamma activity was reported several years ago (Bagur et al., Plos Biology 2018). Second, important articles investigating olfactory processing during sleep have been overlooked (e.g., Arzi et al., Nature Neuroscience 2012; Arzi et al., Journal of Neuroscience 2014). I am not providing an exhaustive list here, but these articles are not only extremely relevant to the present study; they have also become classics in the sleep literature.

      (2) For most of their findings (Figures 2 to 5), the authors used urethane anesthesia. They show that this pharmacological manipulation results in alternation between periods of high-amplitude delta waves (SWSt) and a desynchronized state (ASt). However, the parallel with NREM and REM sleep, respectively, is rough and insufficiently justified. Differences can already be noted by contrasting the short examples provided in the figures. While NREM and REM sleep differ in terms of muscle tone (EMG), no such difference is discernible between SWSt and ASt. In SWSt, the slow waves appear to overlap with fast activity at the cortical level (M1, S1), which is not typically the case during NREM sleep. In addition, because the time scale is not the same in Figures 1 and 2 (1 s vs 2 s), yet the slow waves appear to have similar durations, it is also possible that the slow waves generated during SWSt and NREM differ. To better support the proposed parallel between NREM and SWSt on the one hand, and ASt and REM on the other, the authors should provide a thorough comparison of these states (spectral features, properties of the slow waves, duration and frequency of each state, etc.). Without this, inferences from results obtained under urethane anesthesia to sleep are not warranted.

      The authors acknowledge this issue in the Discussion ("These findings suggest that there is no functional equivalence between urethane-activated states and REM sleep"), but this caveat should be integrated from the very beginning (title, abstract, and introduction).

      (3) In some graphs, the power spectrum is normalized. Under anesthesia, this normalization was performed "within each animal to the SWSt maximum for that signal". However, I could not find equivalent information for sleep. This is key information needed to correctly interpret the results shown in Figure 1.

      (4) The authors should also clarify their criteria for concluding on the absence or presence of a given effect. For example, in the legend of Figure 1c, they write: "Note the presence of coherence during wakefulness, demonstrating the internalization of the respiratory signal, and its drop during sleep". Unless coherence is exactly zero, some degree of coherence is always "present". Figure 1 instead shows that coherence is modulated across frequencies during wakefulness, with peaks in the delta and theta ranges.

      In Figure 2, they write: "PAC between respiration and OB gamma amplitude was present during ASt but disappeared during SWSt". Again, the authors should clarify what is meant by "disappeared", as they only tested for differences between ASt and SWSt.

      Given that the authors implemented a strategy to test for above-chance coherence using surrogate datasets, they should consistently provide statistical tests showing which conditions or frequency bands exhibit coherence above chance in order to justify claims about the presence or absence of an effect.

      (5) Likewise, comparisons across states should always be supported by statistical tests, for example, in Figure 4. In addition, despite the apparent absence of coherence during SWSt in Figures 4f and 4g (which again should be formally tested), Figure 4h shows an increase in coherence around 2 Hz, which suggests some degree of coherence between nasal airflow and the olfactory bulb.

      (6) Figures should more clearly distinguish results based on a single "representative" animal from population averages. For example, were Figures 4g and 2h computed at the population level?

    4. Reviewer #3 (Public review):

      Summary:

      Sleep is typified by a behavioural attenuation of responsiveness to external stimuli (higher arousal thresholds). There are various mechanisms through which sensory perception could be dampened, and while thalamic and cortical gate points have been well studied, the focus here is on peripheral ones - at the level of the olfactory bulb (OB). While something conceptually similar has been shown in insects, this paper represents an important contribution to understanding attenuation of sensory perception during rodent sleep and anaesthesia.

      This paper shows that respiration-locked potentials and gamma activity in the olfactory bulb, which are important for olfactory coding, are diminished during sleep and when under anaesthesia compared to wake. Further, this state-dependent activity in OB is likely to be locally generated. Using a tracheotomy procedure aimed to dissociate nasal airflow from natural inhalations, authors demonstrate that local field potentials (LFPs) in the OB phase lock with artificially generated air pulses (delivered into the nasal cavity) during the active phase of anaesthesia but not during a more passive state. LFPs did not synchronise with respiratory signals during either anaesthesia state. Lastly, the authors showed that as delta power increased (typical of slow-wave-sleep), the coherence between nasal inhalation rhythms and OB LFP coherence decreased, indicating that as rats experienced something akin to slow-wave-sleep (during anaesthesia), disconnection from the external environment could be augmented. Taken together, the authors argue that the change in activity observed in the olfactory bulb during sleep and anaesthesia provides a non-permissive state for sensory processing and manifests as sensory dissociation

      Strengths:

      The manuscript is well-written, and the experiments are thorough. Experiments examining coupling of nasal respiration with OB potentials and delta activity are particularly interesting as they point to augmented sensory disconnection during a sleep phase typically associated with higher arousal thresholds.

      Weaknesses:

      (1) An experiment addressing the following points, is missing:

      Does odour stimulation that wakes up a subject restore gamma activity and respiration-locked potentials?

      Is OB/respiration desynchrony maintained when presented with a non-rousing stimulus?

      Is waking upon stimulus delivery less likely as delta activity increases and coherence between OB/respiratory rhythms weakens?

      (2) Many of the experiments are performed under anaesthesia, which I understand is for practical reasons. While authors are forthcoming about limitations of using anaesthesia in lieu of natural sleep states, I would have preferred to see more experiments performed on sleeping animals.

    5. Author response:

      We thank the reviewing editor and the reviewers for their careful evaluation of our manuscript “Early sleep dependent sensory gating in the olfactory system”, and for their constructive feedback. We are encouraged by the overall positive assessment of the work.

      In the revised version, we will address all the points raised by the reviewers. Below, we outlined the main aspects of the revision.

      (1) Contextualization within prior literature.

      We will expand the text to better situate our findings within the existing literature and clarify the specific contribution of our work, particularly with respect to state dependent changes in olfactory bulb activity.

      (2) Distinction between sleep and urethane anaesthesia.

      We will revise the text to more clearly distinguish findings obtained during natural sleep from those obtained under urethane anaesthesia. While avoiding direct equivalence between states, we will clarify that the comparison is intended to highlight shared features of slow wave brain dynamics associated with sensory gating.

      (3) Clarification of analytical methods and statistical criteria.

      We will provide additional details regarding normalisation procedures, surrogate based analysis, and statistical criteria used to assess the presence or absence of coherence and phase amplitude coupling, ensuring consistency across figures.

      (4) Improvements in figures in terminology.

      We will revise figure annotations to improve clarity (axis, colour scales, units and labelling) and ensure consistent terminology throughout the manuscript.

      We believe these revisions will further strengthen the manuscript while preserving its central conclusions.

    1. eLife Assessment

      The present work provides new insights into detailed brain morphology. Using state-of-the-art methods, it provides compelling evidence for the relevance of sucal morphology for the precise localization of brain function. The fundamental findings have great relevance for the fields of imaging neuroscience and individualized medicine as ever-improving techniques improve precision to the point where individual brain anatomy is taking centre stage.

    2. Reviewer #1 (Public Review):

      [Editors' note: this version has been assessed by the Reviewing Editor without further input from the original reviewers. The authors have addressed the comments raised in the previous round of review.]

      Summary:

      Ever-improving techniques allow the detailed capture of brain morphology and function to the point where individual brain anatomy becomes an important factor. This study investigated detailed sulcal morphology in the parieto-occipital junction. Using cutting-edge methods, it provides important insights into local anatomy, individual variability, and local brain function. The presented work advances the field and will stimulate future research into this important area.

      Strengths:

      Detailed, very thorough methodology. Multiple raters mapped detailed sulci in a large cohort. The identified sulcal features and their functional and behavioural relevance are then studied using various complementary methods. The results provide compelling evidence for the importance of the described sulcal features and their proposed relationship to cortical brain function.

    3. Reviewer #2 (Public Review):

      Summary:

      After manually labelling 144 human adult hemispheres in the lateral parieto-occipital junction (LPOJ), the authors 1) propose a nomenclature for 4 previously unnamed highly variable sulci located between the temporal and parietal or occipital lobes, 2) focus on one of these newly named sulci, namely the ventral supralateral occipital sulcus (slocs-v) and compare it to neighbouring sulci to demonstrate its specificity (in terms of depth, surface area, gray matter thickness, myelination, and connectivity), 3) relate the morphology of a subgroup of sulci from the region including the slocs-v to the performance in a spatial orientation task, demonstrating behavioural and morphological specificity. In addition to these results, the authors propose an extended reflection on the relationship between these newly named landmarks and previous anatomical studies, a reflection about the slocs-v related to functional and cytoarchitectonic parcellations as well as anatomic connectivity and an insight about potential anatomical mechanisms relating sulcation and behaviour.

      Strengths:

      - To my knowledge, this is the first study addressing the variable tertiary sulci located between the superior temporal sulcus (STS) and intra-parietal sulcus (IPS).

      - This is a very comprehensive study addressing altogether anatomical, architectural, functional and cognitive aspects.

      - The definition of highly variable yet highly reproductible sulci such as the slocs-v feeds the community with new anatomo-functional landmarks (which is emphasized by the provision of a probability map in supp. mat., which in my opinion should be proposed in the main body).

      - The comparison of different features between the slocs-v and similar sulci is useful to demonstrate their difference.

      - The detailed comparison of the present study with state of the art contextualises and strengthens the novel findings.

      - The functional study complements the anatomical description and points towards cognitive specificity related to a subset of sulci from the LPOJ

      - The discussion offers a proposition of theoretical interpretation of the findings

      - The data and code are mostly available online (raw data made available upon request).

    4. Reviewer #3 (Public Review):

      Summary:

      72 subjects, and 144 hemispheres, from the Human Connectome Project had their parietal sulci manually traced. This identified the presence of previous undescribed shallow sulci. One of these sulci, the ventral supralateral occipital sulcus (slocs-v), was then demonstrated to have functional specificity in spatial orientation. The discussion furthermore provides an eloquent overview of our understanding of the anatomy of the parietal cortex, situating their new work into the broader field. Finally, this paper stimulates further debate about the relative value of detailed manual anatomy, inherently limited in participant numbers and areas of the brain covered, against fully automated processing that can cover thousands of participants but easily misses the kinds of anatomical details described here.

      Strengths:

      - This is the first paper describing the tertiary sulci of the parietal cortex with this level of detail, identifying novel shallow sulci and mapping them to behaviour and function.

      - It is a very elegantly written paper, situating the current work into the broader field.

      - The combination of detailed anatomy and function and behaviour is superb.

    5. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #2 (Public Review):

      Strengths

      (1) The definition of highly variable yet highly reproducible sulci such as the slocs-v feeds the community with new anatomo-functional landmarks (which is emphasized by the provision of a probability map in supp. mat., which in my opinion should be proposed in the main body).

      We agree with Reviewer 2 that there is merit to including the probability maps as a main text Figure rather than Supplementary Figure. We have now added it to the main text.

      Weaknesses

      (1) While the identification of the sulci has been done thoroughly with expert validation, the sulci have not been labeled in a way that enables the demonstration of the reproducibility of the labeling.

      Our group was unable to use an approach amenable to calculating inter-rater agreements to expedite the process of defining thousands of sulci at the individual level in multiple regions as this was our first study comprehensively documenting the sulcal organization of this region. Nevertheless, our method followed a rigorous, three-tiered procedure to ensure accurate sulcal definitions were identified in all participants. In the case of this study, authors YT and TG first defined sulci. These sulci were then checked by a trained expert (EHW). Finally, sulcal definitions were finalized by the senior author, an expert neuroanatomist (KSW). We emphasize that this process has produced reproducible anatomical results when charting other regions such as posteromedial cortex (Willbrand et al., 2023 Science Advances; Willbrand et al., 2023 Communications Biology; Maboudian et al., 2024 The Journal of Neuroscience; Ramos Benitez et al., 2024 Neuropsychologia), ventral temporal cortex (Miller et al., 2020 Scientific Reports; Parker et al., 2023 Brain Structure and Function), and lateral prefrontal cortex (Miller et al., 2021 The Journal of Neuroscience; Voorhies et al., 2021 Nature Communications; Yao et al., 2022 Cerebral Cortex; Willbrand et al., 2022 Brain Structure and Function; Willbrand et al., 2023 The Journal of Neuroscience; Willbrand et al., 2024 Brain Structure and Function) across age groups, species, and clinical populations. For the present study, by the time the final tier of our method was reached, we emphasize that a very small percentage (~2%) of sulcal definitions were actually modified. We will include an exact percentage in future publications in LPC/LOPJ.

      Our Methods have been edited to describe these features (Pages 21-22):

      “As this is the first time the sulcal expanse of LPC/LOPJ was comprehensively charted with a focus on pTS, the location of each sulcus was confirmed through a three-tiered procedure for each participant in each hemisphere. First, trained independent raters (Y.T. and T.G.) identified sulci. Second, these definitions were checked by a trained expert (E.H.W.). Third, these labels were finalized by a neuroanatomist (K.S.W.). We emphasize that this procedure has produced reproducible results in our prior work across the cortex (Miller et al. 2021; Voorhies et al. 2021; Yao et al. 2022; Willbrand et al. 2023; Willbrand et al. 2022; Willbrand et al. 2024; Parker et al. 2023; Miller et al. 2020; Willbrand et al. 2022; Willbrand et al. 2023; Maboudian et al. 2024; Ramos Benitez et al. 2024). All LPC sulci were then manually defined and saved as .label files in FreeSurfer using tksurfer tools, from which morphological and anatomical features were extracted. We defined LPC/LPOJ sulci for each participant based on the most recent schematics of sulcal patterning by Petrides (2019) as well as pial, inflated, and smoothed white matter (smoothwm) FreeSurfer cortical surface reconstructions of each individual. In some cases, the precise start or end point of a sulcus can be difficult to determine on a surface (Borne et al., 2020); however, examining consensus across multiple surfaces allowed us to clearly determine each sulcal boundary in each individual. For four example hemispheres with these 13-17 sulci identified, see Fig. 1a (Supplementary Fig. 5 for all hemispheres). The specific criteria to identify the slocs and pAngs are outlined in Fig. 1b.”

      Reviewer #3 (Public Review):

      Weaknesses

      (1) The numbers of subjects are inherently limited both in number as well as in being typically developing young adults.

      First, although the sample size of the present study is small in number in comparison to large N, group-level neuroimaging analyses, it is comparable to precision neuroimaging studies examining sulcal features in individual participants (for example, Cachia et al., 2021 Frontiers in Neuroanatomy; Garrison et al., 2015 Nature Communications; Lopez-Persem et al., 2019 The Journal of Neuroscience; Miller et al., 2021 The Journal of Neuroscience; Roell et al., 2021 Developmental Cognitive Neuroscience; Voorhies et al., 2021 Nature Communications; Weiner, 2019 The Anatomical Record; Willbrand, et al., 2022 Science Advances; Willbrand, et al., 2022 Brain Structure & Function; Yao et al., 2022 Cerebral Cortex). We discuss this point in detail in the Limitations subsection of the Discussion (Page 17):

      “This manual method is also arduous and time-consuming, which, on the one hand, limits the sample size in terms of number of participants, while on the other, results in thousands of precisely defined sulci. This push-pull relationship reflects a broader conversation in the human brain mapping and cognitive neuroscience fields between a balance of large N studies and “precision imaging” studies in individual participants (Gratton et al., 2022; Naselaris et al., 2021; Rosenberg and Finn, 2022). Though our sample size is comparable to other studies that produced reliable results relating sulcal morphology to brain function and cognition (for example, Cachia et al., 2021; Garrison et al., 2015; Lopez-Persem et al., 2019; Miller et al., 2021; Roell et al., 2021; Voorhies et al., 2021; Weiner, 2019; Willbrand et al., 2022a, 2022b; Yao et al., 2022), ongoing work that uses deep learning algorithms to automatically define sulci should result in much larger sample sizes in future studies (Borne et al., 2020; Lee et al., 2024, 2025; Lyu et al., 2021). The time-consuming manual definitions of primary, secondary, and PTS also limit the cortical expanse explored in each study, thus restricting the present study to LPC/LPOJ.”

      Second, we utilized a young adult sample as this is what is the standard of the field when charting features of sulci for the first time (for example, Paus et al., 1996 Cerebral Cortex; Chiavaras & Petrides, 2000 Journal of Comparative Neurology; Segal & Petrides, 2012 European Journal of Neuroscience; Zlatkina & Petrides, 2014 Proceedings of the Royal Society B Biological Science; Sprung-Much & Petrides, 2018 Brain Structure & Function; Miller et al., 2021 The Journal of Neuroscience; Willbrand et al., 2022 Science Advances; Willbrand et al., 2023 Communications Biology; Drudik et al., 2023 Cerebral Cortex). Nevertheless, it is indeed crucial to confirm that this schematic is translatable to other age groups; however this exploration is beyond the scope of the present project and is for future investigation. We have added text to the Limitations subsection of the Discussion to emphasize the points (Pages 17-18):

      “Additionally, the scope of the present study is limited in that the sample was only in young adults. This sample was selected as it is the standard of the field when charting features of sulci for the first time (for example, Paus et al. 1996; Chiavaras and Petrides 2000; Segal and Petrides 2012; Zlatkina and Petrides 2014; Sprung-Much and Petrides 2018; Miller et al. 2021; Willbrand et al. 2022; Willbrand et al. 2023; Drudik et al. 2023). Nevertheless, it is necessary to explore how well this updated schematic translates to different age groups, species, and clinical populations.”

      Finally, it is worth mentioning that we have begun preliminary analyses on the translatability of this schematic, and have shown that it does hold in a pediatric sample (ages 6-18 years old; Author response image 1).

      Author response image 1.

      Example pediatric participant with all LPC/LOPJ sulci identified in both hemispheres. Incidence rates for the variable pTS identified in the present work in a pediatric sample are included below (N = 79 participants)

      (2) While the paper begins by describing four new sulci, only one is explored further in greater detail.

      We focused on the slocs-v as it has a high incidence rate, making it amenable to our analytic pipelines relating sulci to cortical morphology, architecture, and function, as well as cognition (Miller et al., 2021 The Journal of Neuroscience; Voorhies et al., 2021 Nature Communications; Yao et al., 2022 Cerebral Cortex; Willbrand et al., 2022 Science Advances; Willbrand et al., 2023 The Journal of Neuroscience; Maboudian et al., 2024 The Journal of Neuroscience). However, we want to emphasize that throughout the paper there are multiple analyses that further describe the three more variable sulci: 1) detailing their sulcal patterning (Supplementary Tables 1-4) and 2) detailing their morphology and architecture (Supplementary Fig. 6). We do agree though that it is a worthwhile endeavor to further describe these sulci—especially if the data is readily available. As such, to complement our behavioral analysis identifying a relationship between the morphology of the consistent sulci and spatial orientation and considering the well-documented relationship between sulcal incidence and cognition (for review see Cachia et al., 2021 Frontiers in Neuroanatomy), we tested whether the number of variable sulci and the incidence of each variable sulcus specifically were related to spatial orientation. This procedure produced null results on all neuroanatomical variables, which we now mention in the Results (Page 11):

      “Finally, as in prior work examining variably-present PTS in other cortical expanses (for example, (Amiez et al., 2018; Cachia et al., 2014; Fornito et al., 2004; Willbrand et al., 2024b), we assessed whether the presence/absence of the more variable PTS identified in the present work (slocs-d, pAngs-v, and pAngs-d) was related to spatial orientation, reasoning, and processing speed task performance. We identified no significant associations between the presence/absence of these sulci in either hemisphere with performance on these tests (ps > .05).”

      (3) There is some tension between calling the discovered sulci new vs acknowledging they have already been reported, but not named.

      To resolve this tension, we have revised the text to 1) ensure proper acknowledgment that sulci have been noticed in this region, 2) point out that these sulci were left unnamed and undescribed, and 3) emphasize that one of the primary goals of this project was to comprehensively detail the sulcal organization of this region at a precise, individual-level considering these often-overlooked sulci.

      This is primarily done at the beginning of the Results (Pages 4-5), where we now write:

      “Four previously undescribed small and shallow sulci in the lateral parieto-occipital junction (LPOJ)

      In previous research in small sample sizes, neuroanatomists noticed shallow sulci in this cortical expanse, but did not describe them beyond including an unlabeled sulcus in their schematic at best (Supplementary Methods and Supplementary Figs. 1-4 for historical details). In the present study, we fully update this sulcal landscape considering these overlooked indentations. In addition to defining the 13 sulci previously described within the LPC/LPOJ, as well as the posterior superior temporal cortex in individual participants (Methods) (Petrides, 2019), we could also identify as many as four small and shallow PTS situated within the LPC/LPOJ that were highly variable across individuals and left undescribed until now (Supplementary Methods and Supplementary Figs. 1-4). Though we officially name and characterize features of these sulci in this paper for the first time, it is necessary to note that the location of these four sulci is consistent with the presence of variable “accessory sulci” in this cortical expanse mentioned in prior modern and classic studies (Supplementary Methods). For four example hemispheres with these 13-17 sulci identified, see Fig. 1a (Supplementary Fig. 5 for all hemispheres).”

      (4) The anatomy of the sulci, as opposed to their relation to other sulci, could be described in greater detail.

      To detail these sulci above and beyond their relation to other sulci, we document the anatomical metrics of all sulci in Supplemental Figure 6:

      Results (Page 8):

      The morphological and architectural features of all LPC/LPOJ sulci are described in Supplementary Fig. 6.

    1. eLife Assessment

      The study investigates an emerging research field: the interaction between sleep and development. The authors used Drosophila larvae sleep as a study model and provide insight into how neuropeptide circuitry control sleep differentially between larvae and adult Drosophila. By using board range of behaviour and imaging methods and analysis, the authors provide a valuable investigation that demonstrates a larvae-specific sleep regulatory neural pathway of Hugin-PK2-Dilps in the Drosophila neurosecretory centre IPC. While some further text clarifications are still required, the revision presented convincing evidence supporting the claims with the new imaging data, sleep parametric analysis, and further clarification addressing the reviewers' comments.

    2. Reviewer #1 (Public review):

      The manuscript investigates how neuropeptidergic signaling affects sleep regulation in Drosophila larvae. The authors first conduct a screen of CRISPR knock-out lines of genes encoding enzymes or receptors for neuropeptides and monoamines. As a result of this screen, the authors follow up on one hit, the hugin receptor, PK2-R1. They use genetic approaches including mutants and targeted manipulations of PK2-R1 activity in insulin-producing cells (IPCs) to increase total sleep amounts in 2nd instar larvae. Similarly, dilp3 and dilp5 null mutants and genetic silencing of IPCs show increases in sleep. The authors also show that hugin mutants and thermogenetic/optogenetic activation of hugin-expressing neurons caused reductions in sleep. Furthermore, they show through imaging-based approaches that hugin-expressing neurons activate IPCs. A key finding is that wash on of hugin peptides, Hug-γ and PK-2, in ex vivo brain preparations activates larval IPCs, as assayed by CRTC::GFP imaging. The authors then examine how the PK2-R1, hugin, and IPC manipulations affect adult sleep. Finally, the authors examine how Ca2+ responses through CRTC::GFP imaging in adult IPCs are influenced by the wash on of hugin peptides.

      Strengths:

      (1) This paper builds on previously published studies that examine Drosophila larval sleep regulation. Through the power of Drosophila genetics, this study yields additional insights into what role neuropeptides play in regulation of Drosophila larval sleep.

      (2) This study utilizes several diverse approaches to examine larval and adult sleep regulation, neural activity, and circuit connections. The impressive array of distinct analyses provides new understanding into how Drosophila sleep-wake circuitry in regulated across the lifespan.

      (3) The imaging approaches used to examine IPC activation upon hugin manipulation (either thermogenetic activation or wash on of peptides) demonstrate a powerful approach for examining how changes in neuropeptidergic signaling affect downstream neurons. These experiments involve precise manipulations as the authors use both in vivo and ex vivo conditions to observe an effect on IPC activity.

      Weaknesses:

      (1) There is limited discussion of why statistically significant differences are observed in some genetic and temperature controls. This discussion would better support the authors' conclusions.

      (2) The functional connectivity of the huginPC-IPC circuit in larvae could be better supported by chemogenetics using real-time calcium imaging (GCaMP).

      Comments on revisions:

      I would like to thank the authors for the revisions. The inclusion of all sleep metrics, more detailed descriptions in the methods, & a more thorough comparison to other published articles has addressed most of my concerns.

    3. Reviewer #3 (Public review):

      Summary:

      Sleep affects cognition and metabolism, evolving throughout development. In mammals, infants have fast sleep-wake cycles that stabilize in adults via circadian regulation. In this study, the author performed a genetic screen for neurotransmitters/peptides regulating sleep and identified the neuropeptide Hugin and its receptor PK2-R1 as essential components for sleep in Drosophila larvae. They showed that IPCs express Pk2-R1 and silencing IPCs resulted in significant increase in the sleep amount, which was consistent with the effect they observed in PK2-R1 knock out mutants. They also showed that Hugin peptides, secreted by a subset of Hugin neurons (Hug-PC), activate IPCs through the PK2-R1 receptor. This activation prompts IPCs to release insulin-like peptides (Dilps), which are implicated in the modulation of sleep. They showed that Hugin peptides induce a PK2-R1 dependent calcium (Ca²⁺) increase in IPCs, which they linked to the release of Dilp3, showing a connection between Hugin signaling to IPCs, Dilp3 release and sleep regulation. Additionally, the activation of Hug-PC neurons reduced sleep amounts, while silencing them had the opposite effect. In contrast to the larval stage, the Hugin/PK2-R1 axis was not critical for sleep regulation in Drosophila adults, suggesting that this neuropeptidergic circuitry has divergent roles in sleep regulation across different stages of development.

      Strengths:

      This study used an updated system for sleep quantification in Drosophila larvae and this method allowed precise measurement of larval sleep patterns which is essential for the understanding of sleep regulation.

      The authors performed unbiased genetics screening and successfully identified novel regulators for larval sleep, Hugin and its receptor PK2-R1, making a substantial contribution to the understanding of neuropeptidergic control of sleep regulation.

      They clearly demonstrated the mechanism by which Hugin expressing neurons influence sleep through the activation of IPCs via PK2-R1 with Ca2+ responses and can modulate sleep.

      Based on the demonstrated activation of PK2-R1 by the human Hugin orthologue Neuromedin U, research on human sleep disorders may benefit from the discoveries from Drosophila since sleep regulating mechanisms are conversed across species.

      Weaknesses:

      Previously identified weaknesses have been largely addressed by the authors.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The study investigates how neuropeptidergic signaling affects sleep regulation in Drosophila larvae. The authors first conduct a screen of CRISPR knock-out lines of genes encoding enzymes or receptors for neuropeptides and monoamines. As a result of this screen, the authors follow up on one hit, the hugin receptor, PK2-R1. They use genetic approaches, including mutants and targeted manipulations of PK2-R1 activity in insulin-producing cells (IPCs) to increase total sleep amounts in 2nd instar larvae. Similarly, dilp3 and dilp5 null mutants and genetic silencing of IPCs show increases in sleep. The authors also show that hugin mutants and thermogenetic/optogenetic activation of hugin-expressing neurons caused reductions in sleep. Furthermore, they show through imaging-based approaches that hugin-expressing neurons activate IPCs. A key finding is that wash-on of hugin peptides, Hug-γ and PK-2, in ex vivo brain preparations activates larval IPCs, as assayed by CRTC::GFP imaging. The authors then examine how the PK2-R1, hugin, and IPC manipulations affect adult sleep. Finally, the authors examine how Ca2+ responses through CRTC::GFP imaging in adult IPCs are influenced by the wash-on of hugin peptides. The conclusions of this paper are somewhat well supported by data, but some aspects of the experimental approach and sleep analysis need to be clarified and extended.

      Strengths:

      (1) This paper builds on previously published studies that examine Drosophila larval sleep regulation. Through the power of Drosophila genetics, this study yields additional insights into what role neuropeptides play in the regulation of Drosophila larval sleep.

      (2) This study utilizes several diverse approaches to examine larval and adult sleep regulation, neural activity, and circuit connections. The impressive array of distinct analyses provides new understanding into how Drosophila sleep-wake circuitry in regulated across the lifespan.

      (3) The imaging approaches used to examine IPC activation upon hugin manipulation (either thermogenetic activation or wash-on of peptides) demonstrate a powerful approach for examining how changes in neuropeptidergic signaling affect downstream neurons. These experiments involve precise manipulations as the authors use both in vivo and ex vivo conditions to observe an effect on IPC activity.

      Weaknesses:

      Although the paper does have some strengths in principle, these strengths are not fully supported by the experimental approaches used by the authors. In particular:

      (1) The authors show total sleep amount over an 18-hour period for all the measures of 2nd instar larval sleep throughout the paper. However, published studies have shown that sleep changes over the course of 2nd instar development, so more precise time windows are necessary for the analyses in this study.

      (2) Previously published reports of sleep metrics in both Drosophila larvae and adults include the average number of sleep episodes (bout number) and the average length of sleep episodes (bout length). Neither of these metrics is included in the paper for either the larval sleep or adult sleep data. Not including these metrics makes it difficult for readers to compare the findings in this study to previously published papers in the established Drosophila sleep literature.

      (3) Because Drosophila adult & larval sleep is based on locomotion, the authors need to show the activity values for the experiments supporting their key conclusions. They do show travel distances in Figure 2 - Figure Supplement 1, however, it is not clear how these distances were calculated or how the distances relate to the overall activity of individual larvae during sleep experiments. It is also concerning that inactivation of the PK2-R1-expressing neurons causes a reduction in locomotion speed. This could partially explain the increase in sleep that they observe.

      (4) The authors rely on homozygous mutant larvae and adult flies to support many of their conclusions. They also rely on Gal4 lines with fairly broad expression in the Drosophila brain to support their conclusions. Adding more precise tissue-specific manipulations, including thermogenetic activation and inhibition of smaller populations of neurons in the study would be needed to increase confidence in the presented results. Similarly, demonstrating that larval development and feeding are not affected by the broad manipulations would strengthen the conclusions.

      (5) Many of the experiments presented in this study would benefit from genetic and temperature controls. These controls would increase confidence in the presented results.

      (6) The authors claim that their findings in larvae uncover the circuit basis for larval sleep regulation. However, there is very little comparison to published studies demonstrating that neuropeptides like Dh44 regulate larval sleep. Because hugin-expressing neurons have been shown to be downstream of Dh44 neurons, the authors need to include this as part of their discussion. The authors also do not explain why other neuropeptides in the initial screen are not pursued in the study. Given the effect that these manipulations have on larval sleep in their initial screen, it seems likely that other neuropeptidergic circuits regulate larval sleep.

      We thank Reviewer #1 for the constructive comments. According to the suggestions, we have compared the relative sleep amounts of wild-type control and Hugin/PK2-R1/IPCs mutants/manipulations between 6hr-period and 18-hour periods in the 2nd instar larval stage and found consistent sleep phenotypes. We have also showed the sleep metrics data of larva and adults. We have included additional data of locomotion and feeding behavior in wild-type control and Hugin/PK2-R1/IPCs mutants/manipulations, which suggest that sleep phenotypes of Hugin/PK2-R1/IPCs mutants/manipulations are less affected by locomotion and feeding behavior changes. As pointed out, our study could not exclude the possibility that in addition to the Hugin/PK2-R1/IPCs axis, other pathways including DH44 could act in larval sleep control. We have included these points in Discussion. Please see point-to-point responses for details.

      Reviewer #2 (Public review):

      Summary:

      This study examines larval sleep patterns and compares them to sleep regulation in adult flies. The authors demonstrate hallmark sleep characteristics in larvae, including sleep rebound and increased arousal thresholds. Through genetic and behavioral analyses, they identify PK2-R1 as a key receptor involved in sleep modulation, likely via the HuginPC-IPC signaling pathway. Loss of PK2-R1 results in increased sleep, which aligns with previous findings in hugin knockout mutants. While the study presents significant contributions to the field, further investigation is needed to address discrepancies with earlier research and strengthen mechanistic claims.

      Strengths:

      (1) The study explores a relatively understudied aspect of sleep regulation, focusing on larval development.

      (2) The use of an automated behavioral measurement system ensures precise quantification of sleep patterns.

      (3) The findings provide strong genetic and behavioral evidence supporting the role of the HuginPC-IPC pathway in sleep regulation.

      (4) The study has broader implications for understanding the evolution and functional divergence of sleep circuits.

      Weaknesses:

      (1) The manuscript does not sufficiently discuss previous studies, particularly concerning hugin mutants and their metabolic effects.

      (2) The specificity of IPC secretion mechanisms is unclear, particularly regarding potential indirect effects on Dilp2.

      (3) Alternative circuits, such as the HuginPC-DH44 pathway, require further consideration.

      (4) Functional connectivity between HuginPC neurons and IPCs is not directly validated.

      (5) Developmental differences in sleep regulatory mechanisms are not thoroughly examined.

      We thank Reviewer #2 for the positive comments. As suggested, our study could not exclude the possibility that in addition to the Hugin/PK2-R1/IPCs axis, alternative pathways including the Hugin/DH44 axis could contribute to sleep control in larvae. We have added these points in Discussion. We also have added additional data to show mechanistic differences of larval and adult sleep control. Please see point-to-point responses for details.

      Reviewer #3 (Public review):

      Summary:

      Sleep affects cognition and metabolism, evolving throughout development. In mammals, infants have fast sleep-wake cycles that stabilize in adults via circadian regulation. In this study, the author performed a genetic screen for neurotransmitters/peptides regulating sleep and identified the neuropeptide Hugin and its receptor PK2-R1 as essential components for sleep in Drosophila larvae. They showed that IPCs express Pk2-R1 and silencing IPCs resulted in a significant increase in the sleep amount, which was consistent with the effect they observed in PK2-R1 knock-out mutants. They also showed that Hugin peptides, secreted by a subset of Hugin neurons (Hug-PC), activate IPCs through the PK2-R1 receptor. This activation prompts IPCs to release insulin-like peptides (Dilps), which are implicated in the modulation of sleep. They showed that Hugin peptides induce a PK2-R1 dependent calcium (Ca²⁺) increase in IPCs, which they linked to the release of Dilp3, showing a connection between Hugin signaling to IPCs, Dilp3 release, and sleep regulation. Additionally, the activation of Hug-PC neurons reduced sleep amounts, while silencing them had the opposite effect. In contrast to the larval stage, the Hugin/PK2-R1 axis was not critical for sleep regulation in Drosophila adults, suggesting that this neuropeptidergic circuitry has divergent roles in sleep regulation across different stages of development.

      Strengths:

      This study used an updated system for sleep quantification in Drosophila larvae, and this method allowed precise measurement of larval sleep patterns which is essential for the understanding of sleep regulation.

      The authors performed unbiased genetics screening and successfully identified novel regulators for larval sleep, Hugin and its receptor PK2-R1, making a substantial contribution to the understanding of neuropeptidergic control of sleep regulation.

      They clearly demonstrated the mechanism by which Hugin-expressing neurons influence sleep through the activation of IPCs via PK2-R1 with Ca2+ responses and can modulate sleep.

      Based on the demonstrated activation of PK2-R1 by the human Hugin orthologue Neuromedin U, research on human sleep disorders may benefit from the discoveries from Drosophila since sleep-regulating mechanisms are conserved across species.

      Weaknesses:

      The study primarily focused on sleep regulation in Drosophila larvae, showing that the Hugin/PK2-R1 axis is critical for larval sleep but not necessary for adult sleep. The effects of the Hugin axis in the adult are, however, incompletely explained and somewhat inconsistent. PK2-R1 knockout adults also display increased sleep, as does HugPC silencing, at least for daytime sleep. The difference lies in Dilp3/5 mutant animals showing decreased sleep and IPCs seemingly responding with reduced Dilp3 release to PK-2 treatment (Figure 6). It seems difficult to reconcile the author's conclusions regarding this point without additional data. It could be argued that PK2-R1 still regulates adult sleep, but not via Hugin and IPCs/Dilps.

      Another issue might be that the authors show relative sleep levels for adults using Trikinetics monitoring. From the methods, it is not clear if the authors backcrossed their line to an isogenic wild-type background to normalize for line-specific effects on sleep. Thus, it is likely that each line has differences in total sleep time due to background effects, e.g., their Kir2.1 control line showed reduced sleep relative to the compared genotypes. This might limit the conclusions on the role of Hugin/PK2-R1 on adult sleep.

      We thank Reviewer #3 for the valuable comments. According to the suggestions, we have included additional data of adult sleep phenotypes with IPCs/Dilps and HugPC/PK-2 manipulations. We believe that these additional data further support the idea that the Hugin/PR2/IPCs axis acts differently in larval and adult sleep control.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) Show all data as individual data points in the graphs. The use of box-and-whisker plots makes it difficult to determine how much variation there is in each experiment.

      According to the comments, we have changed all graphs to the dots-and-whisker plots (Figures 1–6; Figure 1—figure supplements 2–4; Figure 2—figure supplement 1; Figure 3—figure supplement 1 and 3; Figure 5—figure supplement 1; and Figure 6— figure supplements 1 and 3).

      (2) Show all larval sleep metrics (total sleep duration, bout #, bout length, & activity) over the first 6-hour period of 2nd instar development. Larval sleep changes over the course of 2nd instar development so showing an 18-hour period is not as informative for the different manipulations in the study. This also allows for a more thorough comparison to Szuperak et al 2018.

      According to the comments, we have shown all larval sleep metrics (total sleep duration, bout #, bout length, & activity) over the first 6 hours for PK2-R1 KO mutants (Figure 1-figure supplemental 5). These PK2-R1 mutant phenotypes are consistent with those described by our sleep amount data over an 18 hr period (Figure 1-figure supplemental 5). We thus consistently show all the sleep phenotype data in the 18 hr period window in the 2nd instar larvae in this paper.

      (3) Show activity values for every experiment. Behavior is based on locomotion, so there is a need to show that larvae in each manipulation do not have locomotive defects.

      According to the reviewer’s comments, we have shown the activity values for each experiment (Figure 2—figure supplement 1 and Figure 3—figure supplement 1). These data clearly indicated that changes in sleep amounts in each manipulation are not only due to locomotion alterations. We have thus added the sentence below at line 151156 in the manuscript.

      Locomotion changes were not consistently observed upon either activation or suppression of Hug neurons (Figure 3—figure supplement 1), suggesting that changes in sleep amounts is unrelated to locomotor alterations.

      (4) Provide additional explanation as to why PK2-R1 was pursued in the study. There are several candidates in Figure 1 - Figure Supplement 4 (like sNPF-Gal4, Dh31-Gal4, and DskGal4) that have effects on sleep. These have also not been studied in the context of larval sleep regulation.

      According to the reviewer’s comments, we have added the following sentences at line 108-114 in the manuscript.

      The role of PK2-R1 in larval sleep, on the other hand, has been unknown to date. Given its strong expression in insulin-producing cells (Schlegel et al., 2016) and its function as a receptor for the neuropeptide Hugin, which modulates feeding (Schoofs et al., 2014), we hypothesized that PK2-R1 might mediate neuropeptidergic signaling that links metabolic and sleep regulation during development. We thus focused on this gene as a candidate connecting behavioral and endocrine sleep control.

      (5) Insulin manipulations are known to disrupt Drosophila development (Rulifson et al, 2002). Therefore, it would be beneficial to show that larvae develop normally in dilp3 and dilp5 mutants by examining the time to pupal formation in these mutants compared to controls. If the mutant larvae take longer to reach the pupal stage, how do the authors know that the 2nd instar control and mutant larvae are the same developmental age? As indicated above, the developmental age of larvae does affect the total amount of sleep, so this could affect the authors' conclusions.

      We agree that this is an important point in this study. In each experiment, we carefully checked the developmental stage of larvae progeny by mouth hook analysis and measuring larval size and used only larvae with characteristics comparable to wildtype 2nd instar larvae. We have added these descriptions in Methods (line 411–416).

      (6) Figure 1 data is only supported by homozygous mutants & 1 fairly-broadly expressed Gal4 driver. The authors need to show that inactivation of PK2-R1 neurons with more tissuerestrictive Gal4 driver lines has the same effect as the other manipulations to further support the conclusions. Examining sleep in activation of PK2-R1 neurons with the broadly expressed Gal4 driver & UAS-TrpA1 would also provide better support for the conclusions.

      We agree. Indeed, we tried to narrow down to small subsets of neurons using multiple different Gal4 drivers, but unfortunately, we did not obtain potential candidates.

      Therefore, although our data show that the Hugin/PK2-R1axis contributes to sleep control in larvae, we cannot rule out the possibility that other axises could also function in larval sleep control. We mentioned this point in the original version of the submitted manuscript (line 134-137).

      (7) Provide more explanation as to how your methods of defining sleep compare/contrast to published papers. It is not clear how many frames = 1 sec in your recordings. The definition of sleep as 12 frames needs to include a time component as well. This allows for easier comparison to other published papers examining Drosophila larval sleep (Szuperak et al 2018; Churgin et al 2019; Poe et al 2023; Poe et al 2024).

      Our recordings were acquired at 0.87 frames per second. We have added this information in Method (line 431).

      (8) Figure 2 data is only supported by mutants & inactivation with 1 Gal4 driver per cell population. Showing activation of Gal4-expressing cells with UAS-TrpA1 would add more support to the conclusions.

      We have already showed the reduced sleep amounts in both HuginGAL4>ReaChR and HuginGAL4>TrpA larvae (Figure 3 C & D) in the original version.

      (9) Need to clarify in the methods how the authors calculated travel distances as a measure of locomotive activity. It's not clear if this is done during larval sleep experiments or in independent experiments. It is also not clear why the y-axes of Figure 2-Figure Supplement 1 are not consistent across the panels. Finally, the authors do see decreases in locomotive activity in PK2-R1>Kir2.1 and in dilp3 mutants, so the conclusions presented in the results section of the paper need to be modified to reflect those results.

      We calculated travel distances from the same video recording datasets used for sleep quantification. We have added this information in Method (line 431-435). As the reviewer indicated, locomotor activity was reduced in a part of conditions/mutants including PK2-R1 > Kir2.1 and dilp3 mutants, and therefore we cannot exclude the possibility that locomotion changes might contribute to sleep phenotypes. On the other hand, a large part of manipulations of Hugin neurons and IPCs caused a sleep increase without significant changes in locomotor activity (Figure 2—figure supplement 1 and Figure 3—figure supplement 1). It is thus likely that Hugin and IPCs contribute to sleep control independent of locomotion, whereas other neurons trapped by PK2-R1 GAL4 might contribute to locomotion control.

      (10) Given the role that hugin neurons play in Drosophila feeding (Schlegel et al, 2016), the authors should include feeding data for the hugin/PK2-R1 manipulations. It is also unclear from the methods if their thresholding for defining sleep can detect feeding behaviors. Changes in feeding behavior could explain some of the reported increases in sleep if feeding is not classified as a waking but is instead picked up as inactivity.

      We agree that this is an important point. According to reviewer’s points, we have added feeding amounts of the wild-type control and the HuginPC>Kir2.1 larvae (Figure 3-figure supplement 3). These data suggest that feeding amounts of the HuginPC>Kir2.1 larvae are significantly reduced compared to those of the control. Given that our data analysis typically categorized feeding behavior into “moving (not sleep)” (see Materials and Methods) and that HuginPC>Kir2.1 larvae showed increased sleep amounts compared to the wild-type control, it is likely that the increased sleep amounts in HuginPC>Kir2.1 larvae are unrelated to changes in feeding behavior.

      (11) The Hugin-IPC localization data (Figure 3E) would be better supported by the use of more specific synaptic and dendritic markers. Specifically, expressing Syt-eGFP (axon marker) in hugin neurons & DenMark (dendritic marker) in IPCs. Using GRASP or P2X2 to demonstrate the anatomical/functional connections between hugin & IPC neurons would also provide better support for this conclusion.

      According to the reviewer’s suggestion, we have added Syt-eGFP signals in HuginPC neurons (Figure 4—figure supplement 1). We also tried DenMark expression in IPCs, but we could not obtain dipl3>DenMark F1 progeny for unknown season. We also applied GRASP to the HuginPC-IPCs interaction, but we could not detect obvious GRASP signals. It is well known that peptidergic transmission is often independent of conventional synapse structures, called as volume transmission, in which peptidergic signals can transmit over a long-range distance to targeting neurons. It is thus possible that IPCs might receive Hugin signals from HuginPC neurons through volume transmission.

      (12) Figure 4 is missing temperature controls for thermal activation experiments. Also missinggenetic control for UAS/+. It would be more convincing to see experiments in Figure 4 with the more specific hug-PC-Gal4 line instead of the broadly expressed hugin-Gal4 line.

      According to reviewer’s comments, we have added the control data in Figure 4.

      (13) Representative images for Figure 4B & 4C would provide better support for the quantifications & conclusions presented.

      According to the reviewer’s suggestions, we show the representative imagine for Figure 4B and 4C (please see Author response image 1). We are, however, afraid that these images might not help readers’ further understanding in addition to the quantitative data, so we have decided to not add these images in the manuscript.

      Author response image 1.

      mCD8::mCherry (top) and CRTC::GFP (bottom) are shown under high-temperature conditions without ("−") or with ("+") hugin neuron activation. "-" denotes a high-temperature genetic control lacking LexAop-TrpA1, thus no thermogenetic activation occurs. CRTC::GFP is shown in pseudocolor.

      (14) A more zoomed-out image of all the IPC neurons in the bath application of hugin peptides (Figure 5D) would help with the interpretation of the results. It's not clear if the authors only measured the same exact neuron in each IPC cluster or if they examined all of the IPC neurons. If they measured all of the IPC neurons, did they observe similar results across the different neurons? How much variability is there in the response of IPC neurons to hugin peptide application?

      For Figure 5, we obtained images of multiple brains from each genotype and quantified the NLI values from all IPC neurons. For reference, we show plots of the CRTC signals of Figure 5C each brain by bran (Author response image 2). We have added detailed information of CRTC analysis in Methods (lines 552-554).

      Author response image 2.

      Distribution of CRTC signals across individual brains. Plots of nuclear localization index (NLI) for individual brains, corresponding to the conditions shown in Figure 5C. The x-axis represents each larval brain preparation, and each dot indicates the NLI value of a single IPC neuron. Horizontal bars represent the median within each brain. These plots illustrate variability both within and across individual brains.

      (15) The conclusion that application of Hug peptides results in dilp3 release is not well supported (Figure 5E). There is a large amount of variation in anti-dilp3 signal. Representative images for these quantifications would be beneficial. The authors also don't directly show that dilp3 vesicles are released. They only see a reduction in antibody accumulation in IPCs. Could there be other reasons for the reduction in accumulation in the IPCs? Would changes in dilp3 gene expression or membrane localization cause a reduction in signal? Showing that actual release of dilp3 is affected by Hug peptides using a reporter like ANF-GFP would be more convincing.

      According to the reviewer’s comments, we have added representative images (Figure 5—figure supplement 2). As for the ex vivo experiments in Fig5, we treated the extracted brain tissues with Hugin/NMU peptides for only 5minutes. It is thus most likely that reduction of Dilps in IPCs is mediated by Hugin/PK2-R1 signal-dependent secretion, rather than transcriptional control and/or degradation of Dilps.

      (16) Show all sleep metrics (total sleep duration, bout #, bout length, and activity) for adult sleep experiments. Showing relative total sleep for the adult experiments is confusing & would benefit from plots of total average sleep in minutes for each genotype.

      According to the reviewer’s comments, we have added the sleep metrics in adults (Figure 6; Figure 6-figure supplement 3).

      (17) The authors can't conclude that expression patterns of PK2-R1 & hug between larvae & adults are "almost comparable." They don't track neurons over development or immortalize neurons in larvae & check expression patterns in adults. They need to show some type of quantification to support these claims. Or revise the text to remove this conclusion.

      We agree. We have changed our augments as follow (line 211-214).

      Interestingly, the expression patterns of PK2-R1 and Hug as well as the morphology of HugPC neurons in adults appeared to be similar to those in larvae (Figure 6—figure supplement 2), implying that the differential roles of Hug in larvae vs adults are likely due to physiological differences in HugPC neurons and/or IPCs.

      (18) For Figure 6, what effect does genetic inactivation of IPCs have on adult sleep? A more specific manipulation of these cells would provide better support for the conclusion that IPC manipulations have distinct effects on larval & adult sleep. The sleep traces for the hugin manipulation & dilp mutants (Figure 6-Figure Supplement 1) also look inconsistent when comparing genetic controls in (Figure 6-Figure Supplement 1D) or when comparing the dilp mutants. Plotting this data as total sleep amount in the day & night (2 separate graphs) would be beneficial. It would also be helpful to see additional sleep traces for these experiments.

      According to the reviewer’s comments, we have added the sleep amounts of added dilp3 and dilp5 adults (Figure 6-figure supplement 1C-D) as well as IPC silencing (Figure6-figure supplement 3D) in a daytime/night time sleep-separated manner.

      (19) For Figure 6, what effect does thermogenetic activation of hugin neurons have on IPC activity? The authors demonstrate in Figure 5 that thermal activation results in an increase in larval IPC activity, but they do not show these experiments in the adult brain. These experiments would provide more support for their conclusion that hugin has differential effects on IPC activity depending on the developmental age (larvae vs adults).

      According to the reviewer’s comments, we performed thermo-activation of hugin neurons and found no significant effects on adult IPCs (see Author response image 3), consists with the ex vivo data in Figure 6.

      Author response image 3.

      (20) A figure legend is needed for Figure 7. The model is not self-explanatory, nor is there an adequate explanation in the discussion section.

      We have added legends (line 781-785).

      (21) Since hugin is known to be downstream of Dh44 in larvae, the discussion needs to include comparison to published work on Dh44 in larvae (Poe et al, 2023). The hugin receptor, PK2R1, is also expressed in Dh44 & DMS neurons (Schlegel et al, 2016), so a discussion of what role Dh44/DMS neurons may play in their model is necessary.

      We agree. We have added discussion as follow in Discussion (line 313-320).

      We cannot rule out the possibility that other neurons could function downstream of HuginPC neurons in sleep regulation. For instance, given that Dh44 neurons in the brain promote arousal (Poe et al. 2023) and are PK2-R1-positive (Schlegel et al. 2016), Hugin might control sleep in part through Dh44 neurons.

      (22) Minor point: Line 97 should say "resulted in a significant sleep increase." Currently, it says "decrease" which is not what is depicted in the figure.

      We appreciate the reviewer’s point. We have corrected this.

      (23) Minor point: Figure 5 should be renamed as Figure 4 since the text describing the results in Figure 5A & 5B occurs before the text describing the results in Figure 4.

      We do understand the point the reviewer arose. However, since Fig5A explains the experimental setup of the whole Fig5s, we would like to keep Fig5A at the original position.

      Reviewer #2 (Recommendations for the authors):

      First, the study would benefit from a more comprehensive discussion of previous research, particularly the work by Schlegel et al. (2016) and Melcher and Pankratz (2006). A key inconsistency that should be addressed is the observation that hugin mutant larvae exhibit reduced body size and feeding behavior, which may influence Dilp2 secretion. The selective effect on Dilp3 and Dilp5 without affecting Dilp2 warrants further clarification. Conducting conditional gene expression experiments to control hugin, dilp3, and dilp5 expression, along with neuronal activity modulation, would help determine whether the observed effects are direct or secondary consequences.

      According to the review’s comments, we tried to manipulate neuronal activity in IPCs, but unfortunately, expression of Kir2.1 in IPCs caused die or very weak animals. Instead, we cited a recent paper that shows a differential secretion of Dilp2 and Dilp6 in a stimulant-dependent manner (Suzawa et al. PNAS 2025) and added more discussion about selective Dilp3/5 secretion by Hugin-PK2-R1 signals (line 275-297).

      Second, the specificity of IPC secretion mechanisms should be clarified. Given that IPCs coexpress Dilp2, Dilp3, and Dilp5, it remains unclear how the pathway selectively modulates Dilp3 and Dilp5 while leaving Dilp2 unaffected. Additional experiments, such as electron microscopy, could provide insights into whether anatomical differences in vesicular pools influence peptide secretion. Since hugin mutants are reported to have reduced body size, confirming that Dilp2 secretion remains truly unchanged is crucial for eliminating potential indirect effects.

      We thank this reviewer for the valuable suggestions. Since the selective Dilp secretion mechanisms in IPCs are not the main scope in this paper, we would like to attempt detailed EM analysis in next studies. We cited a recent paper that shows a differential secretion of Dilp2 and Dilp6 from IPCs in a stimulant-dependent manner (Suzawa et al. PNAS 2025) and added more discussion about selective Dilp3/5 secretion by Hugin-PK2-R1 signals (line 275-297).

      Third, the study should explore the potential role of alternative circuits, such as the HuginPCDH44 pathway, in sleep regulation. The observation that DH44 mutants exhibit even greater sleep amounts than PK2-R1 mutants suggests the involvement of additional regulatory mechanisms. Prior studies indicate that HuginPC neurons may influence DH44 neuron activity, which could impact sleep. Furthermore, recent findings link DH44 with starvation-induced sleep loss in adult flies. Discussing and experimentally investigating the HuginPC-DH44 axis in larval sleep regulation would provide additional depth to the study.

      As far as we understand, any direct evidence for HuginPC→DH44 pathway has not been reported in larvae as well as adults. Instead, DH44 influences Hugin neuron activity in adults (King et al. 2017). We thus examined whether optogenetic DH44 activation could influence HuginPC activity using CRTC analysis, but unfortunately, we could not detect significant changes in HuginPC activity.

      Given that PK2-R1 is expressed in DH44-positive neurons (Schelgel et al 2016) and that DH44-positive neurons are localized at the regions to which HuginPC neurons innervate, it is still possible that the HuginPC→DH44 pathway might function in parallel to the HuginPC→IPCs pathway. We feel that this is quite an interesting possibility and should be a nice scope in the next paper.

      Fourth, validating the functional connectivity between HuginPC neurons and IPCs using calcium imaging would significantly enhance the study. Employing real-time calcium imaging with GCaMPs would provide direct evidence of synaptic activity between these neuronal populations. Such data would strengthen the claim that the observed sleep regulatory effects result from direct neural communication rather than secondary systemic influences.

      We agree. Indeed, we tried Ca<sup>2+</sup> imaging of HuginPC neurons and IPCs in living larvae as well as using ex vivo preparations, and realized that it was quite technically difficult to obtain reliable Ca<sup>2+</sup> dynamics data in the brain of living larvae/ex vivo brain tissue. Therefore, instead of live Ca<sup>2+</sup> imaging, we performed the CRTC analysis using fixed brain preparations. We have added the mention that we tried Ca<sup>2+</sup> imaging in the larval brain, but it did not work well (line 555-558).

      Finally, a more detailed discussion of developmental differences in sleep regulatory mechanisms would be beneficial. The manuscript should address why genes involved in sleep modulation during development may function differently from their roles in adult sleep regulation. Providing a conceptual framework or experimental evidence to explain these developmental differences would enhance the study's contribution to understanding the evolution of sleep circuits. Clarifying how these findings fit into broader sleep regulation models would increase the impact of the research.

      We agree. We would like to add discussions about how factors/circuits involved in sleep modulation during development may function differently from their roles in adult sleep regulation as follows (line 349-371), as it is rather difficult to discuss why.

      It is thus possible that Hugin/PK2-R1 signaling along the HugPC-IPCs circuitry is suppressed in adults. IPCs in adults receive multiple positive and negative modulatory inputs through GPCRs including the metabotropic GABA<sub>B</sub> receptors (Enell et al., 2010), which suppresses IPC activity and Dilp release in adult IPCs (Enell et al., 2010). It is thus plausible that such negative modulatory inputs to IPCs in adults might counteract with the Hugin/PK2-R1 axis to suppress Dilp release. In addition, our data suggest that Dilps modulate sleep amount in the opposite directions in larvae and adults (Figure 7). Comparing the expression levels and activities of GPCRs in larval and adult IPCs would be essential to better understand how the same modulatory signals over the course of development come to exert differential impacts on sleep. Interestingly, Hugin in adults appears irrelevant for the baseline sleep amount but is required for homeostatic regulation of sleep (Schwarz et al., 2021). Thus, testing if Hugin/PK2-R1 axis is involved in the homeostatic regulation of larval sleep, and how such a system compares to its adult counterpart, may further provide mechanistic insights into how homeostatic sleep regulation matures over development.

      By addressing these aspects, the manuscript will provide a clearer, more robust, and wellsupported analysis of larval sleep regulation. These refinements will help improve the study's clarity and impact, ensuring that its findings are effectively communicated to the research community.

      Reviewer #3 (Recommendations for the authors):

      (1) Line 97: "Silencing neurons expressing Oamb and PK2-R1 resulted in a significant sleep decrease?" But there is an increase in sleep amounts from Figure 1A. (Typo error).

      We thank the reviewer for pointing out this typo. We have corrected this typo in the revised version.

      (2) Line139: "HugPC and IPCs labeled by Dilp3-GAL4 are located in close proximity to each other." While proximity does not equal synaptic connections, direct connectivity of HugPC and IPCs was already shown in larval connectome analyses with HugPC providing the strongest input of larval IPCs (Hückesfeld et al. eLife 2021). This could be cited in this context instead.

      We agree. We have cited this paper in References (line 163).

      (3) Figure 2 Supplement 1: Locomotion speed is affected in PK2-R1 knockouts; what is the significance regarding the observed sleep increase?

      We agree that this is a very important point. As the reviewer pointed out, since locomotion speed was reduced in PK2-R1 KO larvae, sleep increase phenotype in PK2-R1 KO larvae might be in part due to reduction of locomotion. On the other hand, IPCs silencing by Kir2.1caused sleep increase phenotype without significant changes in locomotion (Figure 2; Figure 2-figure supplement 1). It is thus possible that since PK2-R1 is broadly expressed in the nervous system in addition to IPCs (Figure 2), PK2-R1 neurons other than IPCs might contribute to locomotion control.

      (4) Why are Dilp3 levels changing (increasing) in adult IPCs after PK-2 treatment? This is not mentioned in the text and is not discussed at all.

      As the reviewer indicated, this data is unexpected to us. At this moment, we could only assume that PK-2 could act in larval and adult IPCs in a different manner. We have added this sentence in Results (line 211-214).

      (5) It has been shown in other publications that Dilps play a role in sleep regulation (Cong et al., Sleep 2015), this study should be cited.

      We have cited this paper in References (line 224).

      (6) The order of discussing figure panels is sometimes confusing, e.g. Figure 6C is discussed at the very end after 6D-F.

      We agree. Indeed, we discussed a lot about this order during preparation of the first draft. However, we finally decided the current order, as grouping “sleep phenotype data” and “ex vivo data” should be easier to understand for readers. We thus keep the current order in the revised submission.

    1. eLife Assessment

      This important article reports on the role of specific interneurons in the motion processing circuitry of the fruit fly, and marshals convincing evidence from neural recording, genetic manipulation, and behavioral analysis. A significant result ties the activity of C2/C3 neurons to the temporal resolution of the motion vision system. It remains unclear whether disrupting this pathway affects the dynamics of vision more generally.

    2. Reviewer #1 (Public review):

      Summary:

      In this manuscript, Henning et al. examine the impact of GABAergic feedback inhibition on the motion-sensitive pathway of flies. Based on a previous behavioral screen, the authors determined that C2 and C3, two GABAergic inhibitory feedback neurons in the optic lobes of the fly, are required for the optomotor response. Through a series of calcium imaging and disruption experiments, connectomics analysis, and follow-up behavioral assays, the authors concluded that C2 and C3 play a role in temporally sharpening visual motion responses. While this study employs a comprehensive array of experimental approaches, I have some reservations about the interpretation of the results in their current form. I strongly encourage the authors to provide additional data to solidify their conclusions. This is particularly relevant in determining whether this is a general phenomenon affecting vision or a specific effect on motion vision. Knowing this is also important for any speculation on the mechanisms of the observed temporal deficiencies.

      Strengths:

      This study uses a variety of experiments to provide a functional, anatomical, and behavioral description of the role of GABAergic inhibition in the visual system. This comprehensive data is relevant for anyone interested in understanding the intricacies of visual processing in the fly.

      Weaknesses:

      The most fundamental criticism of this study is that the authors present a skewed view of the motion vision pathway in their results. While this issue is discussed, it is important to demonstrate that there are no temporal deficiencies in the lamina, which could be the case since C2 and C3, as noted in the connectomics analysis, project strongly to laminar interneurons. If the input dynamics are indeed disrupted, then the disruption seen in the motion vision pathway would reflect disruptions in temporal processing in general and suggest that these deficiencies are inherited downstream. A simple experiment could test this. Block C2, C3, and both together using Kir2.1 and shibiere independently, then record the ERG. Alternatively, one could image any other downstream neuron from the lamina that does not receive C2 or C3 input.

      Figure 6c. More analysis is required here, since the authors claim to have found a loss in inhibition (ND). However, the difference in excitation appears similar, at least in absolute magnitude (see panel 6c), for PD direction for T4 C2 and C3 block. Also I predict that C2&C3 block statistically different from C3 only, why? In any case, it would be good to discuss the clear trend in the PD direction by showing the distribution of responses as violin plots to better understand the data. It would be also good to have some raw traces to be able to see the differences more clearly, not only polar plots and averages.

      The behavioral experiments are done with a different disruptor than the physiological ones. One blocks chemical synapses, the other shunts the cells. While one would expect similar results in both, this is not a given. It would be great if the authors could test the behavioral experiments with kir2.1 too.

      Comments on revisions:

      I have no further comments.

    3. Reviewer #2 (Public review):

      The work by Henning et al. explores the role of feedback inhibition in motion vision circuits, providing the first identification of inhibitory inheritance in motion-selective T4 and T5 cells of Drosophila. Among the strengths of this work is the verification of the GABAergic nature of C2 and C3 with genetic and immunohistochemical approaches. In addition, double-silencing C2&C3 experiments help to establish a functional role for these cells. The authors holistically use the Drosophila toolbox to identify neural morphologies, synaptic locations, network connectivity, neuronal functions and the behavioral output.

      A limitation of the study is that the mediating neural correlates from C2&C3 to T4&T5 are not clarified, rather Mi1 is found to be one of them. In the future, the same set of silencing experiments performed for C2-Mi1 could be extended to C2 &C3-Tm1 or Tm4 to find the T5 neural mediators of this feedback inhibition loop. Future experiments might also disentangle the parallel or separate function of C2 and C3 neurons.

      In summary, this work advances our current knowledge in Drosophila motion vision and sets the way for further exploring the intricate details of direction selective computations.

      Comments on revisions:

      A label for T5 is missing from Figure 5b. Thank you for addressing our concerns and considering each of our suggestions.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this manuscript, Henning et al. examine the impact of GABAergic feedback inhibition on the motion-sensitive pathway of flies. Based on a previous behavioral screen, the authors determined that C2 and C3, two GABAergic inhibitory feedback neurons in the optic lobes of the fly, are required for the optomotor response. Through a series of calcium imaging and disruption experiments, connectomics analysis, and follow-up behavioral assays, the authors concluded that C2 and C3 play a role in temporally sharpening visual motion responses. While this study employs a comprehensive array of experimental approaches, I have some reservations about the interpretation of the results in their current form. I strongly encourage the authors to provide additional data to solidify their conclusions. This is particularly relevant in determining whether this is a general phenomenon affecting vision or a specific effect on motion vision. Knowing this is also important for any speculation on the mechanisms of the observed temporal deficiencies.

      Strengths:

      This study uses a variety of experiments to provide a functional, anatomical, and behavioral description of the role of GABAergic inhibition in the visual system. This comprehensive data is relevant for anyone interested in understanding the intricacies of visual processing in the fly.

      Weaknesses:

      (1) The most fundamental criticism of this study is that the authors present a skewed view of the motion vision pathway in their results. While this issue is discussed, it is important to demonstrate that there are no temporal deficiencies in the lamina, which could be the case since C2 and C3, as noted in the connectomics analysis, project strongly to laminar interneurons. If the input dynamics are indeed disrupted, then the disruption seen in the motion vision pathway would reflect disruptions in temporal processing in general and suggest that these deficiencies are inherited downstream. A simple experiment could test this. Block C2, C3, and both together using Kir2.1 and Shibire independently, then record the ERG. Alternatively, one could image any other downstream neuron from the lamina that does not receive C2 or C3 input.

      Given the prominent connectivity of C2 and C3 to lamina neurons, we actually expected that lamina processing is also affected. We did the experiment of silencing C2 and recording in the lamina neuron L2 and found no significant difference in their response profile (Author response image 1).

      Author response image 1.

      Calcium responses of L2 axon terminals to full field ON and PFF flashes for controls (grey, N=8 flies, 59 cells) or while genetically silencing C2 using shibire<sup>ts</sup> (magenta, N=4 flies, 26 cells). Traces show mean +- SEM.

      We could include these data in the main manuscript, but we do not really feel comfortable in claiming that C2 and C3 have a specific role in motion processing only, even if it was predominantly affecting medulla neurons. To our knowledge, how peripheral visual circuitry contributes to any other visual behaviors, such as object detection, including the pursuit of mating partners, or escape behaviors, is not well understood. Instead, we added a sentence to the discussion stating that our work does not exclude that, given their wide connectivity, C2 and C3 are also involved in other visual computations.

      (2) Figure 6c. More analysis is required here, since the authors claim to have found a loss in inhibition (ND). However, the difference in excitation appears similar, at least in absolute magnitude (see panel 6c), for PD direction for the T4 C2 and C3 blocks. Also, I predict that C2 & C3 block statistically different from C3 only, why? In any case, it would be good to discuss the clear trend in the PD direction by showing the distribution of responses as violin plots to better understand the data. It would also be good to have some raw traces to be able to see the differences more clearly, not only polar plots and averages.

      We apologize: The plots in the manuscript show the mean across all cells, but the statistics were done more conservatively, across flies. We corrected this mismatch and the figure now shows the mean ± ste across flies after first averaging across cells within each fly. Thank you for pointing this out. Since we recorded n=6-8 flies per genotype, we did not include violin plots, which would indeed make sense if we showed data for each cell.

      (3) The behavioral experiments are done with a different disruptor than the physiological ones. One blocks chemical synapses, the other shunts the cells. While one would expect similar results in both, this is not a given. It would be great if the authors could test the behavioral experiments with Kir2.1, too.

      We have tried this experiment, but unfortunately, flies were not walking well on the ball, and we were not able to obtain data of sufficient quality.

      Reviewer #2 (Public review):

      Summary:

      The work by Henning et al. explores the role of feedback inhibition in motion vision circuits, providing the first identification of inhibitory inheritance in motion-selective T4 and T5 cells of Drosophila. This work advances our current knowledge in Drosophila motion vision and sets the way for further exploring the intricate details of direction-selective computations.

      Strengths:

      Among the strengths of this work is the verification of the GABAergic nature of C2 and C3 with genetic and immunohistochemical approaches. In addition, double-silencing C2&C3 experiments help to establish a functional role for these cells. The authors holistically use the Drosophila toolbox to identify neural morphologies, synaptic locations, network connectivity, neuronal functions, and the behavioral output.

      Weaknesses:

      The authors claim that C2 and C3 neurons are required for direction selectivity, as per the publication's title; however, even with their double silencing, the directional T4 & T5 responses are not completely abolished. Therefore, the contribution of this inherited feedback in direction-selective computations is not a prerequisite for its emergence, and the title could be re-adjusted.

      We adjusted the title to “are involved in motion detection.”

      Connectivity is assessed in one out of the two available connectome datasets; therefore, it would make the study stronger if the same connectivity patterns were identified in both datasets.

      We did not assume large differences between the datasets because Nern et al. 2025 described no major sexual dimorphism. To verify this, we now plotted C2 and C3 connectivity from the three major EM datasets that include C2/C3 connectivity, the female FAFB dataset (Zheng et al. 2018, Dorkenwald et al. 2024, Schlegel et al. 2024) the male visual system (Nern et al. 2025), and the 7-column dataset (Takemura et al. 2015) and see no major differences (Author response image 2 and Author response image 3).

      Author response image 2.

      Relative pres- and post-synaptic counts for C3 from 3 different data sets. Shown are up to ten post- or pre-synaptic partner neurons.

      Author response image 3.

      Relative pres- and post-synaptic counts for C2 from 3 different data sets. Shown are up to ten post- or pre-synaptic partner neurons.

      The mediating neural correlates from C2 & C3 to T4 & T5 are not clarified; rather, Mi1 is found to be one of them. The study could be improved if the same set of silencing experiments performed for C2-Mi1 were extended to C2 &C3-Tm1 or Tm4 to find the T5 neural mediators of this feedback inhibition loop. Stating more clearly from the connectomic analysis, the potential T5 mediators would be equally beneficial. Future experiments might also disentangle the parallel or separate functions of C2 and C3 neurons.

      We fully agree that one could go down this route. Given the widespread connectivity of C2 and C3, and the fact that these are time-consuming experiments with often complex genetics, we had decided to instead study the “compound effect” of C2 and C3 silencing by analyzing T4/T5 physiological properties and motion-guided behavior. We now explicitly explain this logic by saying, “To understand the compound effect of C2 and C3 on motion processing, we focused on the direction-selective T4/T5 neurons, which are downstream of many of the neurons that C2 and C3 directly connect to.”

      Finally, the authors' conclusions derive from the set of experiments they performed in a logical manner. Nonetheless, the Discussion could benefited from a more extensive explanation on the following matters: why do the ON-selective C2 and C3 neurons control OFF-generated behaviors, why the T4&T5 responses after C2&C3 silencing differ between stationary and moving stimuli and finally why C2 and not C3 had an effect in T5 DS responses, as the connectivity suggests C3 outputting to two out of the four major T5 cholinergic inputs.

      Apart from the behavioral screen results, we only tested ON edges in our more detailed behavioral characterizations. And while we show phenotypes for the OFF-DS cell T5, it is well established that inhibitory cells that respond to one contrast polarity can function in the pathway with the opposite contrast polarity (e.g., the OFF-selective Mi9 in the ON pathway). We realized that our narrative in the results section was misleading in this regard (we had given the ON selectivity of C2/C3 as one argument why we first focused on the ON pathway) and eliminated this argument.

      For the differential involvement of C2/C3 for T4/T5 responses to stationary and moving stimuli (C2 and C3 silencing affects both T4 and T5 DS responses, but mostly T4 flash responses): We mostly took the disinhibition of flash responses in T4 as a motivation to look more specifically at a potential role in motion-computation. We now added a sentence about the potential emergence of these flash responses to the already extensive discussion paragraph “How could inhibitory feedback neurons affect motion detection in the ON pathway?”

      Last, we added a discussion point about the relationship between C2 and C3 connectivity and the functional consequences, and discussed the fact that C3 connectivity alone does not correlate with a functional role of C3 (alone) in DS computation.

      Reviewer #3 (Public review):

      Summary:

      This article is about the neural circuitry underlying motion vision in the fruit fly. Specifically, it regards the roles of two identified neurons, called C2 and C3, that form columnar connections between neurons in the lamina and medulla, including neurons that are presynaptic to the elementary motion detectors T4 and T5. The approach takes advantage of specific fly lines in which one can disable the synaptic outputs of either or both of the C2/3 cell types. This is combined with optical recording from various neurons in the circuit, and with behavioral measurements of the turning reaction to moving stimuli.

      The experiments are planned logically. The effects of silencing the C2/C3 neurons are substantial in size. The dominant effect is to make the responses of downstream neurons more sustained, consistent with a circuit role in feedback or feedforward inhibition. Silencing C2/C3 also makes the motion-sensitive neurons T4/T5 less direction-selective. However, the turning response of the fly is affected only in subtle ways. Detection of motion appears unaffected. But the response fails to discriminate between two motion pulses that happen in close succession. One can conclude that C2/C3 are involved in the motion vision circuit, by sharpening responses in time, though they are not essential for its basic function of motion detection.

      Strengths:

      The combination of cutting-edge methods available in fruit fly neuroscience. Well-planned experiments carried out to a high standard. Convincing effects documenting the role of these neurons in neural processing and behavior.

      Weaknesses:

      The report could benefit from a mechanistic argument linking the effects at the level of single neurons, the resulting neural computations in elementary motion detectors, and the altered behavioral response to visual motion.

      We agree that we cannot fully draw this mechanistic argument, but we also do not think that this is a realistic goal of this study. Even in a scenario where one would measure the temporal and spatial properties of “all” neurons that are connected to C2 and C3, this would likely not reveal the full mechanisms linking the single neurons to DS computation, but would require silencing specific connections, or specific molecular components of the connection, or could be complemented by models. A beautiful example where such a mechanistic understanding was achieved, recently published in Nature, essentially focused on a single synaptic connection (between Mi9 and T4) (Groschner et al. 2024), and built on extensive work that had already highlighted the importance of these neurons. We would further argue that the field does not have a good understanding of how T4/T5 responses are translated into behavior. Although possible pathways emerge from connectomes, it is for example not understood why the temporal frequency tuning of T4/T5 substantially differs from the temporal frequency tuning of the optomotor response.

      We therefore would like to highlight that the focus of our study was not to connect all those pieces, but rather to highlight the hitherto unknown overall importance of inhibitory feedback neurons for visual computations along the visual hierarchy, from individual neuron properties, via DS computation, to the temporal precision of the optomotor response.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) Line 52: "The functional significance of feedback neurons, particularly inhibitory feedback mechanisms, in early visual processing is not understood."

      This is incorrect not only because it is referred to as a general statement, but also because many studies have examined inhibition in flies. It may not be solely GABAergic inhibition, but that is just one type. While some discussions later address feedback from horizontal cells in the retina, etc., there is no mention of work on color vision, which requires feedback. Please rephrase.

      We now say “visual motion processing” in this sentence, and added a sentence on color vision: “... color-opponent signalling requires reciprocal inhibition between photoreceptors as well as feedback inhibition from distal medulla (Dm) neurons. (Schnaitmann et al., 2018, Heath et al., 2020, Schnaitmann et al., 2024). “

      (2) Line 197: "Because a previous studies" One or many?, but more important, please cite them.

      We corrected to “a previous study” and cite Tuthill et al. 2013

      (3) Line 172: I noticed a few minor grammatical errors and wording issues, such as the use of "we next" twice in one sentence. "To next identify potential GABAergic neurons that are important for motion computation in the ON pathway, we next intersected 12 InSITE-Gal4." I am bad at picking them out, but since I noticed them, I would strongly suggest looking at the text carefully again.

      We deleted one occurrence of ‘next’, thank you for catching that.

      (4) Question to the authors. Why did you use twice independent lines and not checkers for the white noise analysis in Figure 3e?

      We used flickering bars because many visual system neurons tested in our lab respond with a better signal-to-noise ratio as compared to checkerboards. Flickering bars also appear to be more suited to isolate the spatial surround of neurons. This type of stimulus has been successfully used in previous studies to extract receptive fields of neurons in the fly visual system (Arenz et al. 2017; Leong et al., 2016, Salazar-Gatzimas et al. 2016; Fisher et al. 2015, …).

      (5) Line 248: "Because C2 emerged as a prominent candidate from the behavioral screen, we focused on C2 and asked how silencing C2 affects..." Please state how here. I would need to go to the methods.

      We added a sentence “C2 was silenced by expression of UAS-shibire<sup>ts</sup> (UAS-shi<sup>ts</sup>) for temporal control of the inhibition of synaptic activity.”

      (6) Much of the work in the blowfly uses picrotoxinin to block GABAergic inhibition in the visual motion pathway. It would be useful to mention some of this early work and its results, particularly that of Single et al. (1997). It might be interesting to reinterpret their results.

      Thank you for pointing this out. We added this paragraph to the discussion: ‘Work in blowflies has found a severe impact of GABAergic signaling for DS in LPTCs downstream of T4 and T5 cells, using application of picrotoxin to the whole brain (Single et al. 1997; Schmid and Bülthoff 1988). Although the loss of DS in LPTCs could originate from direct inhibitory synapses onto LPTCs (Mauss et al. 2015; Ammer et al. 2023), the disruption of GABAergic signaling in upstream circuitry, which reduces DS in T4 and T5, may also contribute to the phenotype seen in LPTCs.’

      Reviewer #2 (Recommendations for the authors):

      The following set of corrections aims to better the scientific and presentation aspects of this work.

      (1) The title of the work implies that C2 and C3 neurons are required for motion processing, whereas the study shows their participation in motion computations, which persists post their silencing. Therefore, "Inhibitory columnar feedback neurons contribute to Drosophila motion processing" would be a more appropriate title.

      We rephrased the title to say that inhibitory feedback neurons “are involved in” motion processing.

      (2) The morphology of C2 and C3 neurons, i.e., ramifications in medulla & cell body in medulla and axonal targeting to lamina, implies their feedback role. It would be important to mention the specific feedback loop they participate in and the role of Mi1 more extensively in lines 36, 120.

      We find it hard to speculate on the specific feedback loops that C2 and C3 are involved in from their widespread input and output connectivity. If we had, we would have wanted to support this by functional measurements of this specific loop, which was not the goal of this study.

      (3) In lines 55-89, the authors explore the instances of feedback inhibition within and across species and modalities. For the Drosophila visual example (lines 76-89), given that it also addresses motion circuits, the following studies should be included:

      Ammer, G., Serbe-Kamp, E., Mauss, A.S., et al. Multilevel visual motion opponency in Drosophila. Nat Neurosci 26, 1894-1905 (2023). https://doi.org/10.1038/s41593-023-01443-z. Mabuchi Y, Cui X, Xie L, Kim H, Jiang T, Yapici N. Visual feedback neurons fine-tune Drosophila male courtship via GABA-mediated inhibition. Curr Biol. 2023 Sep 25;33(18):3896-3910.e7. doi: 10.1016/j.cub.2023.08.034.

      We added a sentence on the Ammer et al. finding to the introduction. Since the introduction paragraph focuses on known physiological effects within the visual system, we did not find a good fit for the Mabuchi et al. study, which focuses on serotonergic feedback neurons with a role far downstream in courtship behavior.

      (4) In lines 102-103, the following work should be referenced: Groschner LN, Malis JG, Zuidinga B, Borst A. A biophysical account of multiplication by a single neuron. Nature. 2022 Mar;603(7899):119-123. doi: 10.1038/s41586-022-04428-3.

      We cited a few of the many papers that used “modeling frameworks” and selected the ones focusing on the entire feedforward circuitry. To also give credit to the Borst lab, we instead added Serbe et al. 2016 here.

      (5) In lines 107-108, the Braun et al. (2023) study has not performed Rdl knockdown experiments in T4 cells; hence, it needs to be better clarified in the text.

      We corrected this in the text.

      (6) Even though the dataset was previously published, a summary plot of the different phenotypes would be very helpful to the reader. Moreover, in line 131, as the study focuses on motion vision, it would be better to use "early motion visual processing" rather than "early visual processing.”

      We added a summary plot of the behavioral screen data to Supplementary figure 1, and rephrased previous line 131.

      (7) The first result section title excludes C3 neurons, even though in lines 172-179 they are addressed; therefore, the C3 inclusion is suggested as in "GABAergic C2 and C3 neurons control behavioral responses to motion cues". The term "required" should be excluded from the title as the other neuronal types encountered in the InSITE drivers were never quantified; thus, the "behavioral requirement" might come from these other neurons as well.

      From the experiments shown in this paragraph alone we cannot make conclusive claims about C3, as it was also weakly visible in one of our genetic control in the intersectional strategy that we took (we had written: “This strategy also revealed other GABAergic cell types, including the columnar neuron C3 and the large amacrine cell CT1 which were however also weakly present in the gad1-p65AD control).

      We changed the title of this paragraph to: A forward genetic behavioral screen identifies GABAergic C2 neurons to be involved in motion detection.

      (8) In line 142, it should be clearly stated that the MultiColor FlpOut technique was used and should also be cited: Nern A, Pfeiffer BD, Rubin GM. Optimized tools for multicolor stochastic labeling reveal diverse stereotyped cell arrangements in the fly visual system. Proc Natl Acad Sci U S A. 2015 Jun 2;112(22):E2967-76. doi: 10.1073/pnas.1506763112.

      We did not use MCFO clones, but simple Flp-out clones, and the genotype and reference for this were given in the methods: UAS-FRT-CD2y+-RFT-mCD8::GFP; UAS-Flp , (Wong et al. 2002). To make this clearer, we now also cite (Wong et al. 2002) in the results section.

      (9) In Figure 1c, a description of RFP should be written as it is already in Supplementary Figure 1c.

      We added this to the Figure caption.

      (10) In line 172, "next" is redundant as it was previously used at the beginning of the sentence.

      Removed

      (11) In line 175, based on both figures that the authors refer to, instead of C2, C3 should be written.

      We do indeed see C3 labeled in the images, but also in a gad1-p65AD control. We thus cannot be sure if C3 indeed reflects the intersection pattern. However, the three lines shown in Figure 1d clearly also label C2, which is not seen in the control condition.

      (12) In line 184, a split-C2 line is used (and a split C3 as in Supplementary Figure 2). It would enhance the credibility of the work and even be appropriate afterwards to use the word "requirement" if this split-C2 line was used for behavioral experiments, as in Gohl et al., 2011, and Sillies et al.,2013 studies.

      We are indeed using the same split-C2 line for imaging and for behavioral experiments in Figure 7. We see Figure 1 (and with that, Silies et al. 2013) as a first pass screen, from which we obtained candidates, which we then more thoroughly tested throughout the remaining manuscript, with more specific lines. We are no longer using the word “requirement”

      (13) In lines 186-188, is DenMark used as a postsynaptic marker? If yes, an additional control would be the use of Discs-large (DLG) as a postsynaptic marker, as DenMark would not be restricted to postsynaptic densities.

      Yes, we used DenMark as written in the sentence “we expressed GFP-tagged Synaptotagmin (Syt::GFP) to label pre-synapses together with the dendritic marker DenMark (Nicolai et al., 2010)”. Since our claims about widespread C2 and C3 connectivity are further supported by connectomics, we did not use another postsynaptic marker.

      (14) In line 191, L2 is mentioned as presynaptic, whereas in Figure 2b is clearly postsynaptic.

      We write “This revealed that C2 forms several presynaptic contacts with the lamina neurons L5, L1, and L2” . L5, L1, and L2 are hence postsynaptic to C2, which is what is plotted in Figure 2b. 

      (15) In line 197, the "a" in "because a previous studies" should be removed, and these studies should be cited as the authors do in line 514.

      Done as suggested.

      (16) In line 1191, the figure title uses the term "required", whereas the plotted data suggest that T4 and T5 responses remain DS after C2&C3 silencing. Rephrasing to "C2 and C3 affect direction-selective.." would be better suited.

      We replaced “required” with “contribute to”

      (17) In the legend of Figure 2b, the "Counts of synapses" is misleading. The number plotted refers to the percentage of synapse counts from the target neuron.

      Corrected.

      (18) A general question about the C2 and C3 ON selectivity: How would the authors explain the OFF deficits from the published behavioral screening in Supplementary Figure 1a? Do the other InSITE neurons contribute to it? This needs to be further elaborated in the discussion.

      A neuron being ON selective does not imply that it is functionally required in the ON pathway only. In fact, Mi9, a major component of the ON pathway (even if not “required” under many stimulus conditions), is OFF selective.

      Furthermore, both we (Ramos-Traslosheros and Silies, 2021) and others (Salazar-Gatzimas et al. 2019) have shown that both ON and OFF signals are combined in ON and OFF pathways, which is further supported by connectomics data. We clarified the transition from physiology to function in the results section, as already explained above.

      (19) In line 216, the authors' image from layer M1, but the reasoning behind this choice is missing. The explanation gap intensifies after you proceed with further examining the layer-specific responses in Supplementary Figure 2. Is this because C2 and C3 receive their inputs in M1, as is insinuated in line 219?

      As Supplementary Figure 2 shows, we initially imaged from all layers of the medulla, where C2 arborizes. Because the response properties, including kinetics, weren’t different, we had no reason to believe that C2 is highly compartmentalized. We thus subsequently focused on layer M1, where amplitudes were highest. We clarified this in the text.

      (20) In line 229, it should be clear whether the STRFs come from M1 measurements. STRF analysis in M5, M8, and M9/10 also verifies that the C2, C3 multicolumnar span would further strengthen the results. Given the focus of the work in Mi1 and T4/T5, Mi1-C2 connections should be clarified in terms of which medulla layer they formulate. Additionally, the reasoning behind showing in Figure 3 STRFs from M1 measurements, even though Supplementary Figure 2b implies equal responses in M9/10, where also Tm1 and Tm4 output from C3, should be explained.

      We never recorded STRFs in the silenced condition and make no claims about C2 changing spatial properties of Mi1. We added the information that STRFs were recorded in layer M1 to the figure caption. We checked the specific connectivity of C2 and Mi1 and they indeed connect in M1 (Author response image 4), but regardless of this result, there is no evidence for compartmentalization in these columnar neurons.

      Author response image 4.

      Image of a C2 (blue) and Mi1 (yellow) neuron from EM Data (FAFB). Circles depict synapses from C2 to Mi1 in layer M1 of the medulla.

      (21) In Figure 3e, the statistical significance or lack thereof is not visible at the bar plot.

      Consistently throughout the manuscript, we now just indicate if a comparison is significant. If nothing is shown, it means that it is not.

      To clarify this, we added a sentence to the statistics section in the methods now saying: We show significant differences in figures using asterisks (p<0.05 *,p<0.01 **, p<0.001***). Non-significant differences are not further indicated.

      Please note that based on another reviewer comment, we also adapted the analysis of the kernels. This changed the statistics to be significant for the timing of the on peak response (Figure 3e).’

      (22) In line 249, it is mentioned that the strongest C2 connection is Mi1; this does not derive from the data shown in Figure 2b.

      We intended to look at medulla neurons, and Mi1 is the most connected medulla neuron to C2. We clarified that in the text, which now reads: “Because C2 emerged as a prominent candidate from the behavioral screen, we focused on C2 and asked how silencing C2 affects temporal and spatial filter properties of the medulla neurons that provide direct input to T4 neurons. We chose to test Mi1 as it is the medulla neuron most strongly connected to C2.”

      (23) The result section title "C2 & C3 neurons shape response properties of the ON pathway medulla neuron Mi1" does not include C3 results. This would be fundamental to have. As previously mentioned, the neural correlates of this inhibitory feedback loop should be clearly defined, and the current version of this work evades doing so.

      We corrected the title. As discussed elsewhere, it was not the goal of this study to work the specific contributions of C2 (and C3) to all neurons they connect to, but rather focus on the compound effect for motion detection.

      (24) In line 276, the following work should be cited: Maisak MS, Haag J, Ammer G, Serbe E, Meier M, Leonhardt A, Schilling T, Bahl A, Rubin GM, Nern A, Dickson BJ, Reiff DF, Hopp E, Borst A. A directional tuning map of Drosophila elementary motion detectors. Nature. 2013 Aug 8;500(7461):212-6. doi: 10.1038/nature12320.

      We added the citation.

      (25) In line 273, the title implies the investigation of the spatial filtering of T4 and T5 cells. This does not take place in the respective result section.

      We changed the title to: “C2 and C3 shape temporal and spatial response properties of T4 and T5 neurons.”

      (26) In line 280, Kir2.1 is used, whereas previously thermogenetic silencing with Shibirets was preferred; could the authors elaborate on this choice in the text, for example, genetic reasons?

      We generally prefer shibire[ts] because of its inducible nature. However, our T4/T5 recordings too included more stimuli (motion stimuli) than the Mi1 recordings, and the effect of shi[ts] mediated silencing by pre-heating the flies (as established by Joesch et al. 2010) was not longlasting enough for these experiments, which is why we used Kir2.1. In a previous set of experiments, we had tried incubating flies while imaging, but this induced too large movements of the brain and T4/T5 recordings were not stable enough.

      (27) In lines 290-291, T5 ON suppression is found to be affected by C2 silencing, but the bar plot in Figure 5b uses the OFF-step data. It would be best if the ON-step data for T5 cells were also plotted.

      ON-step data for T5 are plotted in Supplementary Fig. 3e

      (28) In line 288, "when C2 was also blocked", "also" should be included, as you are referring to double silencing.

      Sorry for the confusion, we called the wrong figure in that sentence. Here, we wanted to point at the increased response of T4 to the ON-step upon C2 silencing, which was quantified in Supplementary Fig. 3e.

      (29) In line 312, it is important to mention in the discussion why it is the case that C2 and not C3 had an effect on T5 DS responses. C2 outputs to Tm1, whereas C3 to Tm1 and Tm4, based on Figure 2b, with Tm1 and Tm4 being one of the four major cholinergic T5 inputs. Hence, it would be natural to think that C3 and not C2 would affect T5 responses.

      We addressed this in the discussion.

      (30) In lines 326-328, it is crucial to mention the neural correlates that connect C2 and C3 to T4 and T5. Additionally, the Shinomiya et al. (2019) study shows C3 to T4 connections, which are mentioned in the discussion and should be cited in line 429.

      We do not think that mentioning neural correlates at this point is crucial, as these sentences were concluding a paragraph in which we link C2/C3 silencing to T4/T5 responses. We also do not know the neural correlates (but for Mi1) so this would not be accurate.

      We have been mentioning C3 to T4 connection in both the results and discussion, and our analysis (Figure 2) stems from the FAFB dataset. We added citations to both results and discussion.

      (31) In Figure 6a, compared to Figure 3b, the term compass plots is used instead of polar plots. It would be best to use one consistent term. Additionally, in Figure 6c, it is not mentioned if the responses across genotypes are the outcome of averaging across subtype responses.

      These two plots are not the same; a compass plot is a sub-category of polar plots. Polar plots, as in Figure 3, show the response amplitude of the neurons to the different directions of motion. Instead, compass plots, as in Figure 6, show vectors that depict the tuning direction and the strength of tuning of individual neurons.

      We added the following sentence to clarify the calculation in Figure 6c: ‘To average responses of all neurons, the PD of each neuron was determined by its maximal response to one of 8 directions shown.'

      (32) In line 344, the title could be adjusted to "C2 is controlling the temporal dynamics of ON behavior", under the same reasoning of 'requirements' explained before.

      We think that “is controlling” is a stronger claim than “being required”. For a geneticist, the word “required” simply means that there is a(ny) loss of function phenotype, i.e., a reduction in DS when C2 and C3 are silenced/blocked. Many neurons are sufficient but not required to induce a certain behavior (i.e., they can induce a behavior when ectopically activated, but show no significant loss of function phenotype). We therefore consider it remarkable that C2 and C3 silencing indeed shows a significant reduction in DS.

      However, we do not want to overclaim anything, and the title now reads: “T4 tunes the temporal dynamics of ON behavior”

      (33) In Figure 7c, the plot legend should be "deceleration".

      Corrected

      (34) In line 424, the Braun et al. (2023) experiments were performed in T5 cells as previously mentioned.

      Corrected

      (35) In line 435, the authors mention that both ON-selective C2 and C3 neurons act partially in parallel pathways. In Figure 2b, the upstream circuitry between C2 and C3 is identical. How would they explain the functional-connectivity contradiction?

      In terms of acting in parallel pathways, downstream, not upstream, connectivity of C2 and C3 will matter, which is not identical. C2 for example connects to Mi1, L1, and L4, whereas C3 does not. On the other hand, C3 connects to Mi9 and Tm4, which C2 does not.

      (36) In lines 445-447, the authors address C2 and C3 neurons as columnar, whereas they previously showed in Figure 3 that they are multicolumnar.

      Here, we refer to the nomenclature of Nern et al, that use the term “columnar” whenever something is present in each column. We specifically define this by saying “only 15 cells are truly columnar in the sense that they are present once per column and present in each column”. In the results section, we instead talk about “functionally multicolumnar” and changed a sentence in the discussion to say “The spatial receptive fields of C2 and C3 are consistent with the multicolumnar branching of their projections in the medulla” to avoid any such confusion.

      (37) In line 448, "thus" is repetitive, and the extracted view in line 449 does not contribute to the essence of the study.

      Fixed.

      (38) In line 459, the authors refer to inhibition inheritance; this term should be used frequently in the text in case the neural correlates between C2 & C3 and T4 & T5 are not deciphered.

      We think this point is very clear throughout the manuscript now. As one prominent example, we added a sentence to the first paragraph of the discussion saying “Given the widespread connectivity of C2 and C3 to neurons upstream of T4/T5, this effect [on DS tuning] is likely inherited from upstream neurons of T4/T5.”

      (39) In line 521, the transition between sentences is problematic.

      Corrected

      (40) For Supplementary Figure 1, why were the ON-motion deficits not addressed with the antibody approach used for Supplementary Figure 1a?

      The approach using anti-GABA stainings turned out to be largely redundant with the intersectional strategy. Furthermore, the intersectional strategy provided the full morphology of the cell and, hence, led to easier identification of the cell types involved.

      (41) In line 1169, C2 is mentioned, whereas C3 is annotated in the figure.

      Corrected

      (42) A general comment is that Tm1 inputs could be a good candidate for assessing T5 inputs, as performed for Mi1-T4 in Fig.4. Such experiments would enhance the understanding of inhibitory inheritance to T5 responses.

      We fully agree.

      (42) Do the authors have any indication or experiments done regarding the C2&C3 role in T4&T5 velocity tuning? This would be complementary to the direction of this study.

      This is a good idea, that we had tried. However, we did not see a difference between control and C2 silencing for the temporal frequency tuning of T4/T5. As velocity is closely related to temporal frequency tuning, we would not expect to see a difference there either.

      While it would have been nice to be able to draw such a link, we would also state that our behavioral data are a bit different: We did not look at temporal frequency tuning per se, and overall, it is not well understood how responses in T4/T5 relate to behavior, as they for example have different frequency tunings (T4/T5 physiology: Maisak et al., 2013, Arenz et al., 2017; optomotor behaviour: Strother et al.,2017, Clark et al., 2013). 

      (43) As a suggestion, Figure 7 would be better positioned as Figure 4, right after the ON-selectivity finding of C2 neurons.

      We preferred to keep the current order.

      Reviewer #3 (Recommendations for the authors):

      Main recommendation:

      It would be useful to propose a neural circuit model that connects the various observations. One can draw here on the many circuit models for motion vision in the prior literature.

      (1) How might the extended response in upstream neurons Mi1 lead to the inappropriate nulldirection responses in T4/T5?

      This is a good question and we can only speculate. Mi1 responses are enhanced upon C2 silencing and T4 responses to full field flash responses are also enhanced. Likely, these motionindependent responses are also seen when the edge travels into the non-preferred direction, whereas this non-motion response would likely be masked by the motion response to the preferred direction. The phenotype seen in T5 is likely inherited from medulla neurons, e.g. Tm1, to which C2 connects. How the delay of the Mi1 response upon C2 silencing may specifically affect ND responses, we don’t know. 

      (2) How is the loss of DS in T4/T5 compatible with the continued sensitivity to motion in the turning response? Perhaps the signal from 180-degree oppositely tuned T-cells gets subtracted, so as to remove the baseline activity?

      This is a great question that we cannot answer. Overall, perturbations that affect T4/T5 physiology do not necessarily manifest in equivalent phenotypes when looking at behavioral turning responses. Prominent examples come from silencing core neurons of motion-detection circuits, such as Mi1 and Tm3 (see Figure 4, Strother et al. 2017).

      (3) How do the altered dynamics in upstream neurons relate to the loss of high-frequency discrimination in the behavior? One would want to explain why the normal fly has a pronounced decay in the response even though the motion is still ongoing (Figure 7b left, starting at 0.4 s). That decay is missing in the mutant response.

      That is an excellent question that we unfortunately do not have an answer for. Please note that our visual stimuli is a single edge which is sweeping across the eye, and which might not elicit equally strong responses at each position of the eye, or each time during the stimulus presentation.

      In terms of linking the dynamics of upstream neurons to behavior, we already pointed out above that it is not well understood how responses in T4/T5 relate to behavior, as they for example have different frequency tuning, with T4/T5 neurons being tuned to lower temporal frequencies than the turning behavior of a fly walking on a ball (T4/T5 physiology: Maisak et al., 2013, Arenz et al., 2017; optomotor behaviour: Strother et al.,2017, Clark et al., 2013).

      Other recommendations:

      (1) Abstract line 37 "At the behavioral level, feedback inhibition temporally sharpens responses to ON stimuli, enhancing the fly's ability to discriminate visual stimuli that occur in quick succession." It may be worth specifying *moving* stimuli.

      Done as suggested

      (2) Line 52: "The functional significance of feedback neurons, particularly inhibitory feedback mechanisms, in early visual processing is not understood." This seems overly negative. Subsequent text mentions a number of such instances that are understood, and one could add more from the retina.

      We agree. We rephrased to say ‘motion vision’ and added more examples of known roles of feedback inhibition

      (3) Line 69: "inhibitory feedback signals from horizontal cells and amacrine cells to photoreceptors and bipolar cells, respectively, are involved in multiple mechanisms of retinal processing, including global light adaptation, spatial frequency tuning, or the center-surround organization (Diamond 2017)." Maybe add the proven role in temporal sharpening of responses, which is of relevance to the present report.

      We added temporal sharpening to that introduction point.

      (4) Figure 1: The text for this figure talks about behavioral motion detection deficits in various lines. Maybe add an example of the behavioral effects to this figure.

      We added a summary plot of the behavioral screen data to Supplementary figure 1.

      (5) Line 325: "the timing of the ON peak tended to be slower for C3 compared to C2 for both the vertical and the horizontal STRF": It's hard to see evidence for that in the data.

      Based on your next comment we reanalysed the kernels of C2 and C3. This resulted in a significant difference in peak timing between C2 and C3. 

      (6) When presenting kernels as in Figure 3d and Figure 4b, extend the time axis to positive times until the kernel goes to zero. This "prediction of future stimuli" allows the reader to see the degree of correlation within the stimulus, which affects how one interprets the shape of the kernel. Also, plotting the entire peak gives a better assessment of whether there are any shape differences between conditions. An alternative is to compute the kernel via deconvolution, which gets closer to the actual causal kernel, but that procedure tends to highlight high-frequency noise in the measurement.

      We replotted the kernels in Figure 3d and 4b to show positive times. The kernels of C2 and C3 stayed at a positive level. Going back through the data we found a severe decrease in GCaMP signal in the first 2 seconds of the recording. We reanalyzed the kernels by ignoring the first seconds. All kernels now go back to zero. The shape of the kernels did not change but we now find a significant difference in peak timing between C2 and C3. Thank you for pointing this out.

      (7) Line 280 "simultaneously blocked C2 and C3 using Kir2.1": First use of that acronym. Please explain what the method is.

      We now explain “we simultaneously blocked C2 and C3 by overexpression of the inwardrectifying potassium channel Kir2.1”

      (8) Line 350 "temporal dynamics for C2 silencing": suggests "dynamics of silencing"; maybe better "response dynamics during C2 silencing".

      Edited as suggested

      (9) Figure 7: Explain the details of the stimulus containing two subsequent on edges. What happens between one edge and the next? Does the screen switch back to black? Or does the second edge ride on top of the final level of the first edge? This matters for interpreting the response.

      Yes, the screen turns dark between subsequent edge presentations. We added a sentence to the methods to clarify that. 

      (10) Line 402 "novel, critical components of motion computation.": This seems exaggerated. At the behavioral level, motion computation is mostly unaffected, except for some details of time resolution. Whether those matter for the fly's life is unclear.

      We deleted the word ‘critical.’

      (11) Line 413 "GABAergic inhibition required for motion detection is mediated by C2 and C3": Again, this seems exaggerated. Motion *detection* appears to work fine, but the *discrimination* of two closely successive motion stimuli is affected. The rest of the text does properly distinguish "discrimination" from "detection".

      We changed the title to say: ‘GABAergic inhibition in motion detection is mediated by C2 and C3.’

      (12) Line 489 "Whereas the role of C2 and C3 for the OFF pathway may be more generally to suppress neuronal activity,": Unclear to what this refers. The present report emphasizes that there is no effect on OFF activity (Figure 5).

      We did not see an effect of T5 responses to OFF flashes as shown in Figure 5 but we found a significant reduction of DS when silencing C2, as well as slightly overall increased responses to all directions for C2 and C3 silencing, which was significant for null directions when silencing C2. This is shown in Figure 6.

      Typos:

      (1) Line 521.

      Fixed

      (2) Line 1170: context of the citation unclear.

      Fixed

    1. eLife Assessment

      This is a solid paper on intermittent fasting that will be of interest to readers. The data presented are certainly valuable as a resource. The findings of both shared and tissue-specific signatures, both at the proteomic and transcriptomic levels, align well with what has been established and bring new insight into metabolic adaptation and its consequences in muscle, cortex, and liver. The organ specific changes unveiled by proteomics in response to IF reveal unique rewiring of metabolic, signaling and physiological function.

    2. Reviewer #1 (Public review):

      Summary:

      In this study, authors employed comprehensive proteomics and transcriptomics analysis to investigate the systemic and organ-specific adaptations to IF in male and they found that shared biological signaling processes were identified across tissues, suggesting unifying mechanisms linking metabolic changes to cellular communication, which reveal both conserved and tissue-specific responses by which IF may optimize energy utilization, enhance metabolic flexibility, and promote cellular.

      Strengths:

      This study detected multiple organs including liver, brain and muscle and revealed both conserved and tissue-specific responses to IF.

      Weaknesses:

      (1) Why did the authors choose liver, brain and muscle but not other organs such as heart and kidney? The latter are proven to be the large consumer of ketones, which is also changed in the IF treatment of this study.

      (2) The proteomics and transcriptomics analysis were only performed at 4 months. However, a strong correlation between IF and the molecular adaptions should be time points-dependent.

      (3) The context lack section of "discussion", which shows the significance and weakness of the study.

      (4) There is no confirmation for the proteomic and transcriptomic profiling. For example, the important changes in proteomics could be further identified by a Western blot.

    3. Reviewer #2 (Public review):

      Summary:

      Fan and colleagues measure proteomics and transcriptomics in 3 organs (liver, skeletal muscle, cerebral cortex) from male C57BL/6 mice to investigate whether intermittent fasting (IF; 16h daily fasting over 4 months) produces systemic and organ-specific adaptations.

      They find shared signaling pathways, certain metabolic changes and organ-specific responses that suggest IF might affect energy utilization, metabolic flexibility while promoting resilience at the cellular level.

      Strengths:

      The fact that there are 3 organs and 2 -omics approaches is a strength of this study.

      Weaknesses:

      Poor figures presentation and knowledge of the literature. One sex (male).

      On resubmission the Authors' decision to discriminate the organ-specific from the organ-shared effects of intermittent fasting (IF) also enabled them to more precisely determine the lack of correspondence between transcriptomics and proteomics, i.e., not all transcripts lead to protein translation.

    4. Reviewer #3 (Public review):

      Summary:

      Fan et al utilize large omics data sets to give an overview of proteomic and gene expression changes after 4 moths of intermittent fasting (IF) in liver, muscle and brain tissue. They describe common and district pathways altered under IF across tissues using different analysis approaches. Main conclusions presented are the variability in responses across tissues with IF. Some common pathways were observed, but there were notable distinctions between tissues.

      Strengths:

      (1) The IF study was well conducted and ran out to 4 months which was a nice long-term design.

      (2) The multi omics approach was solid and additional integrative analysis was complementary to the illustrate the differential pathways and interactions across tissues.

      (3) The authors did not over-step their conclusions and imply an overreached mechanism.

      Weaknesses:

      The weaknesses, which are minor, include use of only male mice and the early start (6 weeks) of the IF treatment. However, the authors have provided justification on why they chose male mice and the time points used in the study.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this study, the authors employed comprehensive proteomics and transcriptomics analysis to investigate the systemic and organ-specific adaptations to IF in males. They found that shared biological signaling processes were identified across tissues, suggesting unifying mechanisms linking metabolic changes to cellular communication, which revealed both conserved and tissue-specific responses by which IF may optimize energy utilization, enhance metabolic flexibility, and promote cellular resilience.

      Strengths:

      This study detected multiple organs, including the liver, brain, and muscle, and revealed both conserved and tissue-specific responses to IF.

      We appreciate the recognition of the study’s strengths and the opportunity to clarify the points raised.

      Weaknesses:

      (1) Why did the authors choose the liver, brain, and muscle, but not other organs such as the heart and kidney? The latter are proven to be the largest consumers of ketones, which is also changed in the IF treatment of this study.

      We agree that the heart and kidney are critical organs in ketone metabolism. Our selection of the liver, brain, and muscle was guided by their distinct metabolic functions and relevance to systemic energy balance, neuroplasticity, and locomotor activity, key domains influenced by intermittent fasting (IF). These tissues also offer complementary perspectives on central and peripheral adaptations to IF. Notably, we have previously examined the effects of IF on the heart (eLife 12:RP89214), and we fully acknowledge the importance of the kidney. We intend to include it in future studies to broaden the scope and deepen our understanding of IF-induced systemic responses.

      (2) The proteomics and transcriptomics analyses were only performed at 4 months. However, a strong correlation between IF and the molecular adaptations should be time point-dependent.

      We appreciate this insightful comment. The 4-month time point was selected to capture long-term adaptations to IF, beyond acute or transitional effects. While we acknowledge that molecular responses to IF are time-dependent, our goal in this study was to establish a foundational understanding of sustained systemic and tissue-specific changes. We fully agree that a longitudinal approach would provide deeper insights into the temporal dynamics of IF-induced adaptations. To address this, we are currently undertaking a comprehensive 2-year study that is specifically designed to explore these time-dependent effects in greater detail.

      (3) The context lacks a "discussion" section, which would detail the significance and weaknesses of the study.

      We appreciate this observation. The manuscript was originally structured to emphasize results and interpretation within each section, but we recognize that a dedicated discussion section would enhance clarity and contextual depth. In the revised version, we will add a comprehensive discussion section addressing broader implications, limitations, and future directions of the study.

      (4) There is no confirmation for the proteomic and transcriptomic profiling. For example, the important changes in proteomics could be further identified by a Western blot. 

      We acknowledge the importance of orthogonal validation to support high-throughput findings. While our study primarily focused on uncovering systemic patterns through proteomic and transcriptomic profiling, we agree that targeted confirmation would strengthen the conclusions. To this end, we have included immunohistochemical validation of a key protein common to all three organs— Serpin A1C. Additionally, we are planning a dedicated follow-up study to expand functional validation of several key proteins identified in this manuscript, which will be pursued as a separate project.

      Reviewer #2 (Public review):

      Summary:

      Fan and colleagues measure proteomics and transcriptomics in 3 organs (liver, skeletal muscle, cerebral cortex) from male C57BL/6 mice to investigate whether intermittent fasting (IF; 16h daily fasting over 4 months) produces systemic and organ-specific adaptations. 

      They find shared signaling pathways, certain metabolic changes, and organ-specific responses that suggest IF might affect energy utilization, metabolic flexibility, while promoting resilience at the cellular level.

      Strengths:

      The fact that there are 3 organs and 2 -omics approaches is a strength of this study. 

      We appreciate the reviewer’s recognition of the breadth of our study design. By integrating proteomics and transcriptomics across three metabolically distinct organs, we aimed to provide a comprehensive view of systemic and tissue-specific adaptations to IF. This multi-organ, multi-omics approach was central to uncovering both conserved and divergent biological responses.

      Weaknesses:

      (1) The analytical approach of the data generated by the present study is not well posed, because it doesn't help to answer key questions implicit in the experimental design. Consequently, the paper, as it is for now, reads as a mere description of results and not a response to specific questions.

      We thank the reviewer for this important observation. Our initial aim was to establish a foundational atlas of molecular changes induced by IF across key organs. However, we recognize that clearer framing of the biological questions would enhance interpretability. In the revised manuscript, we will have restructured the introduction, results, and discussion to align more explicitly with specific hypotheses, particularly those related to energy metabolism, cellular resilience, and inter-organ signaling. We have also added targeted analyses and clarified how each dataset contributes to answering these questions.

      (2) The presentation of the figures, the knowledge of the literature, and the inclusion of only one sex (male) are all weaknesses.

      We appreciate this feedback and agree that these are important considerations. Regarding figure presentation, we will revise several figures for improved clarity, add more descriptive legends, and reorganize supplemental materials to better support the main findings. On the literature front, we will expand the discussion to include recent and relevant studies on IF, metabolic adaptation, and sex-specific responses. As for the use of only male mice, this was a deliberate choice to reduce hormonal variability and focus on establishing baseline molecular responses. We fully acknowledge the importance of sex as a biological variable and will soon be conducting studies in female mice to address this gap.

      Reviewer #3 (Public review):

      Summary:

      Fan et al utilize large omics data sets to give an overview of proteomic and gene expression changes after 4 months of intermittent fasting (IF) in liver, muscle, and brain tissue. They describe common and distinct pathways altered under IF across tissues using different analysis approaches. The main conclusions presented are the variability in responses across tissues with IF. Some common pathways were observed, but there were notable distinctions between tissues.

      Strengths:

      (1) The IF study was well conducted and ran out to 4 months, which was a nice long-term design.

      (2) The multiomics approach was solid, and additional integrative analysis was complementary to illustrate the differential pathways and interactions across tissues. 

      (3) The authors did not overstep their conclusions and imply an overreached mechanism.

      We sincerely thank the reviewer for acknowledging the strengths of our study design and analytical approach. We aimed to strike a careful balance between comprehensive data generation and cautious interpretation, and we appreciate the recognition that our conclusions were appropriately framed within the scope of the data.

      Weaknesses:

      The weaknesses, which are minor, include the use of only male mice and the early start (6 weeks) of the IF treatment. See specifics in the recommendations section.

      We appreciate the reviewer’s thoughtful comments. The decision to use male mice and initiate IF at 6 weeks was based on minimizing hormonal variability and capturing early adult metabolic programming. We acknowledge that sex and developmental timing are important biological variables. To address this, we are conducting parallel studies in female mice and evaluating IF initiated at later life stages. These follow-up investigations will help determine the extent to which sex and timing influence the molecular and physiological outcomes of IF.

      Recommendations for the authors:

      Reviewing Editor Comments:

      The editor suggested addressing points regarding the young age at diet onset, use of males only, and justification for the choice of tissues analyzed without requiring new data generation.

      We agree that these are important points for context. We have now added a dedicated paragraph to the Discussion section (page 22) to explicitly acknowledge and discuss these as limitations of our study. We justify our initial experimental design choices in the context of the existing literature while acknowledging the valuable insights that studies in females and with different diet onset timings would provide.

      The editor and reviewers recommended a more integrative analysis, suggesting the use of freely available tools, and a deeper discussion to frame the work against the existing literature.

      We thank the editor for this excellent suggestion. In response to this and the detailed points from Reviewer #2, we have performed a new, integrated multi-omics analysis using Latent variable approaches (DIABLO), implemented in the mixOmics R package version 6.28.0 tool, a state-of-the-art, freely available package for integrative multi-omics analysis. This new analysis, presented in a new Figure 4 and described in the Results section (pages 20-23), identifies the key sources of variation across tissues and omics layers, directly addressing the request for a true integrative approach. Furthermore, we have thoroughly revised the Results and Discussion to more sharply frame our findings and highlight the new insights gleaned from our study.

      The editor requested clarification on whether mice were fasted at euthanasia and to rephrase the statement on page 12 regarding mitochondrial pathways.

      - We have clarified in the Methods section (page 4) that mice were euthanized at the end of their fasting period, precisely detailing the stage of the IF cycle.

      - We thank the editor for this critical correction. We have rephrased the statement on page 12 to more accurately reflect that we observed a lower abundance of proteins involved in mitochondrial oxidative pathways, and we now carefully discuss the important distinction between protein abundance and functional activity in this context.

      The editor noted that the introduction is missing key citations and should acknowledge foundational work.

      We apologize for this oversight. We have now revised the Introduction to include several key foundational citations that were previously missing, ensuring proper credit to the important work of our colleagues.

      Reviewer #2 (Recommendations for the authors):

      We thank the reviewer for their exceptionally detailed and helpful technical suggestions, which have greatly improved the analytical rigor of our manuscript.

      (1) & (4) 3D PCA and Integrated Multi-Omics Analysis:

      We agree with the reviewer that a more sophisticated integrative analysis was needed. As detailed in our response to the editor, we have replaced the original side-by-side analysis with a proper integrated multi-omics analysis using Latent variable approaches (DIABLO), implemented in the mixOmics R package version 6.28.0 tool. This new analysis simultaneously models the proteomic and transcriptomic data from all three organs, identifying shared and tissue-specific sources of variation. This directly and more powerfully validates our claim of "conserved and tissue-specific responses." The results of this analysis are now central to our revised Results section and Figure 4 and supplementary figures (PCA analysis). 

      (2) Concordance/Discordance Analysis:

      This is an excellent point. We have now performed a comprehensive analysis of transcript-protein concordance for the differentially expressed molecules in each tissue. A new figure 4 summarizes these findings, and we discuss the biological implications of both concordant and discordant pairs in the Results section.

      (3) Organ-Specific Functional Remodeling:

      We have taken this advice to heart. The new analysis inherently addresses whether the functional remodeling is shared or tissue-specific. 

      (5) Missing Citations:

      We have thoroughly reviewed the literature and added key citations throughout the manuscript, particularly in the Introduction and Discussion, to properly situate our work within the field.

      (6) Starting Results with Supplementary Data:

      As the study design, including the timing of experimental interventions and blood and tissue collections, is summarized in the supplementary figures, the Results and Discussion section begins with those figures. However, we have now renamed the figures according to the eLife style, in which supplementary figures are linked to the main figures. This ensures a more logical and coherent flow.

      (7) Figure Presentation and Explanation:

      We have completely revised all figures to improve their clarity, consistency, and professional appearance. We have also carefully gone through the manuscript to ensure that every panel in every figure is explicitly mentioned and explained in the main text.

      Reviewer #3 (Recommendations for the authors):

      We thank the reviewer for their important comments regarding the model system.

      (1) Sex Differences and Limitations:

      We fully agree that studying sex differences is a critical and profound aspect of dietary interventions. As noted in our response to the editor, we have added a paragraph to the Discussion to explicitly acknowledge this as a key limitation of our current study. We discuss the existing evidence for sex-specific responses to IF and state that this is an essential direction for future research.

      (2) Early Diet Onset and Developmental Programs:

      This is a valuable point. We have added text to the Discussion acknowledging that starting IF at 6 weeks of age could potentially interact with developmental programs. We discuss this as a consideration for interpreting our data and for the design of future studies.

      We believe that our revised manuscript is substantially stronger as a result of addressing these comments. We are grateful for the opportunity to improve our work and hope that you and the reviewers find these responses and revisions satisfactory.

    1. eLife Assessment

      This useful and interesting study provides evidence that EABR mRNA is at least as effective as standard S mRNA vaccines for SARS-CoV-2. The authors provide convincing justification for the conclusion that the inconsistent statistical significance for Omicron is likely due to immune imprinting or original antigenic sin. In this regard, the significance of the findings is stronger as it points to possible challenges for updated vaccine strategies in overcoming immune imprinting.

    2. Reviewer #1 (Public review):

      Summary:

      This study investigated the immunogenicity of a novel bivalent EABR mRNA vaccine for SARS-CoV-2 that expresses enveloped virus-like particles in pre-immune mice as a model for boosting the population that is already pre-immune to SARS-CoV-2. The study builds on promising data showing a monovalent EABR mRNA vaccine induced substantially higher antibody responses than a standard S mRNA vaccine in naïve mice. In pre-immune mice, the EABR booster increased the breadth and magnitude of the antibody response, but for Omicron, the effects were modest and often not statistically significant. The authors provide compelling evidence to support this may be due to immune imprinting.

      This study also builds on prior work with additional experiments to elucidate the mechanisms that contributed to the EABR increased immunogenicity in naive mice including evidence that the vaccine is inducing responses to more RBD epitopes and a potential role for heterodimer formation as a mechanism whereby bivalent vaccines induce cross-reactive B cell responses.

      Strengths:

      Evaluating a novel SARS-CoV-2 vaccine that was substantially superior in naive mice in pre-immune mice as a model for its potential in the pre-immune population.

      Providing insight into a possible role of immune imprinting in shaping immune responses to updated booster immunizations.

      Minor weaknesses:

      (1) Overall, immune responses against Omicron variants were substantially lower than against the ancestral Wu-1 strain that the mice were primed with. The authors speculate this is evidence of immune imprinting. While parallel controls (mice immunized 3 times with just the bivalent EABR vaccine) were not tested, the authors point to prior published work showing Omicron S antigen is a strong immunogen. This indicates the lower immune responses to Omicron are likely due to immune imprinting (or original antigenic sin) and not due to S immunogen being inherently less immunogenic than the S protein from the ancestral Wu-1 strain.

      (2) The authors reported statistically significant increase in antibody responses with the bivalent EABR vaccine booster when compared to the monovalent S mRNA vaccine but consistently failed to show significantly higher responses when compared to the bi-valent S mRNA vaccine suggesting that in pre-immune mice, the EABR vaccine has no apparent advantage over the bivalent S mRNA vaccine which is the current standard. There were, however, some trends indicating the group sizes were insufficiently powered to see a difference. The discussion acknowledges these limitations of their studies and potential limited benefits of the EABR strategy in pre-immune mice vs standard bivalent mRNA vaccine.

      (3) The EABR S mRNA vaccine was superior to the conventional mRNA S vaccine in naïve mice but not in pre-immune mice. The authors expanded the discussion to propose a possible role for immune imprinting in this result which is supported by the data.

    3. Reviewer #3 (Public review):

      Summary:

      The authors evaluated a novel bivalent (Wu1/BA.5 based) mRNA platform that uses the EABR strategy to produce enveloped virus-like particles for vaccination. These were tested as boosters in the context of pre-existing immunity in mice that received two prior immunizations with conventional Wu1 mRNA vaccines. The animal experimental timeline aimed at mimicking the vaccinations/booster schedule implemented during the COVID-19 pandemia. The authors tested and compared different booster strategies: (1) conventional Wu1 S protein encoding mRNA vaccine, (2) EABR Wu1 S protein encoding mRNA vaccine that produces enveloped virus-like particles, (3) conventional Wu1/BA.5 S protein encoding mRNA vaccine, and (4) EABR Wu1/BA.5 S protein encoding mRNA vaccine that produces enveloped virus-like particles. The EABR approach (monovalent or bivalent) enhanced the antibody response against Wu1 and Omicron subvariants. Interestingly, the bivalent EABR Wu1/BA.5 mRNA (strategy 4) generated polyclonal sera targeting multiple receptor-binding domain epitopes: these sera were more diverse than those generated with the other tested booster strategies (1 to 3).

      Strengths:

      The monovalent Wu1 S-EABR mRNA booster led to increase in antibody binding to tested Omicron variants (BA.5, BQ.1.1, XBB.1), while the bivalent Wu1/BA.5 S-EABR mRNA booster led to the highest Ab response against Omicron variants (BA.5, BQ.1.1, XBB.1) in pre-vaccinated mice.

      Neutralization assays showed that the monovalent Wu1 S-EABR mRNA booster had the highest Wu1 neutralization activity and to a lesser extent the early BA.1 early Omicron variant. The monovalent Wu1 S-EABR mRNA booster and bivalent Wu1/BA.5 S-EABR mRNA booster had similar BA.5 neutralizing activity. Neutralizing activity of the different boosters was less pronounced with later Omicron variants BQ.1.1 and XBB.1. However, of the different boosters tested, the bivalent Wu1/BA.5 S-EABR mRNA booster induced the highest neutralizing titers. These results support that the EABR mRNA vaccine strategy helps improve neutralizing activity against different tested Omicron subvariants: a few (1 or 2) mRNA constructs expressing major antigens in enveloped virus-like particles likely provide a novel strategy to elicit an immune response that has the potential to neutralize subsequent variants.

      The EABR enveloped virus-like particle strategy induces a more diverse antibody response, including epitopes not recognized by the other booster strategies: these new epitopes could play a role in neutralizing activity against new future variants.

      Moreover, the bivalent Wu1/BA.5 S-EABR mRNA booster could potentially produce heterotrimeric S proteins to help activation of cross-reactive B cells and increase polyclass antibody responses.

      Weaknesses:

      When it comes to later Omicron variants (BQ.1.1 and XBB.1), there is a discrepancy between epitope binding response and neutralization titers: only a few binding antibodies have neutralizing activity with these later variants, showing a limitation of the EABR strategy.

      The authors showed that the EABR mRNA strategy represents a novel antigen exposing strategy where antigens are produced at the cell surface and also at the surface of enveloped virus-like particles. This allows the production of novel antigens in addition to those that would be typically generated against cell surface exposed antigens. These novel antigens targeting new epitopes could potentially have neutralizing activity.

      Using a bivalent EABR mRNA booster led to higher antibody titers and higher neutralizing activity. The challenge is to select the best antigen target/variant to support neutralizing activity against later virus variants.

    4. Author Response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      This report provides useful evidence that EABR mRNA is at least as effective as standard S mRNA vaccines for the SARS-CoV-2 booster vaccine. Although the methodology and the experimental approaches are solid, the inconsistent statistical significance throughout the study presents limitations in interpreting the results. Also, the absence of results showing possible mechanisms underlying the lack of benefit with EABR in the pre-immune makes the findings mostly observational.

      Thank you for your assessment of our study. Respectfully, we do not agree that our study shows a lack of benefit of using the EABR approach. For the monovalent boosters, the S-EABR mRNA booster improved neutralizing antibody titers by 3.4-fold against BA.1 (p = 0.03; Fig. S5) and 4.8-fold against BA.5 (failed to reach statistical significance; Fig. 3B) compared to the regular S mRNA booster, which is consistent with the findings from our prior study in naïve mice. In addition, the bivalent S-EABR booster consistently elicited the highest neutralizing titers against all tested variants, including significantly higher titers against BA.5 and BQ.1.1 than the monovalent S booster. The bivalent S-EABR booster also induced detectable neutralization activity in a larger number of mice than all other boosters.

      Consistent with this analysis, please note that reviewers 1 and 2 commented that “the EABR booster increased the breadth and magnitude of the antibody response, but the effects were modest and often not statistically significant” (reviewer 1) and “the authors found that across both monovalent and bivalent designs, the EABR antigens had improved antibody titers than conventional antigens, although they observed dampened titers against Omicron variants, likely due to immune imprinting” (reviewer 2).

      We agree with the reviewers’ assessment that the EABR booster-mediated improvements were mostly modest, in particular against the BQ.1.1 and XBB.1 strains. We also acknowledge that the improvements in titers did not reach statistical significance in many cases, which we believe could have been addressed by adding more animals to our cohorts. Unfortunately, that would have been prohibitively expensive and time-consuming given that we already included 10 mice per group, which is standard practice in the vaccine field.

      Finally, we also wish to point out that we did include experiments that addressed potential mechanistic differences between booster groups. For example, we conducted deep mutational scanning studies to determine polyclonal antibody epitope mapping profiles, showing that bivalent S-EABR boosters induced more balanced targeting of multiple RBD epitopes, which likely contributed to the observed improvements in neutralization. Our work also included cryo-EM studies demonstrating that bivalent S mRNA boosters promote heterotrimer formation, which could potentially drive preferential stimulation of cross-reactive B cells via intra-spike crosslinking. This represents a potential mechanism explaining how bivalent boosters outperformed monovalent boosters in our and many prior studies, which warrants further investigation. Finally, we also performed serum depletion assays, showing that the BA.5 neutralizing activity elicited by the bivalent Wu1/BA.5 S and S-EABR mRNA boosters was primarily driven by cross-neutralizing Abs induced by the primary vaccination series.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This study investigated the immunogenicity of a novel bivalent EABR mRNA vaccine for SARS-CoV-2 that expresses enveloped virus-like particles in pre-immune mice as a model for boosting the population that is already pre-immune to SARS-CoV-2. The study builds on promising data showing a monovalent EABR mRNA vaccine induced substantially higher antibody responses than a standard S mRNA vaccine in naïve mice. In pre-immune mice, the EABR booster increased the breadth and magnitude of the antibody response, but the effects were modest and often not statistically significant.

      We thank the reviewer for their accurate summary of our study. Please see our comments to the reviewer’s individual points below, as well as our responses to the editor’s assessment above.

      Strengths:

      Evaluating a novel SARS-CoV-2 vaccine that was substantially superior in naive mice in pre-immune mice as a model for its potential in the pre-immune population.

      Weaknesses:

      (1) Overall, immune responses against Omicron variants were substantially lower than against the ancestral Wu-1 strain that the mice were primed with. The authors speculate this is evidence of immune imprinting, but don't have the appropriate controls (mice immunized 3 times with just the bivalent EABR vaccine) to discern this. Without this control, it's not clear if the lower immune responses to Omicron are due to immune imprinting (or original antigenic sin) or because the Omicron S immunogen is just inherently more poorly immunogenic than the S protein from the ancestral Wu-1 strain.

      The reviewer raises an important point, and we agree that including additional groups receiving three immunizations with the bivalent spike and/or spike-EABR mRNA vaccines would have improved the experimental design. However, we believe that several prior studies have already demonstrated that Omicron S immunogens are not inherently poorly immunogenic compared to the ancestral S; e.g., Scheaffer et al., Nat Med (2022); Ying et al., Cell (2022); Muik et al., Sci Immunol (2022). Based on these prior reports, we conclude that the lower neutralizing titers against Omicron variants in our study are most likely driven by immune imprinting as a result of the initial vaccination series with the ancestral S immunogen.

      (2) The authors reported a statistically significant increase in antibody responses with the bivalent EABR vaccine booster when compared to the monovalent S mRNA vaccine, but consistently failed to show significantly higher responses when compared to the bivalent S mRNA vaccine, suggesting that in pre-immune mice, the EABR vaccine has no apparent advantage over the bivalent S mRNA vaccine which is the current standard. There were, however, some trends indicating the group sizes were insufficiently powered to see a difference. This is mostly glossed over throughout the manuscript. The discussion section needs to better acknowledge these limitations of their studies and the limited benefits of the EABR strategy in pre-immune mice vs the standard bivalent mRNA vaccine.

      We acknowledge that the improvements in titers did not reach statistical significance in many cases, which we believe could have been addressed by adding more animals to our cohorts. Unfortunately, that would have been prohibitively expensive and timeconsuming given that we already included 10 mice per group, which is standard practice in the vaccine field. We added a “Limitations of the study” section at the end of the discussion to address all of these points in detail (lines 570-598 in the revised version).

      (3) The discussion would benefit from additional explanation about why they think the EABR S mRNA vaccine was substantially superior in naïve mice vs the standard S mRNA vaccine in their previously published work, but here, there is not much difference in pre-immune mice.

      As we pointed out in our response to the editor’s assessment above, the monovalent SEABR mRNA booster improved neutralizing antibody titers by 3.4-fold against BA.1 (p = 0.03; Fig. S5) and 4.8-fold against BA.5 (failed to reach statistical significance; Fig. 3B) compared to the conventional monovalent S mRNA booster, which is largely consistent with the findings from our prior study in naïve mice. Although the bivalent S-EABR mRNA booster consistently elicited higher neutralizing titers than the conventional bivalent S mRNA booster, we agree with the reviewer that these improvements were modest and not statistically significant. Overall, neutralizing activity against later Omicron variants, such as BQ.1.1 and XBB.1 was low. We attributed this finding to immune imprinting (see response to point (1) above) and acknowledged that the EABR approach was not able to effectively overcome this effect (see discussion section of the paper, lines 537-558; and “Limitations of the study” section, lines 570-598 in the revised version).

      Reviewer #2 (Public review):

      Summary:

      In this manuscript, Fan, Cohen, and Dam et al. conducted a follow-up study to their prior work on the ESCRT- and ALIX-binding region (EABR) mRNA vaccine platform that they developed. They tested in mice whether vaccines made in this format will have improved binding/neutralization antibody capacity over conventional antigens when used as a booster. The authors tested this in both monovalent (Wu1 only) or bivalent (Wu1 + BA.5) designs. The authors found that across both monovalent and bivalent designs, the EABR antigens had improved antibody titers than conventional antigens, although they observed dampened titers against Omicron variants, likely due to immune imprinting. Deep mutational scanning experiments suggested that the improvement of the EABR format may be due to a more diversified antibody response. Finally, the authors demonstrate that co-expression of multiple spike proteins within a single cell can result in the formation of heterotrimers, which may have potential further usage as an antigen.

      We thank the reviewer for their support and for the accurate summary and evaluation of our study.

      Strengths:

      (1) The experiments are conducted well and are appropriate to address the questions at hand. Given the significant time that is needed for testing of pre-existing immunity, due to the requirement of pre-vaccinated animals, it is a strength that the authors have conducted a thorough experiment with appropriate groups.

      (2) The improvement in titers associated with EABR antigens bodes well for its potential use as a vaccine platform.

      Weaknesses:

      As noted above, this type of study requires quite a bit of initial time, so the authors cannot be blamed for this, but unfortunately, the vaccine designs that were tested are quite outdated. BA.5 has long been replaced by other variants, and importantly, bivalent vaccines are no longer used. Testing of contemporaneous strains as well as monovalent variant vaccines would be desirable to support the study.

      We thank the reviewer for bringing up this important point. We agree that the variants used for this study are now outdated, and it would have been informative to evaluate conventional and EABR boosters against contemporaneous strains. However, as the reviewer correctly pointed out, this type of study requires a substantial amount of time to conduct and will therefore will likely always be outdated by the time the data are analyzed and prepared for publication. To accurately assess immune responses against recent or current strains in mice, multiple boosters would have been needed to mimic the pre-existing immune context in the human population in 2025. Assuming intervals of 6-7 months between boosters (as used in this study to mimic booster intervals in the human population as closely as possible), this type of study would have been challenging to conduct, especially given the limited lifespan of mice. Thus, we performed this proof-of-concept study using outdated variants to assess the potential of EABR-modified boosters. We greatly appreciate the reviewer’s understanding and acknowledge this limitation of our study, which is highlighted in the added “Limitations of the study” section in the revised version of the manuscript (lines 570-598).

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) The acronym RBD in the title should be spelled out.

      We thank the reviewer for raising this point. We made this change in the revised version of the paper.

      (2) Lines 167-168 describe no differences between the cohorts at day 244. It should also be stated that for all timepoints, there are no significant differences.

      We modified the revised manuscript according to the reviewer’s suggestion (line 170).

      Reviewer #2 (Recommendations for the authors):

      (1) Given the focus on developing broad vaccines for future coronavirus outbreaks, it would be particularly informative to test whether the EABR antigens elicit broadened/heightened responses against other (beta)coronaviruses. If enough serum is left, it would seem straightforward to conduct neutralization assays against non-SARSCoV-2 coronaviruses.

      We thank the reviewer for this valid suggestion. Unfortunately, the extensive analysis of the serum samples, including spike and RBD ELISAs and neutralization assays against multiple variants, deep mutational scanning, and depletion assays, used up the serum samples for most mice. We agree that it would be interesting to investigate whether bivalent EABR boosters elicit pan-sarbecovirus responses in future studies.

      (2) In the bar plots for antibody titer changes, shown as log10 fold change, it is quite hard to interpret the difference between bars (e.g., what is the fold change difference between each bar in the same time point?). A table of mean {plus minus} SD values would be helpful.

      That’s a great suggestion. We added a table (Table S1) presenting all the geometric mean neutralization titers for all timepoints and variants in the revised version of the manuscript.

      (3) The development of heterotrimers as potential antigens is very interesting, but it seems out of place in the current manuscript. This should likely be in a separate, standalone manuscript.

      We thank the reviewer for commenting on the heterotrimer part of our manuscript. The presented work was not intended to advance the development of heterotrimers as potential antigens. Instead, our findings demonstrate that bivalent spike mRNA vaccines readily generate heterotrimers, which could promote intra-spike crosslinking and potentially impact antibody epitope targeting profiles as suggested by the deep mutational scanning data for the bivalent S-EABR mRNA booster (Fig. 4; Fig. S7-8). We think this is an important consideration that warrants further investigation with regards to the development of future bivalent or multivalent vaccines.

      (4) As a minor note, the sequences of the variants used or accession numbers should be provided in the Methods, since different groups have used different mutations for variants.

      We added the accession numbers for the vaccine strains used in this study (lines 604605).

    1. eLife Assessment

      These findings are among some of the first to identify a behavioral and neurobiological substrate that disentangles nonassociative from associative fear responses following stress, providing a fundamental push forward in the field. The evidence supporting this is compelling and uses a variety of conceptual and technological approaches. This investigation will be of interest to neuroscientists and behaviourists broadly, as well as clinicians for its relevance to post-traumatic stress disorder.

    2. Reviewer #1 (Public review):

      Summary:

      This study delineates a highly specific role for the pPVT in unconditioned defensive responses. The authors use a novel, combined SEFL and SEFR paradigm to test both conditioned and unconditioned responses in the same animal. Next, a c-fos mapping experiment showed enhanced PVT activity in the stress group when exposed to the novel tone. No other regions showed differences. Fiber photometry measurements in pPVT showed enhancement in response to the novel tone in the stressed but not non-stressed groups. Importantly, there were also no effects when calcium measurements were taken during conditioning. Using DREADDS to bidirectionally manipulate global pPVT activity, inhibition of the PVT reduced tone freezing in stressed mice while stimulation increased tone freezing in non-stressed mice.

      Strengths:

      A major strength of this research is the use of a multi-dimensional behavioral assay that delineates behavior related to both learned and non-learned defensive responses. The research also incorporates high-resolution approaches to measure neuronal activity and provide causal evidence for a role for PVT in a very narrow band of defensive behavior. The data are compelling, and the manuscript is well-written overall.

      Weaknesses:

      Figure 1 shows a small, but looks to be, statistically significant, increase in freezing in response to the novel tone in the no-stress group relative to baseline freezing. This observation was also noticed in Figures 2 and 7. The tone presented is relatively high frequency (9 kHz) and high dB (90), making it a high-intensity stimulus. Is it possible that this stimulus is acting as an unconditioned stimulus? In addition, in the final experiment, the tone intensity was increased to 115 dB, and the freezing % in the non-stressed group was nearly identical (~20%) to the non-stressed groups in Figures 1-2 and Figure 7. It seems this manipulation was meant as a startle assay (Pantoni et al., 2020). Because the auditory perception of mice is better at high frequencies (best at ~16 kHz), would the effect seen be evident at a lower dB (50-55) at 9 kHz? If the tone was indeed perceived as "neutral," there should be no freezing in response to the tone. This complicates the interpretation of the results somewhat because while the authors do admit the stimulus is loud, would a less loud stimulus result in the same effect? Could the interaction observed in this set of studies require not a novel tone, but rather a high-intensity tone that elicits an unconditioned response? Along these same lines, it appears there may be an elevation in c-fos in the PVT in the non-stress tone test group versus the no-stress home cage control, and overall it appears that tone increases c-fos relative to homecage. Could PVT be sensitive to the tone outside of stress? Would there be the same results with a less intense stimulus? I would also be curious to know what mice in the non-stressed group were doing upon presentation of the tone besides freezing. Were any startle or orienting responses noticed?

      Comments on revisions:

      Following revision, this reviewer felt all of the above concerns were addressed.

    3. Reviewer #2 (Public review):

      Summary:

      Nishimura and colleagues present findings of a behavioral and neurobiological dissociation of associative and nonassociative components of Stress Enhanced Fear Responding (SEFR).

      Strengths:

      This is a strong paper that identifies the PVT as a critical brain region for SEFR responses using a variety of approaches, including immunohistochemistry, fiber photometry, and bidirectional chemogenetics. In addition, there is a great deal of conceptual innovation. The authors identify a dissociable behavior to distinguish the effects of PVT function (among other brain regions).

      Weaknesses:

      (1) The authors find a lack of difference between the Stress and No Stress groups in pPVT activity during SEFL conditioning with fiber photometry but an increase in freezing with Gq DREADD stimulation. How do authors reconcile this difference in activity vs function?

      (2) Because the PVT plays a role in defensive behaviors, it would be beneficial to show fiber photometry data during freezing bouts vs exclusively presented during tone a shock cue presentations.

      (3) Similar to the above point, were other defensive behaviors expressed as a result of footshock stress or PVT manipulations?

      (4) Tone attenuation in Figure 8 seems to be largely a result of minimal freezing to a 115-dB tone. While not a major point of the paper, a more robust fear response would be convincing.

      (5) In the open field test, the authors measure total distance. It would be beneficial to also show defensive behavioral (escape, freezing, etc) bouts expressed.

      (6) The authors, along with others, show a behavioral and neural dissociation of footshock stress on nonassociative vs associative components of stress; however, the nonassociative components as a direct consequence of the stress seem to be necessary for enhancement of associative aspects of fear. Can authors elaborate on how these systems converge to enhance or potentiate fear?

      (7) In the discussion, authors should elaborate on/clarify the cell population heterogeneity of the PVT since authors later describe PVT neurons as exclusively glutamatergic.

      Comments on revisions:

      Following revision, this reviewer felt all of the above concerns were addressed.

    4. Reviewer #3 (Public review):

      Summary:

      The manuscript by Nishimura et al. examines the behavioural and neural mechanisms of stress-enhanced fear responding (SEFR) and stress-enhanced fear learning (SEFL). Groups of stressed (4 x shock exposure in a context) vs non-stressed (context exposure only) animals are compared for their fear of an unconditioned tone, and context, as well as their learning of new context fear associations. Shock of higher intensity led to higher levels of unlearned stress-enhanced fear expression. Immediate early gene analysis uncovered the PVT as a critical neural locus, and this was confirmed using fiber photometry, with stressed animals showing an elevated neural signal to an unconditioned tone. Using a gain and loss of function DREADDs methodology, the authors provide convincing evidence for a causal role of the PVT in SEFR.

      Strengths:

      (1) The manuscript uses critical behavioural controls (no stress vs stress) and behavioural parameters (0.25mA, 0.5mA, 1mA shock). Findings are replicated across experiments.

      (2) Dissociating the SEFR and SEFL is a critical distinction that has not been made previously. Moreover, this dissociation is essential in understanding the behavioural (and neural) processes that can go awry in fear.

      (3) Neural methods use a multifaceted approach to convincingly link the PVT to SEFR: from Fos, fiber photometry, gain and loss of function using DREADDs.

      Weaknesses:

      No weaknesses were identified by this reviewer; however, I have the following comments:

      A closer examination of the Test data across time would help determine if differences may be present early or later in the session that could otherwise be washed out when the data are averaged across time. If none are seen, then it may be worth noting this in the manuscript.

      Given the sex/gender differences in PTSD in the human population, having the male and female data points distinguished in the figures would be helpful. I assume sex was run as a variable in the statistics, and nothing came as significant. Noting this would also be of value to other readers who may wonder about the presence of sex differences in the data.

      Comments on revisions:

      Following revision, this reviewer felt all of the above comments were addressed.

    5. Author Response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This study delineates a highly specific role for the pPVT in unconditioned defensive responses. The authors use a novel, combined SEFL and SEFR paradigm to test both conditioned and unconditioned responses in the same animal. Next, a c-fos mapping experiment showed enhanced PVT activity in the stress group when exposed to the novel tone. No other regions showed differences. Fiber photometry measurements in pPVT showed enhancement in response to the novel tone in the stressed but not nonstressed groups. Importantly, there were also no effects when calcium measurements were taken during conditioning. Using DREADDS to bidirectionally manipulate global pPVT activity, inhibition of the PVT reduced tone freezing in stressed mice while stimulation increased tone freezing in non-stressed mice.

      Strengths:

      A major strength of this research is the use of a multi-dimensional behavioral assay that delineates behavior related to both learned and non-learned defensive responses. The research also incorporates high-resolution approaches to measure neuronal activity and provide causal evidence for a role for PVT in a very narrow band of defensive behavior. The data are compelling, and the manuscript is well-written overall.

      Weaknesses:

      Figure 1 shows a small, but looks to be, statistically significant, increase in freezing in response to the novel tone in the no-stress group relative to baseline freezing. This observation was also noticed in Figures 2 and 7. The tone presented is relatively high frequency (9 kHz) and high dB (90), making it a high-intensity stimulus. Is it possible that this stimulus is acting as an unconditioned stimulus?

      We thank the reviewer for this insightful comment. In our view, the freezing behavior elicited by the tone reflects an unconditioned response; accordingly, the tone functions as an unconditioned stimulus. Indeed, in our data we found a modest increase in freezing in the no-stress group during the tone presentation relative to baseline (Figures 1, 2, and 7). This effect, however, was considerably smaller in magnitude than the robust freezing observed in stressed mice. We conclude that prior footshock stress enhances the unconditioned tone response.

      In addition, in the final experiment, the tone intensity was increased to 115 dB, and the freezing % in the non-stressed group was nearly identical (~20\%) to the non-stressed groups in Figures 1-2 and Figure 7. It seems this manipulation was meant as a startle assay (Pantoni et al., 2020).

      We appreciate the opportunity to clarify this aspect of the model. In Figure 7, the rationale for selecting a tone amplitude to 115 dB was not to conduct a startle assay. Instead, we sought to determine whether chemogenetic inhibition of the pPVT influenced tone-elicited unconditioned fear in stress naïve mice. Given our prior experiments demonstrating that a 90 dB tone elicits relatively low levels of freezing in non-stressed groups, we increased the tone amplitude to 115 dB in an attempt to elicit a more robust freezing response that would be sufficient to detect meaningful group differences (i.e., prevent a floor effect). As noted by the reviewer, the 115 dB tone yielded moderate levels of freezing behavior. Although freezing levels were not very high, we believe they were sufficient to avoid a floor effect. There was no effect pPVT inhibition in this version of the task, which suggests that pPVT is preferentially engaged after stress. Future studies that identify tone parameters capable of eliciting high levels of freezing will be necessary to further strengthen this finding.

      Because the auditory perception of mice is better at high frequencies (best at ~16 kHz), would the effect seen be evident at a lower dB (50-55) at 9 kHz? If the tone was indeed perceived as “neutral,” there should be no freezing in response to the tone. This complicates the interpretation of the results somewhat because while the authors do admit the stimulus is loud, would a less loud stimulus result in the same effect? Could the interaction observed in this set of studies require not a novel tone, but rather a highintensity tone that elicits an unconditioned response?

      Within our framework, it is important to emphasize that tone intensity (amplitude and frequency), rather than the perceived novelty of the stimulus, is the primary determinant of unconditioned freezing behavior. Moreover, numerous studies have demonstrated that auditory stimuli have the capacity to elicit unconditioned fear responses, as in the case of pseudoconditioning. Accordingly, we agree with the reviewer that decreasing the tone amplitude from 90 dB to 50 dB would diminish the unconditioned freezing response. For example, Kamprath and Wotjak (2004) demonstrated that stress-naïve mice exposed to a 95 dB tone exhibited significantly greater levels of freezing compared to those exposed to an 80 dB tone. This graded effect of tone amplitude on unconditioned freezing was also observed in mice previously exposed to footshock stress. Notably, the authors also reported a plateau effect, such that increases in tone amplitude beyond 95 dB did not further elevate freezing levels. As it relates to our findings, this plateau effect may explain the rather modest changes in freezing behavior that we observed between the 90 dB and 115 dB tone.

      Along these same lines, it appears there may be an elevation in c-fos in the PVT in the non-stress tone test group versus the no-stress home cage control, and overall it appears that tone increases c-fos relative to homecage. Could PVT be sensitive to the tone outside of stress? Would there be the same results with a less intense stimulus?

      Indeed, as the reviewer noted, we observed an increase in PVT c-Fos expression in non-stressed animals exposed to the SEFR tone test relative to homecage controls. The finding is consistent with previous reports demonstrating that PVT neurons are robustly activated by salient stimuli and regulate properties of arousal (Penzo and Gau, 2022). Moreover, the PVT has been shown to exhibit neuronal activity responses that are scaled to stimulus intensity. For example, PVT neurons display increased firing rates in response to a tail shock compared to an air puff (Zhu, 2018). Thus, it is conceivable that a less intense stimuli would evoke a diminished level of c-Fos expression.

      I would also be curious to know what mice in the non-stressed group were doing upon presentation of the tone besides freezing. Were any startle or orienting responses noticed?

      We thank the reviewer for raising this important question. Regarding startle responses, we have found that our standard 90 dB, 9 kHz tone parameter elicits similar degrees of startle between stressed and non-stressed mice (data unpublished). However, Golub et al. (2009) observed effects of prior footshock stress on acoustic startle. Further investigation of behavioral responses expressed during the tone is certainly warranted.

      Reviewer #2 (Public review):

      Summary:

      Nishimura and colleagues present findings of a behavioral and neurobiological dissociation of associative and nonassociative components of Stress Enhanced Fear Responding (SEFR).

      Strengths:

      This is a strong paper that identifies the PVT as a critical brain region for SEFR responses using a variety of approaches, including immunohistochemistry, fiber photometry, and bidirectional chemogenetics. In addition, there is a great deal of conceptual innovation. The authors identify a dissociable behavior to distinguish the effects of PVT function (among other brain regions).

      Weaknesses:

      (1) The authors find a lack of difference between the Stress and No Stress groups in pPVT activity during SEFL conditioning with fiber photometry but an increase in freezing with Gq DREADD stimulation. How do authors reconcile this difference in activity vs function?

      The reviewer points out a curious dissociation. Fiber photometry showed no effect of prior stress on the PVT response during single-shock contextual fear conditioning; however, Gq DREADD stimulation of PVT led to increased postshock freezing during this session. We don’t have a definitive explanation for this dissociation, but we wish to emphasize two relevant points. The first is that in our experience, post-shock freezing during the one-shock contextual fear conditioning session is modest, variable, and an unreliable predictor of long-term contextual fear. Thus, we are hesitant to draw firm conclusions from these data. Second, we did not observe differences in freezing during the SEFL context test, indicating that stimulation of pPVT during conditioning is not sufficient to elicit long-term enhancement of conditioned fear (i.e., SEFL). This suggests that the acute freezing response following shock exposure is mechanistically distinct from expression of conditioned contextual fear. Clearly, further research will be needed to clarify the conditions under which PVT activity regulates / does not regulate freezing.

      (2) Because the PVT plays a role in defensive behaviors, it would be beneficial to show fiber photometry data during freezing bouts vs exclusively presented during tone a shock cue presentations.

      We appreciate the reviewer's suggestion. Unfortunately, freezing data are not available for the fiber photometry experiment because the fiber optic patch cable interfered with mouse activity. We now acknowledge this as a limitation in the paper (line #202).

      (3) Similar to the above point, were other defensive behaviors expressed as a result of footshock stress or PVT manipulations?

      In addition to freezing behavior and locomotor activity in the open field, we examined the time and distance spent in the center of the open field arena. Consistent with our previous report (Hassien, 2020), we did not observe significant group differences between stress conditions, nor did we detect differences across the various experiential manipulations. We did not examine other defensive behaviors in this study. Ongoing research in the lab is examining a broader range of defensive behaviors in this paradigm.

      (4) Tone attenuation in Figure 8 seems to be largely a result of minimal freezing to a 115-dB tone. While not a major point of the paper, a more robust fear response would be convincing.

      Although our data indicate that DREADD-mediated inhibition of the pPVT did not attenuate freezing in non-stressed mice, we agree with the reviewer’s assessment that the 115 dB tone elicited only minimal freezing. Therefore, we remain open to the possibility that higher baseline levels of freezing might reveal a significant behavioral effect. We found it challenging to identify a decibel range that reliably evokes robust freezing in non-stressed mice. Future studies could explore varying tone frequencies to achieve a stronger freezing response.

      (5) In the open field test, the authors measure total distance. It would be beneficial to also show defensive behavioral (escape, freezing, etc) bouts expressed.

      We agree this would be valuable information, and we have noted it as a future direction in the discussion.

      (6) The authors, along with others, show a behavioral and neural dissociation of footshock stress on nonassociative vs associative components of stress; however, the nonassociative components as a direct consequence of the stress seem to be necessary for enhancement of associative aspects of fear. Can authors elaborate on how these systems converge to enhance or potentiate fear?

      We appreciate the reviewer for recognizing this important point regarding the mechanistic relationship between nonassociative fear sensitization and associative fear learning that occurs following footshock stress. At present, the majority of research on this topic has been conducted using the SEFL paradigm.

      At the behavioral level, previous studies indicate that manipulations that interfere or attenuate associative fear memory of the footshock stress event fail to block nonassociative fear sensitization. For example, both SEFL and SEFR persist in animals that have successfully undergone fear extinction training in the footshock stress context (Rau et al., 2005; Hassien et al., 2020). Furthermore, reports also find that infantile or pharmacological amnesia of the footshock stress memory does not occlude the emergence of SEFL (Rau et al., 2005; Poulos et al., 2014). Taken together, associative fear memory of the footshock stress event does not appear to be necessary for fear sensitization.

      If and how the associative and nonassociative mechanisms interact is an interesting question that we are currently investigating. PVT has direct projections to the central and basolateral amygdala, regions well known to mediate conditioned fear acquisition and expression (Penzo et al., 2015). Why PVT activity does not modulate conditioned fear in our hands is intriguing. PVT is a heterogeneous structure with a variety of projections (e.g., Shima et al., 2023), and it is possible that the PVT-Amygdala projections are not hyperactive in our paradigm. As we alluded above, further research will be needed to understand why stress-induced PVT hyperactivity affects some forms of fear and not others.

      (7) In the discussion, authors should elaborate on/clarify the cell population heterogeneity of the PVT since authors later describe PVT neurons as exclusively glutamatergic.

      The reviewer is correct that additional explanation of PVT cellular heterogeneity is warranted. We now provide clarity on this point in the discussion.

      Reviewer #3 (Public review):

      Summary:

      The manuscript by Nishimura et al. examines the behavioural and neural mechanisms of stress-enhanced fear responding (SEFR) and stress-enhanced fear learning (SEFL). Groups of stressed (4 x shock exposure in a context) vs non-stressed (context exposure only) animals are compared for their fear of an unconditioned tone, and context, as well as their learning of new context fear associations. Shock of higher intensity led to higher levels of unlearned stress-enhanced fear expression. Immediate early gene analysis uncovered the PVT as a critical neural locus, and this was confirmed using fiber photometry, with stressed animals showing an elevated neural signal to an unconditioned tone. Using a gain and loss of function DREADDs methodology, the authors provide convincing evidence for a causal role of the PVT in SEFR.

      Strengths:

      (1) The manuscript uses critical behavioural controls (no stress vs stress) and behavioural parameters (0.25mA, 0.5mA, 1mA shock). Findings are replicated across experiments.

      (2) Dissociating the SEFR and SEFL is a critical distinction that has not been made previously. Moreover, this dissociation is essential in understanding the behavioural (and neural) processes that can go awry in fear.

      (3) Neural methods use a multifaceted approach to convincingly link the PVT to SEFR: from Fos, fiber photometry, gain and loss of function using DREADDs.

      Weaknesses:

      No weaknesses were identified by this reviewer; however, I have the following comments:

      A closer examination of the Test data across time would help determine if differences may be present early or later in the session that could otherwise be washed out when the data are averaged across time. If none are seen, then it may be worth noting this in the manuscript.

      Given the sex/gender differences in PTSD in the human population, having the male and female data points distinguished in the figures would be helpful. I assume sex was run as a variable in the statistics, and nothing came as significant. Noting this would also be of value to other readers who may wonder about the presence of sex differences in the data.

      We appreciate the reviewer’s thoughtful feedback and have addressed these points as follows: In the methods section, we clarify that pre-tone and post-tone freezing behavior was averaged because we did not detect a significant effect of time across all experiments (line #474). With regards to sex differences, we clarify in the methods section that we did not detect sex as a statistically significant variable across tests (line #443). In addition, we have revised the figures to denote male and female subjects separately.

      Recommendations for the authors:

      Reviewing Editor Comments:

      Following discussion, the reviewers and editors agreed that the strength of the evidence could be updated to compelling, provided the comments were adequately addressed.

      Reviewer #1 (Recommendations for the authors):

      (1) In the discussion around line 333, there is also data indicating a time-dependent role for PVT in conditioned fear (Quinones-Laracuente 2021; Do-Monte 2015).

      We agree with the reviewer’s assessment and have revised the discussion accordingly (line #364).

      (2) The 129S6/SvEvTac mouse exhibits impaired fear extinction but intact discrimination (Temme, 2014). Was there any rationale for using this line of mice?

      The reviewer is correct that additional explanation is warranted. We have amended the manuscript to include additional rationale for using the 129S6/SvEvTac mouse strain as well as address the findings of Temme, 2014 as they relate to our study (line #94).

      (3) Was there any reason why there were no c-fos results in the PAG and IPBM? You discuss those brain regions and their importance in the circuit in the discussion.

      In the current manuscript, we do show c-fos results for the lPAG, dlPAG, and lPBN (Figure 3). We highlight in the discussion the relevance of these regions in the fear circuit.

      (4) Take a look at Sillivan et al., 2018 for an additional reference in the introduction (around lines 61).

      We thank the reviewer for their suggestion and have included the reference in the introduction (line #63).

      (5) Can the authors show the c-fos data for aPVT and pPVT separately? The authors focus on pPVT for later manipulations, but the c-fos data is collapsed. Along these same lines, were there any corrections for multiple comparisons across the brain regions? While the subsequent experiments firmly support a role for pPVT in unlearned stressinduced fear response, a proper correction for multiple comparisons is warranted.

      We have revised Figure 3 to include c-fos expression for both the anterior and posterior PVT separately. To correct for multiple comparisons, we conducted twoway ANOVA (Brain Region X Group) with Tukey's-corrected posthoc tests detailed in methods section (line #577).

      (6) Do the authors provide rationale for why they began to focus specifically on pPVT versus aPVT?

      We agree that additional clarity is warranted. We have provided additional rationale for selecting pPVT as our primary focus in the results section (line #197).

      (7) Lines 298-337 of the discussion could be shortened. This long preamble is a summary of the results.

      We agree with the reviewer’s assessment and have revised the manuscript accordingly.

      Reviewer #2 (Recommendations for the authors):

      Additional analyses for fiber photometry and open field data to probe for PVT-related changes in defensive behaviors beyond freezing.

      As stated above, we agree with the reviewer that additional behavioral analyses would be valuable. Unfortunately, such measures are not available for the current experiment.

      Reviewer #3 (Recommendations for the authors):

      As mentioned in the weaknesses, just checking for differences across time on the Tests, highlighting the M vs. F datapoints in the figures, and reporting if there are sex differences in any of the analyses.

      In the revised manuscript, we have included separate male and female data points for each figure. In addition, we provided clarity in the methods section reporting a lack of statistically significant sex differences across each experiment (line #443).

    1. eLife Assessment

      This valuable study shows that targeted mutations in specific cassava eIF4E-family genes can reduce infection and disease symptoms caused by cassava brown streak viruses. Through systematic knockouts across the eIF4E gene family, the authors provide convincing evidence that certain double mutants show resistance-associated outcomes. Overall, the work supports practical routes to engineer cassava with improved resistance and clarifies which host factors are relevant for this disease.

    2. Reviewer #1 (Public review):

      It is well established that many potivirids (viruses in the Potiviridae family) particularly potyviruses (viruses in the Potyvirus genus) recruit (selectively) either eIF4E or eIF(iso)4E, while some others can use both of them to ensure a successful infection. CBSD caused by two potyvirids, i.e., ipomoviruses CBSV and UCBSV severely impedes cassava production in West Africa. In a previous study (PBI, 2019), Gomez and Lin (co-first authors), et al. reported that cassava encodes five eIF4E proteins including eIF4E, eIF(iso)4E-1, eIF(iso)4E-2, nCBP-1 and nCBP-2, and CBSV VPg interacts with all of them (Co-IP data). Simultaneous CRISPR/Cas9-mediated editing of nCBp-1 and -2 in cassava significantly mitigate CBSD symptoms and incidence. In this study, Lin et al further generated all five eIF4E family single mutants as well as both eIF(iso)4E-1/-2 and nCBP-1/-2 double mutants in a farmer-preferred casava cultivar. They found that both eIF(iso)4E and nCBP double mutants show reduced symptom severity and the latter is of better performance. Analysis of mutant sequences revealed one important point mutation L51F of nCBP-2 that may be essential for the interaction with VPg. The authors suggest that introduction of L51F mutation into all five eIF4E family proteins may lead to strong resistance. Overall I believe this is an important study enriching knowledge about eIF4E as a host factor/susceptibility factor of potyvirids and proposing new information for the development of high CBSD resistance in cassava. I suggest the following two major comments for authors to consider for improvement:

      (1) As eIF(iso)4e-1/-2 or nCBP-1/-2 double mutans show resistance, why not try to generate a quadruple mutant? I believe it is technically possible through conventional breeding.

      (2) I agree that L51F mutation may be important. But more evidence is needed to support this idea. For example. Authors may conduct quantitative Y2H assay on binding of VPg to each of eIF4E (L51F) mutants. Such data may

      Comments on revisions:

      (1) The authors explained it is technically challenging to generate quadruple mutant.<br /> (2) The authors have properly addressed my comment 2.<br /> I do not have more concerns.

    3. Reviewer #2 (Public review):

      Eukaryotic translation initiation factor 4E (eIF4E) acts as a key susceptibility factor for members of the Potyviridae family, and knockout of eIF4E family members enables the generation of corresponding virus-resistant germplasm. In this study, the authors performed systematic knockout experiments on the members of eIF(iso)4E and nCBP clades in cassava, which demonstrated that simultaneous knockout of the eIF4E-family genes nCBP-1 and nCBP-2 in the cultivar 60444 significantly attenuates Cassava Brown Streak Disease (CBSD) root symptoms and reduces viral titer. The authors further screened for CBP mutants without VPg-binding activity and identified the nCBP-2 L51F mutant, which loses the ability to interact with VPg. In the revised manuscript, the authors have addressed most of my previous questions and revised the relevant content accordingly. Overall, this study is a well-performed work, with extensive explorations carried out particularly in the gene knockout of members of eIF(iso)4E and nCBP. It provides an important value for investigating the functions of eIF(iso)4E and nCBP clade members in the development of disease-resistant germplasm, and the identified nCBP-2 L51F mutant also offers a crucial gene editing site target for the generation of virus-resistant cassava germplasm in future.

    4. Reviewer #3 (Public review):

      In the manuscript, the authors generated several mutant plants defective in the eIF4E family proteins and detected cassava brown streak viruses (CBSVs) infection in these mutant plants. They found that CBSVs induced significantly lower disease scores and virus accumulation in the double mutant plants. Furthermore, they identified important conserved amino acid for the interaction between eIF4E protein and the VPg of CBSVs by yeast two hybrid screening. The experiments are well designed, however, some points need to be clarified:

      (1) The authors reported that the ncbp1 ncbp2 double mutant plants were less sensitive to CBSVs infection in their previous study, and all the eIF4E family proteins interact with VPg. In order to identify the redundancy function of eIF4E family proteins, they generated mutants for all eIF4E family genes, however, these mutants are defective in different eIF4E genes, they did not generate multiple mutants (such as triple, quadruple mutants or else) except several double mutant plants, it is hard to identify the redundant function eIF4E family genes.

      (2) The authors identified some key amino acids for the interaction between eIF4E and VPg such as the L51, it is interesting to complement ncbp1 ncbp2 double mutant plants with L51F form of eIF4E and double check the infection by CBSVs.

      Comments on revisions:

      The reviewer understand Cassava is not a model plant, it is hard for the authors to generate multiple genetic mutant plants for experiments, so nothing was done to respond to the comments raised by the reviewer.

    5. Author Response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      It is well established that many potivirids (viruses in the Potiviridae family), particularly potyviruses (viruses in the Potyvirus genus), recruit (selectively) either eIF4E or eIF(iso)4E, while some others can use both of them to ensure a successful infection. CBSD caused by two potyvirids, i.e., ipomoviruses CBSV and UCBSV, severely impedes cassava production in West Africa. In a previous study (PBI, 2019), Gomez and Lin (co-first authors), et al. reported that cassava encodes five eIF4E proteins, including eIF4E, eIF(iso)4E-1, eIF(iso)4E-2, nCBP-1 and nCBP-2, and CBSV VPg interacts with all of them (Co-IP data). Simultaneous CRISPR/Cas9-mediated editing of nCBp-1 and -2 in cassava significantly mitigates CBSD symptoms and incidence. In this study, Lin et al further generated all five eIF4E family single mutants as well as both eIF(iso)4E-1/-2 and nCBP-1/-2 double mutants in a farmer-preferred casava cultivar. They found that both eIF(iso)4E and nCBP double mutants show reduced symptom severity, and the latter is of better performance. Analysis of mutant sequences revealed one important point mutation, L51F of nCBP-,2 that may be essential for the interaction with VPg. The authors suggest that the introduction of the L51F mutation into all five eIF4E family proteins may lead to strong resistance. Overall I believe this is an important study enriching knowledge about eIF4E as a host factor/susceptibility factor of potyvirids and proposing new information for the development of high CBSD resistance in cassava. I suggest the following two major comments for authors to consider for improvement:

      (1) As eIF(iso)4e-1/-2 or nCBP-1/-2 double mutants show resistance, why not try to generate a quadruple mutant? I believe it is technically possible through conventional breeding.

      (2) I agree that L51F mutation may be important. But more evidence is needed to support this idea. For example, the authors may conduct a quantitative Y2H assay on the binding of VPg to each of the eIF4E (L51F) mutants. Such data may add as additional evidence to support your claim.

      We thank the reviewer for their overall assessment. Regarding investigating a quadruple mutant, we agree that this is a logical next step to investigate. A conventional breeding approach with existing mutant lines, however, is problematic for several reasons; 1) cassava does not flower where this work was conducted, and 2) cassava is subject to inbreeding depression, resulting in both low seed set and considerable heterogeneity among progeny that do arise. Editing existing double mutants is possible, but would require a significant, multi-year investment to produce embryogenic tissue from existing lines and generate the new lines. Cassava has practical limits as a non-model plant. Given these constraints, we conclude that investigating a quadruple mutant is beyond the scope of the current work.

      For investigating the HPL to HPF mutation in other cassava eIF4E-family proteins and their interaction with VPg in yeast, we have now completed this experiment and included the data in the paper. Notably we find that generating this mutant for eIF(iso)4E-2 attenuates VPg interaction without impairing eIF(iso)4E-2 accumulation, while similarly mutating nCBP-1 and eIF(iso)4E-1 results in total and reduced protein accumulation, respectively.

      Reviewer #2 (Public review):

      Summary:

      The authors generated single and double knockout mutants for the eIF4E family members eIF4E, iso4E1, iso4E2, nCBP1, and nCBP2 in cassava. While a single knockout of these eIF4E genes did not abolish viral infection, the nCBP1/nCBP2 double knockout mutant displayed the weakest symptoms and viral infection. Through yeast two-hybrid screening, the nCBP-2 L51F mutant was identified, and the mutant was unable to interact with VPg, yet the nCBP-2 L51F mutant could complement the eIF4E yeast mutant. This L51F is a potentially important editing site for eIF4E.

      Strengths:

      This study systematically generated single and double knockout mutants for the eIF4E family members and investigated their antiviral activity. It also identified a L51F site as a potentially important antiviral editing site in eIF4E, however, its antiviral genetic evidence remains to be validated.

      Weaknesses:

      (1) The symptoms of the iso4E1 & iso4E2 double-knockout mutant are slightly alleviated, and those of the nCBP1 & nCBP2 double-knockout mutant are alleviated the most. If the iso4E1 & iso4E2 and nCBP1 & nCBP2 mutants are crossed to obtain quadruple-knockout mutant plants, whether the resistance of the quadruple mutant will be more excellent should be further investigated.

      (2) Although the yeast two-hybrid identified the nCBP-2 L51F mutant, there is no direct biological evidence demonstrating its antiviral function. While the 6-amino acid deletion mutant (including L51F) showed attenuated symptoms, this deletion might be sufficient to cause loss-of-function of nCBP-2. These indirect observations cannot definitively establish that the L51F mutation specifically confers antiviral activity.

      (3) Given that nCBP-2 can rescue yeast eIF4E mutants, introducing wild type and L51F nCBP2 into the Arabidopsis iso4e mutant viral infectious clones into yeast systems could clarify whether the L51F mutation (and the same mutations in eIF4E, iso4E1, iso4E2) abrogates their roles as viral susceptibility factors - critical genetic evidence currently missing.

      We sincerely thank the reviewer for their constructive feedback.

      With regards to investigating a quadruple eIF4E mutant, please see our response to reviewer 1.

      The reviewer makes a salient point regarding the nCBP-2 L51F and K45_L51del mutations. Ideally, complementation of the ncbp double mutant with nCBP-2 L51F, followed by viral challenge, would address this question. However, the practical limitations, as noted in our response to reviewer 1, make this difficult within the context of this manuscript. We acknowledge that this is a limitation of our study and have been cautious in not overstating our conclusions.

      Reviewer #3 (Public review):

      In the manuscript, the authors generated several mutant plants defective in the eIF4E family proteins and detected cassava brown streak viruses (CBSVs) infection in these mutant plants. They found that CBSVs induced significantly lower disease scores and virus accumulation in the double mutant plants. Furthermore, they identified important conserved amino acid for the interaction between eIF4E protein and the VPg of CBSVs by yeast two hybrid screening. The experiments are well designed, however, some points need to be clarified:

      (1) The authors reported that the ncbp1 ncbp2 double mutant plants were less sensitive to CBSVs infection in their previous study, and all the eIF4E family proteins interact with VPg. In order to identify the redundancy function of eIF4E family proteins, they generated mutants for all eIF4E family genes, however, these mutants are defective in different eIF4E genes, they did not generate multiple mutants (such as triple, quadruple mutants or else) except several double mutant plants, it is hard to identify the redundant function eIF4E family genes.

      (2) The authors identified some key amino acids for the interaction between eIF4E and VPg such as the L51, it is interesting to complement ncbp1 ncbp2 double mutant plants with L51F form of eIF4E and double check the infection by CBSVs.

      We thank the reviewer for their assessment and feedback.

      Regarding analysis of higher-order mutants, please see our response to Reviewer #1’s public review.

      For investigation of nCBP-2 L51F in planta, please see our response to Reviewer #2’s public review.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      (1) Since nCBP2 can complement a yeast mutant, it indicates that nCBP2 can also complement Arabidopsis. Wild-type nCBP2 should be introduced into the Arabidopsis iso4e mutant to determine whether it can complement Arabidopsis iso4e and whether the virus can re-establish the infection. The nCBP2 L51F mutant should also be introduced into the Arabidopsis iso4e mutant to see if this mutant fails to re-establish the virus infection. Similarly, eIF4E, iso4E1, iso4E2, nCBP1, etc., should be introduced into the Arabidopsis iso4e mutant to determine whether they can truly complement the virus-infected mutant Arabidopsis, while the L51F mutants cannot.

      Arabidopsis encodes multiple eIF4E proteins, an nCBP protein, and an eIF(iso)4E protein, and knocking out the eIF(iso)4e gene specifically confers resistance to TuMV. Introducing cassava nCBP-2 into arabidopsis eif(iso)4e mutants is unlikely to restore TuMV susceptibility. Because TuMV belongs to a different genus than CBSV, we used the TuMV VPg interaction with arabidopsis eIF(iso)4E to test the generality of mutating the eIF4E HPL motif to HPF potyvirid VPg-eIF4E interaction. However, since this mutation disrupts arabidopsis eIF(iso)4E’s endogenous translation initiation activity in yeast, this mutant protein is not worth pursuing further. In contrast, cassava eIF(iso)4E-2 L27F retains translation initiation activity and has reduced interaction with CBSV VPg by quantitative yeast two-hybrid. It would be interesting to see if this particular mutant protein could interact with TuMV VPg, and if not, would then be worth testing for the ability to restore TuMV susceptibility in Arabidopsis eif(iso)4e. Unfortunately, we are unable to pursue these experiments at this time.

      (2) Given that nCBP-2 can complement yeast eIF4E mutants, the authors may introduce viral infectious clones into yeast systems expressing nCBP-2 variants to determine whether nCBP-2 supports viral translation. This approach could further clarify whether the L51F mutation (and mutations in eIF4E, iso4E1, so4E2) abolishes their roles as viral susceptibility factors.

      This is an intriguing suggestion, but challenging for a few reasons. First, an infectious clone of CBSV Naliendele isolate does not exist, although we have tried to construct one, without success. There is also no guarantee such a clone could infect yeast. We are aware of yeast being used as a surrogate host for a few plant viruses, such as Tomato bushy stunt virus and Brome mosaic virus but are unaware of a similar system for any potyvirid. Developing such a system would undoubtedly require a significant investmentbeyond the scope of this manuscript.

      (3) Phenotypes of all mutant lines with and without virus inoculation in Table 1 should be presented.

      Photos of un-challenged mutants are included in supplemental figures. Representative storage root symptoms for all lines have now been included in the supplemental figures as well.

      (4) In Figure 1c, the results of viral accumulation assays should be presented for additional mutant lines beyond ncbp-1, ncbp-2, ncbp-1 nCBP-2 K45_L51del, and ncbp-1 ncbp-2, particularly eif(iso)4e-1 & eif(iso)4e-2#172 and eif(iso)4e-1 & eif(iso)4e-2#92.

      We have previously found that subtle reductions in visible disease do not always translate to clear differences in viral titer when analyzed by qPCR (Gomez et al., 2018). As such, we focused on lines with the strongest phenotypes in viral titer experiments.

      (5) Inconsistently, the ncbp-1 nCBP-2 K45_L51del line showed reduced symptoms compared to wild-type in Figures 1a and 1b, yet viral accumulation levels were comparable to wild-type in Figure 1c. The explanations for this discrepancy are required.

      Please see our response to (4).

      (6) Root phenotypic data for all mutant lines shown in Figure 1d should be presented.

      Please see our response to (3).

      (7) In Figure 2b, GST control pulldowns showed detectable proteins. This background signal requires explanation.

      It is not uncommon to see weak signal in bead or tag-only negative control pulldown and IP reactions. Importantly, we see strong enrichment of VPg relative to these controls in our experimental samples.

      (8) Contrary to the abstract's implication, Figure 5c indicates that the L51F mutation impacts yeast growth, suggesting potential pleiotropic effects of this mutant.

      We interpret the results to be that nCBP2 L51F does not fully complement the yeast eif4e mutation, rather than nCBP2 L51F impacts yeast growth.

      (9) In vivo protein-protein interaction assays (e.g., co-immunoprecipitation) should be performed to complement the in vitro GST pull-down data in Figure 6.

      We appreciate the desire for these experiments and agree that they would bolster our Y2H and pulldown data. Unfortunately, we are not able to complete these experiments at this time, so have been careful not to over interpret the data.

      (10) Since the AteIF(iso)4E L28F mutant fails to complement yeast, the authors should test whether introducing the L51F mutation into other family members (eIF4E, iso4E1, iso4E2, nCBP1) preserves their yeast complementation capacity.

      This has now been done for additional cassava eIF4E-family proteins.

      (11) Indicate molecular weight sizes in all Western blots.

      This was done. As differences in buffer formulations between gel types can affect the mobility and thus apparent molecular weight of markers, we have provided in the methods section SDS-PAGE gel chemistries and specific protein ladders used in this study. Importantly we note in our experience that certain markers, in relation to proteins of interest, can vary up to 15 kDa between gel chemistries.

      (12) Figures 4d,e are not provided in the paper. Based on the content of the paper, the description in the paper likely corresponds to Figures 5c, d.

      Thank you for catching this error, this has now been corrected.

    1. eLife Assessment

      This useful study uses in vitro electrophysiology, projection-specific chemogenetics, and different behavioural tasks to investigate the role of Vglut1-expression in basolateral amygdala neurons projecting to the nucleus accumbens in aspects of motivated behaviour. Although the manuscript is clearly written, the strength of the evidence supporting claims about the role of this pathway is incomplete. Currently, the work may be of interest to some behavioural neuroscientists, but additional controls and further clarification of specific analyses would strengthen their broader significance.

    2. Reviewer #1 (Public review):

      Summary:

      The authors aimed to determine whether reward conditioning increases inhibitory regulation of Vglut1-expressing BLA→NAc neurons and whether this inhibition shapes motivated behaviors. They used whole-cell electrophysiology to measure conditioning-induced changes in synaptic inhibition and intrinsic excitability. Subsequently, they employed dual-recombinase chemogenetics to selectively inhibit this projection during behavioral tasks. The goal was to test whether suppressing the activity of Vglut1-expressing neurons would alter reward learning, valuation, and fear discrimination.

      Strengths:

      (1) The combination of electrophysical and behavioral assessments to dissect the function of Vglut1-expressing BLA→NAc neurons.

      (2) The various behavioral assessments employed to determine the effect of silencing Vglut1-expressing BLA→NAc neurons.

      Weaknesses:

      (1) The introduction underscores the importance of molecular identity and population dynamics when studying the function of BLA→NAc neurons. Yet, the experiments and manuscript provide little to no information about the Slc17a7-expressing population under study. In fact, there is no evidence that the viral manipulations targeted this neuronal population (e.g., extent and specificity of viral transduction). Regarding population dynamics, evidence is meant to be provided by Experiment 1, but the results are difficult to interpret. The control mice were not exposed to the conditioning chambers, stimuli, or food rewards. These exposures may have been sufficient to produce the changes observed in the experimental mice (i.e., they may have had nothing to do with cue-reward learning). Further, the experiments provide no evidence that the observed effects result from prolonged conditioning, since there is no group receiving a single conditioning session.

      (2) The dual-recombinase approach employed does not permit conclusions about the BLA→NAc pathway specifically, because the effects of silencing NAc-projecting BLA neurons could be driven by modulation of activity in other brain regions innervated by these same neurons through collateral projections. This limitation must be clearly acknowledged by the authors, and the manuscript should refrain from making definitive claims about the BLA→NAc pathway per se.

      (3) The experimental parameters and measures used for cued-reward conditioning complicate any firm conclusions about the observed effects. The use of a 2-second cue provides a minimal temporal window to monitor cue-related behavior. This issue is masked in the data presented because what is labeled as "cued responses" includes responses that occur after the cue has terminated and overlap with those triggered by sucrose delivery itself. These post-cue responses cannot be classified as cue-reward responses since the cue is no longer present; they are reward-related responses. Perhaps the z-score calculation addresses this issue, but this is difficult to assess since the authors do not explain how this calculation was performed or what baseline period was used.

      (4) Throughout the manuscript, there is conceptual confusion regarding the fundamental distinction between Pavlovian (cue-outcome) and instrumental (action-outcome) responses. It is unclear why the authors aimed to study both types of conditioning, but greater caution is necessary when interpreting the findings labeled as "instrumental conditioning." First, no evidence is provided that initiation port entries constitute an instrumental or goal-directed response rather than a Pavlovian approach behavior. Second, many of the conclusions are based on analyzing reward port entries-a Pavlovian conditioned response identical to that measured in the cued-reward conditioning task. This conflation undermines claims about instrumental learning.

      (5) The data from the reward valuation and reversal learning experiments are difficult to interpret. The animals are not tested under extinction conditions (with the flavors present but without reward delivery), making it impossible to establish whether their behavior relies on learned associations or ongoing reinforcement. Further, the behavior generated by these procedures appears unreliable, with substantial inconsistencies across figures (compare Figure 4A with Figures 5B, C, G, H).

      (6) The results from the auditory fear discrimination procedure are also difficult to interpret. No conditioning data are presented, and the "enhanced discrimination" could simply reflect reduced overall responding to the CS-. It is not clear how this selective impact on the CS- fits with the authors' conclusions about enhanced associative salience (noting that the meaning of the latter remains obscure).

      (7) The manuscript contains several statements about behavioral outcomes that are not supported by statistical evidence. The list provided here is non-exhaustive, and the authors should carefully correct any conclusions that lack statistical support.<br /> a) Line 294 (Figure 2F): the control mice gradually reached a similar performance to the experimental mice.<br /> b) Lines 301-303 (Figures 3D-F): inhibition strengthened the temporal association between initiation and reward consumption.<br /> c) Lines 337-339 (Figure 4A): both groups increased their preference for 10% sucrose.

      (8) The manuscript suffers from a lack of clarity and/or transparency about experimental parameters and data. Clarifications about the following would be necessary for the reader to confidently interpret the findings.<br /> a) Number of animals of each sex in each group.<br /> b) Number of animals excluded and justification.<br /> c) Analysis of sex differences.<br /> d) A clarification on the control group used in the electrophysiological experiment.<br /> e) Whether the same animals progress through multiple behavioral paradigms or if separate cohorts are used.<br /> f) All protocols should be described in the methods section.

      Without clarifying the points made above, a reliable and fair assessment of the discussion is impossible.

    3. Reviewer #2 (Public review):

      Summary:

      This study by Mercer et al. focused on Vglut1 neurons in the BLA that project to the NAc. They characterized reward conditioning-induced electrophysiological changes in these neurons, including a decrease in membrane excitability and an increase in inhibitory synaptic inputs onto them, and showed the consequences of reducing their activity in enhancing reward-seeking behaviors. Considering that Vglut1 neurons represent the majority of the BLA→NAc projecting neurons, the findings are important for potentially correcting some of the previous biases in understanding the role of BLA-to-NAc projection in reward processing, for example, the notion that this projection generally promotes reward seeking by conveying reward-associated cue information.

      Strengths:

      The paper is clearly written, with results strongly supporting the main conclusions for the most part.

      There are a few weaknesses noted. For example:

      (1) They used a retrograde recombinase strategy to drive DREADD expression in these cells; however, it is not known if they project exclusively to NAc or to other brain regions as well, and whether those other potential regions may mediate the DREADDs (Gi) effects on reward seeking. They also did not show which subregions of the NAc were innervated by these neurons.

      (2) They did not assess potential changes in excitatory synaptic transmission onto these cells after reward conditioning, which leaves a gap in concluding a shift toward inhibition.

      (3) They also did not report on whether the inhibition was specific to Vglut1 neurons.

      (4) Some statistics appear missing (Figure 3D-F), not optimal (Figure 5CEF and HJK using separate t-tests rather than repeated measure ANOVA), not clear (Figure 2I on peak timing or port entry), or has low n number (Figure 1 Ephys, animal-based manipulations).

      (5) They did not clarify why they used two different doses of the DREADDs ligand Compound 21 at 0.1 or 0.3 mg/kg for different experiments.

    4. Reviewer #3 (Public review):

      Summary:

      This study by Mercer et al. investigates how inhibitory modulation of basolateral amygdala neurons expressing Vglut1 and projecting to the nucleus accumbens (Vglut1BLA→NAc) influences motivated behavior in both appetitive and aversive tasks. Using a combination of whole-cell electrophysiology, chemogenetic inhibition and behavioral tests, the authors demonstrate that (1) reward conditioning increases inhibitory synaptic input and reduces intrinsic excitability of Vglut1BLA→NAc neurons, (2) chemogenetic inhibition of these neurons enhances the number of conditioned approaches in a Pavlovian task and the number of nosepoke responses in an instrumental task, elevates reward valuation, and increases fear discrimination and (3) these effects are linked to salience assignment and associative strength, rather than altered learning or reversal flexibility. The work challenges the classical excitatory function usually reported about the BLA projection to the NAc and highlights an interesting and thought-provoking result. Nevertheless, the study does not address the potential effect of their manipulation on motoric impulsivity, nor did they provide a theoretical framework explaining this unorthodox yet interesting effect.

      Strengths:

      The study establishes the initial finding with a correlational approach that informs a causal study. They find convincingly that Pavlovian conditioning induces an increase in inhibitory inputs onto Vglut1BLA→NAc neurons that leads to reduced excitability. Causality is studied using a powerful dual recombinase chemogenetic strategy to selectively inhibit this population of Vglut1BLA→NAc neurons and determine the effect on different behavioral tasks. The use of different tasks provides convergence on their effect. This surprising finding provokes interest and will stimulate further investigation into the mechanisms underlying these effects.

      Weaknesses:

      Several important aspects of the evidence remain incomplete.

      (1) First, an important aspect of the underlying processes at play remains to be investigated. In all behavioral tasks, the authors find that their manipulation increases responding that they interpret as a facilitation of learning. However, none of the appetitive tasks include a control stimulus that could address the specificity of their effect. Given that on the Pavlovian task, responding to the CS is almost 100%, I suspect that their manipulation may induce motoric impulsivity. This aspect would clearly benefit from additional controls.

      (2) Second, I have several interrogations about the time-resolved probability of port entries (PSTHs).

      a) There is a mismatch between the results presented in Figure 1. Panel D shows a peak of responses on the PSTH at ~2s on day 5 (my remark applies to all days), suggesting that the average should lie around this value. However, panel C reports a latency to respond at ~4sec. Could the authors double-check their PSTHs?

      b) More generally, the fact that in the Pavlovian task all PSTHs show a peak at almost exactly 2 sec is quite surprising and raises questions about how they are constructed. Sure, the most salient event is the water drop occurring 2s after cue onset. Yet, if mice responded only to these drops, the peak response should occur at 2s+reaction time, which is not the case. Figure 2 shows that on the first acquisition day, responding is already centered around 2s and does not decrease with learning, except for treated animals.

      (3) Several methodological flaws are present.

      a) The authors need to report clearly the statistics. In most cases, the statistical test used is mentioned in the figure caption with a single P-value. Thus, on two-way ANOVAs, I do not know whether the P-value relates to the interaction, the main effects, or the post-hoc tests.

      b) Another important issue is related to the average time-resolved z-score probability of port entries. The bin size used, the smoothing (that is much too strong), and the baseline period used to calculate the z-score are absent from the methods.

      (4) This study reports that manipulating 70% of the glutamatergic projection to the NAc induces an effect opposed to what has been previously reported in many different studies. Such a surprising finding deserves a more elaborate discussion about the mechanism that could be at play.

    1. eLife Assessment

      In this study, the authors investigated how inference about the current task context, by weighting evidence based on surprise and uncertainty in the environment, is encoded in the cortex. Using MEG imaging and an impressive amount of analytic work based on normative decision modeling, they provided solid evidence for the involvement of the visual and parietal cortex. These results are a valuable complement to and extension of a previous study using fMRI measurements, by identifying the candidate regions that are of importance for the inference process, not just for encoding the end product.

    2. Reviewer #1 (Public review):

      This paper presents another excellent, sophisticated analysis from this group of brain-wide neural activity correlated with the tracking of belief about the generative state of a stochastic visual environment under volatile conditions. Whereas previous work focussed on the normative belief-updating dynamics mainly in brain areas related to motor planning, under conditions where the environmental state translates directly to a correct action, here, they abstract the belief-updating DV from a specific action by instead associating the environmental state to a stimulus-response mapping rule, to be used in a simple perceptual decision coming up after the environmental state cues. A decoding analysis shows that a remarkably large portion of the brain has activity correlated with the normatively evolving belief about environmental state and the evidence samples feeding into that belief. What the authors were trying to achieve, however, seems far more general than the above, namely, to study "the algorithmic and neural basis of higher-order internal decisions about behavioural context, formed under multiple sources of uncertainty", and I think that the loose implication of such grand notions (such phrasing brings to mind someone's choice to believe in God, to regulate their behaviour depending on whether they are on a rugby pitch or at church, etc, not how grating orientations link to left/right hand movements) muddies the value of the study. The authors thus may have overestimated the generality of the findings. I hope my impressions are a useful guide to focus the interpretations more.

      Strengths:

      One of the main strengths of the study is that it is a technical tour de force. As reflected in an unusually extensive methods section, the authors put an extraordinary amount of work into rigorous data collection and analysis, and all of it is described in excellent detail. The study also builds in a very valuable way on previous landmark studies on tracking of volatile environmental state linked to correct actions using MEG (Murphy et al 2021) and tracking of volatile stimulus-response mappings using fMRI (van den Brink et al 2023). Here, the environmental state is not directly linked to actions during the cues informing about the state, but instead linked to a stimulus-response mapping rule.

      Weaknesses:

      It is surprising, given this main innovation of abstracting the decision about visual position-distribution from particular actions, that the authors do not engage with the literature using EEG and fMRI to study such 'abstract,' 'motor-independent' or 'domain-general' (synonymous terms) decisions. The discussion, for example, mentions the curious lack of involvement of the frontal cortex, and the possibility of intermingled opposites being represented there; motor-independent EEG decision signals have been characterised by regressing against the absolute value of the differential belief-updating process for this very reason (e.g., see Pares-Pujolras et al 2025). Single-unit studies like Bennur & Gold (2011) have also found activity related to a decision about environmental state (non-volatile motion) even when that state does not yet translate directly to an action, and, like the current study, is instead specified in a later frame of the trial.

      Another weakness, as mentioned above, is that of overgeneralisation. It is not clear how "higher-order, internal decisions" are generally defined, and terms more concretely grounded in the paradigm at hand (as in van den Brink et al (2023)), e.g., 'tracking of environmental state dictating a sensory-motor mapping rule,' would seem more useful. Since this task tracks a belief about a sensory feature and how it maps to motor actions, it may not be as surprising a revelation that a range of sensorimotor areas correlate with it, as compared to more general, truly internal decisions about behavioural context involving no sensory input (e.g., deciding one has become hungry). Similarly, the authors paint the belief-tracking process of Murphy et al (2021) as "lower-order" and the current one as higher-order, but both cases are the same in that a hidden binary generative state needs to be inferred on a continual basis from a series of discrete spatial positions presented visually. The only difference is that in the current case, the belief about the current binary state is not transformed directly into an immediate action choice but rather utilised to map a follow-up stimulus to its appropriate action. These decisions then happen one after the other in sequence, with a contingency, but I'm not sure this constitutes a 'high-level' and 'low-level' in the way implied by the authors.

      The paper left me confused on the question of what these widespread decoding effects reflect - whether all areas directly compute and represent the normative DV in concert, or whether at least some areas reflect other processes that may correlate with the DV. Although the discussion mentions things like feedback modulation in V1, which seems to allow for the possibility that it is not directly involved in DV computation, the phrasing used ('encoding' and 'representation' and never 'secondary modulation') from Abstract to Results tends to imply direct involvement.

      Related to this, it seems that the extensive model comparison was done for behaviour, but not for the activation in each area, which may have suggested some dissociations in role - for example, for areas that showed decoding of the evidence (LLR), at least some of them may more closely correspond to the related lower-level quantity of simply spatial position itself, or the higher-level quantity of the transformed belief update (the change in prior from before to after the current cue). There is a map of areas that correlate with the difference of new vs old prior (if I understand correctly - Figure 4D), but not of areas for which activity conforms better to this belief update than to the objective LLR or location. Aside from such model-defined quantities, a critical factor is spatial attention. The authors highlight that the correlated activation of visual regions may reflect feedback modulations akin to attention in nature, but it might actually reflect attention itself, since it is plausible that subjects would pay more attention to the upper field when it is more likely that the centre of the generative distribution is up there (i.e., belief leans upwards). It seems the data could provide insight into this: If the visual cortical effects reflect a spatial attention modulation towards the likely generative source (upper/lower), then the relationship with prior, coded so that upper and lower have opposite sign, should flip in ventral versus dorsal visual cortex. Figure 4A seems like it could be positioned to answer this, but I can't fully interpret it because the prior coding is not explicit in the methods - the relevant section (lines 989-1001) refers back to the normative model description (without pointing to specific equations), which does not say what states S1 and S2 mean (upper and lower? Correct and incorrect? The former is needed to test for this spatial-specificity expected of attention). Even if there are reasons not to perform extra analyses related to the above, the impressions could guide edits to clarify what the data can and cannot say about what these DV-decoding effects reflect. Finally, it could be acknowledged that because the environmental state (upper or lower field generative source) is directly linked to stimulus-response mapping, even decoding effects that are not spatially-specific could equally reflect a representation of either one of these.

      The motivation for the decoding analysis running up to the response is not clear - what are the hypotheses here? Is the idea that if these areas truly represented the belief about the currently active context, then they should continue to do so during the response and beyond, since the next trial will begin in the same context as the previous ended? Or is this section tackling a different question? Is it that there is a potential confound in finding the significant decoding during the cue tokens, because it could be driven by the visual responses to the different spatial positions, and there are no such visual responses later at the response?

    3. Reviewer #2 (Public review):

      Summary:

      Calder-Travis et al. investigate how people form decisions about abstract rules in environments that may change over time. They show that individuals adaptively accumulate information, adjusting how much weight they give new evidence depending on how surprising or uncertain the environment is. Using whole-brain recordings (MEG), they further report that signals reflecting beliefs about the current rule are broadly distributed, particularly in visual and parietal regions. They further argue that these belief-related signals cannot be reduced to representations of momentary sensory evidence alone.

      Overall, the behavioral results convincingly demonstrate adaptive evidence accumulation consistent with the normative model. The neural data provide solid evidence for temporally structured belief-related signals that are broadly distributed across cortical regions. However, the evidence for sustained belief maintenance "across" cues and for full dissociation from gaze-related influences in visual cortex is less definitive. These issues temper, but do not undermine, the central conclusions.

      Strengths:

      A major strength of the study is the integration of normative modeling with temporally resolved neural data. The authors exploit the fine temporal scale of the recordings to examine belief updating across distinct task epochs, and they show that neural signals evolve in a manner consistent with the normative model that best captures behavior. This alignment between behavioral modeling and neural dynamics is carefully executed and conceptually coherent.<br /> Another strength is the authors' cautious interpretation of their findings. They explicitly acknowledge limitations in distinguishing between direct representation of a latent variable and neural modulation driven by that variable. This restraint strengthens the credibility of the conclusions and avoids overstatement.

      Weaknesses:

      (1) Evidence for sustained belief representation across cues

      Behaviorally, the data clearly demonstrate accumulation across sequential cues. However, the neural analyses primarily focus on responses around individual samples (from pre-cue to late post-cue windows). While these analyses demonstrate belief updating following each sample, they do not fully establish whether belief representations are maintained continuously across cues.

      Specifically, it remains unclear whether the neural representation of the prior belief is sustained from the late post-cue period of cue t-1 into the pre-cue period of cue t. Without explicit evidence of such continuity, it is difficult to conclude that the neural signals reflect a maintained belief state rather than repeated sample-locked updating processes. This distinction is important for interpreting the neural mechanism of accumulation.

      (2) Interpretation of belief signals in the visual cortex

      The claim that belief-related signals in the visual cortex cannot be explained by gaze position requires stronger support. The distribution of gaze positions across contexts appears largely non-overlapping, raising the possibility that context-related gaze biases could contribute to the observed neural effects.

      In particular, the "gaze-inconsistent" analysis based on a median split may not fully dissociate belief from gaze if the absolute gaze positions remain systematically different between contexts. As currently presented, the evidence does not fully rule out the possibility that gaze-related modulation contributes to the belief-related signal in visual areas. This affects the strength of the interpretation regarding abstract belief representation in early sensory cortex.

      (3) Clarity and transparency of task and model description

      Several aspects of the task and modeling framework would benefit from clearer exposition. The description of the noise distribution in the context cue would be easier to interpret if the overlapping distributions were visualized explicitly, allowing readers to assess how much accumulation is required versus reliance on strong individual cues. Similarly, the main text would benefit from a clearer explanation of how change point probability and uncertainty are computed (not just in Methods), as these quantities are central to the analyses and interpretation.

      In addition, temporal epochs (e.g., pre-cue, early post-cue, late post-cue) are not clearly defined with specific time ranges in the main text, making it difficult to compare across figures.

      (4) Interpretation of neural dynamics

      Several neural findings are intriguing but underinterpreted. For example, the absence of clear sensory evidence representation in early post-cue epochs in any regions (Figure 4B) is surprising and not discussed. The relative stability of belief-related signals in visual cortex compared to parietal regions (Figure 4E) is also unexpected and warrants interpretation. Additionally, the temporal dynamics of change point probability and uncertainty representations appear different from each other, but such a pattern was not described in detail.

      Clarifying these points would strengthen the interpretability of the results and help readers understand the mechanistic implications.

    4. Reviewer #3 (Public review):

      Summary

      In this study, the authors investigated how inference about the current task context is encoded in the cortex, using MEG measurements. Using the same behavioral task that was initially developed for an fMRI study to identify the loci of task context representation, the current results complement and extend the previous study by identifying the candidate regions that are important for the inference process, not just for encoding the end product. They reported widespread modulation of cortical activity by uncertainty in evidence and volatility of task context changes. In comparison, modulation correlated with the decision variable underlying the task context inference process was more restricted to the parietal and visual cortices, particularly in alpha-band activity.

      Strengths:

      (1) The normative model provides a solid computational foundation for disambiguating quantities related to decision variables from those related to task factors (e.g., uncertainty and volatility).

      (2) The MEG technique allows examination of cortical activity that is modulated by the temporally evolving decision variable.

      (3) Rigorous modeling efforts, including comparisons of well-reasoned alternative/reduced models and examinations of diagnostic features using participant-matched simulations.

      Weaknesses:

      (1) There are two major surprises in the results that raise concerns about how to interpret these data. The first is the absence of modulation of prefrontal cortical activities by prior or posterior. As the authors acknowledged, there are extensive single-neuron recording data (e.g., from the Miller group) demonstrating the presence of task rule modulation in the monkey PFC and prior representation in the PFC in the mouse study that they cited. The second surprise is that the strongest modulation of prior/posterior/evidence was almost always observed in the visual cortex, in contrast to the common embodied cognition assumption. A more elaborated discussion about these discrepancies would help contextualize the current results.

      (2) It is not clear why the effects in Figures 2D and E dipped before responses, which is not expected from any of the models. This could potentially affect the interpretation of the MEG signals in late-post-cue or pre-response periods.

      (3) The definitions of the different periods (e.g., early/late post-cue) are vague, making it hard to assess the functional relevance of the signals. For example, is the difference between the early pre-response map in Figure 5B and the late evidence map in Figure 4B due to completely non-overlapping time periods? A diagram of the timing definitions for different task periods would be helpful.

      (4) Perhaps related to #2, it is puzzling that evidence encoding is absent in the visual cortex during the early post-cue period.

      (5) The presentation and discussion of results related to correlated variability assume that the readers have already read their previous paper. A little more elaboration of the significance of this measurement would be helpful.

    1. eLife Assessment

      This important study links blood-derived dietary content to sustained increases in sleep in the mosquito Aedes aegypti. Using multiple independent approaches, the authors provide convincing evidence for blood-induced changes in sleep. These findings have broad implications for understanding how specialized diets regulate sleep across species and for mosquito vector biology.

    2. Reviewer #1 (Public review):

      Summary:

      The presented investigation aims to expand the sleep definition and its relationship with blood meal and/or circadian clock in the mosquito, Aedes aegypti. The authors exhausted the established sleep analytical paradigm and three behaviour toolkits: LAM10, EthoVision, and DART. They also investigated the potential underlying molecular mechanism by using dsRNA injection (LkR) and a KO mosquito (Cyc-/-).

      Strengths:

      The authors presented a very solid dataset showing posture changes and an increase in the arousal threshold of the mosquito after 10 minutes of immobility. This is a major clarification and extension to our understanding of insect sleep beyond Drosophila. Inclusion of analytical parameters such as bout length, waking activity and pDoze/Wake provide critical reminder for other investigators of the steps needed for defining sleep in a new species. The investigation, with its technical span in behaviour assays, therefore establishes a good standard for mosquito sleep analysis to the same quality seen in the landmark studies (Shaw et al 2000 and Hendricks et al 2000) for Drosophila sleep. The pioneering data showing a clear effect of blood meal and LkR reduction on locomotion and sleep provides an entry point for further investigations.

      Weaknesses:

      Despite the versatility of the behaviour and transgenic methods in this manuscript, there are two logical gaps in the conclusion, which are related to the effect of blood meal/BSA/LkR KD on A. aegypti sleep:<br /> (1) Conventionally, a coincidence of sleep increase and locomotion reduction would weaken the certainty of a sleep increase assessment. The authors implied this concurrence observed after blood meal is derived from internal "drowsy" neural state instead of physical "cripple", but they did not use their two high-resolution video tracking velocity or pDoze/Wake to clarify this.<br /> (2) The major molecular component underlying blood meal effect on sleep/locomotion is less certain, because the BSA solution used for feeding contains ATP, which itself is able to enter haemolymph and potentially exerts sleep/locomotion effect. Additionally, the basal or control sleep recording is done after sucrose feeding. It is, however, unclear from the method if this is 10% too? And if the observed sleep level increase after a blood meal is a result of sugar level reduction in the blood (~0.1%).

    3. Reviewer #2 (Public review):

      Zhang et al. investigate how blood feeding and dietary protein influence sleep in the mosquito Aedes aegypti. The authors first establish a behavioural definition of sleep using postural analysis and arousal threshold measurements, then demonstrate that both blood meals and a bovine serum albumin (BSA)-based protein diet increase sleep for several days. They further show that RNAi-mediated knockdown of the leucokinin receptor (Lkr) enhances sleep, implicating neuropeptide signalling in the regulation of postprandial sleep. The authors propose that elevated sleep persists well beyond the restoration of host-seeking behaviour, suggesting the existence of distinct "opportunistic" versus "determined" host-seeking phases.

      Strengths

      The central question is well-motivated, and the experimental approach is systematic. The use of multiple independent methods to characterise sleep - postural analysis, infrared activity monitoring, videography, and arousal threshold - provides converging evidence. The BSA feeding experiment is a particularly effective demonstration that dietary protein, rather than other blood components, is the key regulator of the sleep increase. The conservation of leucokinin signalling in sleep regulation between Drosophila and Ae. aegypti is a noteworthy finding that adds comparative depth.

      Weaknesses

      (1) Sleep definition.

      The authors settle on a 10-minute immobility threshold, but their own data do not convincingly support this choice. The arousal threshold data (Figure 1G) show no significant difference between the 1-5 min and 6-10 min bins (P=0.246), with significance emerging only at the 11-15 min bin. The postural analysis likewise indicates that sleep-associated postures appear at ~20 min during the day and ~11 min at night. A 15-minute threshold would be better supported by the data as presented. The previous literature used 120 minutes for this species (Ajayi et al. 2022), making this a dramatic shift.

      (2) Confound of reproduction and sleep.

      The primary experimental paradigm measures sleep beginning at Day 4 post-blood feeding, immediately after oviposition. Animals have undergone gut distension, vitellogenesis, and oviposition, and what is being measured as "sleep" could reflect post-reproductive quiescence or recovery rather than diet-induced sleep per se. The BSA experiment partially addresses this, but since BSA also triggers vitellogenesis and egg production (as the authors note), the confound persists.

      (3) Opportunistic vs. determined host-seeking hypothesis.

      This framework is presented as a key conceptual contribution, but the paper contains no data on host-seeking behaviour. The authors infer two phases from the temporal mismatch between a 72-hour host-seeking suppression window (from prior studies) and elevated sleep through Day 5 (~120 hours). While this is an interesting hypothesis, it requires actual measurement of host-seeking alongside sleep to be substantiated, or at least the caveats need to be discussed more explicitly.

      (4) Statistical approach.

      The methods describe "one-way ANOVA, followed by Mann-Whitney tests with Welch's correction," which is an internally inconsistent combination: Mann-Whitney is non-parametric and does not use Welch's correction (which applies to t-tests). Throughout the figures, F-statistics (parametric) are reported alongside what appear to be non-parametric tests. The statistical framework needs to be clarified and made consistent. Exact sample sizes per group should also be stated explicitly in the methods for all experiments.

    1. eLife Assessment

      This manuscript reports a potentially valuable modeling study on sequence generation in the hippocampus in a variety of behavioral contexts. While the scope of the model is ambitious, its presentation is incomplete, and there remains some lack of clarity on the methodology and interpretation. The work will interest the broad community of researchers studying cortical-hippocampal interactions and sequences.

    2. Reviewer #2 (Public review):

      Summary:

      Ito and Toyoizumi present a computational model of context-dependent action selection. They propose a "hippocampus" network that learns sequences based on which the agent chooses actions. The hippocampus network receives both stimulus and context information from an attractor network that learns new contexts based on experience. The model is consistent with a variety of experiments both from the rodent and the human literature such as splitter cells, lap cells, the dependence of sequence expression on behavioral statistics. Moreover, the authors suggest that psychiatric disorders can be interpreted in terms of over/under representation of context information.

      My general assessment of the work is unchanged, and I still have some questions requesting methodological clarification

      Strengths:

      This ambitious work links diverse physiological and behavioral findings into a self-organizing neural network framework. All functional aspects of the network arise from plastic synaptic connections: Sequences, contexts, action selection. The model also nicely links ideas from reinforcement learning to a neuronally interpretable mechanisms, e.g. learning a value function from hippocampal activity.

      Weaknesses:

      The presentation, particularly of the methodological aspects, needs to be heavily improved. Judgment of generality and plausibility of the results is severely hampered but is essential, particularly for the conclusions related to psychiatric disorders. In its present form, it is impossible to judge whether the claims and conclusions made are justified. Also, the lack of clarity strongly reduces the impact of the work on the field.

      Comments:

      The authors have made strong efforts to improve on their description of the methods, however, it is still very hard to understand. As a result of some of their clarifications, new issues appeared that I was not able to extract in the previous version.

      (1) Particularly I had problems figuring out how the individual dynamical systems are interrelated (sequences, attractor, action, learning). As I understand it now (and I still might be wrong) there is one discrete time dynamics, where in each time step one action takes place as well as the attractor and sequence dynamics are moved one step forward. Also, synaptic updates happen in every one of those time steps. The authors may verify or correct my interpretations and further improve on their description in the manuscript. It is also confusing that time in the figure panels is given in units of trials, where each trial may consist of (maybe different amounts of) multiple time steps. Are the thin horizontal red ad blue lines time steps?

      (2) As a consequence of my new understanding of the model dynamics, I have become doubts about the interpretation of the attractor network as context encoding. Since the X population mainly serves to disambiguate sequence continuation, right before the action has to be taken (active for only two time steps in Figure 1C?) they could also be considered to encode task space (El-Gaby et al. 2024; doi: 10.1038/s41586-024-08145-x).

      (3) Also technically, I wonder why the authors introduce the criterion of 50(!) time steps to allow the attractor to converge, if the state of the attractor network is only relevant in one time step to choose the appropriate continuation of the sequence of actions. Is attractor dynamics important at all? What would happen if just the input and output weights to the X population are kept and the recurrent weights are set 0?

      (4) Figure 3E: How many time steps are the H cells active (red bars?) Figure 4J: What are the units of the time axis?

    3. Reviewer #3 (Public review):

      Summary:

      This paper develops a model to account for flexible and context-dependent behaviors, such as where the same input must generate different responses or representations depending on context. The approach is anchored in the hippocampal place cell literature. The model consists of a module X, which represents context, and a module H (hippocampus), which generates "sequences". X is a binary attractor RNN, and H appears to be a discrete binary network, which is called recurrent but seems to operate primarily in a feedforward mode. H has two types of units (those that are directly activated by context, and transition/sequence units). An input from X drives a winner-take-all activation of a single unit H_context unit, which can trigger a sequence in the H_transition units. When a new/unpredicted context arises, a new stable context in X is generated, which in turn can trigger a new sequence in H. The authors use this model to account for some experimental findings, and on a more speculative note, propose to capture key aspects of contextual processing associated with schizophrenia and autism.

      Strengths:

      Context-dependency is an important problem. And for this reason, there are many papers that address context-dependency - some of this work is cited. To the best of my knowledge, the approach of using an attractor network to represent and detect changes in context is novel and potentially valuable.

      Comments on revisions:

      The authors have adequately addressed my concerns. Most importantly, the details of the implementation of the different components of the model are much more clearly described.

    4. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      This is a potentially valuable modeling study on sequence generation in the hippocampus in a variety of behavioral contexts. While the scope of the model is ambitious, its presentation is incomplete and would benefit from substantially more methodological clarity and better biological justification. The work will interest the broad community of researchers studying corticalhippocampal interactions and sequences.

      Thank you very much for your comments. We are very encouraged by your positive feedback. We have revised our manuscript to clarify our model, strengthen its biological justification, and make it more accessible to a broader audience.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The manuscript by Ito and Toyozumi proposes a new model for biologically plausible learning of context-dependent sequence generation, which aims to overcome the predefined contextual time horizon of previous proposals. The model includes two interacting models: an Amari-Hopfield network that infers context based on sensory cues, with new contexts stored whenever sensory predictions (generated by a second hippocampal module) deviate substantially from actual sensory experience, which then leads to hippocampal remapping. The hippocampal predictions themselves are context-dependent and sequential, relying on two functionally distinct neural subpopulations. On top of this state representation, a simple Rescola-Wagner-type rule is used to generate predictions for expected reward and to guide actions. A collection of different Hebbian learning rules at different synaptic subsets of this circuit (some reward-modulated, some purely associative, with occasional additional homeostatic competitive heterosynaptic plasticity) enables this circuit to learn state representations in a set of simple tasks known to elicit context-dependent effects.

      We appreciate it for carefully reading the manuscript and finding the novelty and significance in our work.

      Strengths:

      The idea of developing a circuit-level model of model-based reinforcement learning, even if only for simple scenarios, is definitely of interest to the community. The model is novel and aims to explain a range of context-dependent effects in the remapping of hippocampal activity.

      Weaknesses:

      The link to model-based RL is formally imprecise, and the circuit-level description of the process is too algorithmic (and sometimes discrepant with known properties of hippocampus responses), so the model ends up falling in between in a way that does not fully satisfy either the computational or the biological promise. Some of the problems stem from the lack of detail and biological justification in the writing, but the loose link to biology is likely not fully addressable within the scope of the current results. The attempt at linking poor functioning of the context circuit to disease is particularly tenuous.

      We thank the reviewer for the insightful comments.

      To better characterize our model, we added formal descriptions of each task setting and explicitly specified the sources of uncertainty. We revised the schematic figures in Figure 1 to more clearly illustrate our model. An important revision is that we now distinguish between stimulus prediction error (SPE)–driven remapping and reward prediction error (RPE)–facilitated remapping. SPEdriven remapping is triggered by mismatches between actual sensory stimuli and those predicted from past history and serves to update the current contextual state or to create a new one. In contrast, RPE-facilitated remapping is more likely to occur when executing an action planning sequence associated with recent negative reward prediction errors, possibly due to environmental changes, and promotes exploration of alternative planning sequences.

      “Based on the source of prediction errors, we consider two types of remapping: sensory prediction error (SPE)–driven remapping and reward prediction error (RPE)–facilitated remapping (Figure 1C). SPE-driven remapping is triggered when the mismatch between the predictive inputs from H to X and externally driven sensory inputs exceeds a threshold (see Materials and Methods), causing X to either transition to a different contextual state or form a new one (Figure 1D). RPE-facilitated remapping is more likely to be triggered when the agents execute an action plan following a hippocampal sequence marked by a no-good indicator. The no-good indicator indicates that the action plan, i.e. the hippocampal sequence, has recently been associated with negative reward prediction errors, possibly due to environmental changes (see Materials and Methods). It then facilitates the exploration of alternative hippocampal sequences (Figure 1E).”

      In addition, we added Figure 2C-E to clarify the neural representations of external stimuli and contextual states in the X module, as well as the neural representations within the H module. We also clarified the purpose of each model component and discussed plausible biological implementations to justify our modeling choices. Furthermore, we added a schematic illustration of our results related to psychiatric disorders in Figure 5B and revised the corresponding section of the manuscript to explicitly frame these results as a computational hypothesis. We also expanded the discussion to relate our findings to existing computational psychiatry models (see point-bypoint responses below).

      We believe that these revisions have improved the clarity of our model and broadened its accessibility to a wider audience.

      Reviewer #2 (Public review):

      Summary:

      Ito and Toyoizumi present a computational model of context-dependent action selection. They propose a "hippocampus" network that learns sequences based on which the agent chooses actions. The hippocampus network receives both stimulus and context information from an attractor network that learns new contexts based on experience. The model is consistent with a variety of experiments, both from the rodent and the human literature, such as splitter cells, lap cells, and the dependence of sequence expression on behavioral statistics. Moreover, the authors suggest that psychiatric disorders can be interpreted in terms of over-/under-representation of context information.

      We appreciate it for carefully reading the manuscript and finding the novelty and significance in our work.

      Strengths:

      This ambitious work links diverse physiological and behavioral findings into a self-organizing neural network framework. All functional aspects of the network arise from plastic synaptic connections: Sequences, contexts, and action selection. The model also nicely links ideas from reinforcement learning to neuronally interpretable mechanisms, e.g., learning a value function from hippocampal activity.

      Weaknesses:

      The presentation, particularly of the methodological aspects, needs to be majorly improved. Judgment of generality and plausibility of the results is hampered, but is essential, particularly for the conclusions related to psychiatric disorders. In its present form, it is unclear whether the claims and conclusions made are justified. Also, the lack of clarity strongly reduces the impact of the work in the larger field.

      We appreciate the reviewer’s valuable feedback. In the revised manuscript, we have improved the presentation of the methodological aspects by providing a more intuitive and general explanation of the model framework and training procedure. We also rewrote the section on psychiatric implications to more clearly explain how dysfunction in contextual inference occurs in our model. These revisions enhance both the clarity and plausibility of our conclusions.

      More specifically:

      (1) The methods section is impenetrable. The specific adaptations of the model to the individual use cases of the model, as well as the posthoc analyses of the simulations, did not become clear. Important concepts are only defined in passing and used before they are introduced. The authors may consider a more rigorous mathematical reporting style. They also may consider making the methods part self-contained and moving it in front of the results part.

      Thank you for raising the important point.

      To improve readability, we have updated Figure 1 to more clearly illustrate the main model structure and its adaptation to individual use cases. Additionally, we have moved the previous Figure 6 (now Figure S1) to an earlier point in the Results to facilitate understanding of the methodological flow. Method section is also revised to explain the algorithmic structure indicated in Figure S1. These revisions make the methods more self-contained and easier to follow.

      In the revised manuscript, we have clarified that our model is qualitatively related to the Bayesadaptive reinforcement learning framework (Guez et al., 2013) as follows.

      “In the framework of reinforcement learning, our model can be mapped onto a Bayesian-adaptive model-based architecture in which contextual state serves as the root of Monte Carlo tree search (Guez et al., 2013) in a simple, largely stable environment with noiseless and unambiguous sensory stimuli, and only occasional abrupt changes. In this setup, prediction errors arise from agent’s lack of experience or due to abrupt environmental changes. Once a context selector X infer the hidden state, the sequence composer H generates episodic sequences that correspond to trajectories in a search tree, each branch representing possible action–outcome sequences. Just as Monte Carlo tree search explores potential future paths to evaluate expected rewards, H produces hippocampal sequences that simulate future states and rewards based on its learned connectivity. In this way, X defines the context that anchors the root of the tree, while H expands the tree through replay or planning, thereby our model provides a simplified algorithmic implementation model-based reinforcement learning via tree search planning.”

      (2) The description of results in the main text remains on a very abstract level. The authors may consider showing more simulated neural activity. It remains vague how the different stimuli and contexts are represented in the network. Particularly, the simulations and related statistical analyses underlying the paradigms in Figure 4 are incompletely described.

      Thank you for pointing this out.

      In the revised manuscript, we have added explicit examples of simulated neural activity. Specifically, we added new figures in Figure 2C–E and showed representative activity patterns from both Context selector (X) and Sequence composer (H). We also clarified the distinction between activity in the stimulus domain (externally driven) and the context domain (internally inferred states)

      “Figure 2C illustrates an example of both the environmental state transition and the corresponding contextual state transition of an agent. The neural activity of X at each contextual state is shown in Figure 2D, where the environmental states … are represented in the stimulus domain and the contextual states … are represented in the context domain. … In the example transition shown in Figure 2C, the agent selected an environmental state transition from S2 to S4 in the 2nd, 5th, and 8th trials, which corresponds to a contextual state transition from X2β to X4β in the X module. However, because this transition was not rewarded, no synaptic potentiation occurred among hippocampal neurons. Subsequently, in the 11th trial, the agent attempted an environmental state transition from S2 to S5, corresponding to the transition from X2β to X5β in the contextual states.

      The agent received a reward at S5, and the corresponding hippocampal sequence was strengthened, enabling the agent to acquire the alternation task in the following trials (Figure 2E).”

      (see point-by-point responses below).

      We also added a detailed explanation of our results in Figure 4 as follows.

      “We consider a simplified environment of a probabilistic cueing paradigm (Ekman et al., 2022). In this study, two auditory contextual cues probabilistically predicted distinct visual motion sequences, and fMRI decoding was used to examine the frequency of hippocampal replay. We simplified this task as shown in Figure 4A. ”

      “... This result replicates Ekman et al. (2022), who showed that the probability of the contextual cues is reflected in the statistically significant differences in hippocampal replay probability in humans (Figure 4F).”

      “F, Our model behavior is similar to the human fMRI result of the cue-probability-dependent hippocampal replay (Ekman et al., 2022). Paired sample t-test. **P<0.01.”

      We believe that these revisions make the model description and simulation results more concrete and easier to interpret.

      (3) The literature review can be improved (laid out in the specific recommendations).

      Thank you for pointing this out. We revised the literature review to the best of our ability.

      (4) Given the large range of experimental phenomenology addressed by the manuscript, it would be helpful to add a Discussion paragraph on how much the results from mice and humans can be integrated, particularly regarding the nature of the context selection network.

      Thank you for your suggestion.

      In the revised manuscript, we added a new paragraph in the Discussion explicitly addressing how results from mice and humans can be integrated.

      “Our model is a functionally modular account of the cortical regions and hippocampus, enabling it to capture experimental findings across species. While hippocampal activity in rodents has been extensively characterized in terms of spatial coding, human hippocampal representations are more often non-spatial and episodic-like (Bellmund et al., 2018; Eichenbaum, 2017). For episodic memory to support flexible behavior, it would be beneficial to retrieve each episode in a contextdependent manner. The episodic contents may vary across species and individuals, yet the fundamental computations—estimating the current context from external stimuli and their history, and flexibly updating this estimate via prediction errors—are likely conserved. Holding context information until the contextual prediction error is detected is analogous to the belief state in model-based reinforcement learning, which is known to improve performance under partially observable conditions (POMDPs) (Kaelbling et al., 1998). Our model provides a simple algorithmic implementation of this principle.”

      (5) As a minor point, the hippocampus is pretty much treated as a premotor network. Also, a Discussion paragraph would be helpful.

      Thank you for pointing this out.

      We define action as a transition from one environmental state to another, and transition-coding hippocampal neurons are used for action-planning. Because our model does not incorporate errors in transitions (actions), the generated hippocampal sequences are perfectly correlated with the executed transitions (actions). However, we acknowledge that computations in the brain are more complex, with contributions from other regions such as the premotor network and the basal ganglia. To clarify this, we added formal representations of state transitions (action) in each task and the following sentences to the manuscript.

      “In Sequence composer, there exist two types of neurons: state-coding neurons, which represent each contextual state, and transition-coding neurons, which encode transitions to successive contextual states given the contextual state indicated by the state-coding neurons (Materials and Methods). Note that in the real brain, not only hippocampus but also the premotor cortex and the basal ganglia contribute to action planning and execution (Hikosaka et al., 2002). Here, however, we focus on how simplified planning sequences are learned and composed in a context-dependent manner.”

      “Our model posits that the Sequence Composer corresponds to computations within the hippocampus. As a biologically plausible projection, we consider CA3–CA1 circuit, where contextual inputs from regions such as the PFC and EC provide the current contextual state to CA3, enabling the recurrent CA3–CA1 architecture to generate predictions of the next contextual state without errors in action.”

      Reviewer #3 (Public review):

      Summary:

      This paper develops a model to account for flexible and context-dependent behaviors, such as where the same input must generate different responses or representations depending on context. The approach is anchored in the hippocampal place cell literature. The model consists of a module X, which represents context, and a module H (hippocampus), which generates "sequences". X is a binary attractor RNN, and H appears to be a discrete binary network, which is called recurrent but seems to operate primarily in a feedforward mode. H has two types of units (those that are directly activated by context, and transition/sequence units). An input from X drives a winner-take-all activation of a single unit H_context unit, which can trigger a sequence in the H_transition units. When a new/unpredicted context arises, a new stable context in X is generated, which in turn can trigger a new sequence in H. The authors use this model to account for some experimental findings, and on a more speculative note, propose to capture key aspects of contextual processing associated with schizophrenia and autism.

      We thank the reviewer for this summary of our model.

      We would like to clarify that the hippocampal Sequence composer (H) is a recurrent network that iteratively composes the next state and the associated sensory stimuli in the sequence based on the current contextual state.

      Strengths:

      Context-dependency is an important problem. And for this reason, there are many papers that address context-dependency - some of this work is cited. To the best of my knowledge, the approach of using an attractor network to represent and detect changes in context is novel and potentially valuable.

      Weaknesses:

      The paper would be stronger, however, if it were implemented in a more biologically plausible manner - e.g., in continuous rather than discrete time. Additionally, not enough information is provided to properly evaluate the paper, and most of the time, the network is treated as a black box, and we are not shown how the computations are actually being performed.

      We thank the reviewer for suggesting an important direction for future work. The goal of this research is to develop a minimal, functionally modular neural circuit model that provides general insights into how context-dependent behavior can be realized across species, including humans. To simplify our model, we only considered discrete-time environmental states, where the exact length of the time step depends on each environment. Extending the model to a more biologically plausible, continuous-time framework is a promising direction for future work, such as using continuous-time modern Hopfield networks and synfire chains. We modified the Discussion section to clearly point out this direction.

      “... the resolution at which our model should distinguish different contextual states, including the stimulus resolution and time resolution, is hand-tuned in this work. While we used an abstract, gridlike state space with discrete time, an important direction for future work is to model its activity at finer-grained neural timescales, … In realistic, continuously changing environments, such resolutions should be adjusted autonomously. Introducing continuous and hierarchical representations with multiple levels of spatial and temporal resolution would facilitate such adjustments, potentially through mechanisms such as modern Hopfield networks (Kurotov and Hopfield, 2020) or synfire-chain–based hippocampal sequence generation (Abeles, 1982; Diesmann et al., 1999; Shimizu and Toyoizumi, 2025; Toyoizumi, 2012), but this is beyond the focus of the current study”

      Also, we would like to emphasize that our model is not treated as a black box. To improve the understandability, we have majorly revised Figures 1 and 2 to include additional details illustrating the neural activity and the internal computational mechanisms.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Major comments and suggestions for improvement:

      (1) Formal link to model based RL is unclear: a core feature of inference is the role of uncertainty in modulating computation and corresponding circuit dynamics, in particular defining expected and unexpected degree of errors; as far as I understand the degree of tolerable errors within a context is defined by the size of the basin of attraction of the context module (which is dependent on number of items and the structure of correlations across patterns) and in no obvious way affected by sensory uncertainty (unless the inputs from H serve that purpose in a more indirect way). Similarly, most experiments are deemed to have deterministic (unambiguous) maps between sensory inputs and world state (although how the agent's state relates to environmental state is more complex and not completely clear based on the existing text).

      Thank you for raising this important point. Our model bears conceptual similarities to model-based RL frameworks, for example, the optimal-inference formulation that underlies Monte Carlo Tree Search (Guez et al., 2013), as we now clarify in the revised manuscript. These similarities, however, are qualitative rather than quantitative. In particular, the error thresholds that separate expected from unexpected outcomes are manually specified in our model, but their exact values do not appreciably influence the simulation results.

      Concretely, the heuristic threshold for SPE-driven remapping (𝜃<sub>𝑟𝑒𝑚𝑎𝑝</sub>) is set to 5 bits, allowing for small miss-convergence during recall in the Amari–Hopfield model. For RPE-facilitated remapping, the threshold is set to 𝜃<sub>𝑁𝐺</sub> = 0.7, making the agent sufficiently sensitive to abrupt environmental changes and enabling it to explore some candidate contexts after RPE-facilitated remapping. This simple thresholding scheme is adequate for our largely deterministic simulation setting, where contextual switches are rare and occur abruptly in an otherwise stable and unambiguous environment.

      Importantly, our goal in this work was not to achieve Bayesian optimality. Mice and likely humans in certain settings often deviate from optimal inference. Instead, we focus on the qualitative remapping-related processes that support goal-directed planning following epistemic errors. We have clarified this scope in the revised manuscript.

      “In the framework of reinforcement learning, our model can be mapped onto a Bayesian-adaptive model-based architecture in which contextual state serves as the root of Monte Carlo tree search (Guez et al., 2013) in a simple, largely stable environment with noiseless and unambiguous sensory stimuli, and only occasional abrupt changes. In this setup, prediction errors arise from the agent’s lack of experience or due to abrupt environmental changes. … However, these conceptual similarities are qualitative rather than quantitative. The goal of this work is not to achieve Bayesian optimality, but rather to show qualitative remapping-related processes that support goal-directed planning following epistemic errors.”

      “Note that we set the remapping threshold 𝜃<sub>𝑟𝑒𝑚𝑎𝑝</sub> = 5 bits to allow for small miss-convergence during recall in the Amari–Hopfield model.”

      “Note that we set 𝜃<sub>𝑁𝐺</sub> as 0.7 to make the agents sufficiently sensitive to abrupt environmental changes and enable exploring some candidate contexts after RPE-facilitated remapping.”

      (2) Improvement: start describing each task specification in explicit model-based RL terms, then explain how the environmental specification translates into agent operations. Be explicit about what about the process is inferential, in particular, sources of uncertainty.

      Thank you for this important suggestion. Following your recommendation, we revised the manuscript to describe each task explicitly in model-based RL terms. For each task, we now identify the relevant sources of uncertainty, which arise either from imperfections in the agent’s internal model of the environment or from occasional abrupt switches in task rules. We also explain how the agent infers the hidden state from experience to construct an appropriate context representation, enabling the model to perform the task successfully.

      (3) A lot of seemingly arbitrary model choices need additional computational and biological justification; the description of the process is fundamentally an algorithmic one, which includes a lot of if-then type of operations: the dynamics of different elements of the circuit switch between "initialization to landmark/other", "error detected/not", different forms of plasticity on/off etc and it is not discussed in way how this kind of global coordination of different processes is supposed to be orchestrated biologically; e.g. as far as I understand the sequential structure in H activity is largely hardcoded rather than an emergent property of the learning+neural dynamics.

      Thank you for this important suggestion. We have made a concerted effort to clearly describe the biological context and the relevant literature motivating each of our algorithmic assumptions. Notably, as highlighted in Fig. 1F, we emphasize that the sequential structure in H activity emerges as a consequence of the agent’s exploration and learning. We also explain how the two remapping mechanisms concatenate sequence segments to support long-term planning and to predict both stimuli and rewards.

      About Fig. 1F

      “At the beginning of learning, hippocampal segments are not connected, and H yields only short sequences that generate immediate actions and short-term predictions. As learning continues, the three-factor Hebbian plasticity rule concatenates these segments, thereby creating longer sequences that reflect the task structure (Figure 1F).”

      About “initialization to landmark/other,”

      “While the history-based initialization was introduced to select contextual state based on the history input from H (episodic), the landmark-based initialization was introduced to terminate the episodes that would otherwise continue indefinitely. Biologically, the landmark-based initialization corresponds to the operation of anchoring a contextual state to salient environmental landmarks - such as an animal’s nest - that serve as clear reference points.”

      About “error detected/not,”

      “Based on the source of prediction errors, we consider two types of remapping: sensory prediction error (SPE)-driven remapping and reward prediction error (RPE)-facilitated remapping (Figure 1C). SPE-driven remapping is triggered when the mismatch between the predictive inputs from H to X and externally driven sensory inputs exceeds a threshold (see Materials and Methods), causing X to either transition to a different contextual state or form a new one (Figure 1D). RPE-facilitated remapping is more likely to be triggered when the agents execute an action plan following a hippocampal sequence marked by a no-good indicator. The no-good indicator indicates that the action plan, i.e. the hippocampal sequence, has recently been associated with negative reward prediction errors, possibly due to environmental changes (see Materials and Methods). It then facilitates the exploration of alternative hippocampal sequences (Figure 1E). ”

      About “different forms of plasticity on/off”

      “We used different learning rules for the intra-hippocampal synaptic weights depending on withinepisodic and between-episodic segments.”

      “Within-episodic connections, i.e., state-coding to transition-coding synapses, are constantly updated in a reward-independent manner … This modeling is inspired by behavioral time scale plasticity in the hippocampus (Bittner et al., 2017), in which synaptic potentiation occurs for events that are close in time regardless of reward, and such plasticity is believed to support the formation of place cells, etc..”

      “Between-episodic connections, i.e., transition-coding to state-coding synapses, are constantly updated in a reward-dependent manner … This is supported by the finding that dopaminergic neuromodulation gates LTP, enabling preferential consolidation of reward-associated experiences (Lisman and Grace, 2005; Takeuchi et al., 2016).”

      (4) Improvement: Justify individual design choices by biology whenever possible; in the absence of such justification, provide at least a computational rationale for each such model choice. Additional justification for the neural substrate of different prediction errors.

      Thank you for pointing this out. Following the advice, we have added the computational objectives behind each algorithmic component in addition to the biological motivations described above. In particular, we have completely updated Fig. 1 to help readers better understand the key remapping mechanisms in our algorithm: SPE-driven and RPE-facilitated remapping.

      About the Amari-Hopfield model

      “We employ the Amari–Hopfield model because it allows multiple contexts to be stably maintained and selected in response to stimuli and can be trained via Hebbian plasticity. We assume that similar computations are carried out in prefrontal and entorhinal cortical circuits in the brain.” “As one possible biological implementation, we consider that Context selection in X as the brainwide evoked potential during which bottom-up information may be integrated with top-down signals to select the current context (Mohanty et al., 2025). In this case, it takes several hundred milliseconds for the contextual states in X to settle (Massimini et al., 2005).”

      About the default matrix

      “This contextual state is set as a default context, ensuring that the X module assigns a unique contextual state to each environmental state. Biologically, one possible interpretation is that this default context corresponds to modality-specific innate representations in prefrontal regions (Manita et al., 2015).”

      About state-coding neurons and transition-coding neurons

      “The state-coding neurons receive input from X and represent the current contextual state, while the transition-coding neurons send output to X and predict the next contextual state after an action ... One possible biological grounding for this functional separation is that entorhinal cortex provide contextual inputs to CA3, and CA3 and CA1 generates predictions of next state through its recurrent architecture (Chen et al., 2024).”

      About the no-good indicator

      “No-good indicator is introduced to transiently suppress previously established sequences that have not been recently rewarded, without devaluing them. This no-good indicator facilitates RPEfacilitated remapping (see RPE-facilitated remapping section) that leads to exploration of different contextual states in X and sequences in H. The no-good indicator is inspired by recent findings in the ventral hippocampus, where dopamine D2-expressing neurons of the ventral subiculum selectively promote exploration under anxiogenic contexts (Godino et al., 2025).”

      (5) In particular, the temporal scale at which processes unfold with reference to behavioral time scale actions is fundamentally unclear: what determines the time scale of a sequential element? What stitches them together? What is the temporal relationship between H and X operations? At what time scale do actions happen in terms of those operating scales? How does this align with what is known about hippocampal dynamics during behavior?

      (6) Improvement: make the time scales of different aspects of the process explicit in the text, potentially with additional graphic support.

      Thank you for the questions and suggestions. In this work, we model the agent’s behavior in an abstract grid-world environment with discrete time steps, as is common in classical RL. At each time step, the agent observes a sensory stimulus, makes a plan, and executes an action based on it. The action induces a state transition in the environment. Accordingly, the model includes a single fundamental timescale: the environmental (behavioral) time step.

      The modeled brain dynamics in both X and H are similarly locked to this environmental clock. As clarified in Fig. 1F, each sequence segment corresponds to one behavioral time step. These segments are then chunked based on reward events, enabling longer-horizon planning and prediction.

      The agent’s cognitive operations at each behavioral time step are summarized in Fig. S1. Briefly, the agent infers the contextual state X from the current stimulus and its stimulus history, generates a sequential action plan H with predictions using chunked sequence segments, and then follows the plan when it is sufficiently promising. In addition, when sensory or reward prediction errors occur, the agent reorganizes the synaptic-weight parameters of the context selector and sequence composer. Once the agent becomes familiar with the environment, H typically generates an extended action sequence along with predictions of future stimuli and the resulting reward. The agent then executes this sequential plan, bypassing step-by-step context estimation by X, until a prediction error triggers remapping.

      The revised manuscript includes the following additions.

      “For simplicity, the environment is defined in discrete time, and agents move through environmental states characterized by distinct external stimuli. The model operation relies on the environmental (behavioral) time step. At each time step, the agents perform contextual state estimation by Context selector and activate a corresponding hippocampal neuron. Then, this hippocampal neuron initiates sequential activity based on hippocampal synaptic connectivity. Each hippocampal sequence represents a planned course of action and is used to predict a series of external stimuli. … The hippocampal sequence from which actions are generated is updated upon a reward. After the action execution, the agents repeat the process by selecting the current contextual state. As the agents become familiar with the environment, hippocampal sequences that enable future predictions to become longer, and contextual state estimation by Context selector becomes less frequent. The algorithmic flow chart of our model is described in Figure S1.”

      (7) As far as I understand it, the existence of splitter cells is directly inherited from the task specification, and to some extent the same can be said about the lap cells; please explain what can be understood from the model simulations that goes beyond what was put into the inputs/reward function for each experiment. Emphasize numerical results that are counterintuitive or where additional predictions about the dynamics come directly from simulating the model but would have been less obvious beforehand.

      The existence of splitter cells in our model is not inherited from the task specification. Instead, it emerges directly from the hippocampal module retaining sensory history (namely, whether the agent approached from the left or right arm), independent of reward structure or other task details. When sensory history is removed from the sequence composer (and, consequently, from the context selector), splitter-cell representations disappear.

      To develop lap-cell representations, immediate sensory history alone is not sufficient. The sequence composer must chunk episodic segments based on rewards to support sufficiently long action plans (i.e., history dependence) that span the multiple laps required by the task. The planning horizon - the length of action sequences - typically increases as animals learn a task. This progressive development of hippocampal sequences and their dependence on reward yields experimentally testable predictions. Notably, as we clarified in Fig. S2, the required sensory history length must also be learned adaptively: if it is too short, the agent cannot solve the task, whereas if it is too long, learning becomes unnecessarily slow.

      In the revised manuscript, we explicitly described the emergent process of splitter cells and lap cells as follows.

      About splitter cells

      “A second contextual state at S2, X2β, was generated through SPE-driven remapping at the second visit of S2 (second trial) due to history mismatch… In our model, the transition-coding neurons exhibit right/left turn-specific firing at S2 after learning is complete (Figure 2E, I), replicating the emergence of splitter cells.”

      About lap cells

      “the task environment changes again and the agents are rewarded for two laps, …. Either the shortest transition, ..., or the one-lap transition, …, is no longer rewarded, which triggers another RPE-facilitated remapping and exploration. During exploration, a history mismatch occurs …, and the contextual states for the second lap … are generated. Finally, the rewarded transition of contextual states and corresponding sequence… is reinforced (Figure 3B).”

      “This task can also be solved by simply preparing temporal contexts with three steps of sensory history (n=3), which is the minimal number to solve this task. (see Materials and Methods for Model-free learning). However, it takes much longer to find the correct transition for solving the 1-lap task than our model because it involves an excessive number of states (Figure S2).”

      “As the agents become familiar with the environment, hippocampal sequences that enable future predictions to become longer, and contextual state estimation by Context selector becomes less frequent.”

      (8) The partitioning of H subpopulation into current input vs predictive subpopulations seems to fundamentally deviate from known CA1 properties like theta phase processing, where the same neurons encode information about recent past, present, and future at different moments in time within a theta cycle. The existence of such populations (especially since they come with distinct plasticity mechanisms and projection patterns) seems like a strong avenue for validating the model experimentally.

      (9) Improvement: biologically justify the two subpopulations, discuss neural signatures of this distinction that could be used to identify such neurons in experiments

      We thank the reviewer for bridging our model with biological circuits.

      First, we would like to clarify that we do not claim that our H module corresponds to CA1 specifically.

      Rather, we assume that within the broader hippocampal loop (EC–DG–CA3–CA1–EC), subpopulations emerge that preferentially encode the current contextual states and the transitions to the next contextual states. This assumption reflects our hypothesis that the hippocampus implements a mechanism for predicting the next context given the current one. Importantly, this functional separation does not contradict known theta-phase coding in which the same neurons can represent past, present, and future information at different phases of the theta cycle.

      As a possible biological grounding, we particularly emphasize the CA3–CA1 projection. Recent studies have shown that CA1 representations exhibit a temporal delay relative to CA3 activity (Chen et al., 2024), suggesting a circuit-level mechanism by which predictions of upcoming contextual states may be computed based on the current context. In this framework, state-coding and transition-coding functions could be assigned to CA3 and CA1, or dynamically expressed through their interactions. Based on our model, we make testable experimental predictions. Specifically, we predict that neural representations in CA3 and CA1 should precede contextual switching in tasks such as alternation or multiple-lap tasks, and that perturbing CA3–CA1 computations would impair task performance.

      Please note, however, that our model does not characterize the sequence composer’s activity at such fine-grained neuronal timescales. Instead, we model the computation it performs in abstract time steps corresponding to the grid states (e.g., while the animal is at a corner of the maze).

      We have added these points to the Discussion to clarify the biological interpretation and to suggest potential experimental validations of the proposed subpopulation distinction as follows.

      “Our model posits that the Sequence composer corresponds to computations within the hippocampus. As a biologically plausible projection, we consider the CA3–CA1 circuit, where contextual inputs from regions such as the PFC and EC provide the current contextual state to CA3, enabling the recurrent CA3–CA1 architecture to generate predictions of the next contextual state. Consistent with this idea, the temporal lag in CA3→CA1 transmission suggests a functional gradient in which CA3 represents present-oriented information while CA1 carries more futureoriented predictions (Chen et al., 2024), and neurons in both CA3 and CA1 exhibit action-driven remapping and encode action-planning signals (Green et al., 2022). Our framework, therefore, predicts that changes in CA3→CA1 population activity precede behavioral switching in contextdependent alternation in Figure 2 or multi-lap tasks in Figure 3, and perturbation of this input will degrade the behavioral performance.”

      “While we used an abstract, grid-like state space with discrete time, an important direction for future work is to model its activity at finer-grained neural timescales, such as theta cycles (Foster and Wilson, 2007; Wikenheiser and Redish, 2015).”

      (10) The flexibility of the new solution in terms of learning contexts with variable temporal horizons seems an important feature of the model, but one poorly demonstrated in the existing numerical experiments. Could more concrete model predictions be generated by designing an experiment targeted specifically for such scenarios?

      Thank you for raising this point.

      As we showed in Figure S2, in environments with variable temporal horizons, our model performs better than model-free learning (Q-learning) that incorporates temporal context.

      To further demonstrate this point, we added a new task in Figures 3G and H, in which the 1-lap task and the 2+ lap task are alternated. Our model exhibits rapid switching between these tasks, regardless of differences in sequence length or temporal horizon. We added the following text.

      “To demonstrate the advantage of our model in a rapidly switching task that requires different history lengths, we show that an agent trained on both the 1-lap and 2-lap tasks can flexibly alternate between them in a reward-dependent manner (Figure 3G), selectively engaging hippocampal sequences of different lengths according to the current task context (Figure 3H). Together, these results illustrate how hippocampal lap-like representations emerge through learning and enable flexible context switching across tasks with distinct temporal demands.”

      In such a scenario, a subjective representation of laps in the hippocampus is the key to solving the task. As we responded to points (8) and (9), neural representations, especially in CA1, are expected to bifurcate between the 1-lap and 2-lap conditions, and this bifurcation would precede and critically govern the animal’s behavior.

      (11) I found figures confusing/uninformative, specifically in making it explicit what is external task structure and what is the agent's internal representation of it; as a result it is not clear what of the results is trivially inherited from the task specification and what is an emergent property of the model; e.g. Figure 2A described external transition specification according to world model but it is unclear to me if Figure 2B shows the ideal agent state representation across context or a graphical summary of what the agent actually learned from the sensory experience described in A; from the text. Figure 2F is supposed to describe a property of the emergent representation, but what is shown is another cartoon... etc.

      We appreciate the reviewer’s insightful comments regarding the clarity of our figures.

      To clarify the neural representation of the agent and how it links to the action, we have revised Figure 2 and the descriptions in the main text.

      First, Figure 2A schematically depicts the external stimulus as being determined solely by the task. In this task, animals must keep track of the immediately preceding state (S1 or S3) to correctly choose between S4 and S5 upon reaching S2. Without such a memory of prior states, an agent would have no basis for distinguishing which action is appropriate, and therefore cannot selectively move to S4 and S5. Therefore, any reinforcement learning model that does not incorporate at least a onestep state history cannot solve the task.

      To solve the task, S2 must be represented as two distinct contextual states depending on the previous state. Figure 2B therefore illustrates an example of internal representation that separates S2 into X2α and X2β: transitions from S1 to S2 are internally represented as X1 → X2α, whereas transitions from S3 to S2 are represented as X3 → X2β. Although the sensory inputs provided to the model correspond only to the task-defined states in Figure 2A, the combination of the sensory input with contextual states in Context selector successfully achieves this contextual representation of X2α and X2β (see Figure 2C, D). Also, the hippocampal neurons in Sequence composer indicate the next contextual states given the current contextual states, i.e., X2α→X4 and X2β→X5 (see Figure 2E). Thus, combining Context selector and Sequence composer successfully achieves the task requirement indicated in Figure 2B.

      Regarding the reviewer’s concern that Figure 2F (now Figure 2I) appeared to be another cartoon, we have revised the panel to clearly display our result. These results demonstrate that some hippocampal neurons in our model encode the transition from X2α→X4 and X2β→X5. The updated figure clarifies that our hippocampal neurons functionally work similarly to the splitter cells in Wood et al., 2000.

      (12) Improvement: use visuals and captions. Make it clear what is a cartoon, what is a model specification, and what is an actual result. Replace/complement algorithmic cartoons in Figure 1 with a description of the actual result.

      Thank you for raising this point.

      As we explained in the previous point (11), we added Figure 2D and Figure 2E for displaying the actual neural activity, and the corresponding annotations in the manuscript, e.g, X2α. Also, we revised the cartoons of our model description in Figure 1 to better describe our model structure.

      (13) Map between model and experimental results is poorly justified: in particular the nature of sensory inputs is not clearly specified, and how the experimental manipulations (e.g. MEC input disruption) map into model manipulations is not intuitive and no justification is provided for the choices beyond that the model ends up matching the experiment by some metric. Also, not clear why a tradeoff of neural resources as implemented in the model makes sense for the clinical case and how this hypothesis deviates from alternative Bayesian accounts invoking imperfections in inference (e.g. relative strength of priors vs likelihood as reported by e.g. P.Series's group, or issues with hierarchical inference more generally along R.Jardri's work).

      Thank you for raising this important point. We have revised the manuscript to clarify the mapping between model components, sensory inputs, and the experimental manipulations, and to further justify the clinical interpretation.

      About sensory inputs

      First, each environmental state in our model is represented as a binary (0/1) pattern. We have added Figure 2D to explicitly illustrate these sensory stimuli and how they are provided to the context-selection module.

      About mapping between model components and brain circuits

      Functionally, we speculate that Context selector (X) corresponds to computations carried out in the prefrontal cortex (PFC) and entorhinal cortex (EC), and Sequence composer (H) corresponds to the hippocampus. Inputs from the PFC are thought to reach the hippocampus via the EC. Therefore, suppression of MEC→hippocampus inputs in Sun et al. (2020) naturally maps onto blocking a subset of the inputs from X to H in our model.

      We clarified this correspondence in the revised manuscript and now explicitly justify why this manipulation matches the biological experiment.

      Relation to Bayesian theories

      We agree that Bayesian accounts have provided influential explanations of psychiatric symptoms by invoking imperfections in inference, such as imbalances between priors and likelihoods (e.g., work by P. Series and colleagues) or disruptions in hierarchical inference (e.g., work by Jardri and others). Our model complements these frameworks by explicitly incorporating sequential structure and context remapping. Rather than treating priors as static or fixed-weight quantities, our model allows contextual representations to be dynamically reorganized based on prediction errors over time. In the SZ-like condition, we assume that an excessively expanded context domain increases the influence of internally generated contextual predictions, causing them to override sensory inputs and resulting in maladaptive behavior with hallucination-like percepts. Importantly, this effect reflects not only stronger priors but also excessive generation and competition of contextual states, leading to unstable and non-reproducible remapping. In contrast, in the ASD-like condition, sensory-weighted context representations limit the ability to flexibly incorporate newly introduced contexts, causing the model to perseverate on an initially learned context and thereby reproduce inflexible behavior. We added a schematic illustration in Figure 5B and expanded the Discussion to clarify this point.

      “When the stimulus domain is relatively underrepresented, the reconstruction of contextual state in the Amari-Hopfield network tends to infer contextual states based on the context domain rather than the stimulus domain. Consequently, it converges to an incorrect attractor that is not assigned to the current environmental state, thereby increasing perceptual error for external stimuli (hallucination-like effects). Moreover, SPE-driven remapping and the corresponding synaptic plasticity occur more frequently. In contrast, when the stimulus domain is overrepresented, the Amari-Hopfield network rarely assigns multiple contextual states to a given environmental state, leading to an overuse of default contextual states (see Figure 5B and Materials and Methods). ”

      “Our model also provides an algorithmic-level account of psychiatric symptoms by changing the relative weighting of sensory-encoding versus context-coding neurons. This implementation is analogous to Bayesian theories linking priors to psychiatric symptoms. In SZ, hallucinations and delusions have been modeled as arising from overly strong top-down priors (Powers et al., 2016) or circular inference, which leads to erroneous belief formation (Jardri et al., 2017; Jardri and Denève, 2013). In our model, we used an underrepresented stimulus domain to increase the relative influence of internally generated context representation in context selection. Crucially, this implementation does not simply strengthen priors but induces excessive generation and competition of contextual states, leading to frequent yet non-reproducible remapping of hippocampal contextual activity and a failure of learning to converge despite repeated experience. In ASD, it has been argued that abnormally high sensory precision reduces the updating of expectations (Karvelis et al., 2018) or leads to sensory-dominant perception, which has been interpreted as weak priors (Angeletos, Chrysaitis, and Seriès, 2023; Lawson et al., 2014; Pellicano and Burr, 2012). In our framework, we used an overrepresented stimulus domain to increase the relative influence of external stimulus representations in context selection. Importantly, our model captures not only sensory-dominant processing emphasized in previous studies, but also a distinctive impairment in flexibly utilizing newly introduced contexts, reflecting a failure of context reconstruction and resulting in persistent inflexible behavior. Thus, our conjunctive modeling of sensory and context processing complements Bayesian accounts of psychiatric symptoms and provides a mechanistic explanation for the role of sensory processing in maladaptive, inflexible behavior. ”

      (14) Improvement: justify choices, explain in more detail relationships with computational psychiatry literature.

      Thank you for pointing it out. As we explained in the previous point (13), we justified our model choice in the revised version.

      Minor comments:

      (1) Typos: "algorism" (pg2), duplicate Sun reference.

      Thank you for finding the typo and the missing reference. We revised accordingly.

      (2) Unclear statements from Methods:

      • "preparing temporal context with three histories" not sure what is meant by this.

      • "... state estimation by the context-selection module becomes less frequent." (Methods/Overview): what is the mechanism?

      • "default pattern" and failure to converge: What is the biological basis for them?

      • Why is the converter function used on some occasions but not others?

      • "new contextual state is prepared": What does that mean?

      We thank the reviewer for pointing out several unclear statements in the Methods section.

      • “preparing temporal context with three histories”

      We now explicitly state the formal description of three histories in the Methods as follows.

      “the state is defined by the recent n-step transition history of task state (i.e. 𝑠<sub>𝑘</sub><sup>(𝑛)</sup> =(𝑆<sub>𝑘</sub>,𝑆<sub>𝑘−1</sub>, ⋯,𝑆<sub>𝑘−𝑛</sub>)<sup>𝑇</sup> , where 𝑠<sub>𝑘</sub><sup>(𝑛)</sup> is the temporal context state, and 𝑆<sub>𝑘</sub> is the environmental state at time 𝑘). We changed n from 0 to 3.”

      • “state estimation by the context-selection module becomes less frequent”

      In our model, context selection is performed every time the agents execute an action sequence generated by Sequence composer. As learning progresses, the Sequence composer comes to predict distant future states and executes coherent action sequences based on these predictions. When no unexpected errors are encountered during execution, context estimation is suppressed, resulting in less frequent context selection. We modified the manuscript as follows.

      “After the action execution, the agents repeat the process by selecting the current contextual state. As the agents become familiar with the environment, hippocampal sequences that enable future predictions to become longer, and contextual state estimation by Context selector becomes less frequent. The algorithmic flow chart of our model is described in Figure S1.”

      • “default pattern”

      In biological systems, it is reported that the frontal cortex shows sensory modality-specific representation without prior learning (Manita et al., 2015). We refer to these innate modalityspecific sensory representations as the default pattern. In the early stages of learning, we assume that no stable contextual representations have yet been formed in the brain, and therefore, a default pattern uniquely driven by external stimuli is used as the context representation. Even during intermediate stages of learning, the context selector may fail to converge to a specific state. In such context-uncertain environments, it has been reported that agents often rely on previously learned or habitual action choices (psychological inertia), which is evident in ASD patients.

      “This contextual state is set as a default context, ensuring that the X module assigns a unique contextual state to each environmental state. Biologically, one possible interpretation is that this default context corresponds to modality-specific innate representations in prefrontal regions (Manita et al., 2015).”

      “This default implementation is analogous to psychological inertia, particularly under uncertainty (Ip and Nei, 2025; Sautua, 2017), which has been reported to be more pronounced in ASD patients (Joyce et al., 2017).”

      • Why is the converter function used only in some cases?

      The converter function A(stim → context) was introduced to compose the default pattern (one-toone mappings between stimuli and contexts) as we described above. In other cases, the Hopfield dynamics were used to select contextual states; therefore, we did not use the converter function.

      • “new contextual state is prepared”

      Thank you for pointing this out.

      The term “prepared” was inaccurate. We revised it to “generated”.

      In the case of remapping, we assumed that X generates a new random neural activity pattern in its contextual domain and stores it as a new contextual state. We described this process as “a new contextual state is generated”.

      (3) Please explain the mapping between hippocampal sequences to actions in more detail for each task.

      • Why 9 attempts before rejection?

      • Why all the variations on Hebb?

      We appreciate the reviewer’s request for clarification. Below, we provide additional explanations point by point.

      Mapping between hippocampal sequences and actions

      In this research, we defined action as the transition from one environmental state to another environmental state. The hippocampal sequences predict the transition of environmental states; therefore, they correspond to a set of action plans from the current environmental state. In the revised manuscript, we added the formal definition of environmental states and actions in each task.

      • Why 9 attempts before rejection?

      These repetitions ensure adequate exploration of the contextual states in X and the episodic sequence in H before committing to an action. Increasing the number of attempts excessively causes the reward value function to be dominated by a single highest-scoring sequence, thereby causing excessive exploitation and narrowing behavioral variability. While the exact number 9 is not critical—the qualitative results are robust to moderate changes—we selected this value because it provides a good balance between exploration and exploitation and produces the clearest visualizations in our figures. We have clarified this in Method below.

      “We set the number of attempts before rejection to nine, providing a balance between exploration and exploitation and serving as a good compromise for visualization.”

      • Why all the variations on Hebbian learning?

      We consider three loci of plasticity in our model: the X module, the H module, and their reciprocal connections. Within the H module, synaptic connections that link episodic segments—specifically from transition-coding neurons to state-coding neurons—are assumed to follow a reward prediction error–dependent, supervised form of Hebbian learning. This choice reflects the need to selectively reinforce transitions that lead to successful outcomes. In contrast, all other synaptic updates in the model are assumed to follow reward-independent, activity-based Hebbian learning. These learning rules support the unsupervised formation and stabilization of contextual representations and action execution.

      In addition to the basic Hebbian rule, we introduced biologically motivated constraints, such as upper and lower bounds on synaptic weights and heterosynaptic depression, which weakens nonpotentiated synapses. Importantly, these mechanisms do not alter the fundamental nature of Hebbian learning but increase the stability of our model.

      (4) For Q learning: please clarify "the state is defined by the recent transition history of task state.

      As you suggested, we clarified the statement by adding the following sentences in Method. “To highlight the advantage of our model, we compared it to the Q-learning with temporal contexts, namely, the state is defined by the recent n-step transition history of task states (i.e. 𝑠<sub>𝑘</sub><sup>(𝑛)</sup> =(𝑆<sub>𝑘</sub>,𝑆<sub>𝑘−1</sub>, ⋯,𝑆<sub>𝑘−𝑛</sub>)<sup>𝑇</sup> , where 𝑠<sub>𝑘</sub><sup>(𝑛)</sup> is the temporal context state, and 𝑆<sub>𝑘</sub> is the environmental state at time 𝑘.”

      (5) What is the purpose and biological justification for the NG addition to RW?

      Thank you for raising this point. The prediction-error–based update of each sequence’s value function 𝑅 alone cannot distinguish between two fundamentally different cases:

      (a) the value of a sequence has genuinely decreased, or

      (b) the sequence remains useful, but it is just not appropriate in the current context. This distinction is essential for modeling context-dependent switching of behavioral strategies. To address this, we introduced the No-good (NG) indicator. NG allows the agent to temporarily mark certain sequences as unsuitable without altering their long-term value, thereby facilitating short-term exploration of alternative sequences. In other words, NG provides a mechanism for transiently suppressing a previously valid sequence in case of contextual changes, while preserving the underlying value learned in past experiences.

      This mechanism is consistent with several lines of biological evidence. First, extinction learning after fear conditioning does not erase the original fear memory but instead forms a new memory trace, known to be stored in the medial PFC (Milad & Quirk, 2002). This suggests that animals may switch to a different contextual representation rather than simply downgrading the value of the conditioned stimulus, supporting the idea of temporarily suppressing a sequence without modifying its intrinsic value.

      Second, recent studies in the ventral hippocampus show that dopamine D2–expressing neurons in the ventral subiculum promote exploration specifically under anxiogenic contexts (Godino et al., 2025). This finding is consistent with the short-term exploratory behavior enabled by our NG mechanism. Thus, we added the following statement to the manuscript:

      “No-good indicator is introduced to transiently suppress previously established sequences that have not been recently rewarded, without devaluing them. This no-good indicator facilitates RPEfacilitated remapping … that leads to exploration of different contextual states in X and sequences in H. The no-good indicator is inspired by recent findings in the ventral hippocampus, where dopamine D2-expressing neurons of the ventral subiculum selectively promote exploration under anxiogenic contexts (Godino et al., 2025).”

      Together, these biological findings provide a conceptual basis for modeling NG as a contextsensitive, transient modulation that encourages exploration without overwriting previously learned sequence values.

      (6) Missing details about H network size

      Thank you for pointing it out.

      We used 300 neurons for H. We indicated it as below.

      “We model the hippocampus with an N = 300 binary recurrent neural network.”

      (7) S1 figure: learning is slower even in the early, easy phases of learning when the temporal dependence should not matter; how are learning rates calibrated across models?

      Thank you for raising this point. In our model, the learning rate was fixed at 0.15, whereas the control model (now shown in Figure S2) uses a higher learning rate of 0.4, independent of temporal context.

      Regarding why learning appears slower even in the early, easy phases, when the number of temporal contexts increases, the size of the state space expands. This broadening of the state space makes it more time-consuming to identify and reinforce the appropriate state transitions. This is especially evident in easy phases because the temporal context prepared in the model is excessive to the number of temporal contexts that the task requires.

      Importantly, unlike the control model, which postulated a fixed number of temporal contexts, our model gradually increases the number of temporal contexts depending on prediction error. This adaptive mechanism allows the model to achieve fast learning during early, easy phases while still enabling more complex learning in later phases.

      Reviewer #2 (Recommendations for the authors):

      (1) "Hippocampal neurons show sequential activity...." The authors should include more classical references for hippocampal sequential activity at this point, too.

      Thank you for your suggestion. We added the citations below

      Skaggs and McNaughton, 1996; Wilson and McNaughton, 1993

      (2) "...called remapping" also here, please reference classic work (Bostock, Muller, ...)

      As suggested, we added the citations below

      Bostock et al., 1991; Muller and Kubie, 1987

      (3) "Several theoretical models..." What I miss here are models that explain remapping by inputs from the grid cell population, and/or the LEC (see Latuske 2017 for review), still widely considered the standard mechanism. Also, the models by Stachenfeld et al. 2017, Mattar and Daw 2019, and Leibold 2020 specifically address context dependence. Accordingly, "A comprehensive model that can explain the formation of context-dependent hippocampal sequences of various lengths through remapping, while relying on a biologically plausible learning process,..." somewhat overstates the novelty of the current paper.

      Thank you for pointing this out and for suggesting relevant citations. We agree with the reviewer that inputs from MEC and LEC to the hippocampus constitute a fundamental mechanism underlying remapping. However, in our view, a key open question in the remapping field is how MEC and LEC estimate the current context and convey this information to the hippocampus in a manner that supports goal-directed behavior. While previous studies have addressed remapping at the representational level and the hippocampal sequence at planning, the overall relationship between remapping, reinforcement learning, and planning has not yet been explained within a single unified model. In this work, we propose a simple and biologically plausible model that integrates an Amari–Hopfield network for context selection with hippocampal sequences, providing an account of coordination under goal-directed behavior. To more accurately position the novelty of our contribution, we have revised the manuscript as follows.

      “While previous works have explored hippocampal sequential activity for planning (Jensen et al., 2024; Mattar and Daw, 2018; Pettersen et al., 2024; Stachenfeld et al., 2017) and hippocampal remapping for contextual inference (Low et al., 2023) separately, they have yet to elucidate how these two aspects jointly enable flexible behavior. A simple biologically plausible model-based reinforcement learning model that uses the Amari-Hopfield model for context selection and hippocampal sequences of various lengths as a state-transition model for long-horizon planning, relying on remapping driven by prediction errors to form state representation, would thus provide valuable insights into the neural mechanisms underpinning context-dependent flexible behavior.”

      (4) Please properly introduce nomenclature "C2α, C2β, S2,...." S is sometimes used for stimulus, sometimes for location (state?), or even action?

      Thank you for pointing it out. We acknowledge that the annotation of Cn (e.g., C1, C2…) was not straightforward. Therefore, we changed the annotation to Xn (e.g., X1, X2, …) in order to indicate the contextual state of X.

      We define Sn (e.g., S1, S2…) as the external input given by the environment and represented in stim. domain of X, while Xn (e.g., X1, X2…) is the subjective contextual state generated by the agent and represented in the context domain of X. As a reference, we added the neural representation of X in Figure 2D and added the following text below.

      “The neural activity of X at each contextual state is shown in Figure 2D, where the environmental states (e.g., S1, S2…) are represented in the stimulus domain, and the contextual states (e.g., X1, X2α…) are represented in the context domain.”

      (5) "Our model replicates this result by blocking the synaptic transmission from most of the neurons in the context domain of X to H (Figure 3F).". Does this mean the X module is hypothesized to be in the EC?

      Thank you for the thoughtful question. In our model, the X module is intended as a functional abstraction that combines the roles of several brain regions known to contribute to contextual representation, including the prefrontal cortex (PFC) and the entorhinal cortex (EC). Although X is not necessarily meant to correspond to a single anatomical region, we consider it likely that the contextual information represented in X would reach the hippocampus (H) (CA3 and CA1) primarily through the EC. Thus, the experimental manipulation shown in Figure 3F—suppression of medial EC axon at the hippocampus—is interpreted in our framework as weakening the input from X to H.

      We added the following texts in the Discussion section.

      “We speculate that Context selector is implemented across multiple brain regions with varying degrees of resolution, including a part of the entorhinal cortex and prefrontal cortex.”

      “Our model posits that the Sequence Composer corresponds to computations within the hippocampus. As a biologically plausible projection, we consider the CA3–CA1 circuit, where contextual inputs from regions such as the PFC and EC provide the current contextual state to CA3, enabling the recurrent CA3–CA1 architecture to generate predictions of the next contextual state.”

      (6) Discussion "model-based reinforcement learning": Please detail where the model is here. In my understanding, the naive agent does not have a model (this would be model-free then?).

      Thank you for asking.

      Unlike model-free reinforcement learning, where each action is evaluated step by step, we use hippocampal sequences for multiple-step prediction and action planning. This is the “model” in our research. As you mentioned, initially, animals do not have a “model”, but Sequence composer gradually chunks the episodic segments to compose a longer sequence.

      (7) "...can change the attractor dynamics in the hippocampus (34)": What is (34)? I also would doubt that one can make such absolute statements about the human hippocampus.

      Thank you for pointing out the missing citation. We corrected it accordingly.

      Rolls E. 2021. Attractor cortical neurodynamics, schizophrenia, and depression. Transl Psychiatry 11. doi:10.1038/s41398-021-01333-7

      (8) "To the best of our knowledge, this is the first model that describes the formation of contextdependent hippocampal activity through remapping and its contribution to flexible behavior." See "Several theoretical models...".

      Thank you for pointing this out. We admit that it was an overstatement. We corrected it accordingly.

      “To the best of our knowledge, this is the first model that uses associative memory for describing the formation and switching of context-dependent hippocampal activity through remapping and its contribution to flexible behavior.”

      (9) "We speculate that the context-selection module is implemented across multiple brain regions..." How would an attractor network be implemented over "multiple brain regions"?

      We thank the reviewer for raising this important conceptual question. Context information in realistic environments is likely to have a hierarchical structure. We therefore speculate that multiple brain regions may jointly support context selection by maintaining different levels or components of this hierarchy. In particular, the prefrontal cortex (PFC), medial entorhinal cortex (MEC), and lateral entorhinal cortex (LEC) have all been implicated in representing contextual or task-state information at different levels of abstraction. These regions are known to exhibit attractor-like dynamics and to provide inputs to the hippocampus. Thus, an attractor network spanning multiple regions could arise, with different areas stabilizing distinct components of the contextual representation, depending on the timescale of memory, task demands, or sensory features.

      We used the Amari–Hopfield network as a functional abstraction to explain such multi-regional interactions underlying context representation, rather than to provide a one-to-one mapping onto a specific brain region. How region-specific attractor dynamics jointly contribute to maintaining global contextual information and enabling context switches in response to prediction errors remains an important direction for future research.

      Methods:

      (10) "... agents move through discrete environmental states characterized by distinct external stimuli.": How is this exactly implemented? What is the neural representation of these states, xi? What is the difference to a "landmark"?

      We appreciate the reviewer’s thoughtful question regarding the implementation and neural representation of environmental states. In our model, each environmental state is represented as a binary stimulus pattern provided to the stimulus-domain neurons in Context Selector. Specifically, for each state, we constructed a pattern in which half of the neurons are set to 1 and the other half to 0. We chose this design because, in the Amari–Hopfield model, memory performance is maximized when stored patterns contain approximately equal proportions of 0 and 1. For clarity, we have added an illustration of these stimulus patterns in the revised Figure 2D.

      Regarding the reviewer’s question about landmarks: in our framework, a landmark denotes an environmental state for which the contextual state is uniquely determined, regardless of the preceding transition history. For simplicity in this study, we designated the initial environmental state in each task (S0 or S1) as the landmark. Importantly, in our implementation, landmarks do not differ from other states in terms of their stimulus pattern; their special role arises solely from the task structure, not from additional sensory properties.

      In real environments, what constitutes a landmark likely varies depending on stimulus saliency and the agent’s prior experience. Determining how landmarks should be optimally defined or learned is an interesting direction for future work.

      (11) How are different contexts represented for the same stimulus xi^stim?

      We added an example of neural activity in X in Figure 2D, illustrating the distinction between the stimulus domain and the context domain. While the activity in the stimulus domain depends on the external stimulus, the contextual domain consists of uncorrelated random neural states. We exploit a key property of the Amari–Hopfield network to associate each contextual state with a given external stimulus.

      (12) "...and its stimulus domain ??stim becomes identical to ??xistim ." Does that mean every stimulus is an attractor in the context net? How can that work with only 1200 neurons? Is that realistic for real-life environments? Neuron numbers would need to increase dramatically.

      As you mentioned, we assigned each stimulus to a corresponding attractor in the Context selector (X). An Amari–Hopfield network with 1,200 neurons can store approximately 10–20 attractors, which is sufficient to solve the tasks considered in this study. We adopted the Amari–Hopfield network for its simplicity and conceptual clarity; however, in biological neural systems, it is not necessary to construct such rigid attractors for every stimulus. For example, modality-specific neural projections exist in the brain and are sometimes sufficient to form loose attractor states across different stimuli. In addition, the prefrontal cortex is known to support working memory, which may also serve as a form of contextual representation incorporating recent history. Thus, we propose that multiple brain regions cooperate to implement the Context selector.

      (13) How are WHX and WHH initialized?

      Thank you for pointing this out.

      We set the initial condition of all W to 0. We added the following text in the Method section.

      “Note that the initial synaptic weights of 𝑊<sup>𝐻𝑋</sup> and 𝑊<sup>𝑋𝐻</sup> are all 0.”

      (14) It is unclear why the hippocampus separates into state and transition neurons. Why cannot one pattern serve both purposes?

      Thank you for asking about this important point.

      The reason why we prepare two kinds of hippocampal neurons is that state-coding neurons represent the current contextual state, and transition-coding neurons predict the following contextual state under the current contextual state. These two separations enable it to predict multiple scenarios under the current contextual state and to choose a sequence most suitable in the environment.

      We rewrote the following sentences in the manuscript.

      In result section,

      “In Sequence composer, there exist two types of neurons: state-coding neurons, which represent each contextual state, and transition-coding neurons, which encode transitions to successive contextual states given the contextual state indicated by the state-coding neurons”

      In Method section,

      “The state-coding neurons receive input from 𝑋 and represent the current contextual state, while the transition-coding neurons send output to 𝑋 and predict the next contextual state after an action i.e., T(𝑋<sub>𝑘+1</sub>|𝑋<sub>𝑘</sub>,𝑎<sub>𝑘,𝑘+1</sub>).”

      (15) "the agents execute actions according to this sequence." How are the actions defined? Are they part of the state?

      We thank the reviewer for raising this important point. In our model, an action is defined as the transition from a given environmental state to the next environmental state. To avoid ambiguity, we have added a formal mathematical definition of actions for each task in the revised manuscript. In our framework, the transition-coding neurons in Sequence Composer (H) predict the upcoming environmental state, and thus the hippocampal sequence intrinsically contains the representation of an action. Consequently, the sequence generated before actions functions as the agent’s internal action planning process.

      (16) "Because the input source for the state-coding neuron and the transition coding neuron differ (the former is selected from ??, while the latter is selected from ??), the same hippocampal neuron could occasionally be used for both state-coding and transition-coding across different contextual states. This is evident when an excessive number of contextual states are prepared, especially in the SZ condition. This phenomenon degrades state estimation at X (eq.3)." I have no idea what you want to convey here, .... and how is state estimation related to Equation 3?

      We appreciate the reviewer’s feedback and agree that our original explanation was unclear. Our intention was to clarify why context estimation deteriorates specifically in the SZ condition.

      In our model, state-coding neurons in the hippocampus represent the current contextual state, and transition-coding neurons predict the next contextual state given the current contextual state. Under normal conditions, these two sets of neurons remain sufficiently distinct, allowing accurate prediction of the upcoming contextual state, which is conveyed to X. However, when an excessively large number of contextual states are stored in the SZ condition, representations in the hippocampus begin to overlap. As a result, some hippocampal neurons are inadvertently recruited for both state-coding and transition-coding across different contextual states. This overlap disrupts the H’s ability to accurately predict the next contextual state.

      This degraded prediction directly affects the state-estimation process in X (Eq.3), because Eq.3 relies on receiving an accurate predicted next state from H. When this signal becomes ambiguous, X may converge to an incorrect contextual state, potentially mimicking hallucination-like inference errors.

      We have rewritten the relevant passage in the manuscript to clarify this mechanism as follows.

      “When the number of contextual states increases - particularly in the SZ condition - representational overlap arises between hippocampal state-coding and transition-coding neurons.

      This overlap makes the prediction of the next contextual state by the transition-coding neurons unreliable. The degraded prediction from H, in turn, corrupts the initial condition for context selection in X (Eq. 3), leading to hallucination-like behavior.”

      (17) The figures hardly show simulated activity. Consider displaying more neuronal simulations to help the reader grasp the workings of the model.

      Thank you for your suggestion. We indicated the neural activity of X and H in Figures 2D and 2E, respectively, to show the overview of our model.

      (18) Figure 5: What is the "Hopfield count"?

      Thank you for pointing this out. The definition of the Hopfield count was ambiguous. We added an explicit explanation of “context selection” and its possible outcomes (correct association, hallucination-like, and default contexts) in Fig. S1. To clarify our claim, we replaced the countbased measure with the probability of selecting hallucination-like and default contexts during context selection. Accordingly, we removed the term “Hopfield count” and revised the caption of Figure 5 as follows.

      “The result of context selection (see Figure S1). The probability of wrong stimulus reconstruction (hallucination-like effects) is plotted in red, and the probability of default context usage due to failures in context reconstruction (see Materials and Methods) is plotted in blue.”

      (19) Figure 6: Consider moving this upfront.

      Thank you for the suggestion. We moved Fig.6 to Fig.S1 and introduced it earlier in the manuscript.

      Reviewer #3 (Recommendations for the authors):

      I was a bit confused about the implementation, which may not be autonomous, meaning there are numerous stages that require intervention from outside the X-H network (see Figure 6). It seems that the X network might wait to converge before providing input to H, rather than having the entire network evolve in parallel. There are also aspects to the implementation that seem rather ad hocsuch as the "no-good indicator".

      Thank you for the thoughtful comments. We would like to clarify several points regarding the implementation and its biological motivation.

      First, regarding the concern that the X–H interaction may not be fully autonomous:

      In our framework, the convergence time of the X module under external sensory input is assumed to be on the order of several hundred milliseconds, consistent with the timescale of stimulus-evoked cortical population dynamics observed in biological systems. Especially when hippocampal input is present, X does not need to explore the full attractor landscape. Instead, it quickly settles into an attractor located near the hippocampal cue, which substantially shortens the convergence time.

      Second, although our current implementation proceeds in an algorithmically sequential manner for clarity, we do not intend to imply that the brain performs these steps sequentially. Biologically, the states of X and H are expected to co-evolve and mutually constrain each other through recurrent interactions. The sequential algorithm in the model is therefore a practical choice for implementation, not a theoretical claim about strict temporal ordering in the neural system.

      Finally, the “no-good indicator” is introduced to suppress hippocampal sequences transiently and thereby accelerate switching behavior. Our no-good indicator is most consistent with the biological findings on D2-expressing neurons in the hippocampus. We added the following text below.

      About the no-good indicator

      “The no-good indicator is inspired by recent findings in the ventral hippocampus, where dopamine D2-expressing neurons of the ventral subiculum selectively promote exploration under anxiogenic contexts (Godino et al., 2025)”

      Besides the hippocampus, similar mechanisms—temporary suppression of recently visited or lowvalue attractor states—have been proposed in biological decision-making and working-memory literature, providing conceptual support for the no-good indicator in our model.

      After exposure to a new context, a new memory/context is stored in the X network. As the storage of a new memory requires synaptic plasticity, this step would presumably take a significant amount of time in an animal.

      Thank you for raising this important point. We agree that the formation of a new memory or context requires synaptic changes, and it is well established that processes such as tagging during wakefulness and consolidation during sleep take considerable time. However, once a context has been learned, switching between contexts can be achieved just by moving between attractors in the X network. This mechanism allows for rapid, context-dependent behavior without requiring new synaptic modifications each time. Our study focuses on this aspect of fast context-dependent switching rather than the initial memory formation.

      My understanding is that the Amari-Hopfield network should be evolving in continuous time and not be binary. But there were no time constants mentioned, and the equations were not provided, and it seems that the elements of X were binary units, rather than analog. This should be clarified.

      Thank you for the comment.

      Although there are models with continuous firing rates and continuous time (Ramsauer et al., 2021), the original Amari-Hopfield model uses binary neurons operating in discrete time steps. As we answered the comments (5) and (6) from Reviewer 1, we considered only a discretely timestepped environment for which the timescale is arbitrary. At each environmental state where the current contextual state is selected, it typically takes about ten iterations for the conversion of the Amari-Hopfield network.

      In the text, we added the following text.

      “For simplicity, the environment is defined in discrete time, and agents move through environmental states characterized by distinct external stimuli.”

      Figure 3 is aimed at replicating the lap cell finding of Sun et al, 2020. In panel E, a comparison is made between the data and the model. Are the cells in the model the entire population of H neurons (state and transition), or just a subset? Does the absence of the "ghosts" (the weaker off diagonal responses seen in the experimental data) imply that the network is not encoding that it is in the same location, but a different lap? Why is there not any true sequentiality (i.e., why do all H units go on at once)?

      Thank you for your insightful comments. Throughout this study, we used 300 neurons for the Sequence composer (H); however, for simplicity, we constrained the model such that only a single H neuron was active at each time point. As a result, most other neurons remained silent. Accordingly, in Fig. 3E, we display only neurons with firing activity, and silent neurons are not shown.

      As you correctly inferred, hippocampal neurons in our model encode lap identity rather than the same physical location across laps. This design choice reflects our focus on hippocampal neurons representing contextual states, rather than place-coding neurons, as only the former contributes directly to contextual behavior in our framework. As shown in Fig. 3E, hippocampal neurons exhibit clear sequential activity with “episode-like” representations corresponding to individual laps. Nevertheless, we believe that incorporating a mixture of context-coding neurons and place-coding neurons is an important direction for future work, as illustrated in Fig. S3.

      We revised the caption of Fig. 3E as follows.

      “E, The comparison of (Left) lap cells in the hippocampus in the 4-lap task (Sun et al., 2020) and (Right) our results of active neurons in the H module.”

      Typo "but also makeS predictions".

      Thank you for pointing this out. We revised it correctly.

    1. eLife Assessment

      This is a potentially valuable modeling study on sequence generation in the hippocampus in a variety of behavioral contexts. While the scope of the model is ambitious, its presentation is incomplete and would benefit from substantially more methodological clarity and better biological justification. The work will interest the broad community of researchers studying cortical-hippocampal interactions and sequences.

    2. Reviewer #1 (Public review):

      Summary:

      The manuscript by Ito and Toyozumi proposes a new model for biologically plausible learning of context-dependent sequence generation, which aims to overcome the predefined contextual time horizon of previous proposals. The model includes two interacting models: an Amari-Hopfield network that infers context based on sensory cues, with new contexts stored whenever sensory predictions (generated by a second hippocampal module) deviate substantially from actual sensory experience, which then leads to hippocampal remapping. The hippocampal predictions themselves are context-dependent and sequential, relying on two functionally distinct neural subpopulations. On top of this state representation, a simple Rescola-Wagner-type rule is used to generate predictions for expected reward and to guide actions. A collection of different Hebbian learning rules at different synaptic subsets of this circuit (some reward-modulated, some purely associative, with occasional additional homeostatic competitive heterosynaptic plasticity) enables this circuit to learn state representations in a set of simple tasks known to elicit context-dependent effects.

      Strengths:

      The idea of developing a circuit-level model of model-based reinforcement learning, even if only for simple scenarios, is definitely of interest to the community. The model is novel and aims to explain a range of context-dependent effects in the remapping of hippocampal activity.

      Weaknesses:

      The link to model-based RL is formally imprecise, and the circuit-level description of the process is too algorithmic (and sometimes discrepant with known properties of hippocampus responses), so the model ends up falling in between in a way that does not fully satisfy either the computational or the biological promise. Some of the problems stem from the lack of detail and biological justification in the writing, but the loose link to biology is likely not fully addressable within the scope of the current results. The attempt at linking poor functioning of the context circuit to disease is particularly tenuous.

    3. Reviewer #2 (Public review):

      Summary:

      Ito and Toyoizumi present a computational model of context-dependent action selection. They propose a "hippocampus" network that learns sequences based on which the agent chooses actions. The hippocampus network receives both stimulus and context information from an attractor network that learns new contexts based on experience. The model is consistent with a variety of experiments, both from the rodent and the human literature, such as splitter cells, lap cells, and the dependence of sequence expression on behavioral statistics. Moreover, the authors suggest that psychiatric disorders can be interpreted in terms of over-/under-representation of context information.

      Strengths:

      This ambitious work links diverse physiological and behavioral findings into a self-organizing neural network framework. All functional aspects of the network arise from plastic synaptic connections: Sequences, contexts, and action selection. The model also nicely links ideas from reinforcement learning to neuronally interpretable mechanisms, e.g., learning a value function from hippocampal activity.

      Weaknesses:

      The presentation, particularly of the methodological aspects, needs to be majorly improved. Judgment of generality and plausibility of the results is hampered, but is essential, particularly for the conclusions related to psychiatric disorders. In its present form, it is unclear whether the claims and conclusions made are justified. Also, the lack of clarity strongly reduces the impact of the work in the larger field.

      More specifically:

      (1) The methods section is impenetrable. The specific adaptations of the model to the individual use cases of the model, as well as the posthoc analyses of the simulations, did not become clear. Important concepts are only defined in passing and used before they are introduced. The authors may consider a more rigorous mathematical reporting style. They also may consider making the methods part self-contained and moving it in front of the results part.

      (2) The description of results in the main text remains on a very abstract level. The authors may consider showing more simulated neural activity. It remains vague how the different stimuli and contexts are represented in the network. Particularly, the simulations and related statistical analyses underlying the paradigms in Figure 4 are incompletely described.

      (3) The literature review can be improved (laid out in the specific recommendations).

      (4) Given the large range of experimental phenomenology addressed by the manuscript, it would be helpful to add a Discussion paragraph on how much the results from mice and humans can be integrated, particularly regarding the nature of the context selection network.

      (5) As a minor point, the hippocampus is pretty much treated as a premotor network. Also, a Discussion paragraph would be helpful.

    4. Reviewer #3 (Public review):

      Summary:

      This paper develops a model to account for flexible and context-dependent behaviors, such as where the same input must generate different responses or representations depending on context. The approach is anchored in the hippocampal place cell literature. The model consists of a module X, which represents context, and a module H (hippocampus), which generates "sequences". X is a binary attractor RNN, and H appears to be a discrete binary network, which is called recurrent but seems to operate primarily in a feedforward mode. H has two types of units (those that are directly activated by context, and transition/sequence units). An input from X drives a winner-take-all activation of a single unit H_context unit, which can trigger a sequence in the H_transition units. When a new/unpredicted context arises, a new stable context in X is generated, which in turn can trigger a new sequence in H. The authors use this model to account for some experimental findings, and on a more speculative note, propose to capture key aspects of contextual processing associated with schizophrenia and autism.

      Strengths:

      Context-dependency is an important problem. And for this reason, there are many papers that address context-dependency - some of this work is cited. To the best of my knowledge, the approach of using an attractor network to represent and detect changes in context is novel and potentially valuable.

      Weaknesses:

      The paper would be stronger, however, if it were implemented in a more biologically plausible manner - e.g., in continuous rather than discrete time. Additionally, not enough information is provided to properly evaluate the paper, and most of the time, the network is treated as a black box, and we are not shown how the computations are actually being performed.

    5. Author Response:

      We appreciate the reviewers’ thoughtful assessments and constructive feedback on our manuscript. The central goal of our study was to propose a simple and biologically inspired model-based reinforcement learning (MBRL) framework that draws on mechanisms observed in episodic memory systems. Unlike model-free approaches that require processing at each state transition, our model uses sequential activity (= transition model) to predict environmental changes in the long term by leveraging episode-like representations.

      While many prior studies have focused on optimizing task performance in MBRL, our primary aim is to explore how flexible, context-dependent behavior—reminiscent of that observed in biological systems—can be instantiated using simple, neurally plausible mechanisms. In particular, we emphasize the use of an Amari-Hopfield network for the context selection module. This network, governed by Hebbian learning, forms attractors that can correct for sensory noise and facilitate associative recall, allowing dynamic separation of prediction errors due to sensory noise versus those due to contextual mismatches. However, we acknowledge that our explanation of these mechanisms, especially in relation to sensory noise, was not sufficiently developed in the current manuscript. We plan to revise the text to clarify this limitation and to expand on the implications of these mechanisms in the context of psychiatric disorder-like behaviors, as illustrated in Figure 5. Several reviewers raised concerns about the clarity of our model. Our implementation is intentionally algorithmic rather than formal, designed to provide an accessible proof-of-concept model. We will revise the manuscript to better describe the core logic of the model—namely, the bidirectional interaction between the Hopfield network (X) and the hippocampal sequence module (H), where X sends the information on estimated current context to H, and H returns a future prediction based on the episode to X. This interaction forms a loop enabling the current context estimation and its reselection.

      The key advantage of this architecture is its ability to flexibly adjust the temporal span of episodes used for inference and control, providing a potential solution to the challenge of credit assignment over variable time scales in MBRL. Because our model forms and stores the variable length of episodes depending on the context, it can handle both short-horizon and long-horizon tasks simultaneously. Moreover, because each episode is organized by context, reselecting contexts enables rapid switching between these variable timescales. This flexibility addresses a challenge in MBRL—the assignment of credit across variable time scales—without requiring explicit optimization. To better illustrate this important feature, we plan to include additional experiments in the revised manuscript that demonstrate how context-dependent modulation of episode length enhances behavioral flexibility and task performance.

      Finally, we will address the comments on the presentation and the biological grounding of our model. To improve clarity and biological relevance, we will revise the Methods section to explicitly describe how the model is grounded in mechanisms observed in real neural systems. Also, we will clarify which parts of our figures represent computational results versus schematic illustrations and more clearly explain how each model component relates to known neural mechanisms. These revisions aim to improve both clarity and accessibility for a broad audience, while reinforcing the biological relevance of our approach.

      We thank the reviewers again for their insightful comments, which will help us substantially improve the manuscript. We look forward to submitting a revised version that more clearly conveys the contributions and implications of our work.

    1. eLife Assessment

      This important paper presents a rigorous and comprehensive deep mutational analysis of the kinase TYK2, revealing how single amino acid substitutions influence protein abundance, signaling activity, and responses to pharmacological inhibitors. By combining high‑quality experimental design with dose‑response signaling assays and multiple inhibitor conditions, the authors generate a robust dataset that identifies variants across all domains of TYK2, including clusters at functionally critical sites and protein-protein interfaces. The study highlights mutations that drive drug resistance or potentiation and shows that reduced TYK2 abundance aligns with protective autoimmune‑associated variants, underscoring the therapeutic relevance of modulating TYK2 stability. Overall, the work provides compelling insights with clear implications for biochemistry, immunology, clinical genetics, and drug discovery.

    2. Reviewer #1 (Public review):

      Summary:

      In this compelling study, Howard et al. use deep mutational scanning to probe essentially all possible single amino acid substitutions in the TYK2 tyrosine kinase, and identify those that modulate signaling function and protein abundance. The methodological approach is elegant and thorough, and the results identify numerous examples of amino acid substitutions that have been previously reported to modulate TYK2 function, validating the approach.

      Substitutions that are LOF with respect to IFN-a signaling but not protein abundance are particularly interesting and are widely dispersed across the protein. They include known functionally critical sites such as the active site and activation loop of the kinase domain, as well as the allosteric site within the regulatory pseudokinase domain, but also hundreds of other additional sites. The approach is then used to study the effects of substitutions on kinase inhibition using several JAK family inhibitors that target the pseudokinase domain. By assessing variant effects at both high and low drug concentrations, they are able to identify variants that mediate resistance or conversely potentiate inhibition, respectively. These map to distinct sites on the pseudokinase domain. Finally, the authors show that several TYK2 variants, most notably the P1104A substitution, previously shown to protect against autoimmune disease, correspond to substitutions that reduce protein abundance in their screen. Combining their DMS data with autoimmune phenotype and TYK2 genotype data uncovered a general dose relationship between autoimmunity and TYK2 abundance, and the authors propose that this might justify targeting TYK2 protein levels with degraders.

      Strengths:

      This is a nicely executed, well-written study with good figures and a clear presentation.

      Weaknesses:

      The only substantial critique I have is that while the paper makes a compelling case for the validity and power of the approach, the authors could perhaps go further in their interpretation of their data, particularly with regards to identifying functionally important sites and connecting them to putative allosteric sites and functionally relevant protein-protein interfaces in the context of what is known about JAK family kinase structure and function. An attempt is made to interpret the data in light of a composite structural model of full-length TYK2 engaged with the IFNAR1 receptor (Figure 2C), but much more could be said about this. Below, I list several examples where additional insight might be gleaned.

      (1) The discussion of gain-of-function variants is limited. Given that tight regulation is a general theme of kinase signaling and gain-of-function mutations are a common disease mechanism, these mutations could be particularly interesting. Could the authors comment on patterns of gain versus loss? Are there gain-of-function signaling variants that work in a IFN-a dose dependent versus independent manner?

      (2) The discussion of the signaling-specific variants (LOF in signaling but not abundance) is interesting but could be expanded. Can the authors comment on which regions of the pseudokinase/kinase interface, for instance, are affected, since this allosteric communication is a critical and unique aspect of JAK family protein function? Can something be said about what the 6 activation loop substitutions are doing?

      (3) The cytokine signaling screen was performed at several different levels of IFN-α cytokine stimulation. The authors state that these data were used to identify quantitative variant effects (p7), but the cytokine dose response data are not widely discussed in the manuscript. Is it not possible that valuable information about the strength of substitution effects could be gleaned from this? One might expect that simple loss of function mutants that, e.g. completely destroy catalytic activity, will be LOF at all levels of stimulation, whereas mutations that have more nuanced "tuning" or allosteric effects on signaling might display LOF at low cytokine stimulation levels but be restored at high stimulation levels. Such information could be of potential functional importance and interest. Could the authors comment on this?

      (4) In general, the variant data could be interpreted more specifically in light of the available detailed structural information about TYK2 and JAK kinases generally. For instance, could the resistance versus potentiation variants be interpreted in this context to hypothesize what they might be doing?

    3. Reviewer #2 (Public review):

      Howard et al. describe a set of deep mutational scanning (DMS) experiments applied to TYK2, which is a drug target implicated in autoimmune disease. By assaying protein abundance (stability) effects as well as immune signaling, the authors are able to disentangle variant effects that may be directly involved in protein activity (and therefore potentially druggable) from variant effects that are due to loss of protein or general structural instability. By performing these assays under multiple conditions, including the presence of various concentrations of small molecules, they develop a clear picture of which sites in TYK2 may be most relevant for intervention or targeting. Overall, the work represents a very compelling example of DMS for understanding protein biology and candidate drug mechanisms.

      The work is very thorough, with multiple DMS assays described and compared/contrasted. This greatly enhances the impact and interpretability of any individual assay performed.

      The authors have made improvements to the state of the art in terms of wet-lab assay design as well as the analysis of FACS-based deep mutational scans.

      The potential mechanism of loss of protein abundance in TYK2 being protective for autoimmune disease is clear, but the estimates of the effect size in more physiologically relevant settings vary quite a bit and might be quite small. Are there examples that could be cited of other similar disease mechanisms where a 10% loss in abundance is associated with a clinical phenotype?

    4. Reviewer #3 (Public review):

      Summary:

      In the paper "Deep mutational scanning reveals pharmacologically relevant insights into TYK2 signaling and disease", the authors perform a comprehensive deep mutational scan of the kinase TYK2, a protein of pharmacological interest due to its central role in multiple immune-related phenotypes. The study assesses two key functional phenotypes: protein abundance and IFN-α-dependent signaling. The signaling assays were conducted across a dose-response range under various inhibitor conditions, allowing for an in-depth characterization of TYK2 activity and regulation. Both the experimental design and data analysis were executed with rigor and transparency, yielding a dataset that appears highly reliable. The authors provide strong evidence and a scientifically grounded interpretation of their results.

      The paper presents the results of a deep mutational scan based on two assays: an IFN-α-stimulated signaling assay and a protein abundance assay. These measurements are further supported by variant classifications from AlphaMissense and ClinVar, providing a framework for functional interpretation. Building on these data, the authors propose four potential pharmacological applications of their screening system at the end of the first results section.

      First, they demonstrate that the combined analysis of abundance and IFN-α signaling identifies potential allosteric sites, focusing on variants with normal protein stability but reduced signaling activity. Through this approach, they detect two previously uncharacterized allosteric regions (Results Section 2).

      Second, they explore how the screen can be used to predict variant-specific drug responses or resistance mechanisms (Results Section 3). This is achieved through assays involving two different inhibitors, which reveal both resistance- and potentiation-associated variants.

      Third, they assess the relative functional consequences of ligand and inhibitor dosing by performing IFN-α and inhibitor dose-response experiments (1, 10, and 100 U/mL IFN-α; IC99 and IC75 inhibitor concentrations; Results Section 3).

      Finally, the authors investigate how specific human variants, such as P1104A and I684S, may inform therapeutic modality selection (Results Section 4). Although these variants exhibit no detectable effect on IFN-α signaling within this experimental system, they substantially impact protein abundance. By integrating data from the UK Biobank, the authors further demonstrate that protective effects against autoimmune disease are associated with altered protein abundance rather than differences in IFN-α signaling, highlighting the distinct mechanistic basis of TYK2's clinical relevance.

      Strengths:

      Overall, we found this paper rigorous, well-written, and easy to follow. As such, we think this is an exceptional example of a deep mutational scanning manuscript, and this dataset will be invaluable to the field. We particularly appreciate that the authors could explore sensitivity to inhibitor concentration across multiple doses of the inhibitor.

      Weaknesses:

      Despite the authors' rigorous experimentation and thoughtful interpretation, the study leaves several important mechanistic questions unresolved, as is common in any study. While the data provide clear functional patterns, the underlying biophysical and biochemical explanations remain insufficiently explored. For instance, in point 1, the identification of two novel allosteric sites is intriguing, yet the paper does not elaborate on the structural basis or mechanistic rationale for their regulatory effects. In point 2, resistance and potentiation variants are described for two distinct inhibitors, but it remains unclear why certain variants respond specifically to one compound and not the other. In point 3, higher inhibitor concentrations appear to diminish allosteric interactions, though the reasons why some sites are affected while others are not are left unexplained. Finally, in point 4, the observation that protein abundance, but not IFN-α signaling, correlates with autoimmune protection is compelling but mechanistically ambiguous. These gaps do not detract from the technical excellence of the work; rather, they highlight opportunities for future studies to clarify the molecular and pharmacological mechanisms underlying TYK2 regulation and to deepen the translational insights drawn from this comprehensive mutational scan. We hope that the authors could provide more direction and mechanistic context in the discussion section to guide readers toward these next steps.

    5. Author response:

      We thank the reviewers for their excellent and thoughtful comments and suggestions, along with their strong support of the work. We agree with the general feedback that there is opportunity for further mechanistic dissection of the data from a variety of interesting angles. This was a fascinating project to work on because of all of the possible directions, and we attempted to highlight a diversity of compelling findings. We wish we had time to devote to answering more of the open mechanistic questions, but, given competing priorities, we are unfortunately unable to do them justice at this time. At the suggestion of a reviewer, we have made results available through MaveDB (accession numbers urn:mavedb:00001270-a and urn:mavedb:00001271-a) as a way to empower others to explore more.

    1. eLife Assessment

      The authors establish solid theoretical principles for designing brain perturbations under the assumption that brain activity evolves under a linear model. By prioritizing low-variance components, resonant frequencies, and hub nodes, this framework provides an important foundation for optimizing information gain, neural state classification, and the control of neural dynamics. However, the lack of investigation of model mismatch makes the study incomplete.

    2. Joint Public Review:

      Summary:

      Inferring so-called "functional connectivity" between neurons or groups of neurons is important both for validating models and for inferring brain state. Under the assumption that brain dynamics is linear, the authors show that the error in estimating functional connectivity depends only on the eigenvalues of the covariance matrix of the observed data, and it is the small eigenvalues -corresponding to directions in which the variance of the brain activity is low - that lead to large estimation errors. Based on this, the authors show that to achieve low estimation error, it's important to excite the resonant frequencies and perturb well-connected hubs. The authors propose a practical iterative approach to estimate the functional connectivity and demonstrate faster convergence to the optimal estimate compared to passive observation.

      Strengths:

      The main contribution of the study is the derivation of an explicit expression for the error in functional connectivity that depends only on the covariance matrix of the observed data. If valid, this result can have a profound impact on the field. The study also motivates the current shift to closed-loop experiments by demonstrating the effectiveness of active learning in the system using perturbation, in comparison to passive estimation from resting-state activity. Finally, the relative simplicity of the model makes its practical applications straightforward, as the authors illustrate in the context of brain state classification and neural control.

      Weaknesses:

      The derivation of the main error term misses some important steps, which complicates peer review at this stage. In particular, factorisation of the covariance into noise and the inverse of the observation covariance matrix needs a more thorough justification. The cited sources do not contain the derivation for a noise term with full covariance, which is essential for deriving this error term.

      The practical recommendation at the end of the paper also requires clearer guidance on how the design perturbations are constructed, and how many times and for how long the system is stimulated in each iteration of the experiment.

      Finally, there is no analysis of model mis-specification. In particular, the true dynamics are unlikely to be linear; the noise is unlikely to be either Gaussian or uncorrelated across time; and the B matrix is unlikely to be known perfectly. We're not suggesting that the authors consider a more complex model, but it's important to know how sensitive their method is to model mismatch. If nothing can be done analytically, then simulations would at least provide some kind of guide.

    3. Author response:

      We thank the editors and reviewers for their careful reading of our manuscript and for their insightful comments. We appreciate the opportunity to clarify several aspects of the derivations and experimental design, and we will revise the manuscript accordingly. Below we provide responses to the major weaknesses raised by the reviewers.

      The derivation of the main error term misses some important steps, which complicates peer review at this stage. In particular, factorisation of the covariance into noise and the inverse of the observation covariance matrix needs a more thorough justification. The cited sources do not contain the derivation for a noise term with full covariance, which is essential for deriving this error term.

      Thank you for pointing this out. We agree that the derivation of the main error term should be presented more explicitly to facilitate peer review. In the revised manuscript, we will explicitly cite the relevant equation numbers from the references to make each step of the argument easier to follow. We will also revise the text to more clearly discuss the assumption on the noise covariance matrix.

      The pratical recommendation at the end of the paper also requires clearer guidance on how the design perturbations are constructed, and how many times and for how long the system is stimulated in each iteration of the experiment.

      Thank you for this helpful suggestion. We agree that the practical implementation of the experimental design should be explained more clearly. In the revised manuscript, we will provide a more explicit description of how the input perturbations are constructed in each iteration. To more clearly explain how many times and for how long the system is stimulated, we will clarify the stopping criterion used in the iterative procedure and the time length of the external inputs. As shown in Eq. (8), the estimation error scales approximately as 1/T, so longer measurements improve accuracy. For clearer guidance, we will add additional explanations on the relation between the stimulation time and estimation accuracy, as well as on the role of iterative input design.

      Finally, there is no analysis of model mis-specification. In particular, the true dynamics are unlikely to be linear; the noise is unlikely to be either Gaussian or uncorrelated across time; and the B matrix is unlikely to be known perfectly. We're not suggesting that the authors consider a more complex model, but it's important to know how sensitive their method is to model mismatch. If nothing can be done analytically, then simulations would at least provide some kind of guide.

      We thank the reviewer for raising this important point. We agree that it is important to understand how sensitive the proposed method is to model mismatch. While our current theoretical analysis assumes linear dynamics with Gaussian noise for analytical tractability, real systems may deviate from these assumptions in several ways, including nonlinear dynamics, temporally correlated noise, or imperfect knowledge of the input matrix B. To address this concern, we will add simulation experiments to examine the robustness of our method under several types of model misspecification. These simulations will provide practical guidance on how deviations from the assumed model affect estimation performance. We will include these results and discuss their implications in the revised manuscript.

    1. eLife assessment

      This important study uses state-of-the-art, multi-region two-photon calcium imaging to characterize the statistics of functional connectivity between visual cortical neurons. Although alternative interpretations may partially account for the data, the study provides solid evidence that functionally distinct classes of neurons convey visual information via parallel channels within and across both primary and higher-order cortical areas.

    2. Reviewer #1 (Public review):

      Summary:

      Using multi-region two-photon calcium imaging, the manuscript meticulously explores the structure of noise correlations (NCs) across mouse visual cortex and uses this information to make inferences about the organization of communication channels between primary visual cortex (V1) and higher visual areas (HVAs). Using visual responses to grating stimuli, the manuscript identifies 6 tuning groups of visual cortex neurons, and finds that NCs are highest among neurons belonging to the same tuning group whether or not they are found in the same cortical area. The NCs depend on the similarity of tuning of the neurons (their signal correlations) but are preserved across different stimulus sets - noise correlations recorded using drifting gratings are highly correlated with those measured using naturalistic videos. Based on these findings, the manuscript concludes that populations of neurons with high NCs constitute discrete communication channels that convey visual signals within and across cortical areas.

      Strengths:

      Experiments and analyses are conducted to a high standard and the robustness of noise correlation measurements is carefully validated. To control for potential influences of behaviour-related top-down modulation of noise correlations, the manuscript uses measurements of pupil dynamics as a proxy for behavioural state and shows that this top-down modulation cannot explain the stability of noise correlations across stimuli.

      Weaknesses:

      The interpretation of noise correlation measurements as a proxy from network connectivity is fraught with challenges. While the data clearly indicate the existence of distributed functional ensembles, the notion of communication channels implies the existence of direct anatomical connections between them, which noise correlations cannot measure.

      The traditional view of noise correlations is that they reflect direct connectivity or shared inputs between neurons. While it is valid in a broad sense, noise correlations may reflect shared top-down input as well as local or feedforward connectivity. This is particularly important since mouse cortical neurons are strongly modulated by spontaneous behavior (e.g. Stringer et al, Science, 2019). Therefore, noise correlation between a pair of neurons may reflect whether they are similarly modulated by behavioral state and overt spontaneous behaviors. Consequently, noise correlation alone cannot determine whether neurons belong to discrete communication channels.

    3. Reviewer #2 (Public review):

      Summary:

      This groundbreaking study characterizes the structure of activity correlations over millimeter scale in the mouse cortex with the goal of identifying visual channels, specialized conduits of visual information that show preferential connectivity. Examining the statistical structure of visual activity of L2/3 neurons, the study finds pairs of neurons located near each other or across distances of hundreds of micrometers with significantly correlated activity in response to visual stimuli. These highly correlated pairs have closely related visual tuning sharing orientation and/or spatial and/or temporal preference as would be expected from dedicated visual channels with specific connectivity.

      Strengths:

      The study presents best-in-class mesoscopic-scale 2-photon recordings from neuronal populations in pairs of visual areas (V1-LM, V1-PM, V1-AL, V1-LI). The study employs diverse visual stimuli that capture some of the specialization and heterogeneity of neuronal tuning in mouse visual areas. The rigorous data quantification takes into consideration functional cell groups as well as other variables that influence trial-to-trial correlations (similarity of tuning, neuronal distance, receptive field overlap, behavioral state). The paper demonstrates the robustness of the activity clustering analysis and of the activity correlation measurements. The paper shows convincingly that the correlation structure observed with grating stimuli is present in the responses to naturalistic stimuli. A simple simulation is provided that suggest that recurrent connectivity is required for the stimulus invariance of the results. The paper is well written and conceptually clear. The figures are beautiful and clear. The arguments are well laid out and the claims appear in large part supported by the data and analysis results (but see weaknesses).

      Weaknesses:

      An inherent limitation of the approach is that it cannot reveal which anatomical connectivity patterns are responsible for observed network structure. A methodological issue that does not seem completely addressed is whether the calcium imaging measurements with their limited sensitivity amplify the apparent dependence of noise correlations on the similarity of tuning. Although the paper shows that noise correlation measurements are robust to changes in firing rates / missing spikes, the effects of receptive field tuning dissimilarity are not addressed directly. The calcium responses of mouse visual cortical neurons are sharply tuned. Neurons with dissimilar receptive fields may show too little overlap in their estimated firing rates to infer noise correlations, which could lead to underestimation of correlations across groups of dissimilar neurons.

    4. Reviewer #3 (Public review):

      Summary:

      Yu et al harness the capabilities of mesoscopic 2P imaging to record simultaneously from populations of neurons in several visual cortical areas and measure their correlated variability. They first divide neurons in 65 classes depending on their tuning to moving gratings. They found the pairs of neurons of the same tuning class show higher noise correlations (NCs) both within and across cortical areas. Based on these observations and a model they conclude that visual information is broadcast across areas through multiple, discrete channels with little mixing across them.<br /> NCs can reflect indirect or direct connectivity, or shared afferents between pairs of neurons, potentially providing insight on network organization. While NCs have been comprehensively studied in neurons pairs of the same area, the structure of these correlations across areas is much less known. Thus, the manuscripts present novel insights on the correlation structure of visual responses across multiple areas.

      Strengths:

      The measurements of shared variability across multiple areas are novel. The results are mostly well presented and many thorough controls for some metrics are included.

      Weaknesses:

      I have concerns that the observed large intra class/group NCs might not reflect connectivity but shared behaviorally driven multiplicative gain modulations of sensory evoked responses. In this case, the NC structure might not be due to the presence of discrete, multiple channels broadcasting visual information as concluded. I also find that the claim of multiple discrete broadcasting channels needs more support before discarding the alternative hypothesis that a continuum of tuning similarity explains the large NCs observed in groups of neurons.

      Specifically:

      Major concerns:

      (1) Multiplicative gain modulation underlying correlated noise between similarly tuned neurons

      (1a) The conclusion that visual information is broadcasted in discrete channels across visual areas relies on interpreting NC as reflecting, direct or indirect connectivity between pairs, or common inputs. However, a large fraction of the activity in the mouse visual system is known to reflect spontaneous and instructed movements, including locomotion and face movements, among others. Running activity and face movements are one of the largest contributors to visual cortex activity and exert a multiplicative gain on sensory evoked responses (Niell et al , Stringer et al, among others). Thus, trial-by-fluctuations of behavioral state would result in gain modulations that, due to their multiplicative nature, would result in more shared variability in cotuned neurons, as multiplication affects neurons that are responding to the stimulus over those that are not responding ( see Lin et al , Neuron 2015 for a similar point).

      In the new version of the manuscript, behavioral modulations are explicitly considered in Figure S8. New analyses show that most of the variance of the neuronal responses is driven by the stimulus, rather than by behavioural variable. However, they new analyses still do not address if the shared noise correlation in cotuned neurons is also independent of behavioral modulations .

      As behavioral modulations are not considered this confound affects the conclusions and the conclusion that activity in communicated unmixed across areas ( results in Figure 4), as it would result in larger NCs the more similar the tuning of the neurons is, independently of any connectivity feature. It seems that this alternative hypothesis can explain the results without the need of discrete broadcasting channels or any particular network architecture and should be addressed to support the main claims.

      (2) Discrete vs continuous communication channels<br /> (2a) One of the author's main claims is that the mouse cortical network consists of discrete communication channels, as stated in teh title of the paper. This discreteness is based on an unbiased clustering approach on the tuning of neurons, followed by a manual grouping into six categories with relation to the stimulus space. I believe there are several problems with this claim. First, this clustering approach is inherently trying to group neurons and discretise neural populations. To make the claim that there are 'discrete communication channels' the null hypothesis should be a continuous model. An explicit test in favor of a discrete model is lacking, i.e. are the results better explained using discrete groups vs. when considering only tuning similarity? Second, the fact that 65 classes are recovered (out of 72 conditions) and that manual clustering is necessary to arrive at the six categories is far from convincing that we need to think about categorically different subsets of neurons. That we should think of discrete communication channels is especially surprising in this context as the relevant stimulus parameter axes seem inherently continuous: spatial and temporal frequency. It is hard to motivate the biological need for a discretely organized cortical network to process these continuous input spaces.

      Finally, as stated in point 1, the larger NCs observed within groups than across groups might be due to the multiplicative gain of state modulations, due to the larger tuning similarity of the neurons within a class or group.

    5. Author response:

      The following is the authors’ response to the original reviews.

      General Response

      We are grateful for the constructive comments from reviewers and the editor.

      The main point converged on a potential alternative interpretation that top-down modulation to the visual cortex may be contributing to the NC connectivity we observed. For this revision, we address that point with new analysis in Fig. S8 and Fig. 6. These results indicate that top-down modulation does not account for the observed NC connectivity.

      We performed the following analyses.

      (1) In a subset of experiments, we recorded pupil dynamics while the mice were engaged in a passive visual stimulation experiment (Fig. S8A). We found that pupil dynamics, which indicate the arousal state of the animal, explained only 3% of the variance of neural dynamics. This is significantly smaller than the contribution of sensory stimuli and the activity of the surrounding neuronal population (Fig. S8B). In particular, the visual stimulus itself typically accounted for 10-fold more variance than pupil dynamics (Fig. S8C). This suggests that the population neural activity is highly stimulus-driven and that a large portion of functional connectivity is independent of top-down modulation. In addition, after subtracting the neural activity from the pupil-modulated portion, the cross-stimulus stability of the NC was preserved (Fig. S8D).

      We note that the contribution from pupil dynamics to neural activity in this study is smaller than what was observed in an earlier study (Stringer et al. 2019 Science). That can be because mice were in quiet wakefulness in the current study, while mice were in spontaneous locomotion in the earlier study. We discuss this discrepancy in the main text, in the subsection “Functional connectivity is not explained by the arousal state”.

      (2) We performed network simulations with top-down input (Fig. 6F-H). With multidimensional top-down input comparable to the experimental data, recurrent connections within the network are necessary to generate cross-stimulus stable NC connectivity (Fig. 6G). It took increasing the contribution from the top-down input (i.e., to more than 1/3 of the contribution from the stimulus), before the cross-stimulus NC connectivity can be generated by the top-down modulation (Fig. 6H). Thus, this analysis provides further evidence that top-down modulation was not playing a major role in the NC connectivity we observed.

      These new results support our original conclusion that network connectivity is the principal mechanism underlying the stability of functional networks.

      Public Reviews:

      Reviewer #1 (Public Review):

      Using multi-region two-photon calcium imaging, the manuscript meticulously explores the structure of noise correlations (NCs) across the mouse visual cortex and uses this information to make inferences about the organization of communication channels between primary visual cortex (V1) and higher visual areas (HVAs). Using visual responses to grating stimuli, the manuscript identifies 6 tuning groups of visual cortex neurons and finds that NCs are highest among neurons belonging to the same tuning group whether or not they are found in the same cortical area. The NCs depend on the similarity of tuning of the neurons (their signal correlations) but are preserved across different stimulus sets - noise correlations recorded using drifting gratings are highly correlated with those measured using naturalistic videos. Based on these findings, the manuscript concludes that populations of neurons with high NCs constitute discrete communication channels that convey visual signals within and across cortical areas.

      Experiments and analyses are conducted to a high standard and the robustness of noise correlation measurements is carefully validated. However, the interpretation of noise correlation measurements as a proxy from network connectivity is fraught with challenges. While the data clearly indicates the existence of distributed functional ensembles, the notion of communication channels implies the existence of direct anatomical connections between them, which noise correlations cannot measure.

      The traditional view of noise correlations is that they reflect direct connectivity or shared inputs between neurons. While it is valid in a broad sense, noise correlations may reflect shared top-down input as well as local or feedforward connectivity. This is particularly important since mouse cortical neurons are strongly modulated by spontaneous behavior (e.g. Stringer et al, Science, 2019). Therefore, noise correlation between a pair of neurons may reflect whether they are similarly modulated by behavioral state and overt spontaneous behaviors. Consequently, noise correlation alone cannot determine whether neurons belong to discrete communication channels.

      Behavioral modulation can influence the gain of sensory-evoked responses (Niell and Stryker, Neuron, 2010). This can explain why signal correlation is one of the best predictors of noise correlations as reported in the manuscript. A pair of neurons that are similarly gain-modulated by spontaneous behavior (e.g. both active during whisking or locomotion) will have higher noise correlations if they respond to similar stimuli. Top-down modulation by the behavioral state is also consistent with the stability of noise correlations across stimuli. Therefore, it is important to determine to what extent noise correlations can be explained by shared behavioral modulation.

      We thank the reviewer for the constructive and positive feedback on our study.

      The reviewer acknowledged the quality of our experiments and analysis and stated a concern that the noise correlation can be explained by top-down modulation. We have addressed this concern carefully in the revision, please see the General Response above.

      Reviewer #2 (Public Review):

      Summary:

      This groundbreaking study characterizes the structure of activity correlations over a millimeter scale in the mouse cortex with the goal of identifying visual channels, specialized conduits of visual information that show preferential connectivity. Examining the statistical structure of the visual activity of L2/3 neurons, the study finds pairs of neurons located near each other or across distances of hundreds of micrometers with significantly correlated activity in response to visual stimulation. These highly correlated pairs have closely related visual tuning sharing orientation and/or spatial and/or temporal preference as would be expected from dedicated visual channels with specific connectivity.

      Strengths:

      The study presents best-in-class mesoscopic-scale 2-photon recordings from neuronal populations in pairs of visual areas (V1-LM, V1-PM, V1-AL, V1-LI). The study employs diverse visual stimuli that capture some of the specialization and heterogeneity of neuronal tuning in mouse visual areas. The rigorous data quantification takes into consideration functional cell groups as well as other variables that influence trial-to-trial correlations (similarity of tuning, neuronal distance, receptive field overlap). The paper convincingly demonstrates the robustness of the clustering analysis and of the activity correlation measurements. The calcium imaging results convincingly show that noise correlations are correlated across visual stimuli and are strongest within cell classes which could reflect distributed visual channels. A simple simulation is provided that suggests that recurrent connectivity is required for the stimulus invariance of the results. The paper is well-written and conceptually clear. The figures are beautiful and clear. The arguments are well laid out and the claims appear in large part supported by the data and analysis results (but see weaknesses).

      Weaknesses:

      An inherent limitation of the approach is that it cannot reveal which anatomical connectivity patterns are responsible for observed network structure. The modeling results presented, however, suggest interestingly that a simple feedforward architecture may not account for fundamental characteristics of the data. A limitation of the study is the lack of a behavioral task. The paper shows nicely that the correlation structure generalizes across visual stimuli. However, the correlation structure could differ widely when animals are actively responding to visual stimuli. I do think that, because of the complexity involved, a characterization of correlations during a visual task is beyond the scope of the current study.

      An important question that does not seem addressed (but it is addressed indirectly, I could be mistaken) is the extent to which it is possible to obtain reliable measurements of noise correlation from cell pairs that have widely distinct tuning. L2/3 activity in the visual cortex is quite sparse. The cell groups laid out in Figure S2 have very sharp tuning. Cells whose tuning does not overlap may not yield significant trial-to-trial correlations because they do not show significant responses to the same set of stimuli, if at all any time. Could this bias the noise correlation measurements or explain some of the dependence of the observed noise correlations on signal correlations/similarity of tuning? Could the variable overlap in the responses to visual responses explain the dependence of correlations on cell classes and groups?

      With electrophysiology, this issue is less of a problem because many if not most neurons will show some activity in response to suboptimal stimuli. For the present study which uses calcium imaging together with deconvolution, some of the activity may not be visible to the experimenters. The correlation measure is shown to be robust to changes in firing rates due to missing spikes. However, the degree of overlap of responses between cell pairs and their consequences for measures of noise correlations are not explored.

      Beyond that comment, the remaining issues are relatively minor issues related to manuscript text, figures, and statistical analyses. There are typos left in the manuscript. Some of the methodological details and results of statistical testing also seem to be missing. Some of the visuals and analyses chosen to examine the data (e.g., box plots) may not be the most effective in highlighting differences across groups. If addressed, this would make a very strong paper.

      We thank the reviewer for acknowledging the contributions of our study.

      We agree with the reviewer that future studies on behaviorally engaged animals are necessary. Although we also agree with the reviewer that behavior studies are out the scope of the current manuscript, we have included additional analysis and discussion on whether and how top-down input would affect the NC connectivity in the revision. Please see the General Response above.

      Reviewer #3 (Public Review):

      Summary:

      Yu et al harness the capabilities of mesoscopic 2P imaging to record simultaneously from populations of neurons in several visual cortical areas and measure their correlated variability. They first divide neurons into 65 classes depending on their tuning to moving gratings. They found the pairs of neurons of the same tuning class show higher noise correlations (NCs) both within and across cortical areas. Based on these observations and a model they conclude that visual information is broadcast across areas through multiple, discrete channels with little mixing across them.

      NCs can reflect indirect or direct connectivity, or shared afferents between pairs of neurons, potentially providing insight on network organization. While NCs have been comprehensively studied in neuron pairs of the same area, the structure of these correlations across areas is much less known. Thus, the manuscripts present novel insights into the correlation structure of visual responses across multiple areas.

      Strengths:

      The study uses state-of-the art mesoscopic two-photon imaging.

      The measurements of shared variability across multiple areas are novel.

      The results are mostly well presented and many thorough controls for some metrics are included.

      Weaknesses:

      I have concerns that the observed large intra-class/group NCs might not reflect connectivity but shared behaviorally driven multiplicative gain modulations of sensory-evoked responses. In this case, the NC structure might not be due to the presence of discrete, multiple channels broadcasting visual information as concluded. I also find that the claim of multiple discrete broadcasting channels needs more support before discarding the alternative hypothesis that a continuum of tuning similarity explains the large NCs observed in groups of neurons.

      Specifically:

      Major concerns:

      (1) Multiplicative gain modulation underlying correlated noise between similarly tuned neurons

      (1a) The conclusion that visual information is broadcasted in discrete channels across visual areas relies on interpreting NC as reflecting, direct or indirect connectivity between pairs, or common inputs. However, a large fraction of the activity in the mouse visual system is known to reflect spontaneous and instructed movements, including locomotion and face movements, among others. Running activity and face movements are some of the largest contributors to visual cortex activity and exert a multiplicative gain on sensory-evoked responses (Niell et al, Stringer et al, among others). Thus, trial-by-fluctuations of behavioral state would result in gain modulations that, due to their multiplicative nature, would result in more shared variability in cotuned neurons, as multiplication affects neurons that are responding to the stimulus over those that are not responding ( see Lin et al, Neuron 2015 for a similar point).<br /> As behavioral modulations are not considered, this confound affects most of the conclusions of the manuscript, as it would result in larger NCs the more similar the tuning of the neurons is, independently of any connectivity feature. It seems that this alternative hypothesis can explain most of the results without the need for discrete broadcasting channels or any particular network architecture and should be addressed to support its main claims.

      (1b) In Figure 5 the observations are interpreted as evidence for NCs reflecting features of the network architecture, as NCs measured using gratings predicted NC to naturalistic videos. However, it seems from Figure 5 A that signal correlations (SCs) from gratings had non-zero correlations with SCs during naturalistic videos (is this the case?). Thus, neurons that are cotuned to gratings might also tend to be coactivated during the presentation of videos. In this case, they are also expected to be susceptible to shared behaviorally driven fluctuations, independently of any circuit architecture as explained before. This alternative interpretation should be addressed before concluding that these measurements reflect connectivity features.

      We thank the reviewer for acknowledging the contributions of our study.

      The reviewer suggested that gain modulation might be interfering with the interpretation of the NC connectivity. We have addressed this issue in the General Response above.

      Here, we will elaborate on one additional analysis we performed, in case it might be of interest. We carried out multiplicative gain modeling by implementing an established method (Goris et al. 2014 Nat Neurosci) on our dataset. We were able to perform the modeling work successfully. However, we found that it is not a suitable model for explaining the current dataset because the multiplicative gain induced a negative correlation. This seemed odd but can be explained. First, top-down input is not purely multiplicative but rather both additive and multiplicative. Second, the top-down modulation is high dimensional. Third, the firing rate of layer 2/3 mouse visual cortex neurons is lower than the firing rates for non-human primate recordings used in the development of the method (Goris et al. 2014 Nat Neurosci). Thus, we did not pursue the model further. We just mention it here in case the outcome might be of interest to fellow researchers.

      (2) Discrete vs continuous communication channels

      (2a) One of the author's main claims is that the mouse cortical network consists of discrete communication channels. This discreteness is based on an unbiased clustering approach to the tuning of neurons, followed by a manual grouping into six categories in relation to the stimulus space. I believe there are several problems with this claim. First, this clustering approach is inherently trying to group neurons and discretise neural populations. To make the claim that there are 'discrete communication channels' the null hypothesis should be a continuous model. An explicit test in favor of a discrete model is lacking, i.e. are the results better explained using discrete groups vs. when considering only tuning similarity? Second, the fact that 65 classes are recovered (out of 72 conditions) and that manual clustering is necessary to arrive at the six categories is far from convincing that we need to think about categorically different subsets of neurons. That we should think of discrete communication channels is especially surprising in this context as the relevant stimulus parameter axes seem inherently continuous: spatial and temporal frequency. It is hard to motivate the biological need for a discretely organized cortical network to process these continuous input spaces.

      (2b) Consequently, I feel the support for discrete vs continuous selective communication is rather inconclusive. It seems that following the author's claims, it would be important to establish if neurons belong to the same groups, rather than tuning similarity is a defining feature for showing large NCs.

      Thanks for pointing this out so that we can clarify.

      We did not mean to argue that the tuning of neurons is discrete. Our conclusions are not dependent on asserting a particular degree of discreteness. We performed GMM clustering to label neurons with an identity so that we could analyze the NC connectivity structure with a degree of granularity supported by the data. Our analysis suggested that communication happens within a class, rather than through mixed classes. We realized that using the term “discrete” may be confusing. In the revised text we used the term “unmixed” or “non-mixing” instead to emphasize that the communication happens between neurons belonging to the same tuning cluster, or class. 

      However, we do see how the question of discreteness among classes might be interesting to readers. To provide further information, we have included a new Fig. S2 to visualize the GMM classes using t-SNE embedding.

      Finally, as stated in point 1, the larger NCs observed within groups than across groups might be due to the multiplicative gain of state modulations, due to the larger tuning similarity of the neurons within a class or group.

      We have addressed this issue in the General Response above and the response to comment (1).

      Recommendations for the authors:

      Reviewing Editor (Recommendations For The Authors):

      A general recommendation discussed with the reviewers is to make use of behavioural recording to assess whether shared behaviourally driven modulations can explain the observed relation between SC and NC, independently of the network architecture. Alternatively, a simulation or model might also address this point as well as the possibility that the relation of SC and NC might be also independent of network architecture given the sparseness of the sensory responses in L2/3.

      We have addressed this in the General Response above.

      Broadly speaking, inferring network architecture based on NCs is extremely challenging. Consequently, the study could also be substantially improved by reframing the results in terms of distributed co-active ensembles without insinuation of direct anatomical connectivity between them.

      We agree that the inferring network architecture based on NCs is challenging. The current study has revealed some principles of functional networks measured by NCs, and we showed that cross-stimulus NC connectivity provides effective constraints to network modeling. We are explicit about the nature of NCs in the manuscript. For example, in the Abstract, we write “to measure correlated variability (i.e., noise correlations, NCs)”, and in the Introduction, we write “NCs are due to connectivity (direct or indirect connectivity between the neurons, and/or shared input)”. We are following conventions in the field (e.g., Sporns 2016; Cohen and Kohn 2011).

      Notice also that the abstract or title should make clear that the study was made in mice.

      Sorry for the confusion, we now clearly state the study was carried out in mice in the Abstract and Introduction.

      Reviewer #1 (Recommendations For The Authors):

      The manuscript presents a meticulous characterization of noise correlations in the visual cortical network. However, as I outline in the public review, I think the use of noise correlations to infer communication channels is problematic and I urge the authors to carefully consider this terminology. Language such as "strength of connections" (Figure 4D) should be avoided.

      We now state in the figure legend that the plot in Fig. 4D shows the average NC value.

      My general suggestion to the authors, which primarily concerns the interpretation of analyses in Figures 4-6, is to consider the possible impact of shared top-down modulation on noise correlations. If behavioral data was recorded simultaneously (e.g. using cameras to record face and body movements), behavioral modulation should be considered alongside signal correlation as a possible factor influencing NCs.

      We have addressed this issue in the General Response above.

      I may be misunderstanding the analysis in Figure 4C but it appears circular. If the fraction of neurons belonging to a particular tuning group is larger, then the number of in-group high NC pairs will be higher for that group even if high NC pairs are distributed randomly. Can you please clarify? I frankly do not understand the analysis in Figure 4D and it is unclear to me how the analyses in Figure 4C-D address the hypotheses depicted in the cartoons.

      Sorry for the confusion, we have clarified this in the Fig. 4 legend.

      Each HVA has a SFTF bias (Fig. 1E,F; Marshel et al., 2011; Andermann et al., 2011; Vries et al., 2020). Each red marker on the graph in Fig. 4C is a single V1-HVA pair (blue markers are within an area) for a particular SFTF group (Fig. 1). The x-axis indicates the number of high NC pairs in the SFTF group in the V1-HVA pair divided by the total number of high NC pairs per that V1-HVA pair (summed over all SFTF groups). The trend is that for HVAs with a bias towards a particular SFTF group, there are also more high NC pairs in that SFTF group, and thus it is consistent with the model on the right side. This is not circular because it is possible to have a SFTF bias in an HVA and have uniformly low NCs. The reviewer is correct that a random distribution of high NCs could give a similar effect, which is still consistent with the model: that the number of high NC pairs (and not their specific magnitudes) can account for SFTF biases in HVAs.

      To contrast with that model, we tested whether the average NC value for each tuning group varies. That is, can a small number of very high NCs account for SFTF biases in HVAs? That is what is examined in Fig. 4D. We found that the average NC value does not account for the SFTF biases. Thus, the SFTF biases were not related to the modulation in NC (i.e., functional connection strength). 

      I found the discussion section quite odd and did not understand the relevance of the discussion of the coefficient of variation of various quantities to the present manuscript. It would be more useful to discuss the limitations and possible interpretations of noise correlation measurements in more detail.

      We have revised the discussion section to focus on interpreting the results of the current study and comparing them with those of previous studies.

      Figure 3B: please indicate what the different colors mean - I assume it is the same as Figure 3A but it is unclear.

      We added text to the legend for clarification.

      Typos: Page 7: "direct/indirection wiring", Page 11: "pooled over all texted areas"

      We have fixed the typos.

      Reviewer #2 (Recommendations For The Authors):

      The significance of the results feels like it could be articulated better. The main conclusion is that V1 to HVA connections avoid mixing channels and send distinctly tuned information along distinct channels - a more explicit description of what this functional network understanding adds would be useful to the reader.

      Thanks for the suggestion. We have edited the introduction section and the discussion section to make the take-home message more clear.

      Previous studies with anatomical data already indicate distinctly tuned channels - several of which the authors cite - although inconsistently:

      • Kim et al 2018 https://doi.org/10.1016/j.neuron.2018.10.023

      • Glickfeld et al., 2013 (cited)

      • Han et al., 2022 (cited)

      • Han and Bonin 2023 (cited)

      Thanks for the suggestion, we now cite the Kim et al. 2018 paper.

      I think the information you provide is valuable - but the value should be more clearly spelled out - This section from the end of the discussion for example feels like abdicates that responsibility:<br /> "In summary, mesoscale two-photon imaging techniques open up the window of cellular-resolution functional connectivity at the system level. How to make use of the knowledge of functional connectivity remains unclear, given that functional connectivity provides important constraints on population neuron behavior."

      A discussion of how the results relate to previous studies and a section on the limitations of the study seems warranted.

      Thanks for the suggestion, we have extensively edited the discussion section to make the take-home message clear and discuss prior studies and limitations of the present study.

      Details:

      Analyses or simulations showing that the dependency of correlations on similarity of tuning is not an artifact of how the data was acquired is in my mind missing and if that is the case it is crucial that this be addressed.

      At each step of data analysis, we performed control analysis to assess the fidelity of the conclusion. For example, on the spike train inference (Fig. S4), GMM clustering (Fig. S1), and noise correlation analysis (Figs. 2, S5).

      None of the statistical testing seems to use animals as experimental units (instead of neurons). This could over-inflate the significance of the results. Wherever applicable and possible, I would recommend using hierarchical bootstrap for testing or showing that the differences observed are reproducible across animals.

      We analyzed the tuning selectivity of HVAs (Fig. 1F) using experimental units, rather than neurons. It is very difficult to observe all tuning classes in each experiment, so pooling neurons across animals is necessary for much of the analysis. We do take care to avoid overstating statistical results, and we show the data points in most figure to give the reader an impression of the distributions.

      Page 2. "The number of neurons belonged to the six tuning groups combined: V1, 5373; LM, 1316; AL, 656; PM, 491; LI, 334." Yet the total recorded number of neurons is 17,990. How neurons were excluded is mentioned in Methods but it should be stated more explicitly in Results.

      We have added text in the Fig. 1 legend to direct the audience to the Methods section for information on the exclusion / inclusion criteria.

      Figure 1C, left. I don't understand how correlation is the best way to quantify the consistency of class center with a subset of data. Why not use for example as the mean square error. The logic underlying this analysis is not explained in Methods.

      Sorry for the confusion, we have clarified this in the Methods section.

      We measured the consistency of the centers of the Gaussian clusters, which are 45-dimensional vectors in the PC dimensions. We measured the Pearson correlation of Gaussian center vectors independently defined by GMM clustering on random subsets of neurons. We found the center of the Gaussian profile of each class was consistent (Fig. 1C). The same class of different GMMs was identified by matching the center of the class.

      Figure 1E. There are statements in the text about cell groups being more represented in certain visual areas. These differences are not well represented in the box plots. Can't the individual data points be plotted? I have also not found the description and results of statistical testing for these data.

      We have replotted the figure (now Fig. 1F) with dot scatters which show all of the individual experiments.

      Figure 2A, right, since these are paired data, I am not quite sure why only marginal distributions are shown. It would be interesting to know the distributions of correlations that are significant.

      This is only for illustration showing that NCs are measurable and significantly different from zero or shuffled controls. The distribution of NCs is broad and has both positive and negative values. We are not using this for downstream analysis.

      Figure 4A, I wonder if it would not be better to concentrate on significant correlations.

      We focused on large correlation values rather than significant values because we wanted to examine the structure of “strongly connected” neuron pairs. Negative and small correlation values can be significant as well. Focusing on large values would allow us to generate a clear interpretation.  

      Figure 4B, 'Mean strength of connections' which I presume mean correlations is not defined anywhere that I can see.

      I believe the reviewer means Fig. 4D. It means the average NC value. We have edited the figure legend to add clarity.

      Figure 4F, a few words explaining how to understand the correlation matrix in text or captions would be helpful.

      Sorry for the confusion, we have clarified this part in figure legend for Fig. 4F.

      Page 5, right column: Incomplete sentence: "To determine whether it is the number of high NC pairs or the magnitude of the NCs,".

      We have edited this sentence.

      Page 5, right column: "Prior findings from studies of axonal projections from V1 to HVAs indicated that the number of SF-TF-specific boutons -rather than the strength of boutons- contribute to the SF-TF biases among HVAs (Glickfeld et al., 2013)." Glickfeld et al. also reported that boutons with tuning matched to the target area showed stronger peak dF/F responses.

      Thank you. We have revised this part accordingly.

      Page 9, the Discussion and Figure 7 which situates the study results in a broader context is welcome and interesting, but I have the feeling that more words should be spent explaining the figure and conceptual framework to a non-expert audience. I am a bit at a loss about how to read the information in the figure.

      Sorry for the confusion, we have added an explanation about this section (page 10, right column).

      As far as I can see, data availability is not addressed in the manuscript. The data, code to analyze the data and generate the figures, and simulation code should be made available in a permanent public repository. This includes data for visual area mapping, calcium imaging data, and any data accessory to the experiments.

      We have stated in the manuscript that code and data are available upon request. We regularly share data with no conditions (e.g., no entitlement to authorship), and we often do so even prior to publication.

      The sex of the mice should be indicated in Figure T1.

      The sex of the mice was mixed. This is stated in the Methods section.

      Methods:

      Section on statistical testing, computation of explained variance missing, etc. I feel many analyses are not thoroughly described.

      Sorry for the confusion, we have improved our method section.

      Signal correlation (similarity between two neurons' average responses to stimuli) and its relation to noise correlation is not formally defined.

      We have included the definition of signal correlation in the Methods.

      Number of visual stimulation trials is not stated in Methods. Only stated figure caption.

      The number of visual stimulus trials is provided in the last paragraph of the Methods section (Visual Stimuli).

      Fix typos: incorrect spelling, punctuation, and missing symbols (e.g. closing parentheses).

      We have carefully examined the spelling, punctuation, and grammar. We have corrected errors and we hope that none remain.

      Why use intrinsic imaging to locate retinotopic boundaries in mice already expressing GCaMP6s?

      We agree with the reviewer that calcium imaging of visual cortex can be used to identify the visual cortex.

      It is true that areas can be mapped using the GCaMP signals. That is not our preferred approach. Using intrinsic imaging to define the boundary between V1 and HVAs has been a well refined routine in our lab for over a decade. It is part of our standard protocol. One advantage is that the data (from intrinsic signals) is of the same nature every time. This enables us to use the same mapping procedure no matter what reporters mice might be expressing (and the pattern, e.g., patchy or restricted to certain cell types).

      Reviewer #3 (Recommendations For The Authors):

      The possibilty that larger intra-group NCs observed simply reflect a multiplicative gain on cotuned neurons could be addressed using pupil and/or face recordings: Does pupil size or facial motion predict NCs and if factored out, does signal correlation still predict NCs?

      Perhaps a variant of the network model presented in Figure 6 with multiplicative gain could also be tested to investigate these issues.

      We have addressed this issue in general response.

      Here, we will elaborate on one additional analysis we performed, in case it might be of interest. We carried out multiplicative gain modeling by implementing an established method (Goris et al. 2014 Nat Neurosci) on our dataset. We were able to perform the modeling work successfully. However, we found that it is not a suitable model for explaining the current dataset because the multiplicative gain induced a negative correlation. This seemed odd but can be explained. First, top-down input is not purely multiplicative but rather both additive and multiplicative. Second, the top-down modulation is high dimensional. Third, the firing rate of layer 2/3 mouse visual cortex neurons is lower than the firing rates for non-human primate recordings used in the development of the method (Goris et al. 2014 Nat Neurosci). Thus, we did not pursue the model further. We just mention it here in case the outcome might be of interest to fellow researchers.

      Similarly further analyses can be done to strengthen support for the claims that the observed NCs reflect discrete communication channels. A direct test of continuous vs categorical channels would strengthen the conclusions. One possible analysis would be to compare pairs with similar tuning (same SC) belonging to the same or different groups.

      Thanks for pointing this out so that we can clarify.

      We did not mean to argue that the tuning of neurons is discrete. Our conclusions are not dependent on asserting a particular degree of discreteness. We performed GMM clustering to label neurons with an identity so that we could analyze the NC connectivity structure with a degree of granularity supported by the data. Our analysis suggested that communication happens within a class, rather than through mixed classes. We realized that using the term “discrete” may be confusing. In the revised text we used the term “unmixed” or “non-mixing” instead to emphasize that the communication happens between neurons belonging to the same tuning cluster, or class. 

      However, we do see how the question of discreteness among classes might be interesting to readers. To provide further information, we have included a new Fig. S2 to visualize the GMM classes using t-SNE embedding.

      I also found many places where the manuscript needs clarification and /or more methodological details:<br /> • How many times was each of the stimulus conditions repeated? And how many times for the two naturalistic videos? What was the total duration of the experiments?

      The number of visual stimulus trials is provided in the last paragraph of the Methods section entitled Visual Stimuli. About 15 trials were recorded for each drifting grating stimulus, and about 20 trials were recorded for each naturalistic video.

      • Typo: Suit2p should be Suite2p (section Calcium image processing - Methods).

      We have fixed the typo.

      • What do the error bars in Figure 1E represent? Differences in group representation across areas from Figure 1E are mentioned in the text without any statistical testing.

      We have revised the Figure 1E (current Fig. 1F), and we now show all data points.

      • The manuscript would benefit from a comparison of the observed area-specific tuning biases across areas (Figure 1E and others) with the previous literature.

      We have included additional discussion on this in the last paragraph of the section entitled Visual cortical neurons form six tuning groups.

      • Why are inferred spike trains used to calculate NCs? Why can't dF/F be used? Do the results differ when using dF/F to calculate NC? Please clarify in the text.

      We believe inferred spike trains provide better resolution and make it easier to compare with quantitative values from electrical recordings. Notice that NC values computed using dF/F can be much larger than those computed by inferred spike trains. For example, see Smith & Hausser 2010 Nat Neurosci. Supplementary Figure S8.

      • The sentence seems incomplete or unclear: "That is, there are more high NC pairs that are in-group." Explicit vs what?

      We have revised this sentence.

      • Figure 1E is unclear to me. What is being plotted? Please add a color bar with the metric and the units for the matrix (left) and in the tuning curves (right panels). If the Y and X axes represent the different classes from the GMM, why are there more than 65 rows? Why is the matrix not full?

      We have revised this figure. Fig. 1D is the full 65 x 65 matrix. Fig. 1F has small 3x3 matrices mapping the responses to different TF and SF of gratings. We hope the new version is clearer.

      • How are receptive fields defined? How are their long and short axes calculated? How are their limits defined when calculating RF overlap?

      We have added further details in the Methods section entitled “Receptive field analysis”.

    1. eLife Assessment

      This useful study presents a simple homeostatic-plasticity model in spiking E-I networks to link spontaneous critical dynamics with representational drift and relatively stable stimulus-response geometry in mouse visual cortex. However, the evidence is incomplete because key concepts and analysis details are not well defined, controls are limited, and several results might be the result of specific methodological choices (e.g., dimensionality reduction, aggregation, or tuned parameters) rather than a robust mechanism. As a result, the work currently supports an interesting correlation between these phenomena, but not a clear causal account.

    2. Reviewer #1 (Public review):

      Summary:

      The authors study criticality and drift in spontaneous activity observed in visual cortex of mice from existing data, and relate it to a model based on homeostatic plasticity. The main phenomena are power laws and an alignment across different neural representations that is maintained through drift.

      Strengths:

      The authors should be commended by making the effort of relating their model to experimental data. The mechanism that they propose has the advantage of being simple, and could unify various phenomena.

      Weaknesses:

      Introduction/abstract: General wording: the notion of reliability, which is key to the paper is not explicitly defined anywhere. The authors refer to some notion of information being preserved, but again, this is not clearly explained. A good example is the sentence "identical input signals exhibit significant variability but also share certain reliability across sessions". Depending on the definition of reliability, the sentence could be a contradiction. A similar issue appears when the authors talk about "restricted" representation. I get what they want to say, but it's not properly defined. "One example is the recent studies about stimulus-evoked..." The sentence explains that there are examples, but provides no citations! Also "One" and "exampleS"

      Fig. 1: - The method to fit the power law is not detailed in the methods (just a vague reference to a package). This is a problem because some methods like least squares don't do well on power laws, and particularly for neuroscience due to low sampling (Wilting & Priesemann, Nat com.). - The "olive" curve is not "olive". Olive is dark green, and the color is purple. The problem appears in the subsequent figure.

      Fig. 2: - The number of neurons is very small (19). This is very odd, since the original dataset has a lot of neurons. Also, the authors seem to pick age 97 and 102, but do not explain why those two points have any relevance. - If you run a correlation you need to explain what is the correlation (pearson, spearman?). It also matters where the variables are normalized or not, and there is no control for shuffling. - The authors mention "low dimensional", but don't explain what method they use (looks t-SNE to me). - The authors use the word "signal" while in the text they refer to the "mean activity". Are those the same? - "We reproduced previous results showing that low-dimensional embeddings of mean population response vectors for different signals remain similar across sessions" The blue and green clusters that the authors report as being close across sessions are not close. Red-green-grey seem to remain closer, but even that is quite a stretch. - Correlation across matrices is strange. Since the authors did not clarify the actual formula or method, the correlation of 0.5 in Fig. 2E could be simply due to the fact that all the variables are pre-selected to be positive (or above threshold). This would also have an important effect on the angle (Fig. G). In fact, it would explain how comes that the correlation does not decrease with Delta T (which is what would be expected from drift. - Whenever the authors run a statistical analysis, it would help to run a shuffled control.

      Self-organised criticality emerges through homeostatic plasticity. - The authors refer a lot to reference 35, but it's not clear what is the difference between their work and that one. - The text provides a general overview and refers to the methods for details. Since most of the results are based on that mode, I suggest putting it in the main text (although this is an opinion, not a dealbreaker). - Especially, mention which populations are we talking about, what are the numbers of neurons in each, and how are they connected.

      • Fig. 4 has a lot of the same weaknesses as Fig. 2. In fact, the results on E are very similar, despite the fact that the matrices in D are clearly not the same.

      Enhanced Neural representation through self-organised criticality The phase transition seems to be an observation over a computational model, but I don't see much analysis. It would be nice to have some order parameter, although the plots are convincing without it. The authors do spend time talking about co-spiking and silent periods though, but don't actually plot this. The only reference is to S4, which actually only seems to cover the super-critical state.

      Fig 6: - It might be true that the accuracy peaks at the critical point, but it's really hard to call it significant. The authors should run multiple models and assess significance. - I don't entirely see the point of C. What does it mean for the model? And although I assume it is on the same experimental data, the authors do not mention it.

      Fig. 7: - Plot is squeezed, and has low resolution. - Since the authors didn't clarify whether they have II connections or not (some models use them, some don't), or whether their plasticity applies to inhibitory neurons, it is very hard to assess what are the differences between A and B.

      References: There are a fair amount of works that studied computational models for criticality. I am particularly thinking of the works of Bruno del Papa "Criticality meets learning: Criticality signatures in a self-organizing recurrent neural network". Experimentally, there are works showing that the so-called spontaneous activity is actually very reliable (if you record enough neurons). Nghia et al. "Nguyen, Nghia D., et al. "Cortical reactivations predict future sensory responses." Nature 625.7993 (2024): 110-118."

      An important point missing in this work is that it assumes that spontaneous activity is somehow intrinsically generated. This is not necessarily true of cortical areas (where it could easily come from hippocampus).

    3. Reviewer #2 (Public review):

      This work attempts to reconcile the concepts of critical neural dynamics with short-term reliable responses and long-term drifting responses. This is an important question, because critical dynamics are typically associated with unpredictable population responses to perturbations. Instead, this paper demonstrates that recordings from the mouse visual cortex include typical avalanche statistics in their spontaneous state as well as clustered within-session responses to natural movies. The authors find that a spiking neural network with homeostatic plasticity on inhibitory coupling captures the correlation-based metrics observed in experiments and that this network self-organizes into a critical state.

      Strengths:

      The structure of the manuscript is clear, and the line of argumentation is easy to follow. The question raised is valid, and the model employed to answer it is adequate. While I am unsure if representation should be equated with reliable responses, I find the framework of reliable responses well-suited to compare experimental and numerical data.

      Weaknesses:

      • The claim that the presented model "self-organizes to the critical spontaneous state" is incompatible with Fig. 6 showing that the inhibitory timescale is a control parameter of the transition from subcritical to supercritical avalanche statistics.

      • The notion of "drift" implies to me a gradual change on long timescales. This is demonstrated in Ref. [47] for a model including two different types of plasticity. Also, such a drift over time was observed in Ref. [11] Fig.3C. In the present work, we can see from Fig. 2E that the correlation drops immediately to a plateau. Instead, the model actually shows some decay of correlations, expected from the ongoing plasticity. This challenges the claim that the "model successfully reproduce[s] both representational drift and [...]". Instead, the model of [47] does reproduce representation drift.

      • The claim that "spontaneous self-organized criticality serves as [...] functional mechanism for maintaining reliable information representation under continuously changing networks" is not justified by the above-raised points.

      • From the methods, I understand that the dimensionality reduction in Fig.2C and Fig.4C is a result of independent t-SNE. Since t-SNE to my knowledge starts with a random projection of data to then optimize the embedding, the resulting orientation of independent runs cannot be compared such that statements like "rotation of low-dimensional representations as in Fig. 2C, where nodes (centers of the same-color clusters) change their positions across sessions (top panel and bottom panel), but their relative positions remain stable" are not possible.

    4. Reviewer #3 (Public review):

      Summary:

      This study uses computational modeling of a spiking network of E-I with homeostatic inhibitory plasticity and aims to show that self-organized criticality that arises from the homeostatic mechanism can result in representational drift as well as reliable stimulus representation, because the geometric representation of stimuli remains restricted.

      Strengths:

      This paper provides a framework to link critical spontaneous state, homeostatic inhibitory plasticity, representational drift, and stimulus population response reliability

      Weaknesses:

      The study does not show a causal (or necessary/ sufficient) relationship between criticality at the spontaneous state, representational drift, and reliable stimulus presentation. The study only reports an observation that these features could co-exist. However, it does not show how the criticality of the spontaneous state could restrict the manifold for stimulus response.

    1. eLife Assessment

      Using a combination of innovative and robust techniques, this study outlines cell-type-specific translational landscape changes that occur in the spinal cord neurons in the early and late phases of nerve injury. The authors provided compelling evidence suggesting an essential role of protein synthesis regulation in the chronic phase of neuropathic pain. Although additional mechanisms contributing to late-phase neuropathic pain beyond altered PV+ neuron excitability remain to be elucidated, this is a fundamental and significant study toward a comprehensive understanding of the molecular pathways involved in neuropathic pain.

    2. Reviewer #2 (Public review):

      Summary:

      This manuscript compares transcription and translation in the spinal cord during the acute and chronic phases of neuropathic pain induced by surgical nerve injury. The authors chose to focus their investigation on translation in the chronic phase due to its greater impact on gene expression in the spinal cord compared to transcription.

      (1) The study is significant because the molecular mechanisms underlying chronic pain remain elusive. The role of translational regulation in the spinal cord has not been investigated in neuroplasticity and chronic pain mouse models. The manuscript is innovative and technically robust. The authors employed several cutting-edge techniques such as Rio-seq, TRAP-seq, slice electrophysiology, and viral approaches. Despite the technical complexity, the manuscript is well-written. The authors demonstrated that inhibition of eIF4E alleviates pain hypersensitivity, that de novo protein synthesis is more pronounced in inhibitory interneurons, and that manipulating mTOR-eIF4E pathways alters mechanical sensitivity and neuroplasticity.

      (2) Strengths: innovation (conceptual and technical levels), data support the conclusions.

      Comments on revisions:

      The authors did a great job addressing my comments.

    3. Reviewer #4 (Public review):

      Summary:

      The significance of this study lies in its focus on translational regulation in the late phase of neuropathic pain, using both genetic and pharmacological approaches, with specific emphasis on parvalbumin-positive (PV⁺) inhibitory interneurons in the spinal cord. The authors are very responsive to all the reviewers' comments.

      Strengths:

      I did not review this manuscript in the first round. However, the authors have been highly responsive to the reviewers' comments and have substantially strengthened the study. They conducted new behavioral experiments that yielded informative negative results (Fig. 6A and 6B). These findings demonstrate that targeting translational control in PV neurons is sufficient to reverse SNI-induced reductions in PV neuron excitability, but insufficient to ameliorate behavioral phenotypes. This suggests that additional cell types and pathways contribute to late-phase neuropathic pain.

      Weaknesses:

      Only the withdrawal threshold was measured to assess neuropathic pain. Some studies only used female mice. However, the authors appropriately discuss the study's limitations in the final two paragraphs and have added experimental details to improve clarity. Overall, the manuscript has been significantly improved.

    4. Reviewer #5 (Public review):

      Summary:

      This study investigates the molecular mechanisms underlying the maintenance of neuropathic pain, specifically focusing on the role of mRNA translation in the spinal cord. Using the Spared Nerve Injury (SNI) model, the authors demonstrate that while both transcription and translation are active in the early phase, the chronic phase (day 63) is uniquely characterized by a shift toward translational control. They identify spinal inhibitory neurons, particularly parvalbumin-positive interneurons, as key sites of this translational regulation.

      Strengths:

      Technical Rigor: The use of Ribo-seq and TRAP-seq allows for a high-resolution view of the "translatome," which more accurately reflects the functional protein output than standard mRNA-seq.Novelty: The study uncovers that reducing a single translation initiation factor (eIF4E) specifically in the CNS is sufficient to provide long-lasting relief from established chronic pain.Addressing Disinhibition: The electrophysiological evidence showing that increased translation in PV+ neurons reduces their excitability provides a clear mechanism for the "spinal disinhibition" typically seen in chronic pain.

      Weaknesses:

      Cell-Type Sufficiency: New experiments in the revision show that while inhibiting translation in PV+neurons restores their individual excitability, it is not sufficient on its own to reverse behavioral pain hypersensitivity. This suggests that the maintenance of chronic pain likely involves translational changes across a broader network of cell types, including other inhibitory neurons or non-neuronal cells like microglia. -This does not have to be resolved in the current study, but providing some framework to account for potential mechanisms might help the audience.

    5. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This study investigated the role of transcriptional and translational controls of gene expression in dorsal root ganglia and lumbar spinal cord in neuropathic pain in mice. Using ribosome profiling (Ribo-seq) and translating ribosome affinity purification (TRAP), they show changes in transcriptomic and translational gene expression at the peripheral and central levels rapidly after nerve injury. While translational changes in gene expression remained elevated for more than two months in both DRGs and the spinal cord, transcriptomic regulation was absent in the spinal cord long after the onset of neuropathy. Disrupting mRNA translation in dorsal horn neurons using antisense oligonucleotides reduced mechanical withdrawal threshold and facial expression of pain. Using fluorescent noncanonical amino acid tagging (FUNCAT), the authors further show that de novo protein expression primarily occurs in inhibitory neurons in the superficial dorsal horn after nerve injury. Accordingly, a selective increase in translational control of gene expression in spinal inhibitory neurons, or a subset of mainly inhibitory neurons expressing parvalbumin (PV), using transgenic mice, led to a decrease in the excitability of PV neurons and mechanical allodynia. In contrast, decreasing the translational control of spinal PV neurons prevented the alteration of the electrophysiological properties of the PV cells induced by nerve injury.

      Strengths:

      This is a well-written article that uncovers a previously unappreciated role of gene expression control in PV neurons, which seems to play an important part in the loss of inhibitory control of spinal circuits typically seen after peripheral nerve injury. The conclusions are generally well supported by the data.

      Weaknesses:

      The study would benefit from further clarifications in the methods section and a deeper analysis of gene expression changes in mRNA expression and ribosomal footprint observed after nerve injury.

      We have improved the description of the methods and clarified the rationale underlying the presentation of gene expression changes. We have also added lists of the top differentially expressed genes at both the translational and transcriptional levels to Figure 1, and improved the description of the datasets in the Supplementary Materials.

      Antisense oligonucleotides used to reduce translation by disrupting eIF4E expression were administered i.c.v. It is unknown if the authors controlled for locomotor deficits, which might add confounds in the interpretation of behavioral results. A more local route should have been preferable to avoid targeting brain regions, which could potentially affect behavior.

      Thank you for raising this important point. We used i.c.v. administration to specifically target the central nervous system (CNS) without affecting the peripheral nervous system, as this is the recommended approach for selectively targeting the CNS using ASOs. Intraspinal administration of ASOs (into the spinal cord parenchyma) at an effective dose for long-term effects is not feasible. Intrathecal administration is possible but would result in exposure of the DRGs to the injected ASO and therefore would not be specific to the CNS.

      To rule out potential locomotor deficits, we now subjected mice to the rotarod and open field tests to assess motor function. We found no differences between eIF4E-ASO– and control-ASO– injected mice (Fig. 2J, K).

      In the revised version of the manuscript, we now better explain the rationale for i.c.v. injection. Moreover, we discuss the potential supraspinal effects of eIF4E-ASO in the Limitations section, while also describing the lack of motor phenotypes in the rotarod/open field tests.

      Only female mice were used for Ribo-Seq, TRAP, FUNCAT, and electrophysiology, but both sexes were used for behavior experiments.

      Our manuscript involves various complicated techniques and analyses. Due to limited resources, we therefore opted to use only females for expensive and labor-intensive experiments, such as Ribo-Seq, TRAP, FUNCAT, and electrophysiology, while using both sexes for behavioral studies.

      We now clearly acknowledge this limitation in the revised manuscript.

      The conditional KO of 4E-BP1 using transgenic animals should be total in the targeted cells. However, only a partial reduction is reported in Figure S2 in GAD2, PV, Vglut2, or Tac1 cells. Again, proper methods for quantification of fluorescence in these experiments are lacking.

      We apologize for the oversight; we have now updated the description of the methods for IHC signal quantification. Although genetic ablation is indeed expected to result in a complete loss of signal, in practice, previous studies employing IHC, but not Western blotting, for 4E-BP1 have also shown only a partial reduction in signal. This is likely because the 4E-BP1 antibody partially detects other epitopes. Using the same antibody, we and others have shown complete elimination of the band corresponding to 4E-BP1 in spinal cord and DRG tissue (e.g., PMID: 26678009).

      The elegant knockdown of eIF4E using AAV-mediated shRNAmir shows a recovery of the electrophysiological intrinsic properties of PV neurons after injury. It is unclear if such manipulation would be sufficient to reverse mechanical allodynia in vivo.

      Thank you for this concern, which was also raised by other reviewers. We have now performed two additional experiments, which revealed that suppressing the mTORC1–eIF4E axis in spinal PV neurons (using AAVs expressing eIF4E-shRNA in spinal PV neurons [Fig. 6A] and transgenic mice expressing non-phosphorylatable 4E-BP1 in PV neurons [Fig. 6B]) is not sufficient to alleviate neuropathic pain. These new findings need to be reconciled with our other results showing that eIF4E downregulation in PV neurons prevents the SNI-induced reduction in their excitability, and that ASO-mediated suppression of eIF4E, which affects all cell types, alleviates neuropathic pain.

      Together, these results suggest that targeting translational control in PV neurons is sufficient to reverse SNI-induced reduction in PV neuron excitability, but is not sufficient to prevent behavioral phenotypes, which likely require changes in other cell types and/or additional pathways, as well as other alterations within PV neurons. We have now included these new results in the revised manuscript (Fig. 6A and Fig. 6B) and revised the text accordingly. These changes include toning down the role of translational control in PV neurons after SNI in driving behavioral hypersensitivity.

      Reviewer #2 (Public review):

      Summary:

      I reviewed the manuscript titled "Translational Control in the Spinal Cord Regulates Gene Expression and Pain Hypersensitivity in the Chronic Phase of Neuropathic Pain." This manuscript compares transcription and translation in the spinal cord during the acute and chronic phases of neuropathic pain induced by surgical nerve injury. The authors chose to focus their investigation on translation in the chronic phase due to its greater impact on gene expression in the spinal cord compared to transcription.

      (1) The study is significant because the molecular mechanisms underlying chronic pain remain elusive. The role of translational regulation in the spinal cord has not been investigated in neuroplasticity and chronic pain mouse models. The manuscript is innovative and technically robust. The authors employed several cutting-edge techniques such as Rio-seq, TRAP-seq, slice electrophysiology, and viral approaches. Despite the technical complexity, the manuscript is wellwritten. The authors demonstrated that inhibition of eIF4E alleviates pain hypersensitivity, that de novo protein synthesis is more pronounced in inhibitory interneurons, and that manipulating mTOR-eIF4E pathways alters mechanical sensitivity and neuroplasticity.

      Strengths:

      Innovation (conceptual and technical levels), data support the conclusions.

      Weakness:

      Confusion about the sex of the animals. It is unclear whether eIF4E ASO affects translation and which cells. It is not determined that modulating translation in PV<sup>+</sup> neurons impacts neuropathic pain behaviors.

      We thank the reviewer for their thoughtful comments. In the revised version of the manuscript, we better explain that both sexes were used for behavioral experiments, whereas only females were used for Ribo-Seq, TRAP, FUNCAT, and electrophysiology experiments.

      ASOs are not known to be intrinsically cell-type-specific; therefore, we do not expect differential effects on excitatory versus inhibitory neurons. We demonstrated that eIF4E-ASO reduces the levels of eIF4E, a key translation initiation factor that is rate-limiting for cap-dependent translation.

      Moreover, in the revised manuscript we included two additional experiments (Fig. 6A and Fig. 6B) showing that decreased eIF4E-dependent translation in PV neurons is not sufficient to alleviate neuropathic pain, despite its effects on excitability measures. We have updated the manuscript to reflect these important new findings

      Reviewer #3 (Public review):

      Summary:

      This study provides evidence for translational changes in inhibitory spinal dorsal horn neurons following chronic nerve injury. Gene expression changes have been widely studied in the context of pain induction and provided key insights into the adaptation of the nervous system in the early phases of chronic pain. Whereas this is interesting biologically, most patients will arrive in the clinic beyond the acute phase of their injury, thus limiting the translational relevance of these studies. Recent studies have extended this work to highlight the difference between acute and chronic pain states, potentially explaining the cascading factors leading to chronic pain, and hopefully how to prevent this in vulnerable populations. The present study suggests that translational changes within spinal inhibitory populations could underlie long-term chronic pain, leading to decreased inhibition and heightened pain thresholds.

      Strengths:

      The approaches used and the broad outcomes of the manuscript are interesting and could be an exciting development in the field. The authors are using approaches more common in molecular biology and extending these into neuroscientific research, getting into the detail of how pathology could impact gene expression differentially across the course of an injury. This could open up new areas of research to selectively target not only defined populations but additionally help alleviate pain symptoms once an injury has already reached the maintenance phase. There is an opportunity to delve into what must be a very large data set and learn more about what genes are differentially translated and how this could affect circuit function.

      Weaknesses:

      Whereas the authors approach a key question in pain chronicity, the manuscript falls a little short of providing any conclusive data. The manuscript was in some areas very difficult to follow. Terminology was not always consistent or clear, and the flow of the manuscript could use some attention to highlight key areas. Whereas the overall message is clear in the summary, this would not necessarily be the case when reading the manuscript alone.

      To improve the clarity and flow of the manuscript, we made changes to the text, including the addition of intermediate summaries and further explanations of terms and experiments.

      The study claims to show that translational control mechanisms in the spinal cord play a role in mediating neuropathic pain hypersensitivity, but the studies presented do not fully support this statement. The authors instead provide some correlation between translation and behavioural reflex excitability (namely vfh and Hargreaves).

      It is difficult to fully interpret the work, as there are a number of inconsistencies, namely the range of timings pre- and post-injury, lack of controls for manipulations, the use of shmiRNA versus lineage deletions, and lack of detailed somatosensory testing. It is not completely clear how this work could be translatable as is, without a deeper understanding of how translational control affects circuit function and whether all of this is necessarily bad for the system, or whether this is a positive homeostatic adaptation to the hyperexcitability of the circuit following injury.

      A large portion of the work is focussed on showing an inhibitory-selective change in translation following chronic nerve injury. The evidence for this is however lacking. Statistics to show that translational effects are restricted to inhibitory subpopulations are inadequate. The author's choice of transgenic lines is not clear and seems to rely on availability rather than hypothesis.

      Although we agree with some of the criticism, we have reservations regarding other points raised by the reviewer. To address several of the concerns, we added new experiments (Fig. 2J, 2K, 6A, and 6B). We also made changes to the text to improve readability and to better explain the rationale for the study and our focus on inhibitory neurons.

      For example, we clarify that we do not state that changes in mRNA translation in the spinal cord during the chronic phase of neuropathic pain occur exclusively in inhibitory neurons. Although we observe changes in general protein synthesis, assessed using FUNCAT, in inhibitory but not excitatory neurons after SNI, alterations in the translation of specific transcripts, assessed using the TRAP approach, are observed in both excitatory and inhibitory neurons.

      The second part of the paper focuses on inhibitory neurons because these neurons demonstrate larger translational changes. We now clearly indicate that alterations in excitatory neurons are also likely important during the chronic phase of SNI. This conclusion is further supported by newly added results (Fig. 6A and Fig. 6B), showing that targeting eIF4E-dependent translation in spinal PV neurons using two different approaches is not sufficient to reverse pain hypersensitivity.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Analysis of gene expression in Figure 1 lacks clarity, and the data do not effectively guide the reader toward their intended purpose. A list of the most dysregulated genes at the transcriptional level, the translational level, or both, would help the reader fully appreciate the outcome of this analysis. Similarly, what is the message conveyed by Figures 4 D-G?

      As requested, we have now included the top 10 upregulated and top 10 downregulated genes at both the translational and transcriptional levels in Figure 1. We also expanded the main text and figure legends to clarify that Supplementary Figure 1 includes volcano plots for all conditions, and that Supplementary Table 1 contains the complete datasets. In addition, we expanded the figure legends to explain the organization of the data in Supplementary Table 1. Finally, we provide pathway analyses of translationally regulated genes in the spinal cord, as this condition is the primary focus of the study.

      Figure 4D–G shows the top 15 translationally upregulated and downregulated genes in inhibitory neurons at days 4 (D) and 60 (E), and in Tac1<sup>+</sup> excitatory neurons at days 4 (F) and 60 (G) (four conditions in total) after SNI. These panels convey that translational regulation of specific transcripts occurs in both inhibitory and excitatory neurons. Panel 4H further demonstrates that, although translational changes are observed in both neuronal populations, a greater number of genes are altered in inhibitory neurons. We have improved the readability and flow of this section to better convey this message.

      Details about how AHA was quantified in Figure 3 are missing. It is unclear how and where the cells were selected for quantification. Objective criteria for expression/no expression of AHA in the cells are not indicated. Additionally, the signal seems to have somehow been normalized over images from the contralateral side. It is difficult to understand what the bar graphs actually represent in panel C. One would interpret them as percentages of excitatory/inhibitory cells expressing AHA.

      We apologize for the lack of clarity. We have now expanded the description of the analyses in the figure legend and in the Methods to better explain the results shown in Fig. 3. The imaged cells were selected based on specific criteria, such as lamina location and cell type. In panel C (the anisomycin experiment), values were normalized to the control group. In all other panels, no normalization was applied, and the values represent the AHA integrated density on maximumintensity projection images (averaged per mouse). We also describe the number of sections and cells per mouse, as well as other technical details, as requested.

      In addition, a few minor changes should be made:

      (1) Rephrase Introduction: "Peripheral nerve injury can cause neuropathic pain, a chronic pain condition [...]." Neuropathic pain is not necessarily chronic.

      This sentence was reworded to read “Peripheral nerve injury may result in neuropathic pain, a debilitating condition with limited effective treatment options”.

      (2) Host species for secondary anti-mouse antibodies are provided but not for the anti-rabbit (donkey?). Also, check for consistency in the methods section. The method mentions P21 two secondary antibodies and an apparent third antibody named "anti-HRP-conjugated antibody." Please provide information about this antibody, or remove it.

      Thank you for flagging it, the inadvertent repetition of “anti-HRP-conjugated antibody” was removed.

      (3) Provide primary antibody hosts on page 22.

      The hosts of all primary and secondary antibodies were now provided.

      (4) Define PBST on page 21 and PBS-T on page 22.

      We defined PBST in the revised manuscript (0.2% Triton-X100 in PBS).

      (5) Specify the filter sets used for fluorescent microscopy.

      We specified the filter sets used for fluorescent microscopy.

      (6) Change the legend to 50% withdrawal threshold for vF behavior tests.

      We addressed this by making the requested change in all relevant legends.

      Reviewer #2 (Recommendations for the authors):

      Major:

      (1) The authors need to show that eIF4E ASO (Figure 2) reduces translation in both inhibitory and excitatory neurons.

      ASOs are not intrinsically cell-type specific, as they do not contain promoters or regulatory elements and act wherever they enter cells and engage RNase H1. However, differences in ASO effects across cell types can arise from variability in uptake, intracellular trafficking, RNase H activity, or target mRNA expression levels.

      In our study, we used eIF4E-ASO as a general approach to demonstrate that eIF4E-dependent translation contributes to SNI-induced hypersensitivity, particularly at the chronic phase. We show a marked reduction in eIF4E levels in the spinal cord of eIF4E-ASO–injected mice compared with controls. We do not claim that the effects of eIF4E-ASO are mediated by a specific cell type; rather, they may involve excitatory neurons, inhibitory neurons, and non-neuronal cells, such as microglia and astrocytes, among others.

      Notably, while eIF4E can promote general translation during development, in adult mice it predominantly regulates cap-dependent translation of specific mRNAs without having a major effect on overall protein synthesis. In our case, the partial reduction in eIF4E is unlikely to substantially affect general translation, as assessed by AHA incorporation, and would instead require TRAP or Ribo-Seq to detect transcript-specific translational changes. We now better explain the rationale for the eIF4E-ASO experiment and clearly state that the effects observed cannot be attributed to a specific cell type.

      In addition, our new results showing that inhibition of eIF4E-dependent translation in PV neurons is not sufficient to alleviate SNI-induced mechanical hypersensitivity suggest that translational changes in other neuronal and/or non-neuronal cell types contribute to hypersensitivity. This important point is now more clearly explained in the revised manuscript, and the role of PV neurons is toned down throughout the paper.

      (2) In Figure 5, it is necessary to show the effect of eIF4E-shRNA in PV+ neurons on neuropathic behaviors (von Frey and MGS).

      To address this important concern, we performed two new experiments, both of which showed that inhibiting the mTORC1–eIF4E axis in parvalbumin neurons is not sufficient to alleviate neuropathic pain. First, we injected PV-Cre mice with AAV-eIF4E-shRNAmir and a scrambled control. We found that downregulating eIF4E in spinal PV neurons has no effect on SNI-induced mechanical hypersensitivity. We used a second, complementary approach to validate this finding. Specifically, we generated transgenic mice in which a non-phosphorylatable form of 4E-BP1 is expressed in PV neurons. Because non-phosphorylatable 4E-BP1 acts as a translational suppressor of eIF4E, this approach is functionally similar to eIF4E deletion.

      Altogether, our findings indicate that cell-type–non-specific suppression of eIF4E using ASOs is sufficient to alleviate neuropathic pain, particularly at the chronic phase. In contrast, while activation of eIF4E-dependent translation in PV neurons (via 4E-BP1 deletion) induces pain hypersensitivity, suppression of eIF4E-dependent translation in PV neurons inhibits SNI-induced decrease in PV neuron excitability but does not alleviate pain hypersensitivity. Thus, increased eIF4E-dependent translation in PV neurons is sufficient to induce pain hypersensitivity, but targeting this pathway in PV neurons alone is not sufficient to reverse neuropathic pain.

      Potential explanations for these findings include: (1) the presence of other important mechanisms in PV neurons (e.g., changes in synaptic transmission) that are translation independent; (2) the insufficiency of correcting reduced PV neuron excitability to alleviate hypersensitivity; and (3) an essential role for mRNA translation in other neuronal and/or non-neuronal cell types in neuropathic pain. We have updated the manuscript to include these potential explanations in the Discussion section.

      Moderate:

      (1) In Figure 2, MGS should be performed at earlier time points as well.

      We performed MGS when von Frey testing, which is less noisy and less labor intensive in our hands, suggested altered phenotypes.

      (2) In Figure 4B, the gene markers are different in Gad2+ and Tac1+ cells. Please show the 12 markers for both cell types.

      We now better explain the selection of the markers.

      (3) In Figure 5, MGS should be performed to test if the effect is limited to mechanical sensation/reactivity or extends to nociception. Additionally, do these mice exhibit altered locomotion and grip strength?

      As described above, we added experiments involving downregulation of eIF4E and expression of a mutant non-phosphorylatable 4E-BP1 in PV neurons. We performed von Frey testing, which showed no effect of suppressing the mTORC1–eIF4E axis on mechanical hypersensitivity under these conditions. Given these negative results, we did not proceed with mouse grimace scale (MGS) analysis.

      (4) In Figure S2E, the reduction of eIF4E does not appear to be specific to GFP+ cells.

      We now replaced the representative images in this Figure.

      (5) Can chronic neuropathic pain be reduced by enhancing 4E-BP1 specifically in PV+ neurons?

      We added the experiment proposed by the reviewer in Fig. 6B. We found that enhancing 4E-BP1 activity, by expressing a non-phosphorylatable form of 4E-BP1 in PV neurons, is not sufficient to alleviate neuropathic pain hypersensitivity.

      (6) Why did the authors not use PainFace for the MGS?

      We began using manual, blinded MGS scoring, as originally described by Mogil and colleagues in 2010 (PMID: 20453868), for this project before PainFace became available around 2019 (e.g., Tuttle and Zylka) and in later versions (e.g., PMID: 39024163). For consistency, we therefore continued using the same approach throughout the experiments.

      (7) In Figures 2A-C, the labeling of the bar graphs seems incorrect: is it 4E-BP1 or eIF4E immunoreactivity?

      Thank you very much for noticing this; we have corrected the mistake.

      (8) In Figure 1, present the data by sex.

      We performed sequencing analyses only in females. This decision was based on the large number of mice and experimental conditions required for both Ribo-Seq (n = 15 mice per replicate, 3 replicates per condition, and 2 time points for SNI/Sham, ~180 mice total) and TRAP (n = 3 mice per replicate, 3 replicates per condition, 2 time points, and 2 genotypes [Tac1 and GAD2] for SNI/Sham), as well as the high cost of sequencing. Behavioral experiments were performed in both sexes. This information is clearly indicated in the Methods section, and we have now also included it in the Limitations section of the paper.

      (9) While the methods state that all behavioral testing was done with equal numbers of male and female mice, it seems that several experiments were done only in females. In the absence of a strong justification, all experiments should be conducted in both sexes.

      As explained above, due to the very large number of mice required for some experiments and the high cost of sample processing and sequencing, only behavioral experiments were performed in both sexes. We now clearly describe the sex of the animals used in each experiment in the figure legends.

      Minor:

      (1) In Figure 3, the legend is confusing and lacks labels.

      We expanded the Fig. 3 legends and added labels, as requested.

      Reviewer #3 (Recommendations for the authors):

      Overall, the manuscript needs to be made clearer and more specific. As it stands, the logic and flow are difficult to follow. Figure legends are not always indicative of the figure and are inconsistent.

      Regarding timelines:

      The logic of the different timelines is not clear. Either explain why different times post-injury were chosen between experiments or keep them consistent. It seems a key message here is that the timing is important. It therefore follows that the authors should be strict about this in their own experiments. Figure 1: 4 and 63 days. Figure 2: Day 3 and weeks 8 and 12. Figure 3: Days 4 and 60. Figure 4: Days 4 and 60. Figure 5: 6 weeks. Figure S1: 4 and 60. Clarifying why these timings were used in each case and showing at the transcript level that these are most appropriate would be needed.

      We thank the reviewer for carefully reviewing our manuscript. We focused on early versus late time points. For the sequencing experiments, we performed Ribo-seq at day 4 for the early time point and day 63 for the late time point, whereas TRAP analyses (and FUNCAT) were performed at day 4 for the early time point and day 60 for the late time point. These differences (day 60 versus day 63) were due to logistical issues related to sample collection. In our view, there are no major biological differences between day 60 and day 63 for the late time points, particularly because we do not perform direct comparisons across different experiments.

      In other experiments, we used several time points (e.g., day 3, as well as 6, 8, and 12 weeks) either to follow the development of phenotypes or based on previous publications regarding the timing of specific effects. We now acknowledge the potential limitation of using slightly different time points in the Limitations section of the paper.

      Regarding the use of inhibitory and excitatory markers:The comparisons they made between subpopulations seem a little random- for one, the number of Tac1 positive cells in the dorsal horn is not equal to that of PV, and so the comparison seems inappropriate.

      The number of cells from each subpopulation should not affect the number of DEGs. Because these analyses were performed on bulk mRNA rather than at the single-cell level, the comparisons are made between SNI and control groups within each subpopulation. Thus, the number of differentially translated genes is determined per cell type, not per individual cell.

      The lack of any semblance of variability or statistics with regard to gene changes makes it difficult to assess whether these comparisons were justified experimentally. Pax2 is a developmentally regulated transcription factor, with reduced levels in the adult. Using Pax2- NeuN+ to label excitatory interneurons is therefore not appropriate for comparison. A more appropriate comparison would be to use vGluT2 and GAD67. Similarly, the use of the GAD2Cre seems a poor choice. This is a restricted population of interneurons that have been suggested to have specific roles in presynaptic inhibition. If the authors were interested in this subpopulation for that reason, then they should state so.

      Pax2 is commonly used as a marker of inhibitory neurons in the spinal cord (e.g. PMID: 36323322) as in the adult dorsal horn, Pax2 protein remains expressed in nearly all inhibitory neurons, including both GABAergic (GAD65/67<sup>+</sup>) and glycinergic (GlyT2<sup>+</sup>) neurons. VGluT2 marks terminals of IB4-binding peripheral sensory neurons as well as those of spinal cord excitatory interneurons in lamina II of the dorsal horn, complicating the analyses. We attempted using Lmx1b for excitatory neurons (Pax2 for inhibitory and Lmx1b for excitatory) but could not obtain specific and robust signal using different commercial antibodies (we have no access to non-commercial Pax2 antibody).

      Regarding Cre lines, Gad2-Cre has been extensively used to target GABAergic neurons in the spinal cord. Although it is not expressed in purely glycinergic neurons, it is expressed in GABAergic and mixed GABA/glycine interneurons. Gad2-Cre is more restricted to superficial dorsal laminae I–III, which are relevant to pain processing, versus Gad1-Cre, which may also capture low-level GABAergic neurons in deep laminae and ventral horn inhibitory neurons. Moreover, there are also differences in the developmental profile, whereas Gad1-Cre is expressed earlier at embryonic stages during inhibitory neuron development, GAD2 is expressed later, in post-mitotic and mature inhibitory neurons. Because of these considerations (higher specificity to dorsal horn and later developmental expression), we used Gad2-Cre mouse line in our experiments.

      Regarding cKO experiments:

      It is unclear whether the deletion of Eif4ebp (which is not "ablation" as stated in the manuscript) has had any effect on the PV/GAD2 cells themselves seeing as this deletion would be a lineage deletion. One would imagine that altering transcription in such a population from early development would affect a host of neuronal and circuit properties, such as connectivity, dendritic branching, etc. The authors should show that the circuit properties were not broadly changed, not least as PV is expressed throughout the nervous system and in muscles. This could in itself explain the hypersensitivity described in their results. Experimenters should repeat the AAV shRNAmir experiments in non-injured animals, and not just control animals with the scrambled sh.

      We agree with the concerns related to potential developmental effects. Although it is nearly impossible to reliably and comprehensively demonstrate that circuit properties were not altered in our cKO mice, our manuscript presents several lines of evidence supporting a role for translational control in specific cell types in the regulation of gene expression and nociception independent of developmental effects. First, our translational gene expression analyses were performed in adult WT mice and reflect SNI-induced changes in gene expression at the translational level, assessed using complementary approaches. In addition, the effects of eIF4E ASO delivered to adult animals support a role for translational control in the regulation of SNI-induced pain hypersensitivity at later stages.

      Moreover, downregulation of eIF4E in PV neurons using an AAV-based approach in adult mice affects their SNI-induced excitability, further supporting a role for translational mechanisms in regulating PV neuron plasticity after peripheral nerve injury in adulthood. To acknowledge the potential developmental effects associated with 4E-BP1 deletion using Tac1-Cre, Gad2-Cre, and PV-Cre mouse lines (with PV-Cre beginning expression postnatally), we have included an explicit limitation statement in the Discussion of the revised manuscript.

      We also thank the reviewer for highlighting the distinction between deletion and ablation, and we have corrected this terminology in the revised manuscript.

      Regarding pain:

      A large sticking point within the study is the lack of clarity of the populations they are targeting. Many of the populations mentioned are not expressed solely in the dorsal somatosensory horn and instead are also expressed in the ventral motor horn. This is particularly important with regard to the sensory tests they are performing, which rely on reflex responses. It seems these results, although interesting, are not proof of a pain effect, but rather showing changes in vfh-behaviour. To show this is a pain-specific event, and not just correlative or reflexive, the authors should perform further behavioural tests beyond vfh, Hargreaves, and the grimace scale, such as low threshold touch, rotarod, etc. How much of this effect is due to changes in reflex excitability? Would the authors expect similar results for all neuropathic models but not for chronic inflammatory states for example? Western Blot analysis at the moment is for the whole cord, which could imply changes in the ventral or intermediate horn, it could help strengthen the study to show that these changes are selective to the dorsal cord.

      We have now added a new experiment showing that eIF4E-ASO has no effect on motor function in the rotarod and open field tests (Fig. 2J, K). In addition, the eIF4E-ASO experiment included in the original submission reflects supraspinal behavior, as assessed by MGS. Overall, our study includes numerous experiments and datasets. While we agree with some of the reviewer’s concerns, the extensive additional work requested, including additional neuropathic and inflammatory pain models, further assays of supraspinal behavior, Western blot analyses restricted to the dorsal horn, additional Cre lines and markers, and other analyses, is not feasible within the scope of the current manuscript.

      Notably, in the revised manuscript, we have added new experiments (Fig. 2J, 2K, 6A, 6B) that we believe address the most critical concerns raised by the reviewers, and we have revised the text to more clearly acknowledge the limitations of the study.

      Regarding patch clamp studies:

      An increase in rheobase alone in the PV cells would not in itself account for the changes seen in behaviour, seeing as the authors are suggesting this is a selective effect for von Frey and not radiant heat, for example. The authors should therefore show a change in mechanically-evoked firing of PV/GAD2 cells either by dorsal root stimulation in slice, or by cfos or equivalent marker of activation following sensory stimulation. The title of this figure is also misleading- it is not clear how there is any proof of promotion of plasticity in the experiments shown.

      In the original submission, in addition to an increase in rheobase, we also demonstrated decreased spiking activity in response to a range of stimulating currents (Fig. 4). We agree that assessing mechanically evoked responses of PV neurons would be informative; however, such studies are beyond the scope of the current manuscript.

      To address the final concern, we modified the title of Fig. 5 and the related text. Moreover, the newly added data showing that inhibition of translation in PV neurons does not alleviate SNIinduced hypersensitivity prompted us to tone down, throughout the manuscript, the link between translational changes in PV neurons and pain hypersensitivity.

    1. eLife Assessment

      In this manuscript, Wafer and Tandon et al. present a thoughtful and well-designed genetic screen for regulators of adipose remodeling using zebrafish as a model system. This work is valuable because it uncovers several genes associated with adipose tissue hyperplastic hypertrophic morphology and diet-induced remodelingthe hat have considerable potential health impact. The rigorous phenotypic analyses and compelling evidence make this work a key resource for the field.

    2. Reviewer #1 (Public review):

      In this manuscript, Wafer and Tandon et al. present a thoughtful and well-designed genetic screen for regulators of adipose remodeling using zebrafish as a model system. The authors cross-referenced several human adipocyte-related transcriptomic and genetic association datasets to identify candidate genes, which they then functionally tested in zebrafish. Importantly, the authors devised an unbiased microscopy-based screening platform to document quantitative adipose phenotypes with whole animal imaging, while also employing rigorous statistical methods. From their screen, the authors identified 3 genes that resulted in robust adipose phenotypes out of a total of 25 that were tested. Overall, this work will be an important resource for the field because of the genes identified from the screen, the quantitative screening pipeline, and the rigorous phenotypic analysis.

      Comments on revisions:

      The authors have far exceeded my expectations with their revised manuscript. All my questions and concerns from the original manuscript have been addressed by the authors. The additional data and analysis in Figure 6 and Supplementary Figure 8 are compelling and have greatly improved the manuscript.

    3. Reviewer #2 (Public review):

      This manuscript by Wafer, Tandon et al., presents exciting new approaches for using the zebrafish CRISPR screening and imaging system to identify genes that are associated with hyperplastic and hypertrophic adipose morphology. This paper established valuable screening pipelines in zebrafish to identify genetic regulators that affect adipose tissue morphology by combining CRISPR with an imaging-based, comprehensive adipose spatial analysis platform. Starting from a human transcriptomic dataset with differentially expressed genes that separate small and large adipocytes, they eventually identified 3 genes that induce hyperplastic or hypertrophic phenotypes in zebrafish. From which, they focused on foxp1 gene, a transcription factor known to regulate tissue development. They discovered that the foxp1 mutant displays basal hypertrophic morphology and failed to undergo hypertrophic remodeling in response to a high-fat diet, suggesting a link between adipose tissue development and diet-induced remodeling response. Overall, this manuscript is extremely well-written, the data presented is quite compelling, and the identified novel genes that are associated with adipose tissue hyperplastic and hypertrophic morphology and diet-induced remodeling are very exciting.

      Strength:

      (1) Obesity remains a worldwide public health concern. The mechanisms underlying adipose tissue hypertrophic and hyperplastic adaptation remain unclear.

      (2) This manuscript combined multiple omic datasets to identify candidate genes and performed a CRISPR-based screening to identify genes underlying adipose tissue development and adaptation. This new method will open opportunities that will facilitate our understanding and testing of new genetic mechanisms underlying the development of obesity.

      (3) Using the screening approach, this paper successfully identified new genes that are associated with adipose tissue LD size change. More importantly, the paper provided further validation using a stable CRISPR line to show the phenotype in basal and HFD conditions.

      (4) The experiments are extremely well-designed. Sample sizes are large. Statistical analysis is rigorous. Overall, this is a very high-quality study.

      Author's response to the previous comments/weakness:

      (1) In this revised manuscript, the authors provided new comprehensive spatial analyses of foxp1a and foxp1 b mutants in basal conditions as well as responding to high-fat feeding. The new data confirmed their initial findings and beautifully illustrated the spatiotemporal dynamics of the adipocytes in response to High-fat diet feeding.

      (2) The authors have addressed all my comments, and I do not have further comments.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Thank you for the thoughtful and constructive comments on our manuscript. We have carefully addressed all points raised, and believe the manuscript is substantially improved as a result. In particular, we have performed:

      - Comprehensive spatial analysis of stable mutants. Following Recommendations for the authors comment #1, we performed spatial analysis by binning the anterior-posterior axis into 200 µm strata. This analysis validates our initial conclusions and reveals striking spatiotemporal dynamics, including profoundly blunted HFD responses in foxp1b mutants (68% reduction) and loss of spatial gradients in foxp1a mutants.

      - Substantially enhanced the statistical rigour of the screen analysis. We have implemented stratified Kolmogorov-Smirnov tests (within-experiment testing, then combined via Fisher's method) alongside linear mixed models to control for batch effects. In the revised manuscript, we now focus on three hypertrophy genes – foxp1b, txnipa and mmp14b – which are robustly validated by both methods.

      - Normalisation of adipose area to body size. To address concerns about developmental delay (Recommendations for the authors #2), we now normalise adipose area to standard length. With this normalisation, foxp1b single mutants show only a non-significant trend toward decreased adiposity (updated from our original analysis), while the hypertrophic LD morphology remains highly significant - demonstrating the phenotype is independent of body size and not a developmental delay.

      - Revised title. As suggested by Recommendations for the authors comment #6, we have changed the title to: "A quantitative in vivo CRISPR-imaging platform identifies regulators of hyperplastic and hypertrophic adipose morphology in zebrafish"

      - Extensive code and analysis availability. We now provide all code and extensive analysis pipelines in interactive HTML documents at https://github.com/jeminchin/zebrafish_adipose_morphology_screen

      Joint Public Review:

      We thank the reviewers for their thoughtful assessment of our work and their recognition of the rigorous experimental design, statistical approaches, and the utility of both the identified genes and screening pipeline for the field. We address their concerns below.

      Weakness:

      Distinguishing developmental patterning from adipose tissue plasticity

      We appreciate this important distinction and agree that separating developmental from adaptive effects is a key challenge in the field. We would like to make several points in response:

      First, we acknowledge this limitation in our discussion and have now expanded this section to more explicitly address the interpretive boundaries of our approach. Our screening platform was intentionally designed to capture the outcome of genetic perturbation across development and early adaptation, as these processes are inherently intertwined during the establishment of adipose tissue.

      Second, regarding the suggested analysis of lipid droplet size along the AP axis in response to HFD: we have now performed this analysis and include it as new Fig. 6 and new Supplemental Fig. 8 & 9. These data validate our initial conclusions and reveal striking spatiotemporal dynamics, including profoundly blunted HFD responses in foxp1b mutants (68% reduction) and loss of spatial gradients in foxp1a mutants. Further, these data provide additional resolution on regional responses to dietary challenge.

      Third, we note that our stable mutant validation experiments (Figure 6) do begin to disentangle these effects by examining both baseline and HFD-challenged conditions in animals with constitutive genetic loss. However, we agree that definitive separation would require temporally controlled genetic manipulation, which we now acknowledge as an important future direction.

      Lack of tissue-specific manipulations

      We agree that tissue-specific approaches would strengthen mechanistic conclusions and have acknowledged this limitation in our revised discussion. The current study was designed as a discovery-focused screen to identify candidate regulators, with the understanding that mechanistic dissection would require follow-up studies employing tissue-specific tools.

      We note that adipocyte-specific Cre/lox or Gal4-UAS approaches in zebrafish are feasible and represent an important next phase of investigation for the most promising candidates identified here, rather than a requirement for the current screening study. We have added text explicitly framing our findings as establishing genetic associations that warrant future tissue-autonomous investigation.

      Recommendations for the authors: 

      (1) Analysis: In Figure 6, the authors state that foxp1b mutants "fail to undergo further hypertrophic remodeling in response to a high-fat diet (HFD)." Foxp1b mutant juveniles are already hypertrophic before the high-fat diet. After a high-fat diet, these mutants reach mean lipid droplet diameters similar to WT, approximately 65 µm, which the authors state earlier in the manuscript are "a potential upper limit of LD growth at this developmental stage." The authors should perform additional analysis of their existing data. Specifically, determine lipid droplet size by binning the AP axis as shown in Figure 3. The rationale is that lipid droplet size differences in response to HFD may be more evident when not considering the anterior populations of lipid droplets that have already reached maximum steady state size for this juvenile stage. This would not require any new experiments, just reanalyzing data similar to how they did in Figure 3.

      We thank the reviewer for this excellent suggestion. We have performed the requested spatial analysis by binning the AP axis into 200 µm strata (Figure 3 approach). These data can be found in new Fig. 6H-M, and new Supplemental Figs 8 & 9. This new analysis verifies our initial conclusions, and also reveals several very interesting spatiotemporal dynamics

      (i) Baseline hypertrophy in foxp1b mutants across AP strata

      In support of our initial conclusion that foxp1b mutants have larger LDs at baseline, the spatial analysis confirms that on a control diet (baseline), foxp1b mutants have significantly larger LDs than WT across strata 1-5 (new Fig. 6I), ranging from +22.2 µm larger in strata 1 to +17.8 µm larger in strata 5 (all FDR-adjusted p < 0.05, linear mixed effects model). Extended analysis across all 15 strata is shown in Supplemental Figs. 8 & 9. By contrast, and also in support of our initial conclusion, foxp1a mutants showed no baseline hypertrophy on control diet (all strata p > 0.10, Supplemental Fig. 8).

      (ii) foxp1b mutants show a profoundly blunted hypertrophic response to HFD

      Using paired analysis (same fish on both control diet and after 14 days of high-fat diet) with a linear mixed effects model, we quantified the effect of HFD across all strata:

      (A) Anterior/oldest strata (1-6): WT + HFD increases LD diameter by +25.1-28.1 µm (+52-58%, p < 0.0001). Whereas, foxp1b mutants + HFD only increase LD diameter by +7.5-11.7 µm (+12-19%, p < 0.003). Therefore, in the oldest/most anterior regions, containing the largest LDs, the hypertrophic response of foxp1b mutants to HFD is ~57% weaker than WTs.

      (B) Posterior/newer strata (7-15): WT + HFD undergo significant increases in LD diameter of +17.7-23.7 µm (p < 0.024). However, in foxp1b mutants there is no significant hypertrophic response at all (p > 0.068), and hypertrophic effect sizes decline from +6.8 µm (stratum 7) to +0.4 µm (stratum 15).

      (C) Overall effect: Averaged across all strata, WT + HFD LDs show +24.4 µm increase (p < 0.0001), whereas foxp1b mutant LDs only show a +7.7 µm increase with HFD (p = 0.020). Therefore, foxp1b mutants show a 68% reduction in hypertrophic growth in response to HFD compared to WT (Fig. 6K).

      The consequence of these spatial dynamics is that WT SAT LDs - which start 22 µm smaller than foxp1b mutants on a control diet - undergo massive hypertrophy across all regions/strata in response to a HFD. Meanwhile, foxp1b mutants - starting larger than in WTs - show only a modest, spatially restricted response. This results in a convergence in LD size in early/anterior strata, but WT LDs actually surpass foxp1b mutant sizes in late/posterior strata (strata 14-15: +WT 14.7 µm larger on HFD, p = 0.028; Supplemental Figs. 8 & 9).

      By contrast, foxp1a mutants retain the capacity for HFD-induced hypertrophy but show a ~35% weaker response than WT (p = 0.023) – significantly less severe than the 68% reduction in foxp1b mutants. Interestingly, foxp1a mutants after HFD show a reduction in the AP gradation of LD size observed in WT and foxp1b mutants (uniform +14.4 mm across all strata versus WT range of +26.4 mm anteriorly to +16.6 mm posteriorly), suggesting that foxp1a may regulate spatial heterogeneity in adaptive responses to HFD (Fig. 6L-M).

      (iii) Developmental ceiling or impaired adaptive capacity?

      The reviewer raises an important question about whether anterior adipose LDs have reached a "developmental ceiling." After conducting the spatial analysis suggested by the Reviewer, we now believe several lines of evidence support an intrinsic defect in HFD-induced hypertrophy in foxp1b mutants, rather than reaching a developmentally determined limit:

      First, foxp1b mutants show reduced responses across ALL strata, not just anterior regions. The attenuation extends throughout the entire AP axis (57% reduction in strata 1-6, complete loss of response in strata 7-15). If anterior adipocytes had simply reached a size ceiling, we would expect normal responses in posterior regions where cells are smaller - but we don't observe this.

      Second, in posterior/newer regions of SAT (strata 14-15) the hypertrophic response to HFD in foxp1b is so limited that WT LDs actually become larger than foxp1b mutant LDs (+14.7 mm larger, p = 0.028; Supplemental Fig. 9). This demonstrates that these LD sizes are not developmentally limiting and argues for intrinsic hypertrophic defects in response to HFD.

      Third, foxp1a mutants provide an important control. These mutants show no baseline hypertrophy (all strata p > 0.10) yet still exhibit blunted hypertrophic responses to HFD (~35% reduction, p = 0.023), proving that reduced HFD responses can occur independently of baseline hypertrophy.

      We have updated the Results and Discussion to reflect these new conclusions. Methods have been updated to include the spatial analysis approach.

      (2) Adipose morphogenesis in WT is a function of standard length, as shown by the authors. At juvenile stages, foxp1 mutants are both smaller and have reduced adipocyte coverage, while adults show normal body length and very subtle adipose phenotypes. Can the authors demonstrate that the observed defects in foxp1 mutant juveniles are bona fide phenotypes rather than a developmental delay?

      We thank the reviewer for this key point. We agree it is critical to distinguish true foxp1b-dependent phenotypes from potential developmental delay. Importantly, our data strongly argue against a simple developmental delay. We show that LD size scales with body size in Fig. 3G, with smaller zebrafish having smaller LDs and larger zebrafish having larger LDs. In contrast to a developmental delay, our data show that foxp1b single and foxp1a;foxp1b double mutants are smaller (reduced standard length) but have larger LDs (Fig. 6E,G). This dissociation between body size and LD size is the opposite of what would be expected from developmental delay.

      To account for the body size difference, we have now normalised adipose area to standard length (Fig. 6F). With this normalisation, foxp1b single mutants show only a non-significant trend toward decreased adiposity, whereas foxp1a;foxp1b double mutants remain significantly reduced. This represents a change from our original analysis and we have updated the text accordingly. Critically, despite normalised adipose area showing only a trend in foxp1b singles, the hypertrophic LD morphology remains highly significant (Fig. 6G), demonstrating that the morphological phenotype is robust and independent of overall body size.

      We have clarified this interpretation in the Results and Discussion.

      (3) What was the rationale for selecting one amongst paralogous genes for the screen? For example, why did the authors choose ptenb rather than ptena?

      (4) Point 3 is particularly relevant for the final six genes that resulted in adipose phenotypes. Why did the authors choose not to target both paralogs, given that multi-plexed F0 CRISPR targeting is feasible in zebrafish (PMID: 29974860).

      We answer Points 3 & 4 together here.

      We used the DIOPT (DRSC Integrative Ortholog Prediction Tool) orthology tool to identify the zebrafish paralogue with the highest orthology score to each human gene. This tool integrates predictions from 20 orthology databases to generate a composite score. We selected the paralogue with the highest DIOPT score for each gene. For example, we selected ptenb over ptena because it showed a higher predicted orthology to human PTEN.

      We acknowledge this approach has important limitations, including orthology scores not necessarily predicting functional equivalence (ie, the "most orthologous" paralogue may not be the one with the most relevant adipose tissue function in zebrafish). We acknowledge that this may mean we have missed genuine hits - testing only one paralogue means we could fail to identify genes where the "less orthologous" paralogue has the relevant adipose function.

      Our findings with Foxp1 paralogues both validate this approach and reveal its limitations. The higher-scoring paralogue foxp1b (DIOPT score = 13/19) showed the more severe phenotype, validating our prioritisation. However, the lower-scoring paralogue foxp1a (DIOPT score = 5/19), which we tested subsequently, showed a distinct but significant phenotype (altered spatial patterning) – a finding that would have been missed had we not pursued secondary validation.

      For future screens where comprehensive hit identification is the goal, multiplexed targeting of all paralogues would be valuable, though this may complicate interpretation of paralogue-specific phenotypes. We have discussed this in the Discussion.

      (5) General framework and limitations: The analysis platform presented in the manuscript cannot separate the developmental effects from adipose tissue plasticity/remodeling. Potential approaches that may help address this concern include: (a) establishing a baseline model to illustrate how WT fish respond to high-fat diet (HFD); (b) showing how mutants with hyperplasticity (opposite effects of foxp1 mutants) respond to HFD; (c) examining whether foxp1 gene expression level changes in response to HFD. However, these approaches (especially a and b) would require extensive experimental work and may be beyond the scope of this study. Without further evidence or data support of adipose tissue plasticity and remodeling, the author may want to emphasize in the background and discussion sections how adipose tissue development may affect plasticity and adaptation, and soften the tone of how genes may directly regulate adipose tissue plasticity and adaptation.

      We thank the reviewer for this comment about the relationship between adipose development and plasticity/remodelling. We agree this is an important issue as we are looking in juvenile fish that are still growing. Therefore, when we feed them HFD and see LDs get bigger – is this diet-induced remodelling or just accelerated normal development (ie, growth that would happen anyway, but occurring faster due to more nutrients)?

      To address the reviewer's specific suggestions:

      (A) Baseline model of WT HFD response: We have now performed detailed spatial analysis of WT responses to HFD (new Fig 6H-M, Supplemental Figs. 8 & 9). This analysis establishes a comprehensive baseline for hypertrophic responses to HFD in developing adipose tissue. In summary, WT fish show robust, statistically significant and spatially-graded hypertrophic responses to HFD across the entire AP axis, with responses ranging from +28.1 mm anteriorly to +17.7 mm posteriorly.

      We agree with the Reviewer that separating developmental from adaptive processes in growing juvenile fish is challenging. Importantly, we believe foxp1a mutants provide compelling genetic evidence that we are studying adaptive responses rather than purely developmental processes. foxp1a mutants have normal baseline LD sizes on control diet (demonstrating foxp1a is not required for developmental adipose expansion), yet when challenged with HFD show significantly reduced hypertrophic expansion and reduction of spatial gradient. This genetic dissociation strongly argues we are observing adaptive capacity rather than developmental growth rate.

      (B) Hyperplastic mutants:

      We agree that analysis of hyperplastic mutants would provide valuable complementary information about tissue remodelling capacity. However, as the reviewer anticipated, this would require: (1) generating stable lines of the appropriate hyperplastic mutants, (2) conducting paired HFD feeding studies, (3) performing spatial morphometric analysis comparable to our foxp1 studies, and (4) potentially distinguishing hyperplastic vs hypertrophic contributions to expansion. We agree this constitutes substantial additional experimental work beyond the scope of the current manuscript, though it represents an important direction for future studies.

      (C) foxp1 expression changes in HFD:

      Unfortunately, we do not have SAT samples from HFD-treated fish preserved for RNA analysis, and therefore cannot assess whether foxp1 expression levels change in response to dietary challenge. This would be valuable for future studies to determine whether foxp1 genes are dynamically regulated during metabolic adaptation or function as constitutive regulators of adaptive capacity.

      Following the Reviewer's guidance, we have revised throughout the manuscript to more carefully distinguish developmental patterning from metabolic adaptation.

      (6) Title: In the absence of experimental results that can distinguish between developmental effects from adipose tissue plasticity/remodeling, such as those mentioned above, the manuscript title is not accurate and should therefore be revised to be something like "hyperplastic and hypertrophic adipose morphology."

      We have now altered the title as the Reviewer suggested to “A quantitative in vivo CRISPR-imaging platform identifies regulators of hyperplastic and hypertrophic adipose morphology in zebrafish”

      Minor:

      (7) In mice studies, deleting foxp1b in adipose tissue protects mice from diet-induced obesity, while overexpressing foxp1b in adipose tissue promotes diet-induced obesity (Liu et al., Nature Communication, 2019). These overall phenotypes and foxp1b-mediated effects appear to be contradictory to what is observed in the zebrafish model. Can the authors also provide more evidence/discussion on why such a difference occurs comparing zebrafish and mice models?

      We thank the reviewer for this important comparison. We believe the apparent contradictions reflect (1) differences in adipose tissue thermogenic capacity - between species possibly, but also between functionally distinct depots and (2) whole-organism versus tissue-specific experimental approaches.

      (1) Different adipose tissue biology: browning-prone vs browning-resistant adipose

      Liu et al. (2019, PMID: 31699980) demonstrated that adipose-specific deletion of Foxp1 in mice increases thermogenesis and browning of SAT, with protection from diet-induced obesity (DIO) and improved insulin sensitivity. Conversely, Foxp1 overexpression impaired adaptive thermogenesis and promoted DIO. Mechanistically, Foxp1 directly represses β3-adrenergic receptor transcription, thereby inhibiting the thermogenic program. Strikingly, mouse Foxp1-deleted adipocytes displayed smaller, multilocular lipid droplets characteristic of brown/beige adipocytes.

      These morphological outcomes initially appear opposite to our zebrafish findings: mouse Foxp1 mutants have smaller adipocytes (due to browning), while zebrafish foxp1b mutants have larger lipid droplets (hypertrophy). We believe this fundamental difference may reflect the propensity of adipose tissue to undergo adaptive thermogenesis.

      While it was recently discovered that zebrafish possess thermogenic epicardial adipose tissue (PMID: 38507414), in general zebrafish adipose is not considered thermogenic, and zebrafish as ectotherms are thought to lack adaptive thermogenesis for thermoregulation. The exact thermogenic potential of zebrafish adipose remains to be fully characterised, but potential differences in thermogenic capacity between mouse and zebrafish adipose may help explain the distinct phenotypic outcomes.

      Importantly, Liu et al. studied mouse inguinal subcutaneous WAT - the depot most prone to browning in rodents. It remains unclear what role Foxp1 plays in browning-resistant mammalian WAT depots, where thermogenic conversion does not readily occur. In such depots, Foxp1 loss might produce phenotypes more similar to our zebrafish findings - dysregulated white adipose function without browning.

      The above hypothesis suggest that browning responses may mask other roles for Foxp1 in WAT. Interestingly, although not quantified in the paper, Liu et al.’s Foxp1 overexpression model (Ap2-Foxp1) appeared to reduce adipocyte size despite suppressing Ucp1 expression and reducing lipolysis. These data suggest more complex roles and indicate that Foxp1’s control of adipocyte size might extend beyond simply regulating thermogenesis and may involve coordinating the balance between hyperplastic versus hypertrophic expansion.

      Furthermore, human subcutaneous WAT is not as prone to browning as mouse inguinal WAT. Human browning occurs primarily in specialised depots (e.g. supraclavicular, deep neck), while the majority of human adipose tissue represents constitutive white adipose with limited thermogenic capacity. Therefore, it remains an open question whether FOXP1's primary physiological role in humans relates to thermogenesis regulation (in specialised depots) or white adipose metabolic control (in the majority of adipose tissue). Zebrafish findings examining constitutive WAT function (admittedly the lack of adaptive thermogenesis in zebrafish is presumed at this stage) may be more relevant to human adipose than initially appear.

      (2) Whole-organism vs tissue-specific effects on metabolic health

      A second apparent contradiction concerns metabolic outcomes: mouse adipose-specific Foxp1 deletion improves metabolic health (Liu et al.), whereas our zebrafish whole-organism foxp1b mutants display metabolic dysfunction (baseline hypertrophy, impaired HFD response, hyperglycaemia and fatty liver). We believe this discrepancy reflects comparison of whole-animal mutants (zebrafish) to tissue-specific deletions (mouse), rather than opposite adipose tissue functions.

      Critically, Foxp1 has established roles in hepatic glucose metabolism. Zou et al. (PMID: 26504089) demonstrated that hepatic Foxp1 inhibits expression of gluconeogenesis genes and decreases hepatic glucose production and fasting blood glucose by competing with Foxo1 for binding of insulin responsive gluconeogenic genes. In line with these observations, we observe fatty liver and hyperglycaemia in foxp1a;foxp1b double mutant zebrafish (data not shown), suggesting that the metabolic dysfunction in our whole-animal mutants may be driven primarily by hepatic Foxp1 loss rather than adipose-specific effects.

      We have expanded on the points raised here in the Discussion.

      (8) Line 522-524: "The major phenotype in foxp1a mutants was impaired adipose expansion following HFD, suggesting failure to respond to diet-induced stress signals". In the presented Figure 6j, foxp1a mutant expands adipose LD size following HFD, similar to the control, which is contradictory to the statement above. Please clarify.

      We thank the reviewer for highlighting this apparent inconsistency and apologise for imprecise wording. These measurements are actually consistent but refer to different scales of analysis.

      Tissue level (Supplementary Fig. 7): foxp1a mutants show significantly reduced total adipose expansion (based on whole-animal Nile Red images) compared to wild-type fish on HFD—this is what we refer to as "impaired adipose expansion."

      Cellular level (Fig. 6L-M): At the individual adipocyte level, foxp1a mutants show statistically significant increases in LD diameter following HFD. However, the magnitude is reduced by ~35% compared to wild-type (mutants: +14.4 µm; WT: +22.2 µm; p = 0.023).

      We have revised the text to more precisely state "reduced adipose expansion" rather than "impaired expansion" to avoid implying complete failure to respond.

    1. eLife Assessment

      This potentially valuable study investigates the interaction of two integral membrane proteins (Cdhr1a and Pcdh15b) and their roles in cone-rod dystrophy. Convincing evidence using loss-of-function mutants demonstrates that both proteins are required for cone maintenance and survival. There is insufficient evidence to support the subcellular localization and the proposed heterodimeric interaction of the two proteins from distinct subcellular compartments. The methodologies are unclear, and the statistical methods and analysis are improperly applied.

    2. Reviewer #1 (Public review):

      Mutations in CDHR1, the human gene encoding an atypical cadherin-related protein expressed in photoreceptors, are thought to cause cone-rod dystrophy (CRD). However, the pathogenesis leading do this disease is unknown. Previous work has led to the hypothesis that CDHR1 is part of a cadherin-based junction that facilitates the development of new membranous discs at the base of the photoreceptor outer segments, without which photoreceptors malfunction and ultimately degenerate. CDHR1 is hypothesized to bind to a transmembrane partner to accomplish this function, but the putative partner protein has yet to be identified.

      The manuscript by Patel et al. makes an important contribution toward improving our understanding of the cellular and molecular basis of CDHR1 associated CRD. Using gene editing, they generate a loss of function mutation in the zebrafish cdhr1a gene, an ortholog of human CDHR1, and show that this novel mutant model has a retinal dystrophy phenotype, specifically related to defective growth and organization of photoreceptor outer segments (OS) and calyceal processes (CP). This phenotype seems to be progressive with age. Importantly, Patel et al, present intriguing evidence that pcdh15b, also known for causing retinal dystrophy in previous Xenopus and zebrafish loss of function studies, is the putative cdhr1a partner protein mediating the function of the junctional complex that regulates photoreceptor OS growth and stability.

      This research is significant in that it:

      (1) provides evidence for a progressive, dystrophic photoreceptor phenotype in the cdhr1a mutant and, therefore, effectively models human CRD; and

      (2) identifies pcdh15b as the putative, and long sought after, binding partner for cdhr1a, further supporting the theory of a cadherin-based junction complex that facilitates OS disc biogenesis.

      Comments on the revised version of the manuscript:

      The authors adequately addressed previous comments related to lack of details on quantitative and statistical analyses and methods. In this regard, I believe the revised manuscript presents a stronger analysis of the data. I also appreciated the revised discussion section, which better contextualizes their new data with previous observations in different animal models.

      The authors provided additional evidence in Fig 1C-H for the co-localization of pcdh15b and actin within CPs using immunolabeling with super resolution imaging. This data firmly supports their other observations. A similar approach tends to also show co-localization of actin and cdhr1a, although the authors suggest that the pattern of expression is less overlapping, which would be expected if cdhr1a is predominately expressed in the OS membranes whereas pcdh15b is predominantly expressed in the CP membranes. In my opinion the data presented to support this separation is still not that convincing. Moreover, the authors show that both cdhr1a and pcdh15b are expressed in CPs using immuno-TEM (Fig 1I). This is a difficult question to address experimentally, and it is, of course, still plausible that pcdh15b within the CP membrane and cdhr1a within the OS membrane are interacting in trans. However, I just don't think that the data unequivocally support mutually exclusive localization of these proteins as suggested by the authors and depicted in the model in Fig 1J.

    3. Reviewer #2 (Public review):

      Summary:

      The goal of this study was to develop a model for CDHR1-based Con-rod dystrophy and study the role of this cadherin in cone photoreceptors. Using genetic manipulation, a cell binging assay, and high- resolution microscopy the authors find that like rods, cones localize CDHR1 to the lateral edge of outer segment (OS) discs and closely opposes PCDH15b which is known to localize to calyceal processes (CPs). Ectopic expression of CDHR1 and PCDH15b in K652 cells indicate these cadherins promote cell aggregation as heterophilic interactants, but not through homophilic binding. This data suggests a model where CDHR1 and PCDH15b link OS and CPs and potential stabilize cone photoreceptor structure. Mutation analysis of each cadherin results in cone structural defects at late larval stages. While pcdh15b homozygous mutants are lethal, cdhr1 mutants are viable and subsequently show photoreceptor degeneration by 3-6 months.

      Strengths:

      A major strength of this research is the development of an animal model to study the cone specific phenotypes associate with CDHR1-based CRD. The data supporting CDHR1 (OS) and PCDH15 (CP) binding is also a strength, although this interaction could be better characterized in future studies. The quality of the high-resolution imaging (at the light and EM levels) is outstanding. In general, results support the conclusions of the authors.

      Weaknesses:

      While the cellular phenotyping is strong, the functional consequences of CDHR1 disruption is not addressed. While this is not the focus of the investigation, such analysis would raise the impact of the study overall. This is particularly important given some of the small changes observed in OS and CP structure. While statistically significant, are the subtle changes biologically significant? Examples include cone OS length (Fig 4F, 6E) as well as other morphometric data (Fig 7I in particular). Related, for quantitative data and analysis throughout the manuscript, more information regarding the number of fish/eyes analyzed as well as cells per sample would provide confidence in the rigor. The authors should also not whether analysis was done in an automated and/or masked manner.

      Comments on revisions:

      Most of my concerns were addressed in this revised version.

    4. Reviewer #3 (Public review):

      Summary:

      The manuscript by Patel et al investigates the hypothesis that CDHR1a on photoreceptor outer segments is the binding partner for PCDH15 on the calyceal processes, and the absence of either adhesion molecule results in separation between the two structures, eventually leading to degeneration. PCDH15 mutations cause Usher syndrome, a disease of combined hearing and vision loss. In the ear, PCDH15 binds CDH23 to form tip links between stereocilia. The vision loss in less understood. Previous work suggested PCDH15 is localized to the calyceal processes, but the expression of CDH23 is inconsistent between species. Patel et al suggest that CDHR1a (formerly PCDH21) fulfills the role of CDH23 in the retina.

      The experiments are mainly performed using the zebrafish model system. Expression of Pcdh15b and Cdhr1a protein is shown in the photoreceptor layer through standard confocal and structured illumination microscopy. The two proteins co-IP and can induce aggregation in vitro. Loss of either Cdhr1a or Pcdh15, or both, results in degeneration of photoreceptor outer segments over time, with cones affected primarily.

      The idea of the study is logical given the photoreceptor diseases caused by mutations in either gene, the comparisons to stereocilia tip links, and the protein localization near the outer segments. The work here demonstrates that the two proteins interact in vitro and are both required for ongoing outer segment maintenance. The major novelty for this paper would be the demonstration that Pcdh15 localized to calyceal processes interacts with Cdhr1a on the outer segment, thereby connecting the two structures. Unfortunately, the data as presented are inadequate proof of this model.

      Strengths:

      The in vitro data to support the ability of of Pcdh15b and Cdhr1a to bind is well done. The use of pcdh15b and cdhr1a single and double mutants is also a strength of the study, especially being that this would be the first characterization of a zebrafish cdhr1a mutant.

      This is a large body of data.

      Weaknesses:

      (1) I have serious concerns about the quality of the imaging here. The premise that cdhr1a/pcdh15 juxtaposition is evidence for the two proteins mediating the connection between outer segments and calyceal processes requires very careful microscopy. The SIM images have two major issues - one being that the red and green channels are misaligned and the other being evidence of bleed through between the channels. This is obvious in Fig 2A but likely true across all the panels in Fig 2, and possibly applies to confocal images in Fig 1 as well. The co-labelling with actin shows very uneven, punctate staining for actin bundles.

      (2) The newly added TEM and transverse sections include colored regions that obscure the imaging.

      (3) The quantification should be done with averages from individual fish. Counting individual measurements as single data points artificially inflates the significance. Also, the cone subtypes are still lumped together for analysis despite their variable sizes.

      (4) I highlighted previously that the measurement of calyceal processes was incorrect. The redrawn labels in Fig 7 are now more accurate, although still difficult to interpret. However, the quantification in Fig 7O is exactly the same. How can that be if the measurement region is now different?

      (5) Lower magnification views would provide context for the TEM data.

      (6) The statement describing the separation between calyceal processes and the outer segment in the mutants is still not backed up by the data.

      (7) The authors state "from the fact that rod CPs are inherently much smaller than cone CPs". This is now referenced, but incorrectly. Also, the issue of pigment interference was not addressed.

      (8) The images in panels B-F of the Supplemental Figure are uncannily similar, possibly even of the same fish at different focal planes.

    5. Author Response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Mutations in CDHR1, the human gene encoding an atypical cadherin-related protein expressed in photoreceptors, are thought to cause cone-rod dystrophy (CRD). However, the pathogenesis leading to this disease is unknown. Previous work has led to the hypothesis that CDHR1 is part of a cadherin-based junction that facilitates the development of new membranous discs at the base of the photoreceptor outer segments, without which photoreceptors malfunction and ultimately degenerate. CDHR1 is hypothesized to bind to a transmembrane partner to accomplish this function, but the putative partner protein has yet to be identified.

      The manuscript by Patel et al.makes an important contribution toward improving our understanding of the cellular and molecular basis of CDHR1-associated CRD. Using gene editing, they generate a loss of function mutation in the zebrafish cdhr1a gene, an ortholog of human CDHR1, and show that this novel mutant model has a retinal dystrophy phenotype, specifically related to defective growth and organization of photoreceptor outer segments (OS) and calyceal processes (CP). This phenotype seems to be progressive with age. Importantly, Patel et al, present intriguing evidence that pcdh15b, also known for causing retinal dystrophy in previous Xenopus and zebrafish loss of function studies, is the putative cdhr1a partner protein mediating the function of the junctional complex that regulates photoreceptor OS growth and stability.

      This research is significant in that it:

      (1) Provides evidence for a progressive, dystrophic photoreceptor phenotype in the cdhr1a mutant and, therefore, effectively models human CRD; and

      (2) Identifies pcdh15b as the putative, and long sought after, binding partner for cdhr1a, further supporting the theory of a cadherin-based junction complex that facilitates OS disc biogenesis.

      Nonetheless, the study has several shortcomings in methodology, analysis, and conceptual insight, which limits its overall impact.

      Below I outline several issues that the authors should address to strengthen their findings.

      Major comments:

      (1) Co-localization of cdhr1a and pcdh15b proteins

      The model proposed by the authors is that the interaction of cdhr1a and pcdh15b occurs in trans as a heterodimer. In cochlear hair cells, PCDH15 and CDHR23 are proposed to interact first as dimers in cis and then as heteromeric complexes in trans. This was not shown here for cdhr1a and pcdh15b, but it is a plausible configuration, as are single heteromeric dimers or homodimers. Regardless, this model depends on the differential compartmental expression of the cdhr1a and pcdh15b proteins. Data in Figure 1 show convincing evidence that these two proteins can, at least in some cases, be distributed along the length of photoreceptor membranes that are juxtaposed, as would be the case for OS and CP. If pcdh15b is predominantly expressed in CPs, whereas cdhr1a is predominantly expressed in OS, then this should be confirmed with actin double labeling with cdhr1a and pcdh15b since the apicobasal oriented (vertical) CPs would express actin in this same orientation but not in the OS. This would help to clarify whether cdhr1a and pcdh15b can be trafficked to both OS and CP compartments or whether they are mutually exclusive.

      First let me thank the reviewer for taking the time to comprehensively evaluate our work and provide constructive criticism which will improve the quality of our final version.

      To address this issue, we are completed imaging of actin/cdhr1a and actin/pcdh15b using SIM in both transverse and axial sections (Fig 1C-H). Additionally, we have recently established an immuno-gold-TEM protocol and showcase co-labeling of cdhr1a and pcdh15b at TEM resolution along the CP (Fig 1I).

      Photoreceptor heterogeneity goes beyond the cone versus rod subtypes discussed here and it is known that in zebrafish, CP morphology is distinct in different cone subtypes as well as cone versus rod. It would be important to know which specific photoreceptor subtypes are shown in zebrafish (Figures 1A-C) and the non-fish species depicted in Figures 1E-L. Also, a larger field of view of the staining patterns for Figures 1E-L would be a helpful comparison (could be added as a supplementary figure).

      The revised manuscript includes labels for the location of different cone subtypes in figure 1. All of the images showcasing CHDR1 localization across species concentrate on the PNA positive R/G cones. Larger fields of view were not collected as we prioritized the highest resolution possible and therefore collected small fields of view.

      (2) Cdhr1a function in cell culture

      The authors should explain the multiple bands in the anti-FLAG blots. Also, it would be interesting to confirm that the cdhr1a D173 mutant prevents the IP interaction with pcdh15b as well as the additive effects in aggregate assays of Figure 2.

      The multiple bands on the WB is like our previous results (Piedade 2020), which we believe arise due to ubiquitination and proteolytic cleavage of cdhr1a. We expect the D173 mutation to result in a complete absence of cdhr1a polypeptide, based on the lack of in situ signal in our WISH studies.

      Is it possible that the cultured cells undergo proliferation in the aggregation assays shown in Figure 2? Cells might differentially proliferate as clusters form in rotating cultures. A simple assay for cell proliferation under the different transfection conditions showing no differences would address this issue and lend further support to the proposed specific changes to cell adhesion as a readout of this assay.

      This is a possibility; however we did not use rotating cultures, this was a monolayer culture. We did not observe any differences in total cell number between the differing transfections. As such, we do not feel proliferation explains the aggregation of K562 cells.

      Also, the authors report that the number of clusters was normalized to the field of view, but this was not defined. Were the n values different fields of view from one transfection experiment, or were they different fields of view from separate transfection experiments? More details and clarification are needed.

      This will be clarified in the revised manuscript, in short we replicated this experiment 3 times, quantifying 5 different fields of view in each replicate.

      (3) Methodological issues in quantification and statistical analyses

      Were all the OS and CP lengths counted in the observation region or just a sample within the region? If the latter, what were the sampling criteria? For CPs, it seems that the length was an average estimate based on all CPs observed surrounding one cone or one-rod cell. Is this correct? Again, if sampled, how was this implemented? In Fig 4M', the cdhr1a-/- ROS mostly looks curvilinear. Did the measurements account for this, or were they straight linear dimension measurements from base to tip of the OS as depicted in Fig 5A-E? A clearer explanation of the OS and CP length quantification methodology is required.

      The revised manuscript will clearly outline measurement methods. In short, we measured every CP/OS in the imaged regions. We did not average CPs/cell, we simply included all CP measurements in our analysis. All our CP measurements (actin or cdhr1a or pcdh15), were measured in the presence of a counter stain, WGA, prph2, gnb1 or PNA to ensure proper measurements (landmark) and association with proper cell type. Our new figure 7 now includes cone OS counter staining to better highlight the OS.

      All measurements were taken as best as possible to reflect a straight linear dimension for consistency.

      How were cone and rod photoreceptor cell counts performed? The legend in Figure 4 states that they again counted cells in the observation region, but no details were provided. For example, were cones and rods counted as an absolute number of cells in the observation region (e.g., number of cones per defined area) or relative to total (DAPI+) cell nuclei in the region? Changes in cell density in the mutant (smaller eye or thinner ONL) might affect this quantification so it would be important to know how cell quantification was normalized.

      The revised manuscript will clearly outline measurement methods. In short, rod and cone cell counts were based on the number of outer segments that were observed in the imaging region and previously measured for length. We did not observe any eye size differences in our mutant fish.

      In Figure 6I, K, measuring the length of the signal seems problematic. The dimension of staining is not always in the apicobasal (vertical) orientation. It might be more accurate to measure the cdhr1a expression domain relative to the OS (since the length of the OS is already reduced in the mutants). Another possible approach could be to measure the intensity of cdhr1 staining relative to the intensity within a Prph2 expression domain in each group. The authors should provide complementary evidence to support their conclusion.

      The revised manuscript will clearly outline measurement methods. In short, all of our CP measurements (actin or cdhr1a or pcdh15), were done in the presence of a counter stain, WGA, prph2, gnb1 or PNA to ensure proper measurements and association with proper cell type.

      A better description of the statistical methodology is required. For example, the authors state that "each of the data points has an n of 5+ individuals." This is confusing and could indicate that in Figure 4F alone there were ~5000 individuals assayed (~100 data points per treatment group x n=5 individuals per data point x 10 treatment groups). I don't think that is what the authors intended. It would be clearer if the authors stated how many OS, CP, or cells were counted in their observation region averaged per individual and then provided the n value of individuals used per treatment group (controls and mutants), on which the statistical analyses should be based.

      This has been addressed in the revised manuscript. In short, we had an n=5 (individual fish) analyzed for each genotype/time point.

      There are hundreds of data points in the separate treatment groups shown in several of the graphs. It would not be correct to perform the ANOVA on the separate OS or CP length measurements alone as this will bias the estimates since they are not all independent samples. For example, in Figure 6H, 5dpf pcdh15b+/- have shorter CPs compared to WT but pcdh15b-/- have longer compared to WT. This could be an artifact of the analysis. Moreover, the authors should clarify in the Methods section which ANOVA post hoc tests were used to control for multiple pairwise comparisons.

      We have re-analyzed the data using multiple pairwise comparison ANOVA with post hoc tests (Tukey test). This new analysis did not significantly alter the statistical significance outcome of the study.

      (4) Cdhr1a function in photoreceptors

      The Cdhr1a IHC staining in 5dpf WT larvae in Figure 3E appears different from the cdhr1a IHC staining in 5dpf WT larvae in Figure 1A or Figure 6I. Perhaps this is just the choice of image. Can the authors comment or provide a more representative image?

      The image in figure 3E was captured using a previous non antigen retrieval protocol which limits the resolution of the cdhr1a signal along the CP. In the revised manuscript we have included an image that better represents cdhr1a staining in the WT and mutant.

      The authors show that pcdh15b localization after 5dpf mirrored the disorganization of the CP observed with actin staining. They also show in Figure 5O that at 180dpf, very little pcdh15b signal remains. They suggest based on this data that total degradation of CPs has occurred in the cdhr1a-/- photoreceptors by this time. However, although reduced in length, COS and cone CPs are still present at 180dpf (Figure 5E, E'). Thus, contrary to the authors' general conclusion, it is possible that the localization, trafficking, and/or turnover of pcdh15b is maintained through a cdhr1a-dependent mechanism, irrespective of the degree to which CPs are maintained. The experiments presented here do not clearly distinguish between a requirement for maintenance of localization versus a secondary loss of localization due to defective CPs.

      We agree, this point has been addressed in our revised manuscript. Additionally, we have also included data from 1 and 2 year old samples.

      (5) Conceptual insights

      The authors claim that cdhr1a and pcdh15b double mutants have synergistic OS and CP phenotypes. I think this interpretation should be revisited.

      First, assuming the model of cdhr1a-pcdh15b interaction in trans is correct, the authors have not adequately explained the logic of why disrupting one side of this interaction in a single mutant would not give the same severity of phenotype as disrupting both sides of this interaction in a double mutant.

      Second, and perhaps more critically, at 10dpf the OS and CP lengths in cdhr1a-/- mutants (Figure 7J, T) are significantly increased compared to WT. In contrast, there are no significant differences in these measurements in the pcdh15b-/- mutants. Yet in double homozygous mutants, there is a significant reduction of ~50% in these measurements compared to WT. A synergistic phenotype would imply that each mutant causes a change in the same direction and that the magnitude of this change is beyond additive in the double mutants (but still in the same direction). Instead, I would argue that the data presented in Figure 7 suggest that there might be a functionally antagonistic interaction between cdhr1a and pcdh15b with respect to OS and CP growth at 10dpf.

      If these proteins physically interacted in vivo, it would appear that the interaction is complex and that this interaction underlies both OS growth-promoting and growth-restraining (stabilizing) mechanisms working in concert. Perhaps separate homodimers or heterodimers subserve distinct CP-OS functional interactions. This might explain the age-dependent differences in mutant CP and OS length phenotypes if these mechanisms are temporally dynamic or exhibit distinct OS growth versus maintenance phases. Regardless of my speculations, the model presented by the authors appears to be too simplistic to explain the data.

      We agree with the reviewer, as such we have revised the discussion in our revised manuscript.

      Reviewer #2 (Public review):

      Summary:

      The goal of this study was to develop a model for CDHR1-based Con-rod dystrophy and study the role of this cadherin in cone photoreceptors. Using genetic manipulation, a cell binding assay, and high-resolution microscopy the authors find that like rods, cones localize CDHR1 to the lateral edge of outer segment (OS) discs and closely oppose PCDH15b which is known to localize to calyceal processes (CPs). Ectopic expression of CDHR1 and PCDH15b in K652 cells indicates these cadherins promote cell aggregation as heterophilic interactants, but not through homophilic binding. This data suggests a model where CDHR1 and PCDH15b link OS and CPs and potentially stabilize cone photoreceptor structure. Mutation analysis of each cadherin results in cone structural defects at late larval stages. While pcdh15b homozygous mutants are lethal, cdhr1 mutants are viable and subsequently show photoreceptor degeneration by 3-6 months.

      Strengths:

      A major strength of this research is the development of an animal model to study the cone-specific phenotypes associated with CDHR1-based CRD. The data supporting CDHR1 (OS) and PCDH15 (CP) binding is also a strength, although this interaction could be better characterized in future studies. The quality of the high-resolution imaging (at the light and EM levels) is outstanding. In general, the results support the conclusions of the authors.

      Weaknesses:

      While the cellular phenotyping is strong, the functional consequences of CDHR1 disruption are not addressed. While this is not the focus of the investigation, such analysis would raise the impact of the study overall. This is particularly important given some of the small changes observed in OS and CP structure. While statistically significant, are the subtle changes biologically significant? Examples include cone OS length (Figures 4F, 6E) as well as other morphometric data (Figure 7I in particular). Related, for quantitative data and analysis throughout the manuscript, more information regarding the number of fish/eyes analyzed as well as cells per sample would provide confidence in the rigor. The authors should also note whether the analysis was done in an automated and/or masked manner.

      First let me thank the reviewer for taking the time to comprehensively evaluate our work and provide constructive criticism which will improve the quality of our final version.

      The revised manuscript outlines both methods and statistics used for quantitation of our data. (please see comments from reviewer 1). While we do not include direct evidence of the mechanism of CDHR1 function, we do propose that its role is important in anchoring the CP and the OS, particularly in the cones, while in rods it may serve to regulate the release of newly formed disks (as previously proposed in mice). We do plan to test both of these hypothesis directly, however, that will be the basis of our future studies.

      Reviewer #3 (Public review):

      Summary:

      The manuscript by Patel et al investigates the hypothesis that CDHR1a on photoreceptor outer segments is the binding partner for PCDH15 on the calyceal processes, and the absence of either adhesion molecule results in separation between the two structures, eventually leading to degeneration. PCDH15 mutations cause Usher syndrome, a disease of combined hearing and vision loss. In the ear, PCDH15 binds CDH23 to form tip links between stereocilia. The vision loss is less understood. Previous work suggested PCDH15 is localized to the calyceal processes, but the expression of CDH23 is inconsistent between species. Patel et al suggest that CDHR1a (formerly PCDH21) fulfills the role of CDH23 in the retina.

      The experiments are mainly performed using the zebrafish model system. Expression of Pcdh15b and Cdhr1a protein is shown in the photoreceptor layer through standard confocal and structured illumination microscopy. The two proteins co-IP and can induce aggregation in vitro. Loss of either Cdhr1a or Pcdh15, or both, results in degeneration of photoreceptor outer segments over time, with cones affected primarily.

      The idea of the study is logical given the photoreceptor diseases caused by mutations in either gene, the comparisons to stereocilia tip links, and the protein localization near the outer segments. The work here demonstrates that the two proteins interact in vitro and are both required for ongoing outer segment maintenance. The major novelty of this paper would be the demonstration that Pcdh15 localized to calyceal processes interacts with Cdhr1a on the outer segment, thereby connecting the two structures. Unfortunately, the data presented are inadequate proof of this model.

      Strengths:

      The in vitro data to support the ability of Pcdh15b and Cdhr1a to bind is well done. The use of pcdh15b and cdhr1a single and double mutants is also a strength of the study, especially being that this would be the first characterization of a zebrafish cdhr1a mutant.

      Weaknesses:

      (1) The imaging data in Figure 1 is insufficient to show the specific localization of Pcdh15 to calyceal processes or Cdhr1a to the outer segment membrane. The addition of actin co-labelling with Pcdh15/Cdhr1a would be a good start, as would axial sections. The division into rod and cone-specific imaging panels is confusing because the two cell types are in close physical proximity at 5 dpf, but the cone Cdhr1a expression is somehow missing in the rod images. The SIM data appear to be disrupted by chromatic aberration but also have no context. In the zebrafish image, the lines of Pcdh15/Cdhr1a expression would be 40-50 um in length if the scale bar is correct, which is much longer than the outer segments at this stage and therefore hard to explain.

      First let me thank the reviewer for taking the time to comprehensively evaluate our work and provide constructive criticism which will improve the quality of our final version.

      To address this issue, we have added images of actin/cdhr1a and actin/pcdh15b using SIM in both transverse and axial sections. Additionally, we have established an immuno-gold-TEM protocol and provide data showcasing co-labeling of cdhr1a and pcdh15b at TEM resolution.

      (2) Figure 3E staining of Cdhr1a looks very different from the staining in Figure 1. It is unclear what the authors are proposing as to the localization of Cdhr1a. In the lab's previous paper, they describe Cdhr1a as being associated with the connecting cilium and nascent OS discs, and fail to address how that reconciles with the new model of mediating CP-OS interaction. And whether Cdhr1a localizes to discrete domains on the disc edges, where it interacts with Pcdh15 on individual calyceal processes.

      The image in figure 3E was captured using a previous non antigen retrieval protocol which limits the resolution of the cdhr1a signal along the CP. In the revised manuscript we include an image that better represents cdhr1a staining in the WT and mutant.

      (3) The authors state "In PRCs, Pcdh15 has been unequivocally shown to be localized in the CPs". However, the immunostaining here does not match the pattern seen in the Miles et al 2021 paper, which used a different antibody. Both showed loss of staining in pcdh15b mutants so unclear how to reconcile the two patterns.

      We agree that our staining appears different, but we attribute this to our antigen retrieval protocol which differed from the Miles et al paper. We also point to the fact that pcdh15b localization has been shown to be similar to our images in other species (monkey and frog). As such, we believe our protocol reveals the proper localization pattern which might be lost/hampered in the procedure used in Miles et al 2021.

      (4) The explanation for the CRISPR targets for cdhr1a and the diagram in Figure 3 does not fit with crRNA sequences or the mutation as shown. The mutation spans from the latter part of exon 5 to the initial portion of exon 6, removing intron 5-6. It should nevertheless be a frameshift mutation but requires proper documentation.

      This was an overlooked error in figure making, we have corrected this typo in the revised manuscript.

      (5) There are complications with the quantification of data. First, the number of fish analyzed for each experiment is not provided, nor is the justification for performing statistics on individual cell measurements rather than using averages for individual fish. Second, all cone subtypes are lumped together for analysis despite their variable sizes. Third, t-tests are inappropriately used for post-hoc analysis of ANOVA calculations.

      As we discussed for reviewer 1 and 2, all methods and quantification/statistics will be clearly described in the revised manuscript.

      (6) Unclear how calyceal process length is being measured. The cone measurements are shown as starting at the external limiting membrane, which is not equivalent to the origin of calyceal processes, and it is uncertain what defines the apical limit given the multiple subtypes of cones. In Figure 5, the lines demonstrating the measurements seem inconsistently placed.

      As we discussed for reviewer 1 and 2, all methods and quantification/statistics will be clearly described in the revised manuscript. We have also clarified that CP measurements were made based on a counterstain for the cone/rod OS so that the actin signal was only CP associated. We have included the counter stain in our revised Figure 7.

      (7) The number of fish analyzed by TEM and the prevalence of the phenotype across cells are not provided. A lower magnification view would provide context. Also, the authors should explain whether or not overgrowth of basal discs was observed, as seen previously in cdhr1-null frogs (Carr et al., 2021).

      The revised manuscript now includes the n number for our TEM samples. We have also added text comparing our results directly to Carr 2021.

      (8) The statement describing the separation between calyceal processes and the outer segment in the mutants is not backed up by the data. TEM or co-labelling of the structures in SIM could be done to provide evidence.

      We have completed both more SIM as well as immuno-gold TEM to support our conclusions, see new Figure 1.

      (9) "Based on work in the murine model and our own observations of rod CPs, we hypothesize that zebrafish rod CPs only extend along the newly forming OS discs and do not provide structural support to the ROS." Unclear how murine work would support that conclusion given the lack of CPs in mice, or what data in the manuscript supports this conclusion.

      In the revised manuscript we have adjusted our discussion to hypothesize that the small length of rod CPs is most likely to represent their interaction with newly forming discs rather than connect with mature discs which are enclosed in the OS.

      (10) The authors state "from the fact that rod CPs are inherently much smaller than cone CPs" without providing a reference. In the manuscript, the measurements do show rod CPs to be shorter, but there are errors in the cone measurements, and it is possible that the RPE pigment is interfering with the rod measurements.

      We have included references where rod CPs have been found to be shorter. We have no doubt that in zebrafish the rod CPs are significantly shorter. All our CP measurements are done with a counter stain for rods and cones to be sure that we are measuring the correct cell type.

      (11) The discussion should include a better comparison of the results with ocular phenotypes in previously generated pcdh15 and cdhr1 mutant animals.

      The revised manuscript has included these points.

      (12) The images in panels B-F of the Supplemental Figure are uncannily similar, possibly even of the same fish at different focal planes.

      We assure the reviewer that each of the images in supplemental figure 1 are distinct and represent different in situ experiments.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) In the second sentence of the Introduction section, the acronym 'PRC' should be defined.

      This has been corrected

      (2) In the Discussion section, it would be useful to comment on differences between the published Xenopus cdhr1-/- OS phenotypes and the published zebrafish pcdh15b-/- OS phenotypes compared to the present zebrafish cdhr1a-/- phenotypes. In the published studies, OS in these mutants demonstrated dysmorphic and overgrown disc membranes compared to the relatively minor disc layering defects shown for cdhr1a-/- in the present study.

      This discussion has been added.

      (3) CDHR1 mutations in patients cause cone-rod dystrophy, but mutations in PCDH15 (Usher 1F) cause rod-cone dystrophy. In the Discussion section, the authors should comment on what might lead to these different phenotypic trajectories in humans in the context of their proposed model.

      We have added to our discussion highlighting that is not possible to assess rod-cone dystrophy in the pcdh15b model as the mutation is lethal by 15dpf, which is still before most rods mature.

      Reviewer #2 (Recommendations for the authors):

      In addition to defining the 'n' for animal and cell numbers (as well as methods of analysis - automated/masked), there are a few additional recommendations for the authors.

      (1) Expression of USH1 genes in larval zebrafish (Figure S1) is not very convincing. SC RNAseq data exists and argues against this cell type restriction.

      Based on extensive experience with WISH we are confident that our interpretation of the data are valid. Furthermore, analysis of the daniocell data base confirms that cdh23, ush1ga, ush1c (harmonin) and myo7aa all have either no expression in photoreceptors or very low levels especially compared to pcdh15b and cdhr1a.

      (2) The model in Figure 1 is great. The coloring was a bit confusing. Cdhr1 and axoneme are both in green, while Pcdh15 and actin are both in red. Can each have its own color?

      Changed pcdh15b color to blue

      (3) Figure 2A: Please explain the multiple bands in some lanes. What do the full blots look like?

      Full blots were uploaded to eLife and do not exhibit any additional bands. The multiple bands are likely due to ubiquitination or proteolytic cleavage of cdhr1a and have been documented in our previous publication (Piedade 2020).

      (4) Is "data not shown" permissible? (lack of compensation of cdh1b in cdh1a mutants) (nonsense-mediated decay of the mutant transcript).

      We have added a supplementary figure showcasing this data.

      (5) Figure 4: Is there a TEM phenotype in discs before 15dpf? One would think there would be...?

      Due to technical limitations, we have not been able to examine disc phenotypes prior to 15dpf.

      (6) Figure 5: How are calyceal processes discriminated from cortical/PM-associated actin? A bonafide calyceal marker seems to be needed. Espin or Myo3, for example.

      We discriminate to identify CPs as actin signal that originates at the base of the OS and travels along the OS. Pcdh15b is a bonafinde CP marker which we show overlaps with actin signal along CPs.

      (7) Figures 5A-J: How is actin staining for CPs discriminating between rod and cones??? Apical - basal level imaging? This could be better clarified.

      CP identification is based on co-stain for either rod or cone Oss

      (8) Figure 6: Het phenotype for pcdh15b+/- (cone OS length and CP length at 5 and 10 dpf) is surprising ... worth discussing. (Figures 6E, H).

      The discussion section has been updated to discuss this finding.

      (9) Last, the authors state "Data not shown" throughout the manuscript. I do not believe this is allowed for the journal.

      This data (cdhr1b expression in cdhr1a mutants as well as cdhr1a WISH in cdhr1a mutants) has been added as supplementary figures.

      Reviewer #3 (Recommendations for the authors):

      Major comments are addressed above and the most important is the need for a convincing demonstration of Cdhr1a localization on the outer segment and proximity to Pcdh15b. The SIM could be a powerful tool, but the images provided are impossible to assess without any basis for context. Could a membrane, Prph2, and/or actin label be added? And lower magnification views?

      Minor comments.

      (1) The mention of "short CPs" in rodents is not an accurate description. Particular rodents (e.g. mouse, rat) lack CPs altogether or have a single vestigial structure.

      We have adjusted the text to reflect this point.

      (2) Inconsistent spacing between numbers and units.

      We have corrected these inconsistencies

      (3) Missing references.

      We have added missing references

      (4) Indicate the mean or median for bar graphs.

      The materials and methods section now specifies that all of our graphs depict a mean value

      (5) Unclear how rods are distinguished from cones in the cone analysis if both are labeled with prph2 antibody.

      Rods are physiological separate from cones in zebrafish retina and therefore easily identified by location as well as their distinct pattern of actin staining.

      (6) Red and green should not be used together for microscopy images.

      (7) The diagram in Figure 1D is confusing because of the repeated use of red and green for disparate structures. Also, the location and structure of actin are misrepresented, as is the transition of disc structure during maturation in rods.

      We have adjusted the color of pcdh15b to blue.

    1. eLife Assessment

      In this important study, the authors provide a simple yet elegant approach to identifying therapeutic targets that synergize to prevent therapeutic resistance in ovarian cancer using cell lines, data-independent acquisition proteomics, and bioinformatic analysis. The authors convincingly identify several combinations of pharmaceuticals that were able to overcome or prevent therapeutic resistance in culture models of ovarian cancer, a disease with an unmet diagnostic and therapeutic need. This work will be of interest to researchers focused on ovarian cancer.

    2. Reviewer #1 (Public review):

      Summary:

      The authors provide a simple yet elegant approach to identifying therapeutic targets that synergize to prevent therapeutic resistance using cell lines, data-independent acquisition proteomics, and bioinformatic analysis. The authors identify several combinations of pharmaceuticals that were able to overcome or prevent therapeutic resistance in culture models of ovarian cancer, a disease with an unmet diagnostic and therapeutic need.

      Strengths:

      The manuscript utilizes state-of-the-art proteomic analysis, entailing data-independent acquisition methods, an approach that maximizes the robustness of identified proteins across cell lines. The authors focus their analysis on several drugs under development for the treatment of ovarian cancer and utilize straightforward thresholds for identifying proteomic adaptations across several drugs on the OVSAHO cell line. The authors utilized three independent and complementary approaches to predicting drug synergy (NetBox, GSEA, and Manual Curation). The drug combination with the most robust synergy across multiple cell lines was the inhibition of MEK and CDK4/6 using PD-0325901+Palbociclib, respectively. Additional combinations, including PARPi (rucaparib) and the fatty acid synthase inhibitor (TVB-2640). Collectively, this study provides important insight and exemplifies a solid approach to identifying drug syngery without large drug library screens.

      Weaknesses:

      The manuscript supports their findings by describing the biological function(s) of targets using referenced literature. While this is valuable, the number of downstream targets for each initial target is extensive, thus, the current work does not attempt to elucidate the mechanism of their drug synergy. Responses to drugs are quantified 72 hours after treatment and exclusively focused on cell viability and protein expression levels. The discovery phase of experimentation was solely performed on OVSAHO cell line. An additional cell line(s) would increase the impact of how the authors went about identifying synergistic targets using bioinformatics. Ovarian cancer is elusive to treatment as primary cancer will form spheroids within ascites/peritoneal fluids in a state of pseudo-senescence to overcome environmental stress. The current manuscript is executed in 2D culture, which has been demonstrated to deviate from 3D, PDX, and primary tumours in terms of therapeutic resistance (DOI: 10.3390/cancers13164208). Collectively, the manuscript is insufficient in providing additional mechanistic insight beyond the literature, and its interpretation of data is limited to 2D culture until further validated.

      Comments on revisions:

      The reviewer has no further recommendations for the authors.

    3. Reviewer #2 (Public review):

      Summary:

      Franz and colleagues combined proteomics analysis of OVSAHO cell lines treated with 6 individual drugs. The quantitative proteomics data was then used for computational analysis to identify candidates/modules that could be used to predict combination treatments for specific drugs.

      Strengths:

      The authors present solid proteomics data and computational analysis to effectively repeat at the proteomics level analysis that have previously been done predominately with transcriptional profiling. Since most drugs either target proteins and/or proteins are the functional units of cells, this makes intuitively sense.

      Weaknesses:

      Considering the available resources of the involved teams, preforming the initial analysis in a single HGSC cells is certainly a weakness/limitation. During the revision additional cell lines were used for verification.

      The data also shows how challenging it is to correctly predict drug combinations. In Table 2 (if I read it correctly) the majority of the drug combinations predicted for the initial cell line OVSAHO did not result in the predicted effect. It also shows how variable response was in the different HGSC cell lines used for combination treatment. The success rate will most likely continue to drop as more sophisticated models are being used (i.e., PDX). Human patients are even more challenging.

      It would most likely be useful to more directly mention/discuss these caveats in the manuscript. This was added to the discussion during the revision. Overall the authors have responded to previous suggestions.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors provide a simple yet elegant approach to identifying therapeutic targets that synergize to prevent therapeutic resistance using cell lines, data-independent acquisition proteomics, and bioinformatic analysis. The authors identify several combinations of pharmaceuticals that were able to overcome or prevent therapeutic resistance in culture models of ovarian cancer, a disease with an unmet diagnostic and therapeutic need.

      Strengths:

      The manuscript utilizes state-of-the-art proteomic analysis, entailing data-independent acquisition methods, an approach that maximizes the robustness of identified proteins across cell lines. The authors focus their analysis on several drugs under development for the treatment of ovarian cancer and utilize straightforward thresholds for identifying proteomic adaptations across several drugs on the OVSAHO cell line. The authors utilized three independent and complementary approaches to predicting drug synergy (NetBox, GSEA, and Manual Curation). The drug combination with the most robust synergy across multiple cell lines was the inhibition of MEK and CDK4/6 using PD-0325901+Palbociclib, respectively. Additional combinations, including PARPi (rucaparib) and the fatty acid synthase inhibitor (TVB-2640). Collectively, this study provides important insight and exemplifies a solid approach to identifying drug synergy without large drug library screens.

      Weaknesses:

      The manuscript supports their findings by describing the biological function(s) of targets using referenced literature. While this is valuable, the number of downstream targets for each initial target is extensive, thus, the current work does not attempt to elucidate the mechanism of their drug synergy. Responses to drugs are quantified 72 hours after treatment and exclusively focused on cell viability and protein expression levels. The discovery phase of experimentation was solely performed on the OVSAHO cell line. An additional cell line(s) would increase the impact of how the authors went about identifying synergistic targets using bioinformatics. Ovarian cancer is elusive to treatment as primary cancer will form spheroids within ascites/peritoneal fluids in a state of pseudo-senescence to overcome environmental stress. The current manuscript is executed in 2D culture, which has been demonstrated to deviate from 3D, PDX, and primary tumours in terms of therapeutic resistance (DOI: 10.3390/cancers13164208). Collectively, the manuscript is insufficient in providing additional mechanistic insight beyond the literature, and its interpretation of data is limited to 2D culture until further validated.

      We appreciate your positive remarks on the use of NetBox, GSEA, and human curation for predicting anti-resistance effects of second drugs. Regarding the weaknesses you identified:

      Mechanistic Insight: We agree that our current work interprets findings using prior published knowledge and does not attempt to infer detailed mechanisms of drug resistance of the nominated drug combinations. Our primary goal with this study was to establish a robust, unbiased proteomic and computational pipeline for proposing anti-resistance drug combinations, rather than to fully characterize the downstream molecular effects for each combination or to prove causation. To get closer to mechanistic insight, meaning detailed hypotheses of causative interactions, one would need to investigate anti-resistance effects in other pre-clinical materials as a crucial next step for the most promising combinations identified. This was out of scope for us. We assume the proposed combinations are useful for focussed follow-up in the community.

      Discovery Phase on a Single Cell Line: Our discovery phase was focused solely on the OVSAHO cell line due to its resemblance to surgical ovarian cancer samples. Including additional cell lines in the initial proteomic-response discovery phase plausibly would have enhanced the generalizability. But this was not done due to resource constraints. However, we did perform more extensive validation of the effect of drug combinations on proliferation in several cell lines to explore broader applicability.

      2D Culture Limitations: We are fully aware of the limitations of 2D cell culture models, especially in the context of ovarian cancer, where in clinical reality interactions with the microenvironment and other effects can have significant roles in therapeutic resistance. Adn we recognize that in lab experiments 2D culture does not fully recapitulate the complexities of 3D tumors, PDX models, or primary patient tumors. We have added citations to the relevant literature (including the reference you provided), and have emphasized in the Discussion that our findings serve as a strong foundation for future experimental tests (validation) in more physiologically relevant experimental model systems.

      Reviewer #2 (Public review):

      Summary:

      Franz and colleagues combined proteomics analysis of OVSAHO cell lines treated with 6 individual drugs. The quantitative proteomics data were then used for computational analysis to identify candidates/modules that could be used to predict combination treatments for specific drugs.

      Strengths:

      The authors present solid proteomics data and computational analysis to effectively repeat at the proteomics level analysis that have previously been done predominantly with transcriptional profiling. Since most drugs either target proteins and/or proteins are the functional units of cells, this makes intuitive sense.

      Weaknesses:

      Considering the available resources of the involved teams, performing the initial analysis in a single HGSC cell is certainly a weakness/limitation.

      The data also shows how challenging it is to correctly predict drug combinations. In Table 2 (if I read it correctly), the majority of the drug combinations predicted for the initial cell line OVSAHO did not result in the predicted effect. It also shows how variable the response was in the different HGSC cell lines used for the combination treatment. The success rate will most likely continue to drop as more sophisticated models are being used (i.e., PDX). Human patients are even more challenging.

      It would most likely be useful to more directly mention/discuss these caveats in the manuscript.

      Thank you for your summary and positive comments. Regarding the weaknesses you identified:

      Initial Analysis in a Single Cell Line: We concur with your assessment that performing the initial analysis in a single HGSC cell line (OVSAHO) is a limitation. As mentioned in our response to Reviewer #1, resource limitations caused this decision, and we acknowledge that a broader initial screen would have strengthened generalizability. We added this limitation in the discussion section, emphasizing use of diverse cell lines in the initial protein response profiling as an area for future work.

      Challenges in Predicting Drug Combinations and Variability: We thank the observation regarding the challenges in predicting the effect of drug combinations and the variability of antiproliferative effects observed in different HGSC cell lines (Table 2). As with any predictive method, our computational-experimental pipeline is not guaranteed to identify with absolute certainty additive or synergistic interactions, but generates data-informed hypotheses to be considered in the presence of other available observations. We now emphasize in the Discussion that while our computational pipeline provides plausible anti-resistance candidates, the precise results (extent of additivity or synergy) differ in different cell lines. This underscores that experimental validation across diverse physiological models, such as PDXs or organoids (not just additional cell lines) is an essential criterion of validity of the generated hypotheses. And we underscore the (obvious) challenge of the ultimate translation of pre-clinical experiments to therapeutic effects in humans.

      In revision, we have clarified in detail the expectation of predicted synergy implied by the reviewer’s comment, “the majority of the drug combinations predicted for the initial cell line OVSAHO did not result in the predicted effect”. This reflects a misunderstanding of our goals. The predictions are for drug effects that are anti-resistant, such that the proteomic response to one drug is counteracted by the second drug. The predicted effect is not synergy. Indeed, useful anti-resistance effect does not require synergy - additivity is sufficient: if cells are resistant to the original drug, the second drug plausibly still has antiproliferative effect, as it targets the cellular processes that are increased in activity (upregulated) in response to the first drug. So we deleted the red synergy color in Table 2 to avoid the potential conclusion from our results that without synergy, there is no benefit to a drug combination. In fact, additive drug combination effects are in themselves beneficial. For clarity on this point, added coloring in Table 2 to highlight the small number of combinations that did not work well in that the combination was clearly antagonistic, using a combination index CI >= 2.0 cutoff; we clarify this point in the Discussion.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) Figure 2b. This figure would be more impactful if presented as an upset plot with the same Venn diagram embedded. I am not sure Figure 2C accurately supports the statement : "Frequently affected proteins generally had expression level changes in the same direction across all drug perturbations (Figure 2c), indicating a potential general stress response. ". It would be beneficial if the authors could present the data in a way that shows the number of genes with similar directional groupings. Likewise, the color scheme for this figure is hard to interpret as grey is the most negative value and values are preselected for absolute fold-change. Please consider colors with a stronger contrast.

      Authors should consider uploading MS files to the PRIDE or MASSIVE repository.

      We have addressed these very useful suggestions. We have edited Figure 2b to include the requested upset plot. It serves to illustrate the intersection of proteins responding to different perturbation conditions; due to figure space constraints, we limit the figure to entries with counts of at least 15. We have added the number of proteins with consistent directional changes in the figure 2c caption and the text.

      For Figure 2c, we have edited the color bar legend to better reflect the colors that appear in the heatmap.

      We have added our mass-spectrometry drug-response dataset to the ProteomeXchange Consortium via PRIDE with accession number PXD066316.

    1. eLife Assessment

      This valuable computational study presents a conceptually simple and biologically plausible reinforcement-learning framework for motor learning based on policy-gradient methods. The evidence supporting the conclusions is convincing, including rigorous mathematical derivations of learning rules for the mean and variance of motor commands and simulation results for three sets of experimental data, based on three different motor learning tasks from the literature. However, there is a lack of a clear description of the specific conditions under which this framework yields unique mechanistic insights or predictive values, hence falling short of qualifying as a "general theory of motor learning". The work will be of interest to researchers in computational motor learning and motor neuroscience.

    2. Reviewer #1 (Public review):

      Summary:

      This study proposes a simple and universal reinforcement-learning framework for understanding learning in complex motor tasks. Central to the framework is a policy-gradient algorithm, in which motor commands are updated not via the gradient of the reward with respect to policy parameters, but via the gradient of the policy itself, scaled by reward information. The authors demonstrate that this scheme can reproduce learning dynamics that have been reported in previous empirical studies.

      Strengths:

      The key contribution of this study lies in its application of a policy-gradient algorithm to describe motor learning processes. This idea is biologically plausible, as computing the gradient of the policy with respect to its parameters is likely to be substantially easier for the nervous system than computing the gradient of the reward with respect to policy parameters. The authors present three representative examples showing that this scheme can capture several aspects of motor learning dynamics. Notably, providing such a unified description across different tasks has been difficult for conventionally proposed learning frameworks, such as supervised learning.

      Weaknesses:

      While this scheme is valuable in that it captures certain aspects of learning dynamics, I find that its overall significance is limited for the following reasons.

      (1) The empirical results examined in this study primarily demonstrate that motor learning drives performance toward the spatial task goal while reducing variability. Given that the policies are expressed using Gaussian distributions and that their parameters (i.e., the mean and covariance matrix) are updated during learning, it is not surprising that the proposed scheme can reproduce these results by fitting the parameters to the data.

      (2) The proposed framework assumes that the motor learning system relies on the gradient of the policy with respect to its parameters. However, I am not convinced that this assumption is always appropriate, because in all three empirical studies examined here, explicit spatial error information is available. In such cases, the motor learning system could, in principle, compute the gradient of the error with respect to the policy parameters directly, without relying on a policy-gradient mechanism.

      (3) Most importantly, it remains unclear how the proposed scheme advances our understanding of the underlying learning mechanisms beyond providing a descriptive account of the learning process. While the framework offers a compact mathematical description of learning dynamics, it is uncertain how it can yield novel mechanistic insights or testable predictions that distinguish it from existing learning models.

    3. Reviewer #2 (Public review):

      Summary:

      In this study, Haith applies, and to some extent extends, the theoretical framework of policy gradient (PG) and the derived REINFORCE learning rules to human motor learning. This approach is coherent because human motor skill learning is characterized by improvements in both accuracy and precision (the inverse of variance), and REINFORCE provides update rules for both the mean and the variance of the motor commands.

      Weaknesses:

      The mean update (equation 4) is given in task space (i.e., angle and velocity for the skittle task), but the covariance update (equation 5) is given in eigenvector space. This formulation appears to have been provided for computational convenience, as it ensures that the variances are always positive by exponentiating the eigenvalues. However, this eigenspace formulation is somewhat artificial and complex (notably the update rule for the orientation of the covariance matrix) and seems far from biological reality. A simpler alternative, suggested by the author, is to provide the full covariance matrix, including crossed terms, and derive equations to update the diagonal variance terms and the cross-terms (perhaps after a transformation to keep all elements positive if needed). This would provide a simpler and more biologically plausible update to the covariance matrix terms, in the spirit of the original REINFORCE algorithm. The author suggests that he has derived the update rule for the cross terms, so this should be relatively easy to write and update, especially for the skittle learning rules. If the author wishes to keep their rules in simulations, then the two mathematical rules could be presented in the methods or a supplementary material section.

      The discussion about binary rewards and the increase in variance in previous experiments is potentially interesting. However, I do not understand why variance cannot increase with the policy-gradient RL update? Surely, equation 5 can lead to both an increase and a decrease in variance depending on the reward prediction error and the noise (for example, suppose the noise at trial i is small and leads to a smaller reward than the baseline; variance would increase). It would be interesting to see detailed simulation results for the skittle task showing changes in both mean and variance across a few consecutive trials, with both increases and decreases in reward prediction errors. These results could then be compared in simulations with those of a task with discrete binary rewards.

      Generalization is a major feature of human learning, but it is not discussed or studied here. In fact, in the de novo task simulations, there can be no generalization because the values are modeled as running averages for each target rather than derived from a critic network. Can the author discuss this point and, ideally, show generalization results in simulations, say in the skittle task?

      The application of the model to reproduce the Shmuelof et al. data is, at the same time, justified (because one of their main results is an improvement in precision, which Policy Gradient directly addresses) and somewhat "forced," as the author approximates curved movements with a series of straight-line movements. The author therefore needs to specify multiple via points with PG updating and a reward function that also enforces smoothness. The justification for the Guigon 2023 model seems somewhat artificial because it mainly applies to slow movements. Can the author comment and discuss alternatives that do not require via points, drawing from the robotics literature if needed (Schaal's Dynamic Movement Primitives come to mind, for example).

      Policy Gradient requires both a "noisy" and a clean "pass", making it non-biological in its simplest form. Legenstein et al. (2010) and Miconi (2017) provided biologically plausible forms for the mean update. Since Policy Gradient is proposed as a model of human motor learning, can the author discuss the biological plausibility of the proposed learning rules and possible biologically plausible extensions?

    1. eLife Assessment

      This study addresses an important gap in drug discovery by delivering a rigorous, large-scale evaluation of widely used co-folding methods for predicting ligand-bound protein complexes and virtual screening. A key strength is the comprehensive benchmarking framework, which leverages structures and chemical compounds that were absent from the AI models training set, thereby providing particularly compelling and unbiased evidence of co-folding performance. The findings clearly delineate the complementary roles of deep learning-based co-folding and physics-based docking, offering practical guidance for their rational integration into drug discovery workflows. Although the conclusions are convincing, improvements in the test cases, presentation, and usability can further strengthen the overall impact.

    2. Reviewer #1 (Public review):

      The authors conducted a comprehensive benchmarking and evaluation of co-folding platforms, including AlphaFold3, Boltz-2, Chai-1, and the docking algorithm Dock3.7, which employs a physics-based scoring function that incorporates van der Waals interactions, electrostatics, and ligand desolvation energies. The system of interest was the SARS-CoV-2 NSP3 macrodomain (Mac1), an increasingly popular antiviral target, and the ligand sets comprised 557 unseen ligand poses (keeping the training for these co-folding platforms in mind). Additionally, the authors investigated whether the co-folding models could distinguish true ligands from non-binding small molecules. The study is thorough, with extensive statistical support and consensus across multiple metrics (chemoinformatics for quantifying ligand similarity and efficacy). The questions that the authors aim to address are whether the co-folding models struggle with memorization, whether they can distinguish between a true and a false binder, whether they replicate experimental binding affinities and efficacy, and how they compare to the physics-based docking algorithm (Dock3.7).

      Strengths:

      Overall, this is a scientifically solid paper. The work is highly detailed and well executed, featuring thorough data analysis and statistical assessment.

      Weaknesses:

      My main concern is that the study's aim is a bit unclear. Modern benchmarking studies comparing physics-based docking with deep learning-based co-folding approaches (e.g., AF3, Boltz-2, Chai-1, and others) are increasingly expected to go beyond aggregate performance metrics. In addition to rigorous dataset construction, transparent methodology, and appropriate statistical evaluation, high-impact benchmarks typically provide actionable guidance on when each method class is most appropriate, reflecting their distinct inductive biases and practical constraints. Failure-mode analyses that link performance differences to protein flexibility, ligand chemistry, or binding-site characteristics are particularly valuable, as they move comparisons beyond "scoreboard" assessments toward mechanistic understanding. While full biological validation is not expected, qualitative interpretation grounded in physical and biological principles strengthens conclusions. Providing reproducible workflows or reference pipelines is not mandatory, but it is increasingly viewed as a best practice because it facilitates adoption and helps contextualize results for practitioners.

    3. Reviewer #2 (Public review):

      Summary:

      The manuscript by Kim et al. evaluates the performance of three modern AI-based methods in predicting complex structures and binding affinities between proteins and chemical compounds. An honest 'prospective' evaluation is achieved by studying benchmark structures and chemical compounds that did not exist in the PDB at the time the AI structure prediction models (AlphaFold3, Chai-1, Boltz-2) were trained.

      Strengths:

      (1) The study addresses an important question in modern computational biology and drug discovery, and establishes the strengths and limitations of the three tools in solving various computational chemistry tasks, including compound pose prediction, active-inactive discrimination, and potency ranking.

      (2) The conclusions are based on examination of four separate targets and respective compound datasets, where for one of the targets, the authors also obtained numerous X-ray structures to serve as experimental answers for the binding pose prediction task.

      (3) The study reports relationships between structure prediction confidence, predicted energies (DOCK3.7), and affinity predictions (Boltz-2) with the geometric accuracy of compound pose prediction as well as the experimentally measured potency.

      (4) One of the key findings is the limited ability of co-folding methods to predict conformational rearrangements, which does not correlate with their ability to predict binding poses of the compounds inducing these rearrangements.

      (5) The findings could serve as useful guidelines for computational chemists in selecting appropriate software and scoring schemes for each task.

      Weaknesses:

      While I consider this a solid study, several aspects would need to be addressed to make it really strong:

      (1) DOCK3.7 docking and scoring experiments were performed using one experimental structure of Mac1, selected from dozens of structures based on a criterion that is not sufficiently well justified. For sigma2 receptor, dopamine D4 receptor, and AmpC β-lactamase, it is not clear which structures or models were selected for docking at all. It is well known that geometry predictions, scoring, and active-inactive ROC AUCs are all strongly influenced by the selected structure. It would be important to attempt Mac1 docking using all available experimental Mac1 structures, or at least against representative structures in various conformations; it would also be quite insightful to compare results to docking of the same compound sets to AF3, Boltz-2 and Chai-1 predicted structures of Mac1. Same goes for the docking studies of sigma2, D4, and AmpC β-lactamase.

      (2) For binding affinity predictions, as a control, authors should consider compound co-folding with an unrelated protein, or even with a pseudo-peptide that consists of a few random single amino acids - this would provide an honest baseline for such predictions.

      (3) ROC curves Figure 3 and elsewhere should be shown, and AUCs quantified/reported on a log or square-root scaled x-axis, to emphasize early enrichment, which is the area of practical significance for these predictions. For example, Figure 3A currently suggests that the pose prediction performance of AF3 exceeds that of Boltz-2 whereas the early enrichment is clearly better for Boltz-2.

      (4) 'Trained set' in figures and text should probably be 'training set'? Or otherwise explain this new term the first time it is introduced.

      (5) Figure 1 illustrates a projection onto the first two principal components of a space that apparently had only one (scalar) metric for each compound pair (% maximum common substructure or Tanimoto coefficient); the authors need to better explain the principle behind this analysis and visualization.

    4. Reviewer #3 (Public review):

      Summary:

      This study's core conclusions are well-supported by data. It is shown that co-folding outperforms docking in known ligand pose/affinity prediction (validated by RMSD and IC₅₀ correlation), struggles with false-positive discrimination in virtual screens (lower AUC values), and is complementary to docking (non-correlated errors, distinct strengths in drug discovery stages).

      Strengths:

      (1) Unprecedented prospective design with 557 novel Mac1-ligand complexes ensures rigorous, independent evaluation of co-folding methods.

      (2) Comprehensive comparison of 3 co-folding tools (AlphaFold3, Chai-1, Boltz-2) with DOCK3.7 across diverse targets and metrics enables nuanced performance assessment.

      (3) The study clearly demonstrates complementary roles of co-folding (superior pose/affinity prediction for known ligands) and docking (better hit prioritization), and addresses deep learning memorization concerns via ligand similarity analysis.

      Weaknesses:

      (1) Limited generalization to diverse protein families (e.g., no ion channels/transporters).

      (2) Ambiguity in the mechanism underlying co-folding's failure to predict rare conformational changes.

      (3) Virtual screen comparison is unbalanced (docking-prioritized hit lists bias results).

    1. eLife Assessment

      In this study, the authors describe the degradation of HDACs in late HSV-1 infection and attempt to link this phenomenon to HDAC export to the cytoplasm and to DNA damage response. However, the evidence is incomplete, as many of the experiments are lacking in rigor. As a result, mechanistic links to the proposed model are weak.

    2. Reviewer #1 (Public review):

      Summary:

      In this study, the authors propose that HSV-1 infection degrades the class I histone deacetylases HDAC1 and HDAC2. The MDM2 E3 ubiquitin ligase from the DNA damage response pathway is responsible for ubiquitinating these HDACs that are subsequently degraded via proteasomes. The authors hypothesize that HDAC degradation will cause hyperacetylation of viral chromatin and enable viral gene transcription.

      Strengths:

      The ubiquitination of HDAC1 & HDAC2 by Mdm2 and the mapping studies are clear.

      Weaknesses:

      (1) Degradation of HDACs is observed late, at least 12-24 h post-infection (1 PFU/cell). Viral genes have been transcribed by that point, and the virus has replicated its genome. The kinetics do not match the proposed model.

      (2) The authors need to connect these findings with their story. As of now, these findings are correlative. For example, what is the impact of MDM2 depletion on viral gene expression and progeny virus production? Leptomycin B is not specific to the HDAC cytoplasmic translocation, and its effect on the infection could be due to its effect on ICP27.

      (3) The time point when the inhibitors were added to the cultures has not been stated in any experiment. If inhibitors were added with the virus, viral gene expression would be blocked.

      (4) The authors need to present late gene expression data in all the experiments where drugs have been used.

      (5) Figure 1A, ICP4 is not detected up to 12 hours post-infection of HeLa cells with 1 PFU/cell. This cannot be true.

      (6) Leptomycin B blocks nuclear/cytoplasmic shuttling of ICP27 that brings viral mRNAs to the cytoplasm to be translated. So, the effect of LMB is not specific to the HDACs.

      (7) The key experiment is to use the degradation-resistant form of HDAC1 to evaluate its impact on viral gene transcription.

      (8) In the experiment where Mdm2 was depleted, the authors need to demonstrate the effect on the infection. ICP4 expression is not enough. How about growth curves? After Mdm2 depletion, ICP4 expression increases, which may contradict the authors' findings. An analysis of alpha and gamma gene expression is important.

      (9) Why did the authors analyze a liver HSV-1 infection and not a more relevant skin infection?

    3. Reviewer #2 (Public review):

      Summary:

      The authors discovered that HDAC1/2 are degraded in HSV-1 and PRV infections. They attempted to establish a new mechanism by which HDAC1/2 are translocated to the cytoplasm to be degraded in HSV-1 infection, and the degradation causes changes in histone acetylation to affect the DDR pathway.

      Strength:

      (1) Interesting findings of HDAC1/2 degradation during HSV-1 and PRV infection, and it may impact more than the virology field.

      (2) Significant work to identify the ubiquitin site in HDAC1/2 and K63 linkage.

      Weaknesses:

      (1) Insufficient evidence to support the mechanism described by the authors.

      (2) Expansion of the conclusion to alphaherpesvirus without studying the intended mechanism in PRV infection.

      Overall, there may be a correlation between HDAC1/2 level, ATM/ATR phosphorylation, and HDAC1 translocation during the HSV-1 infection. However, core evidence supporting the mechanism that a) HDAC1 export causes its degradation, b) degradation of HDAC1 causes histone acetylation changes and DRR activation has not been sufficiently demonstrated.

    4. Reviewer #3 (Public review):

      The authors state that infection of cells by the alphaherpesviruses HSV-1 or PRV leads to a proteosome-dependent reduction in levels of HDAC1 and HDAC2 and that this leads to chromatin hyperacetylation, a DNA damage response, and greater replication of these viruses. Previously, other authors reported no change in levels of HDAC1 and HDAC2 after HSV-1 infection of human cells, but this paper is neither cited nor commented on in this new submission. The experiments are poorly designed. For instance, most of the time points analysed are way beyond the time needed for HSV-1 replication and are therefore not biologically relevant. The infections are done with a dose of virus that does not ensure that all cells are infected synchronously, but rather infection spreads from cell to cell with multiple rounds of replication. Some essential controls are missing. Additionally, this reviewer feels that the data presented do not support the conclusions drawn. Currently, links are not established between a reduction in HDAC1/ 2 and other phenomena such as hyperacetylation of histones, a DDR, and altered virus replication. The paper does not identify which HSV or PRV protein(s) induce reduction in HDACs, nor how the HDACs mediate antiviral activity; what are the HSV-1 or PRV protein targets? Lastly, the paper is not well prepared, and it does not adequately refer to prior literature.

    1. eLife Assessment

      This useful study examines patterns of clonal reproduction and somatic mutations in 'Pando', a massive, quaking aspen clone consisting of ~47000 stems. Because the study relies on relatively low-coverage, reduced-representation genomic resequencing data for the detection of somatic mutations, the evidence provided for several of the primary conclusions about clone age and the relationship between mutation accumulation and geographic distance is incomplete.

    2. Reviewer #1 (Public review):

      Summary

      The authors use reduced-representation sequencing (GBS) across samples from the quaking aspen clonal stand Pando to identify putative somatic mutations, which were used to estimate clone age, and evaluate whether somatic variation shows spatial structure across the grove. This is a compelling and charismatic system to look at somatic mutation in plants. They report little sharing of putative somatic mutations as a function of distance and interpret this as evidence for weak mutation transmission or homogenization over time, potentially driven by rapid root growth and clonal spread dynamics. They use mutations to estimate clone age. The authors are generally upfront and commendably transparent about limitations in sequencing depth and mutation calling. The paper addresses an interesting research system, but struggles to overcome limitations in the suitability of the data.

      Strengths.

      This is a fantastic system and an interesting set of questions. The authors' GBS data does a great job distinguishing Pando from its neighbors, which is an important first step in studying the history of this clone.

      The manuscript is upfront and highlights the need for improved data to refine inference, for example: "Higher-coverage whole-genome sequencing, and ideally single-cell sequencing of defined meristem lineages, will be needed to refine mutational and evolutionary parameter estimates in this iconic organism."

      It also states that "either we are missing roughly 80% of true somatic mutations or only 20% of the mutations we detect are true positives."

      I appreciate that the authors report an age estimate range that considers the breadth of potential false negatives and positives.

      Weaknesses

      I am still not sure whether the paper overcomes issues with the use of GBS for somatic mutation calling.

      I found it difficult to reconcile the manuscript's description of the call set as "conservative" with the reported validation tests (calibrated by looking at retained variants detected in 2 of 8 technical replicates). How was this threshold determined? A mutation with 2/8 has quite low reproducibility, which could reflect either substantial false negatives under low depth (true variants frequently dropping out) or false positives that recur sporadically due to library - or sequencing-specific artifacts. Without stronger internal diagnostics or external validation, it is hard to determine which applies here.

      The GBS sequence space and genomic distribution could be more clearly explained. According to the methods, "The total number of base pairs sequenced(129,194,577) was estimated using angsd, and reduced following the proportion of base pairs that we filtered out because of low coverage (48%)." What does the 129M basepairs represent? Is that 129M/genome length, or is it the number of aligned basepairs (i.e., 1M genome covered x129 depth)? In addition, summarizing where GBS loci fall across the genome, genic vs intergenic vs TE; repetitive vs unique, since these can have substantially different somatic mutation rates (Meyer et al. 2025). Without additional summary/descriptive statistics, it is hard to interpret both missingness and "rate".

      Statistical concerns about some results. In the Figure 3 legend, the authors state that the sample-level relationship between shared variants and distance is significant: "Pearson correlation coefficient ... is −0.02, 95% CI = [−0.05, 0.00], which is significantly different from a randomized distribution (P < 0.001) (B)." However, as plotted in Figure 3B, the observed correlation (−0.02) appears to fall well within the bulk of the randomized distribution of correlation coefficients. If the reported P value is intended to be permutation-based (i.e., the tail probability under the randomized null), it is unclear how P could be < 0.001 given that the observed value does not appear extreme relative to the null.

      The developmental program of plant stem cell layers is essential, but not discussed much. In a root-spreading clone, expectations about mutation sharing depend strongly on how new ramets arise developmentally (root-derived meristem initiation) and how layered meristems partition mutations across tissues (e.g., L1/L2/L3). I was surprised there was not a substantial discussion of the details about the layer specificity of somatic development and mutation accumulation in plants. Especially relating to mutations that would be shared between roots/shoots around potential layer-specific growth of roots. The current analysis seems to focus on comparisons within tissue types (e.g., leaves between ramets), but did not report informative tests between tissue and within-ramet (e.g., in heavily sampled trees, whether a ramet's root, shoot, leaves, share a subset of variants; whether neighboring ramets share root-lineage variants more than shoot-lineage variants). It would help to articulate expectations and clarify what the data can and cannot test. Relatedly, for "mutation rates," in aging material, it would be good to discuss which meristem layer(s) each tissue is likely sampling and how layer-specific mutation dynamics (e.g., reported differences between L1 vs L2 lineages) could influence rate and therefore age estimates (Goel et al. 2024, Amundson et al. 2025).

      Developmental mosaicism makes expected allele fractions lower than discussed in the paper. The supplement states, "However, because the Pando clone is triploid, it reduces our expectation for fixation of a mutation to 0.33", but this ignores layer-specific stem cells in plant development. True that if calls are made against a haploid reference, then a new somatic mutation in a triploid background is expected around ~1/3 allele fraction - but only if fixed in 100% of cells. Layer-specificity (e.g., L1 vs L2 vs L3 restriction) or polyclonal founding events will push expected allele fractions substantially lower. Therefore, at ~12-14× depth (or min of 4x), these allele fractions translate into only a handful (or even 0) of alternate reads (<<33% is expectation).

      Within-tree replicate consistency was unclear. The manuscript hints at multiple samples/replicates per tree (e.g., Figure S2), but it is not clear how often the same putative somatic variants are recovered across samples from the same ramet and tissue. A simple reproducibility summary would be extremely helpful: for variants called in one sample, what fraction are recovered in other samples from the same tree (by tissue), what variant allele fractions, and how do their spectra compare to mutations unique to a single sample?

      The manuscript did not provide supplemental tables or mutation calls. Supplemental tables containing pre-filter and/or post-filter calls (or some other structured data file with flags indicating various quality metrics, REF vs ALT depths at minimum, REF call, and ALT call) would substantially improve transparency and ability to evaluate the work.

    3. Reviewer #2 (Public review):

      Summary:

      The topic of the paper is intriguing as it sets out to age one of the potentially largest living organisms, a tree clone (Pando), using shallow genome resequencing of a large number of replicate samples. The key result is that the Pando clone is several tens of thousands of years old, which is of high-interest to plant genomics and evolutionary ecology.

      Weaknesses:

      Unfortunately, the claims are not matched by the available data and their analysis. Probably, the results can also not be resurrected using modified analyses, as the available data are not suited to reliably detect somatic genetic variation as a means to age-clonal plants.

      In order to reliably age clones, one needs to consider the full process by which clone mates genetically diverge from one another over time, which starts with a plant's apical meristem (SAM). From this, all above-ground tissues such as twigs and branches, as well as leaves, are derived, which has been beautifully worked out now in oaks and many fruit trees (e.g., doi: 10.1101/2023.01.10.523380 ; 10.1101/2024.01.04.573414). For the accumulation and propagation of fixed somatic genetic variation, only the processes in the SAM matter. Hence, it does make little sense to look at tissue-specific mutations unless one is invoking non-cell division induced mutations through UV light. Those, however, would remain undetected with the present low-coverage sequencing as they cannot leave the mosaic status any more, as that tissue is essentially non-dividing.

      Somatic genetic drift (https://www.nature.com/articles/s41559-020-1196-4) is the foundation for the fixation of somatic genetic variation and hence, for ageing (plant) clones. It requires quantitative modeling of the processes at the cell-line level when new modules, here, aspen trees are formed, in particular N (cell population size) and N0 (founder cell size).

      Calibrations have to be made using the mutation and fixation rate at the somatic cell lineage level, ideally also with some empirical data. In trees such as aspen, it would be very easy to obtain calibration points of branch tips that have physically and thus genetically diverged upon a defined TCA to directly determine the rate of accumulation of somatic genetic variation by direct dendrochronology (i.e., counting tree rings).

      Instead, in the present work, a mutation rate from another tree species is taken, which will introduce a lot of uncertainty into the estimates, given that tree SAMs divide at a very different pace (see doi 10.1093/evolut/qpae150). It is clear that a small difference in the assumed mutation rate, e.g., a higher one, would conversely reduce the age estimate considerably.

      I am doubtful that a conventional phylogenetic model based on coalescence, such as the one employed here, can be utilized, as it assumes a sexually recombining population and hence variable sites. A model simulation on an asexually evolving population would be needed to check this.

      In order to reliably call somatic genetic variation, a decent coverage of short-read sequences is needed, definitely > 15x, which was achieved in the present dataset. This is particularly relevant as a fixation in one of the three haploid chromosome sets would just amount to a read frequency of only 0.33. A coverage of only 4x reads per called site seems very low to me; in other words, the filtering steps do not seem to be very rigorous to me. It is also difficult to follow the logic of several ad hoc adjustments that were made to compensate for the low coverage of sequencing, in particular, the common panel and the replicate identical samples. Why chose 80% in the latter?

      There are alternative, non-sequencing-based ways to double-check the accuracy of somatic SNP calls (e.g., described here https://www.nature.com/articles/s41559-020-1196-4), which could have been employed at least once to evaluate the error rates for the specific sequencing strategy.

      I also suggest that for any future study, reference to mutation callers developed for cancer somatic mutation detection should be employed, which are now increasingly used both in clonal plants and trees for that purpose.

      What worries me is that there is a poor correlation between physical and genetic distance. This lack of correlation among spatial and genetic structure, for example, the star-like phylogeny presented in Figure 6d, indicates a large fraction of false positives rather than some special, as yet unexplained processes of local mutation accumulation that the authors claim to have discovered.

      Finally, the work is not properly embedded into the current literature. For example, recent developments of molecular clocks were not considered, such as the development of a dedicated somatic genetic clock that precisely addresses this question (https://www.nature.com/articles/s41559-024-02439-z). Also, older but nevertheless significant work that aged aspen clones using microsatellite markers is not mentioned (http://dx.doi.org/10.1111/j.1365-294X.2008.03962.x).

    1. eLife Assessment

      This important study explores whether complex structures that are lost during evolution can re-evolve, which is a long-standing debate in evolutionary and developmental biology. The authors demonstrate that re-evolution can occur if the gene regulatory network that underlies the development of complex traits is maintained. The evidence supporting its conclusions is solid and the work will be of interest to those studying the evolution and development of complex traits.

    2. Reviewer #1 (Public review):

      Summary:

      The manuscript by Vasquez-Correa and colleagues describes the expression pattern of the ocelli (simple eye) gene regulatory network in ants. They correlate the expression pattern of these genes with the presence and absence of ocelli in different classes and species of ants. The presence of ocelli is a polyphenic trait in ants - understanding the molecular and developmental underpinnings of polyphenic traits is of significant interest to evolutionary biologists, developmental biologists, and ecologists. The authors propose that the presence of the latent expression of the ocellar network in classes of ants that do not display ocelli in the adults may underlie the re-evolution of ocelli within the ant lineage.

      Strengths:

      The strengths of the manuscript are that it is well written, the images are of the highest quality, and the data support the conclusions of the authors.

      Weaknesses:

      One improvement that could be made is to include imaginal discs of the queen ants as well as scanning electron images of the ocelli of the queen ant to match the pupal stage images of the worker and soldier ants. A second improvement is to attempt a gene knockdown using RNAi or similar methods to ensure that the genes that are being studied are, in fact, responsible for ocelli development in the ant.

    3. Reviewer #2 (Public review):

      Summary:

      The manuscript titled "Latent gene network expression underlies partial re-evolution of a polyphenic trait in the worker caste of ants" by Vasquez-Correa et al. aimed to study genetic mechanisms underlying developmental plasticity, especially binary polyphenism in queen vs worker ant castes. This is an interesting question regarding the extent to which phenotypic traits were altered, lost or regained, and how molecular pathways (upstream vs. downstream) can facilitate this process.

      In ants, reproductive castes (queens and males) develop wings as well as 3 ocelli for mating flights and other activities, while worker castes are wingless, and in some species, they have either no or a reduced number of ocelli. The phylogenetic analysis showed that in the Camponotini ant clade, the one-ocellus phenotype re-evolved in three species independently. The authors analyzed the conserved developmental pathways between Drosophila (well-established) and ants using HCR (a high-quality in situ hybridization technique). They found that although upstream genes for the development of ocelli (otd and hh) showed similar expression between castes, downstream genes (toy, eya, and so) had reduced or no expression in workers of C. floridanus, and this differential expression may lead to partial or complete loss of ocelli. Consistently, workers develop rudimentary tissues, suggesting that they initiate the ocellus developmental process but somehow stop it before adulthood.

      Strengths:

      Evo-devo approaches to reveal conserved molecular pathways of ocellus development. High-quality HCR provided convincing evidence of the expression of key genes in ocelli, eyes and antenna throughout larval development.

      Using HCR, the authors showed differential expression of downstream genes in males vs. soldiers vs. minor workers of C. floridanus, which might explain phenotypic differences between castes.

      Weaknesses:

      Although the molecular pathway is conserved, the mechanism underlying the lack of ocelli in workers remains unclear. In C. floridanus, it could be explained by the evidence of no expression of certain developmental genes, but in other species, e.g. Polyrachis rastellata, is their expression intact, or reduced? There is no control male.

      In addition, HCR in species with partial re-evolution (if their genomes have been sequenced) would be useful to understand the mechanism. For example, there might be differential spatial expression between medial and lateral ocelli.

    4. Reviewer #3 (Public review):

      Summary:

      This paper examines the loss and re-evolution of specific organs during the evolution of ants. The authors show that these organs, the ocelli, disappear and are re-evolved in different ant species and in different ant castes within these species. The authors show that this is linked to dto a conserved GRN discovered in Drosophila, that appears to underlie the development of the ocelli, and demonstrate that this GRN appears to remain active in the developing heads of ants that have no ocelli- implying that it is the evolutionary latency of this GRN that allows loss and subsequent evolution.

      Strengths:

      This manuscript has outstanding imaging of a very difficult developing organ, and the key data, fluorescence in situ hybridisation, is done well and clearly shows what the authors wish to demonstrate. The methods are well described and underpin the whole work.

      The authors convincing demonstatrate that gene expression patterns imply the conservation of the ocellus gene regulatory network from Drosophila to ants. They further show that this network is present even in ants that don't produce an adult ocellus, but do show that in those species, loss of a developing nascent ocellus (which they identify) occurs at the same time as an interruption in the expression of the key genes in the GRN. All of this data is beautifully presented and explained.

      Weaknesses:

      There is one key weakness in that there are no functional students that indicate that the GRN actually does make the ocellus, though the expression patterns are convincing. This applies to loss of the ocellus as well. It would be nice to see that transient loss of the ocelli GRN might lead to loss of ocelli in ant species that have them. These are very difficult things to achieve, as the key genes have earlier developmental roles, such that CRISPR knockouts would not be interpretable, and transient RNAi in the head capsules of developing pupal ants would be challenging.

    1. eLife Assessment

      This important study provides new insight into the regulation of cell organization and division in Trypanosoma brucei through the control of a kinesin motor protein by a polo-like kinase. The authors present solid evidence from rigorous biochemical and imaging analyses showing that phosphorylation modulates kinesin function and cellular organization. However, direct in vivo evidence that PLK phosphorylates kinesin-G is lacking.

    2. Reviewer #1 (Public review):

      Summary:

      This manuscript identifies the orphan kinesin KIN-G as a substrate of Polo-like kinase (TbPLK) in Trypanosoma brucei and demonstrates that phosphorylation of Thr301 inhibits KIN-G microtubule binding and disrupts its cellular function. Using a combination of in vitro kinase assays, phosphosite mapping, microtubule binding and gliding assays, and in vivo complementation with phosphomimetic and phosphodeficient mutants, the authors link TbPLK-mediated regulation of KIN-G to defects in centrin arm integrity, FAZ elongation, Golgi organization, flagellum positioning, and division plane placement. The study provides a mechanistic advance in understanding how TbPLK regulates centrin arm biogenesis and integrates KIN-G into the growing regulatory network controlling hook complex and FAZ assembly. Overall, the work is technically strong, internally consistent, and builds logically on previous studies from this group and others.

      Strengths:

      A major strength of the manuscript is the clear mechanistic link between phosphoryltion of Thr301 and loss of microtubule binding activity. The use of phosphomimetic (T301D) and phosphodeficient (T301A) mutants in an RNAi-rescue framework provides a clean and convincing demonstration of functional relevance in vivo. The integration of biochemical assays with detailed cell biological phenotyping (centrin arm length, FAZ elongation, basal body segregation, and cytokinesis markers) is particularly effective and makes the central conclusion robust. The observed phenotypic cascade from centrin arm defects to FAZ and division plane abnormalities is also well aligned with existing models of trypanosome morphogenesis.

      Weaknesses:

      My (more or less main) concern relates to the interpretation of the Golgi phenotype. The conclusion that phosphorylation of KIN-G "impairs Golgi biogenesis" is currently based on fluorescence microscopy using TbGRASP and Sec13 markers and on quantification of the number and distribution of Golgi/ERES puncta in binucleated cells. While these data convincingly demonstrate altered Golgi/ERES number and spatial organization, they do not distinguish between true defects in Golgi biogenesis or duplication and alternative possibilities such as fragmentation, vesiculation, or mislocalization of Golgi membranes. Given the central role of Golgi-centrin arm organization in the proposed model, ultrastructural analysis (for example, by EM or electron tomography) would greatly strengthen this aspect of the study by providing direct evidence for structural alterations of the Golgi and its association with the centrin arm and ERES. Such data would elevate this part of the manuscript from a descriptive fluorescence phenotype to a true structural cell biological insight. I appreciate that this experiment goes beyond the current dataset, but it would substantially enhance the mechanistic depth of the Golgi-related conclusions and strengthen the causal chain linking centrin arm defects to Golgi abnormalities. However, I have to confess, the inclusion of such data would make this reviewer particularly enthusiastic about the work. If this is not feasible, I would recommend tempering the wording of "Golgi biogenesis" to a more conservative description, such as altered Golgi organization or duplication, and explicitly acknowledging the limitations of fluorescence-based analysis for this conclusion.

      An additional conceptual point concerns the dual role of TbPLK in centrin arm regulation. TbPLK is known to promote centrin arm biogenesis through phosphorylation of TbCentrin2, yet in this study, TbPLK phosphorylation of KIN-G negatively regulates centrin arm assembly. This dual positive and negative regulatory role is intriguing but could be discussed more explicitly. The manuscript would benefit from a clearer conceptual framework addressing how phosphorylation of KIN-G might serve as a temporal or spatial switch to restrain KIN-G activity at specific stages of centrin arm assembly.

      Finally, a schematic model summarizing the proposed regulatory pathway from TbPLK phosphorylation of KIN-G to centrin arm assembly, FAZ elongation, division plane placement, and Golgi organization would aid the reader.

    3. Reviewer #2 (Public review):

      Summary:

      The authors identify KIN-G as an in vitro substrate for phosphorylation by TbPLK and show that several of the in vitro P-ated sites, including T310, overlap with P-ation sites seen in live cells. The authors further show that PLK-mediated P-ation inhibits KIN-G binding to microtubules in vitro, as does a KIN-G-T301D mutant, and that expression of a KIN-G-T301D Phospho-mimic in T. brucei phenocopies KIN-G RNAi knockdowns, producing defects in cell division, morphogenesis of the centrin arm, FAZ and other cellular structures, as well as a misplaced cytokinesis furrow.

      Understanding cytoskeletal rearrangements that drive cell division in T. brucei is an important and unresolved problem, so the work addresses important questions that are of great interest. PLK and KIN-G have previously been shown to be important for cell division and morphogenesis of cytoskeletal structures that drive cell division in T. brucei. The current work advances our understanding by suggesting a potential mechanism by which PLK and KIN-G might participate, namely through PLK-dependent P-ation to control KIN-G MT binding activity.

      Strengths:

      The authors use a rigorous combination of biochemistry, phosphoproteomics, cell biology, and mutant analysis to support their conclusion that PLK-mediated P-ation of KIN-G negatively regulates KIN-G microtubule binding, and this may explain the observation that a KIN-G T301 phosphomimic mutant blocks cell division and perturbs biogenesis of cytoskeletal structures that drive cell division and morphogenesis. Combining rigorous and informative in vitro studies with mutant analysis in live cells is a great strength. The work is solid and important, though a few pieces are needed to fully connect the in vitro findings with the in vivo observations, as detailed below.

      Weaknesses:

      Overall, I find this work to be solid and to provide an important addition to our understanding of mechanisms controlling cell division in T. brucei. The biochemistry, in particular, is rigorous and convincingly demonstrates PLK can P-ate KIN-G, altering its MT-binding ability. Analysis of phospho-mutants of KIN-G in live T. brucei supports the conclusion that P-ation of KIN-G at T301 negatively affects KIN-G function in vivo. I think, however, that the results fall short of supporting the title, because, although the data convincingly show that PLK can phosphorylate KIN-G at T301 in vitro, and that T301 is P-ated in vivo, they do not formally demonstrate (nor even test) whether PLK is the kinase responsible for this phosphorylation in vivo (experiments to address this seem quite feasible). I also do not see where the authors try to reconcile the absence of phenotype for KIN-G-T301A with the implied importance of KIN-G phosphorylation by PLK in cell division, which calls into question the need for P-ation of KIN-G-T301 in cell division. Suggestions for addressing these concerns are provided below.

      My two main questions are:

      (1) What is the biological relevance of KIN-G P-ation at T301?

      a) The authors report no defect for the KIN-G-T301A mutant, so what then is the need for T301 P-ation, if the cell gets along fine without it? One step toward addressing this would be to ask what fraction of KIN-G shows P-ation at T301. Although published studies indicate P-ation at T301, it isn't known what percentage of KIN-G in the cell is P-ated. One might anticipate, for example, that T301-P is a small minority of the population in asynchronous cultures and that T301 P-ation increases at specific cell cycle stages.

      b) Published work links PLK to cell division, FAZ elongation, etc... The current work suggests that one role of PLK is to P-ate KIN-G at T301. In contrast, however, the current work also indicates that P-ation of KIN-G at T301 is unnecessary for normal cell division, FAZ elongation, etc....

      c) Some experiments or at least commentary on points a and b above would strengthen the paper.

      (2) Is PLK the kinase that P-ates Kin-G T301 in vivo?

      a) The authors show PLK P-ates T301 (and other residues) in vitro, and that T-301 is P-ated in vivo. To bring the analysis full circle, it would be informative to examine KIN-G P-ation in a PLK mutant or upon inhibition of PLK with published inhibitors. This seems to be a very doable experiment with the tools available.

    4. Reviewer #3 (Public review):

      Summary:

      Here, the authors investigate the role of the Trypanosoma brucei polo-like kinase TbPLK in the function of flagellum-associated cellular structures in trypanosomes. They set out to test the hypothesis that a key substrate of TbPLK is the kinesin protein KIN-G, and that TbPLK phosphorylation of KIN-G regulates its functions in cells.

      Strengths:

      Using in vitro biochemistry with purified proteins, the authors convincingly demonstrate that TbPLK phosphorylates KIN-G at 29 sites. Moreover, they convincingly show that phosphorylation at one site, T301, impairs the binding of purified KIN-G to purified microtubules. Using immunofluorescence-based imaging approaches, they also show that TbPLK colocalizes with KIN-G at centrin arms during the early S-phase of the cell cycle. Centrin arms are structures that are located near the basal body and flagellum and are important for new flagellum biogenesis, Golgi positioning, and cell division. To evaluate the function of KIN-G phosphorylation in cells, they depleted KIN-G by RNAi, simultaneously expressed phospho-mimetic (T301D) and phospho-ablative mutant proteins, and used immunofluorescence to examine the impact on flagellum-associated cellular structures. They show that expression of the phospho-mimetic mutant KIN-G-T301D causes the following defects: reduced cell proliferation, disruption of centrin arm and Golgi biogenesis, impairment of FAZ elongation and flagellum positioning, and misplacement of the cell division plane. The data convincingly support the conclusion that KIN-G phosphorylation on T301 plays an important role in regulating the cellular functions of this kinesin motor protein.

      Weaknesses:

      Some of the broader conclusions are not directly supported by the data. For example, the title states "Polo-like kinase phosphorylation of the orphan kinesin KIN-G negatively regulates centrin arm biogenesis in Trypanosoma brucei," but the data do not directly address the specific role of TbPLK in phosphorylating KIN-G in cells. Moreover, some of the more specific conclusions in the paper, for example, that "phosphorylation of KIN-G" causes various cellular defects, are a bit of an overstatement. The supporting data rely on the expression of a phospho-mimetic mutant of KIN-G. Presumably, phosphorylation in cells is a normal part of KIN-G regulation, and it is not just phosphorylation, but rather hyperphosphorylation that is being mimicked by the mutant. Some rewording of the specific conclusions is warranted, and the broader conclusion would be better supported with additional experimental evidence.

    1. eLife Assessment

      This valuable study uses a large cohort of clinical malaria cases collected over 18 years to address a critical knowledge gap regarding the role of PfEMP1 variants across distinct severe malaria syndromes. The conclusions are potentially of importance and interest to those who study malaria severity, but the evidence is incomplete, largely due to a lack of clarity on data inclusion and the correct use of statistical tests. More up-to-date data analysis methods would further strengthen the conclusions.

    2. Reviewer #1 (Public review):

      Summary:

      Severe childhood malaria is associated with three main overlapping syndromes: impaired consciousness (IC), respiratory distress (RD), and severe malaria anaemia (SMA). One central feature of severe malaria, driven by host and parasite factors, is the sequestration of parasitized red blood cells in vascular beds, leading to impaired tissue perfusion and lactic acidosis. The causing agent, the parasite ligand PfEMP1, is expressed on the surface of infected red blood cells, where it binds to a broad range of different endothelial receptors. Accumulation of parasite-infected erythrocytes in the host's microvasculature has been repeatedly confirmed for cerebral malaria, but there are scarce data on the extent of sequestration in the other severe malaria syndromes. However, the absence of effective adjunctive therapies for severe malaria implies that our understanding of its pathogenesis remains incomplete. Thus, by comparing var gene expression from a large Kenyan cohort (n=372 severe cases; n=340 non-severe cases), this study addresses a critical knowledge gap regarding the role of PfEMP1 across distinct severe malaria syndromes. The substantial sample size, phenotypic stratification, and use of two complementary methods (DBLa-tag sequencing and RT-qPCR), along with data about the parasite's ability to form rosettes and antibody level assessments, provide a strong setup. Var gene expression data - either proportions of different DBLa-tags classified by the number of cysteine residues and presence of particular motifs or relative expression RT-qPCR data from a set of primer pairs targeting conserved regions of var groups or particular domains - is associated with (a) severe malaria syndromes, (b) variant expression homogeneity, (c) rosetting ability, and (d) mortality using independent linear regression models, spearman ranks correlations, or logistic regression models. In summary, the study confirms that A-type and DC8-containing gene expression correlate with IC, that RD is associated with rosetting, and that SMA is linked to a high variant expression homogeneity (VEH) of var-A expression, which may indicate a longer infection duration. However, some findings remain inconclusive. For example, when analyzing pure syndromes, several associations changed: DC8 expression was also found to be significantly enriched in SMA (with multiple primer pairs) and RD, not exclusively with IC. Additionally, rosetting was associated with DC8 expression but not with IC, even though IC itself is linked to DC8 expression. Overall, the findings are significant and supported by a large dataset, though the reported evidence remains largely associative rather than mechanistic.

      Strengths:

      As the authors stated themselves, one of the key unresolved questions is whether severity-causing parasites are biologically different from parasites responsible for asymptomatic infections. This study is among the first to address this question using data from a large, phenotypically stratified cohort. The use of two complementary methods (DBLa-tag sequencing and RT-qPCR), together with data on the parasites' ability to form rosettes and assessments of antibody levels, provides an excellent experimental framework.

      Weaknesses:

      Even when assessing var gene expression using two different approaches - DBLα-tag sequencing and RT-qPCR targeting pre-defined variants - only a glimpse of the parasites' actual biology is captured. Moreover, a well-known confounder in gene expression studies of P. falciparum field isolates is variation in parasite age (hours post-invasion) or synchronicity, both of which significantly influence var gene expression. The methods employed in this study, unfortunately, do not allow for controlling or correcting for these factors. Then, the old classification system of DBLa-tag data developed by Bull et al is certainly still valid; however, more recent advances in bioinformatic tool development now allow for a more in-depth exploration of DBLa-tag datasets. Tools such as Varia (doi: 10.1186/s12859-022-04573-6), cUPS (https://doi.org/10.1371/journal.ppat.1012813), and upsML (doi: https://doi.org/10.1101/2025.05.19.654848) enable the prediction of DBLa-tag-connected PfEMP1 domains and the var group affiliations.

      As A-type var gene expression has already been associated with severity, most expression studies (including this one) have a selection bias towards A- and B/A-type var genes. Here, A- and B/A-types are covered by 8 primer pairs (gpA1, gpA2, 4x DC8, DC13, DC4), whereas high polymorphic B-types are targeted by only 2 primer pairs (b1, DC9) and C-types only by a single primer (c2). Thus, any association with A-type expression is more likely to be observed, although evidence is accumulating that parasites are preferably expressing B-type var genes at the onset of blood stage infection in naïve/less immune individuals; this is also consistent with the observation of the authors that VEH is positively associated with immunity (measured as anti-IE) and negatively associated with temperature.<br /> I am not an expert in biostatistics, but to my understanding, independently performed regressions should be corrected for multiple testing.

      Overall, the authors largely achieved their aims, identifying specific var groups associated with different severity syndromes. However, due to the complexity of var gene data and the interdependence of parameters, the resulting picture is not entirely clear. Some opposite results between different analyses may also be difficult for the reader to interpret. Nevertheless, this study can be considered a pioneering effort, providing valuable insights into the complex interplay of var gene expression across different severity syndromes and offering useful data for the field. Follow-up studies will be important to validate these findings and further dissect the mechanisms linking parasites gene expression to clinical outcomes.

    3. Reviewer #2 (Public review):

      Summary:

      The manuscript presents results of a study using two complementary approaches (RT-qPCR and DBL) to analyze the putative relationship between var gene transcription (and hence, PfEMP1 expression) and clinical presentation among Kenyan children with Plasmodium falciparum malaria. Binary rosetting (yes/no) data are used in a similar way. The study includes samples collected over a period of almost 20 years from about 700 children presenting with either severe (impaired consciousness [IC], respiratory distress [RD], severe anemia [SA]) or non-severe malaria. During the study period, the study area experienced a remarkable drop in P. falciparum transmission intensity.

      Strengths:

      The study stands on the shoulders of many similar studies of this kind, both by the authors and by other research teams, and the inferences made largely confirm those made previously. The current study has analytical rigor and a large sample size. Disentangling the multiple parameters of the above-mentioned relationship is of obvious and crucial importance to an improved understanding of P. falciparum malaria pathogenesis and of the targets and mechanisms of protective immunity to the disease. The present study is a valuable effort towards that. The study is well-structured, and the figures are clear.

      Weaknesses:

      It is somewhat unclear to this reviewer to what extent the samples and data analyzed and reported here are new (i.e., not used/analyzed in previous studies). If there is substantial overlap with earlier studies, this is a weakness because of the risk of circular inferences. The Discussion section would benefit from less repetition of the results section and a more in-depth discussion of the findings obtained relative to the existing literature. Better inclusion of key primary references is recommended.

    4. Reviewer #3 (Public review):

      Summary:

      In this manuscript, Ndugwa et al. attempt to link specific severe malaria manifestations with particular var gene expression patterns. This is an important question, and the dataset the authors have assembled over decades is impressive. However, greater clarity in the descriptions and statistics would, in my view, help this reviewers, and readers in general develop a more precise understanding of the significance of the findings.

      Strengths:

      The study addresses a critically important question in malaria pathogenesis, and the dataset is extensive and represents a significant long-term effort by the authors.

      Weaknesses:

      The Results section often lacks clarity: clinical group definitions (NS, non-IC, non-SMA, mild vs. moderate) are sometimes ambiguous, and key methodological details, including the VEH index calculation, RT-qPCR quantification, antibody detection methods, and rosetting assays, are either missing from the results text or poorly explained in the figure legends. Additionally, figure presentation requires improvement, with inconsistent reporting of sample sizes, undefined colors, and p-values that overlap with data points rather than being clearly displayed above them.

    1. eLife Assessment

      This important study presents a novel immunotherapy strategy for cancer. The authors develop a whole-tumor cell vaccine comprised of senescent tumor cells and a COX2 inhibitor in a hydrogel matrix. They present convincing evidence of the efficacy of this approach in preclinical models, demonstrating that prostaglandin E2 (PGE2) modulates the senescence-associated secretory phenotype (SASP) toward an immunostimulatory state, although more mechanistic/functional work would strengthen their conclusions. This work is timely and will be of interest to immunologists and others interested in the development of novel cancer therapies.

    2. Reviewer #1 (Public review):

      Summary:

      The authors aimed to overcome the limitations of whole-tumor-cell vaccines, specifically the weak immunogenicity and rapid clearance often associated with them. They leveraged the unique properties of senescent tumor cells (STCs), which remain metabolically active and secrete chemokines, as a source of antigens. However, to counteract the secretion of the immunosuppressive lipid prostaglandin E2 (PGE2), which is part of the senescence-associated secretory phenotype (SASP), they engineered a hydrogel vaccine formulation (STCs+CLX-Lipo@Gel) containing STCs and liposomal celecoxib (a COX2 inhibitor).

      Strengths:

      (1) The study is conceptually strong in its approach to leveraging the SASP to improve immunotherapy responses. By selectively inhibiting COX2/PGE2 while preserving the secretion of recruitment chemokines (like CCL2 and CCL5) in the SASP, the authors successfully turn a potentially deleterious cellular state into a therapeutic asset.

      (2) Mechanistic Insight: The manuscript provides detailed evidence regarding the mechanism of action. The authors convincingly show that the vaccine restores activity in the NK-DC axis. Specifically, they demonstrate that reducing PGE2 levels enhances NK cell activation (upregulation of NKG2D and NKp46) and promotes the secretion of CCL5 and XCL1 by NK cells, which subsequently recruits cDC1 dendritic cells.

      (3) The therapeutic potential is tested across multiple models, including a subcutaneous melanoma model, a difficult-to-treat melanoma brain metastasis model, and an orthotopic pancreatic cancer model. The consistent efficacy across these distinct physiological contexts suggests broad applicability.

      Weaknesses:

      (1) While the authors successfully inhibit PGE2, the SASP is a complex cocktail of factors. The discussion regarding the long-term presence of these "live" senescent cells is somewhat limited. Although the hydrogel retains cells locally, the potential for other chronic inflammatory factors to eventually promote tumorigenesis or tissue damage in the surrounding niche warrants careful consideration when translating this approach to patients and may require additional preclinical testing.

      (2) The study posits that STCs serve as an antigen reservoir. However, the manuscript would benefit from a clearer distinction between whether the immune system is recognizing senescence-specific neoantigens or simply shared tumor antigens that are being presented more effectively due to the adjuvant effect. The authors briefly touch upon neoantigens in the discussion, but the experimental data primarily measure general anti-tumor responses.

      Impact:

      This work bridges material science and immunology, offering a practical solution to the immunosuppressive barriers of cell-based vaccines. It provides a platform that could potentially be adapted for various solid tumors.

    3. Reviewer #2 (Public review):

      Summary:

      Wang et al. examined an engineered whole-tumor-cell vaccine based on senescent tumor cells co-encapsulated with liposomal celecoxib in a chitosan hydrogel. The authors propose that prolonged persistence of senescent cells, combined with COX2/PGE2 inhibition, restores NK-DC crosstalk, enhances cDC1 recruitment, and ultimately drives robust CD8⁺ T-cell-mediated antitumor immunity. The study is nicely executed and clearly presented, with extensive in vitro and in vivo validation across multiple tumor models, including melanoma brain metastases and orthotopic PDAC. While the overall concept is timely and of potential interest, several mechanistic conclusions are based primarily on correlative evidence and would benefit from additional functional experiments to strengthen causal interpretation and translational relevance.

      Strengths:

      (1) Strong conceptual framework

      (2) Impressive breadth of in vivo models.

      (3) Clear immunological readouts.

      (4) Innovative combination of senescence biology and biomaterials.

      Weaknesses:

      (1) Mechanistic conclusions rely heavily on correlation.

      (2) Lack of functional immune cell depletion studies.

      (3) Limited exploration of long-term safety and antigenic specificity.

      Major Critiques:

      (1) The authors emphasize the expansion and activation of cDC1 as a key mechanism linking innate and adaptive immunity, yet it does not directly test whether cDC1 is required for the observed CD8⁺ T-cell responses and tumor control.

      The authors should perform experiments using Batf3-deficient mice or any other cDC1-depletion strategies to provide important mechanistic validation. If such experiments are not feasible, this limitation should be more clearly acknowledged and discussed.

      (2) The authors note that senescence may generate neoantigens distinct from those present in proliferating tumor cells, but the extent to which STC-induced immunity cross-reacts with non-senescent tumor cells is not fully addressed. While it is appreciated that tumor challenge experiments are included, the author should perform a more explicit analysis of antigenic overlap that would strengthen the translational relevance of the approach. For example, they can compare senescence induced by different stimuli or directly assess immune recognition of non-senescent tumor targets, which would help clarify whether the vaccine primarily exploits senescence-specific antigens or broadly shared tumor antigens.

      (3) Hydrogel encapsulation clearly extends STC persistence in vivo; however, the study provides limited information on the eventual clearance of these cells and the potential implications of prolonged SASP exposure. Given general concerns regarding chronic inflammation associated with senescent cells, additional discussion of long-term local and systemic responses would be helpful. If extended safety analyses are beyond the scope of the current study, the authors should acknowledge the limitation.

      (4) The immunological effects are attributed to COX2/PGE2 inhibition, but it remains unclear whether these effects are specific to celecoxib or could reflect formulation-dependent or off-target mechanisms. The authors may perform additional experiments employing an alternative COX2 inhibitor, genetic COX2 suppression, or PGE2 rescue, which could further support the specificity of the COX2/PGE2-dependent mechanism.

    1. eLife Assessment

      This important work describes systematic computational and experimental approaches to turn a moderately stable α-helical bundle into a very stable fold. The authors advance our understanding of α-helix stabilization providing a convenient framework that has general implications for the protein design field. The main claims have convincing support through a sound methodology, with strong specific conclusions for designing mechanically, thermally, and chemically stable α-helical bundles.

    2. Reviewer #1 (Public review):

      Summary:

      In the work from Qiu et al. a workflow aimed at obtaining the stabilization of a simple small protein against mechanical and chemical stressors is presented.

      Strengths:

      The workflow makes use of state-of-the-art AI-driven structure generation and couples it with more classical computational and experimental characterizations in order to measure its efficacy.

      The work is well presented and results are thorough and convincing.

      The Methods description is quite precise, and some important details were added during review.

      Weaknesses:

      The pulling velocity is quite high but in accordance with this observation the results were only used for comparative analyses.

      Following the review process the authors have shown that the minimum distance between each protein from its periodic images was consistently above 1 nm, yet towards the end of some simulations the value crosses the non-bonded interaction cut-off distance.

      Comments on revisions:

      The authors did a good job in addressing the reviews.

    3. Reviewer #2 (Public review):

      Summary:

      Qiu, Jun et. al., developed and validated a computational pipeline aimed at stabilizing α-helical bundles into very stable folds. The computational pipeline is a hierarchical computational methodology tasked to generate and filter a pool of candidates, ultimately producing a manageable number of high-confidence candidates for experimental evaluation. The pipeline is split into two stages. In stage I, a large pool of candidate designs is generated by RFdiffusion and ProteinMPNN, filtered down by a series of filters (hydropathy score, foldability assessed by ESMFold and AlphaFold). The final set is chosen by running a series of steered MD simulations. This stage reached unfolding forces above 100pN. In stage II, targeted tweaks are introduced - such as salt bridges and metal ion coordination - to further enhance the stability of the α-helical bundle. The constructs undergo validation through a series of biophysical experiments. Thermal stability is assessed by CD, chemical stability by chemical denaturation, and mechanical stability by AFM.

      Strengths:

      A hierarchical computational approach that begins with high-throughput generation of candidates, followed by a series of filters based on specific goal-oriented constraints, is a powerful approach for a rapid exploration of the sequence space. This type of approach breaks down the multi-objective optimization into manageable chunks and has been successfully applied for protein design purposes (e.g., the design of protein binders). Here, the authors nicely demonstrate how this design strategy can be applied to successfully redesign a moderately stable α-helical bundle into an ultrastable fold. This approach is highly modular, allowing the filtering methods to be easily swapped based on the specific optimization goals or the desired level of filtering.

      Weaknesses:

      Assessing the change in stability relative to the WT α-helical bundle is challenging because an additional helix has been introduced, resulting in a comparison between a three-helix bundle and a four-helix bundle. Consequently, the appropriate reference point for comparison is unclear. A more direct and informative approach would have been to redesign the sequence of the original α-helical bundle of the human spectrin repeat R15, allowing for a more straightforward stability comparison.

      The three constructs chosen are 60-70% identical to each other, either suggesting over-constrained optimization of the sequence, or a physical constraint inherent to designing ultrastable α-helical bundles. It would be interesting to explore whether choosing a different combination of filters would enable ultrastable α-helical bundles constructs with a more varied sequence content.

      While the use of steered MD is an elegant approach to picking the top N most stable designs, its computational cost may become prohibitive as the number of designs increases or as the protein size grows, especially since it requires simulating a water box that can accommodate a fully denatured protein.

      Comments on revisions:

      The authors have done a good job of addressing the comments.

    4. Reviewer #3 (Public review):

      Summary:

      Qiu et al., present a hierarchical framework that combine AI and molecular dynamic simulation to design α-helical protein with enhanced thermal, chemical and mechanical stability. Strategically chemical modification by incorporating additional α-helix, site-specific salt bridges and metal coordination further enhanced the stability. The experimental validation using single-molecule force spectroscopy and CD melting measurements provide fundamental physical chemical insights into the stabilization of α-helices. Together with the group's prior work on super-stable β strands (https://www.nature.com/articles/s41557-025-01998-3), this research provides a comprehensive toolkit for protein stabilization. This framework has broad implications for designing stable proteins capable of functioning under extreme conditions.

      Strengths:

      The study represents a complete frame work for stabilizing the fundamental protein elements, α-helices. A key strength of this work is the integration of AI tools with chemical knowledge of protein stability.<br /> The experimental validation in this study is exceptional. The single-molecule AFM analysis provided a high-resolution look at the energy landscape of these designed scaffolds. This approach allows for the direct observation of mechanical unfolding forces (exceeding 200 pN) and the precise contribution of individual chemical modifications to global stability. These measurements offer new, fundamental insights into the physicochemical principles that govern α-helix stabilization.

      Weaknesses:

      (1) While the initial manuscript lacked a detailed explanation for the stabilizing effect of the additional helix, the revised version now includes a clear structural basis for this improvement. The authors successfully attribute the increased unfolding force threshold to the reinforcement of the hydrophobic core and enhanced cooperative interactions, supported by relevant literature correlations between helix bundle size and stability.

      (2) The author analyzed both thermal stability and mechanical stability. It would be helpful for the author to discuss the relationship between these two parameters in the context of their design. Since thermal melting probes equilibrium stability (ΔG), while mechanical stability probes the unfolding energy barriers along pulling coordinate. While the integrative design approach successfully improved both stability types, a deeper exploration of how the specific structural modifications influence the unfolding energy barrier relative to the overall equilibrium stability would further strengthen the mechanistic impact of the work.

      (3) While the current study demonstrates a dramatic increase in global stability, the analysis focuses almost exclusively on the unfolding (melting) process. However, thermodynamic stability is a function of both folding (kf) and unfolding (ku) rates. The author have clarified that the observed ultrastability likely originates from a significantly reduced unfolding rates, a hypothesis consistent with the unfolding force. Direct measurements of the kinetics would provide deeper insights.

      (4) The authors chose the spectrin repeat R15 as the starting scaffold for their design. R15 is a well-established model known for its "ultra-fast" folding kinetics, with folding rates (kf ~105s), near three orders of magnitude faster than its homologues like R17 (Scott et.al., Journal of molecular biology 344.1 (2004): 195-205). Measuring the folding rates of newly designed proteins would provide additional insights into the design.

      Comments on revisions:

      I think the author have addressed comments.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      In the work from Qiu et al., a workflow aimed at obtaining the stabilization of a simple small protein against mechanical and chemical stressors is presented.

      Strengths:

      The workflow makes use of state-of-the-art AI-driven structure generation and couples it with more classical computational and experimental characterizations in order to measure its efficacy. The work is well presented, and the results are thorough and convincing.

      We are grateful to this reviewer for his/her thoughtful assessment and supportive feedback. In response, we have addressed each comment and incorporated the necessary revisions into the manuscript.

      Weaknesses:

      I will comment mostly on the MD results due to my expertise.

      The Methods description is quite precise, but is missing some important details:

      (1) Version of GROMACS used.

      We used GROMACS version 2023.2 (single-precision). All subsequent MD simulation procedures mentioned below have been consolidated and described in detail in the Supporting Information (SI).

      (2) The barostat used.

      Pressure coupling was applied using the C-rescale barostat (τ<sub>p</sub> = 5.0 ps, ref<sub>p</sub> = 1.0 bar).

      (3) pH at which the system is simulated.

      No explicit pH was defined during system construction. Proteins were modeled using standard protonation states as assigned by GROMACS preprocessing tools, corresponding to physiological, near-neutral pH (~ 7.0).

      (4) The pulling is quite fast (but maybe it is not a problem)

      The relatively high pulling velocity (1 nm/ns) was selected to enable efficient screening across a large number of designed proteins (211 candidates), while maintaining reasonable computational cost/time. Given the intrinsic orders-of-magnitude difference between simulation and experimental pulling rates, SMD results were used as a comparative screening tool, rather than for direct quantitative comparison with AFM data.

      (5) What was the value for the harmonic restraint potential? 1000 is mentioned for the pulling potential, but it is not clear if the same value is used for the restraint, too, during pulling.

      All positional restraints used in the simulations, including those applied during equilibration as well as the harmonic restraint on the N-terminus and the pulling umbrella restraint during SMD, employed the same force constant (k = 1000 kJ·mol<sup>–1</sup>·nm<sup>2</sup>). We have clarified this point in the revised Methods section.

      (6) The box dimensions.

      Rectangular simulation boxes were used throughout. For equilibrium MD simulations, the box dimensions in each direction were set based on the maximum extent of the protein along that axis, with a minimum distance of 1.2 nm between the protein surface and the box boundary on all sides. For SMD simulations, the same box dimensions were applied in the x and y directions. Along the pulling (z) direction, the box length was extended to accommodate the theoretical stretching length, defined as the initial N–C terminal distance plus 0.36 nm per stretched residue, while maintaining a 1.2 nm buffer at both ends (2.4 nm total). These details have now been clarified in the revised Supporting Information.

      From this last point, a possible criticism arises: Do the unfolded proteins really still stay far enough away from themselves to not influence the result?

      We analyzed the minimum atomic distance between each protein and its periodic images to assess potential artifacts from periodic boundary conditions. For all simulation stages used in screening and statistical analysis, the minimum protein–image separation remained above 1.0 nm for the majority of the simulation time, exceeding the nonbonded interaction cutoff and minimizing cross-boundary interactions. As shown in the Author response image 1for SpecAI89 (left), this separation during SMD simulations is consistently well above the threshold, indicating that the chosen box dimensions are appropriate. In the very late stages of annealing MD, highly unstable proteins may exhibit large conformational fluctuations and transient boundary proximity (right); however, these regimes are associated with large RMSD deviations and are excluded from analysis. Notably, the mechanically relevant unfolding events occur near the center of the simulation box and proceed along the pulling axis in SMD simulations, making boundary effects unlikely to influence the unfolding process or the relative mechanostability ranking.

      Author response image 1.

      Analysis of the minimum atomic distance between the protein and its periodic images under periodic boundary conditions. Left: SpecAI89 during SMD simulations, showing that the minimum protein–image distance remains above 1.0 nm for the majority of the simulation time. Right: WT during AMD simulations, where transient proximity to the periodic boundary is observed at very late stages due to large conformational fluctuations.

      Additionally, no time series are shown for the equilibration phases (e.g., RMSD evolution over time), which would empower the reader to judge the equilibration of the system before either steered MD or annealing MD is performed.

      We thank the reviewer for this suggestion. To assess equilibration, we analyzed the backbone RMSD evolution during the equilibration phase. Using SpecAI89 as a representative example (Author response image 2), the protein backbone RMSD converges rapidly and reaches a stable plateau within approximately 5 ps. The subsequent 125 ps equilibration period therefore sufficiently demonstrates that the system is well equilibrated prior to both steered MD and annealing MD simulations.

      Author response image 2.

      The backbone RMSD of SpecAI89 over time during simulation

      Reviewer #1 (Recommendations for the authors):

      (1) In Figure S2, only one copy (or the average of the three copies; it is not clear from the caption) is shown, would be better to show the individual traces for each repeat. Additionally, only the plot for the forces is shown, and not, similarly to the AMD, the RMSD plot. This could be a stylistic choice, but it just reports on how much force was applied and not on how the protein responded to the force. Moreover, horizontal lines at the maximum value reached by the force could be added in order to directly see the difference in force applied, since it is then remarked on.

      Figure S2 originally shows a representative single SMD trajectory, as the force–extension peak positions vary between independent simulations and averaging the force traces would obscure the characteristic force peaks. In the revised Supplementary Information, we have now added the force–extension traces from the other two independent SMD repeats for each construct (New Figure S2). In addition, horizontal lines indicating the maximum force reached in each trajectory have been included to facilitate direct comparison of force differences between designs.

      (2) In Figure S3 the plots have different y-axis. Maybe it could be valuable to modify it so that in figures b, c, and d the spectrum result is in the background (perhaps in gray) so that the y-axis is not changed to retain the information included in this plot, but one could still compare directly to the spectrum result. With a 0 to 1 nm y-axis part of the spectrin run will be hidden, but in any case, plot a can be used to see the full behavior. Similarly to S2, the repeats (if any) could be shown.

      We have revised Figure S3 as suggested. The y-axis is now unified to 0–1.2 nm across all panels. For panels b–d, the natural spectrin trajectory is displayed in light gray in the background for direct comparison. Additionally, three independent MD replicates are now presented for each construct to demonstrate reproducibility.

      Finally, minor remarks that could nevertheless improve the paper:

      (3) In Figure S7, a bimodal distribution model for the number of events could be used to fit the data better.

      We thank the reviewer for the detailed suggestion. Following this advice, we explored the bimodal Gaussian distribution model for fitting the force-event data in Figure S7. Indeed, our analysis showed that a bimodal fit could fit Figures S7 panel f better (as shown in Author response image 3). The two peaks were centered at F<sub>1</sub> = 190 ± 4 pN and F<sub>2</sub> = 380 ± 6 pN. Interestingly, the force of the first major peak obtained is the same as the previously fitted value. The second one is double force value which we guess maybe is a bi-molecule stretched for unknown reason. Considering the very few numbers of the second peak and the same force value (190 pN), we decide not to change the unfolding force value in the manuscript. But we thank this reviewer’s insightful comment.

      Author response image 3.

      The bimodal fit for unfolding force of SpecAI88-49E102K-6H149H show the same 190 pN unfolding for the first peak as previous fit.

      (4) The colors in the video are not very intuitive, as the spectrin is shown initially in light blue, but becomes grey in the variants, where light blue is reserved for the additional helix. A counter of elapsed time and/or force/temperature applied could help the readers orient. Maybe it could be useful to produce a video with spectrin and the three variants all shown together?

      We thank this comment. The videos have been revised to improve clarity and consistency accordingly. In all cases, the original protein scaffold is now shown in gray, while the additional helix in the designed variants is highlighted in blue. Real-time annotations have been added to aid interpretation: the instantaneous temperature is displayed during AMD simulations, and time is shown during SMD simulations. In addition, for ease of comparison, the AMD and SMD results of all four proteins are each compiled into a single combined video, allowing their behaviors to be viewed side by side.

      Reviewer #2 (Public review):

      Qiu, Jun et. al., developed and validated a computational pipeline aimed at stabilizing α-helical bundles into very stable folds. The computational pipeline is a hierarchical computational methodology tasked to generate and filter a pool of candidates, ultimately producing a manageable number of high-confidence candidates for experimental evaluation. The pipeline is split into two stages. In stage I, a large pool of candidate designs is generated by RFdiffusion and ProteinMPNN, filtered down by a series of filters (hydropathy score, foldability assessed by ESMFold and AlphaFold). The final set is chosen by running a series of steered MD simulations. This stage reached unfolding forces above 100pN. In stage II, targeted tweaks are introduced - such as salt bridges and metal ion coordination - to further enhance the stability of the α-helical bundle. The constructs undergo validation through a series of biophysical experiments. Thermal stability is assessed by CD, chemical stability by chemical denaturation, and mechanical stability by AFM.

      Strengths:

      A hierarchical computational approach that begins with high-throughput generation of candidates, followed by a series of filters based on specific goal-oriented constraints, is a powerful approach for a rapid exploration of the sequence space. This type of approach breaks down the multi-objective optimization into manageable chunks and has been successfully applied for protein design purposes (e.g., the design of protein binders). Here, the authors nicely demonstrate how this design strategy can be applied to successfully redesign a moderately stable α-helical bundle into an ultrastable fold. This approach is highly modular, allowing the filtering methods to be easily swapped based on the specific optimization goals or the desired level of filtering.

      We are thankful for the reviewer’s diligent evaluation and positive remarks. His/her concluding remarks, which encourage our future work at the intersection of AI-protein design and AFM-SMSF, are especially appreciated. All comments have been incorporated into our revisions.

      Weaknesses:

      Assessing the change in stability relative to the WT α-helical bundle is challenging because an additional helix has been introduced, resulting in a comparison between a three-helix bundle and a four-helix bundle. Consequently, the appropriate reference point for comparison is unclear. A more direct and informative approach would have been to redesign the original α-helical bundle of the human spectrin repeat R15, allowing for a more straightforward stability comparison.

      This is an insightful comment. Indeed, a direct comparison between the same structure of the three-helix bundle will be most straightforward with a clear reference point. I will take this advice and try it in our future endeavor.

      In our case, a substantial fraction of the hydrophobic region is relatively shallow and partially solvent-exposed in the wild-type R15 α-helical bundle. So, the added fourth helix provides a new hydrophobic packing interface, increasing core burial, packing density, and strengthening the internal load-bearing network. Consistent with this design rationale, rSASA analysis shows that the designed proteins exhibit a higher degree of hydrophobic core burial compared to the wild-type R15. Specifically, the fraction of residues with rSASA < 0.2 exceeds 30% in the designs, compared to 23% in the natural spectrin repeat.

      While the authors have shown experimentally that stage II constructs have increased the mechanical stability by AFM, they did not show that these same constructs have increased the thermal and chemical stabilities. Since the effects of salt bridges on stability are highly context dependent (orientation, local environment, exposed vs buried, etc.), it is difficult to assess the magnitude of the effect that this change had on other types of stabilities.

      We agree that the effects of salt bridges are highly context-dependent and that different dimensions of stability do not always correlate. Following your suggestion, we evaluated the thermal and chemical stabilities of the Stage II constructs. The experimental results (now added as Figure S9) show that Stage II designs successfully maintain the high thermal stability and resistance to chemical denaturation to different extend. The thermal stability is still as high as the Stage I but the resistance to chemical denaturation is slightly reduced. We have added this result in the manuscript accordingly.

      The three constructs chosen are 60-70% identical to each other, either suggesting overconstrained optimization of the sequence or a physical constraint inherent to designing ultrastable α-helical bundles. It would be interesting to explore these possible design principles further.

      Yes, the observed sequence convergence likely arises from a combination of intrinsic physical constraints of the protein architecture and the applied design and screening criteria. In particular, the tightly packed hydrophobic core imposes strong constraints on side-chain size, packing complementarity, and the alignment of heptad-like motifs reminiscent of coiled-coil organization, which collectively reduce the accessible sequence space. In addition, the strong selection pressure imposed by foldability and stability filters further promotes convergence toward similar solutions. And we agree with the reviewer that this represents an important direction for future work.

      While the use of steered MD is an elegant approach to picking the top N most stable designs, its computational cost may become prohibitive as the number of designs increases or as the protein size grows, especially since it requires simulating a water box that can accommodate a fully denatured protein

      Yes, steered MD can become computationally expensive, particularly as the number of designs increases or as protein size grows. Considering the vast pool created by AI, SMD in this work was applied to a relatively small, high-confidence subset of candidates after multiple rounds of rapid prescreening, keeping the overall computational cost manageable. In future applications, this step could be further accelerated by integrating machine-learning–based predictors to improve scalability.

      Reviewer #2 (Recommendations for the authors):

      I am not convinced that the difference in rSASA between the designs and the natural spectrin repeat is meaningful. It would be helpful to report confidence intervals for the rSASA values of the designs to clarify whether any differences are statistically robust. Even if such differences prove statistically significant, it is not clear that they are large enough to be practically meaningful.

      In our analysis, rSASA values were calculated from equilibrated MD conformations and were consistently higher for all designed proteins that passed the simulation-based screening compared to the wild-type spectrin repeat. However, we believe that rSASA was used only as a supportive structural descriptor to indicate a trend toward a more compact and better-buried hydrophobic core, rather than as a standalone or decisive metric of stability.

      Protein stability is indeed influenced by multiple factors, including hydrogen bonding, salt bridges, metal coordination, and topology-dependent load-bearing interactions, none of which are captured by rSASA alone. Therefore, we agree with the reviewer that differences in rSASA alone should not be overinterpreted as a quantitative measure of protein stability. For this reason, rSASA was not used as a ranking criterion or a predictor of stability, but only as complementary evidence consistent with the overall design rationale and with the experimentally observed stability enhancements.

      The claim "The strong agreement between computational rankings and experimental measurements validates this approach for prioritizing designs based on relative mechanostability, offering a practical pipeline to bridge the gap between in silico design and experimental validation." should be substantiated by a citation or a figure. Since the authors have the experimental AFM data and steered MD data, I suggest adding a Spearman correlation plot of the two.

      Following this comment, we examined the Spearman rank correlation between SMD-derived unfolding forces and experimentally measured AFM forces (Author response image 4). The resulting correlation was modest (ρ = 0.4, p = 0.6), which is not unexpected given (i) the large difference in force and timescales between high-speed SMD simulations and single-molecule AFM experiments, and (ii) the limited number of designs and simulation repeats available.

      Nevertheless, qualitatively, the difference between the first point from wt-spectrin and the other three specAI is clear. Considering the large computational cost, we only performed three times simulation one each design to balance the accuracy and the cost/time. To avoid overinterpretation, we therefore did not include the correlation analysis in the main text and revised the manuscript to soften claims of strong agreement, emphasizing instead the qualitative and comparative role of SMD in the design pipeline.

      Author response image 4.

      Spearman correlation between SMD and AFM unfolding forces for natural spectrin and SpecAI designs. SMD force (x-axis) versus experimental AFM force (y-axis); each point represents one protein.

      Reviewer #3 (Public review):

      Summary:

      Qiu et al. present a hierarchical framework that combines AI and molecular dynamics simulation to design an α-helical protein with enhanced thermal, chemical, and mechanical stability. Strategically, chemical modification by incorporating additional α-helix, site-specific salt bridges, and metal coordination further enhanced the stability. The experimental validation using single-molecule force spectroscopy and CD melting measurements provides fundamental physical chemical insights into the stabilization of α-helices. Together with the group's prior work on super-stable β strands (https://www.nature.com/articles/s41557-025-01998-3), this research provides a comprehensive toolkit for protein stabilization. This framework has broad implications for designing stable proteins capable of functioning under extreme conditions.

      Strengths:

      The study represents a complete framework for stabilizing the fundamental protein elements, α-helices. A key strength of this work is the integration of AI tools with chemical knowledge of protein stability.

      The experimental validation in this study is exceptional. The single-molecule AFM analysis provided a high-resolution look at the energy landscape of these designed scaffolds. This approach allows for the direct observation of mechanical unfolding forces (exceeding 200 pN) and the precise contribution of individual chemical modifications to global stability. These measurements offer new, fundamental insights into the physicochemical principles that govern α-helix stabilization.

      We appreciate the positive assessment of our manuscript from this reviewer and his/her support. We have answered all the comments as follows and modified the manuscript accordingly.

      Weaknesses:

      (1) The authors report that appending an additional helix increases the overcall stability of the α-helical protein. Could the author provide a more detailed structural explanation for this? Why does the mechanical stability increase as the number of helixes increase? Is there a reported correlation between the number of helices (or the extent of the hydrophobic core) and the stability?

      In multi-helix bundle proteins, tight interhelical packing leads to the formation of a dense hydrophobic core, which substantially enhances overall structural stability. The introduction of an additional helix does not merely increase helix count, but expands the buried hydrophobic interface, improving packing density and cooperative side-chain interactions in the core. This, in turn, strengthens the internal load-bearing network that resists force-induced unfolding.

      From a mechanical perspective, adding a helix also increases topological interlocking among secondary-structure elements, which raises the energetic barrier for unfolding and shifts the unfolding pathway toward more cooperative rupture events, thereby increasing the unfolding force threshold. Consistent with this design principle, pioneering studies have reported a positive correlation between the number of helices (or the extent of the hydrophobic core) in helix bundles and their stability (Lim et al., Structure, 2008, 16:449; Minin et al., J. Am. Chem. Soc., 2017, 139, 16168; Bergues-Pupo et al., Phys. Chem. Chem. Phys., 2018, 20, 29105). Inspired by these works, our AI-protein design study uses the appended helix to reinforce the hydrophobic core rather than simply increasing secondary-structure content.

      (2) The author analyzed both thermal stability and mechanical stability. It would be helpful for the author to discuss the relationship between these two parameters in the context of their design. Since thermal melting probes equilibrium stability (ΔG), while mechanical stability probes the unfolding energy barriers along the pulling coordinate.

      We agree this is a crucial distinction. Thermal and chemical stabilities report on the equilibrium free energy (ΔG), while mechanical stability probes the kinetic unfolding barrier (ΔG‡) along a force-dependent pathway. Their inherent difference makes concurrent improvement in all parameters a non-trivial task, which highlights the importance and success of our integrative design approach.

      (3) While the current study demonstrates a dramatic increase in global stability, the analysis focuses almost exclusively on the unfolding (melting) process. However, thermodynamic stability is a function of both folding (k<sub>f</sub>) and unfolding (k<sub>u</sub>) rates. It remains unclear whether the observed ultrastability is primarily driven by a drastic decrease in the unfolding rate (k<sub>u</sub>) or if the design also maintains or improves the folding rate (k<sub>f</sub>)?

      We agree with the reviewer that thermodynamic stability is determined by both the folding rate (k<sub>f</sub>) and the unfolding rate (k<sub>u</sub>). In the present study, we did not directly measure folding kinetics, and therefore cannot quantitatively deconvolute the respective contributions of k<sub>f</sub> and k<sub>u</sub> to the observed ultrastability. Based on the design strategy and the experimental observations, we propose that the enhanced stability primarily originates from a substantial reduction in the unfolding rate (k<sub>u</sub>), corresponding to an increased unfolding energy barrier. The reinforcement of the hydrophobic core, the introduction of stabilizing interactions such as salt bridges and metal coordination, and the additional helix that increases topological and packing constraints all raise the energetic cost of disrupting key interactions in the folded state.

      This interpretation is consistent with the high mechanical unfolding forces observed in both AFM experiments and SMD simulations. In contrast, these stabilizing features are not necessarily expected to accelerate folding and may even modestly increase folding complexity. Addressing folding kinetics explicitly would require dedicated kinetic experiments or simulations, which are beyond the scope of the present work but represent an interesting direction for future studies.

      (4) The authors chose the spectrin repeat R15 as the starting scaffold for their design. R15 is a well-established model known for its "ultra-fast" folding kinetics, with folding rates (k<sub>f</sub> ~105s), near three orders of magnitude faster than its homologues like R17 (Scott et.al., Journal of molecular biology 344.1 (2004): 195-205). Does the newly designed protein, with its additional fourth helix and site-specific chemical modifications, retain the exceptionally high folding rate of the parent R15?

      We did not directly measure the folding kinetics of the newly designed proteins, and therefore cannot determine whether they retain the exceptionally fast folding rate reported for the parent spectrin repeat R15. While R15 is known for its ultrafast folding behavior, the introduction of an additional fourth helix and site-specific chemical modifications, although beneficial for enhancing stability, may increase the complexity of the folding landscape and do not necessarily guarantee that the folding rate (k<sub>f</sub>) remains comparable to that of R15.

      Reviewer #3 (Recommendations for the authors):

      (1) Please clarify the used Gaussian function to fit the unfolding force distribution (Figure 3-4). In Figure S8, the Bell-Evans model is used to analyze unfolding force. The authors should explain the choice of fitting methods and ensure consistency.

      The Gaussian fitting used in Figures 3–4 is intended as a descriptive statistical analysis to summarize the unfolding force distributions and to facilitate direct comparison between different designs. This approach provides a robust estimate of the most probable unfolding force and the distribution width, without invoking a specific physical unfolding model, and is commonly used in single-molecule force spectroscopy for comparative purposes.

      In contrast, the Bell-Evans model applied in Figure S8 is a kinetic framework that explicitly accounts for force-loading-rate dependence and is used to extract mechanistic insights into the unfolding process. Therefore, the two fitting approaches serve complementary roles: Gaussian fitting for quantitative comparison and ranking of mechanostability, and Bell-Evans analysis for mechanistic interpretation. We have clarified this distinction and the rationale for using both methods in the revised Supplementary Information to ensure consistency and transparency.

      (2) The authors utilized steered MD simulation to analyze the mechanical properties via ForceGen (Ni et al., 2024, Sci. Adv. 10, eadl4000). However, the significant discrepancy between the predicted unfolding force (~600 pN) and the experimental value (~50 pN for spectrin, line 376) requires further justification (line 376). Please clarify how the accuracy of these predictions can be established. Specifically, do the MD simulations successfully capture the relative ranking or trends in stability across the different designed variants?

      We agree with the reviewer that there is a substantial discrepancy between the absolute unfolding forces predicted by SMD simulations (~ 600 pN) and those measured experimentally by AFM (~ 50 pN for spectrin). This difference primarily arises from the orders-of-magnitude mismatch in loading rates between simulations and experiments. In our SMD simulations, the pulling velocity (~10<sup>9</sup> nm/s) is several orders of magnitude higher than that used in AFM experiments (~10<sup>3</sup> nm/s), which is to systematically elevate the apparent unfolding force. In addition to loading-rate effects, limitations in force-field accuracy, finite system size, and restricted conformational sampling further contribute to deviations in absolute force values. As a result, the unfolding forces obtained from SMD are not intended to provide quantitative agreement with experimental measurements or absolute mechanical stability.

      Instead, SMD is employed here as a comparative screening tool to assess relative mechanostability across different designed variants under identical simulation conditions. Despite the limited number of repeats imposed by computational cost, the simulations consistently distinguish candidates with markedly different mechanical responses. Importantly, the variants identified by SMD as more mechanically stable were subsequently confirmed experimentally to exhibit enhanced mechanostability relative to the wild-type spectrin repeat. Therefore, while SMD does not yield quantitatively accurate unfolding forces, it successfully captures relative stability trends and provides a practical and effective means for prioritizing designs prior to experimental validation.

    1. eLife Assessment

      This is an important study showing that movement vigor is not solely an individual property but emerges through interaction when two people are physically linked. The evidence is convincing, supported by a well-controlled experimental design and modeling that closely match the observed behavior. While the authors provided a helpful comparison of several candidate models of human-human interaction dynamics, the statistical power remains limited.

    2. Reviewer #1 (Public review):

      Summary:

      The authors present a novel investigation of movement vigor of individuals completing a synchronous extension-flexion task. Participants were placed into groups of two (so-called "dyads") and asked to complete shared movements (connected via a virtual loaded spring) to targets placed at varying amplitudes. The authors attempted to quantify what, if any, adjustments in movement vigor individual participants made during the dyadic movements, given the combined or co-dependent nature of the task. This is a novel, timely question of interest within the broader field of human sensorimotor control.

      Participants from each dyad were labeled as "slow" (low vigor) or "fast" (high vigor), and their respective contributions to the combined movement metrics assessed. The authors presented four candidate models for dyad interactions: (a) independent motor plans (i.e., co-activity hypothesis), (b) individual-led motor plans (i.e., leader-follower hypothesis), (c) generalization to a weighted average motor plan (i.e., weighted adaptation hypothesis), and (d) an uncertainty-based model of dynamic partner-partner interaction (i.e., interactive adaptation hypothesis). The final model allowed for dynamic changes in individual motor plans (and therefore, movement vigor) based on partner-partner interactions and observations. After detailed observations of interaction torque and movement duration (or, vigor), the authors concluded that the interactive adaptation model provided the best explanation of human-human interaction during self-paced dyadic movements.

      Strengths:

      The experimental setup (simultaneous wrist extension-flexion movements) has been thoroughly vetted. The task was designed particularly well, with adequate block pseudo-randomization to ensure general validity of the results. The analyses of torque interaction, movement kinematics, and vigor are sound, as are the statistical measures used to assess significance. The authors structured the work via a helpful comparison of several candidate models of human-human interaction dynamics, and how well said models explained variance in the vigor of solo and combined movements.

      The authors adequately addressed several concerns that I raised in my initial review of the work, including clarity regarding analyses of movement vigor and inclusion of additional analyses of reaction time. The results are supported by both parametric and non-parametric statistical methods.

      The research question is timely and extends current neuroscientific understanding of sensorimotor control, particularly in social contexts. This work answers several new, important questions about control of vigor during volitional movements, and in doing so it motivates future research into the topic.

      Weaknesses:

      My chief concern about the study is the relatively low number of dyad data points (n=10). The authors recruited 20 participants, but the primary conclusions are based on dyad-specific interactions (i.e., analyses of "fast" vs "slow" participants in each pair). However, it is important to note that most of the effects upon which the conclusions rest are associated with relatively large effect sizes.

    3. Reviewer #2 (Public review):

      Summary:

      This study examines how individual movement vigor is integrated into a shared, dyadic vigor when two individuals are physically coupled. Participants performed wrist-reaching movements toward targets at different distances while mechanically linked via a virtual elastic band, and dyads were formed by pairing participants with different baseline vigor profiles. Under interaction conditions, movements converged to coordinated patterns that could not be explained by simple averaging, indicating that each dyad behaved as a single functional unit. Notably, under coupling, movement durations for both partners were shorter than in the solo condition, arguing against the view that each individual simply executed an independent movement plan. Furthermore, dyadic vigor was primarily predicted by the slower partner's vigor rather than by the faster partner's, suggesting that neither a leader-follower strategy nor a weighted averaging account fully explains the observed behavior. The authors propose a computational model in which both partners adapt to the emerging interaction dynamics ("interactive adaptation strategy"), providing a coherent explanation of the behavioral observations.

      Strengths:

      The study is carefully designed and addresses an important question about how individual movement vigor is integrated during joint action. The experimental paradigm allows systematic manipulation of interaction strength and partner asymmetry. The behavioral results show clear and robust patterns, particularly the shortening of movement durations under elastic coupling (KL and KH condition) and the asymmetrical contribution of the slower partner's vigor to dyadic vigor. The computational model captures the main behavioral patterns well and provides a principled framework for interpreting dyadic vigor not as a simple combination of two independent motor plans, but as an emergent property arising from mutual adaptation. Conceptually, the study is notable in extending the notion of vigor from an individual attribute to a dyad-level construct, opening a new perspective on coordinated movement and motor decision-making.

      Weaknesses:

      The revised manuscript now clearly explains why the proposed computational model successfully accounts for the observed dyadic behavior. In particular, the mechanisms by which uncertainty associated with the slower partner and time-related costs of the faster partner jointly shape dyadic vigor are now clear. I have no further comments to add.

    4. Reviewer #3 (Public review):

      Summary:

      This study provides novel insights into how individuals regulate the speed of their movements both alone and in pairs, highlighting consistent differences in movement vigor across people and showing that these differences can adapt in dyadic contexts. The findings are significant because they reveal stable individual patterns of action that are flexible when interacting with others, and they suggest that multiple factors, beyond reward sensitivity, may contribute to these idiosyncrasies. The evidence is generally strong, supported by careful behavioral measurements and appropriate modeling.

      The authors have addressed all of my previous comments. I appreciate the clarification of abbreviations, terminology, and key concepts, the expansion of the discussion, and the adjustments to some of the statistical analyses in response to both my earlier comments and those of Reviewer 1.

    5. Author response:

      The following is the authors’ response to the original reviews.

      We thank the reviewers for their constructive and precise comments, which have helped us improve the consistency and clarity of our manuscript. Below, we provide a point-by-point response to each comment. In summary, the main changes introduced in the revised version are as follows:

      (1) We replaced all the statistical analyses to their non-parametric equivalents to ensure compliance with test assumptions and consistency of the results;

      (2) We compare the participants’ reaction times before and during connected practice, revealing a significant reduction in reaction times of both partners when connected;

      (3) We added, in the supplementary materials, a table reporting the vigor scores of each participant in each experimental condition, facilitating the assessment of individual and dyadic behaviors;

      (4) We have reviewed and refined the terminology throughout the manuscript and reduced the number of abbreviations to improve clarity.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors present a novel investigation of the movement vigor of individuals completing a synchronous extension-flexion task. Participants were placed into groups of two (so-called "dyads") and asked to complete shared movements (connected via a virtual loaded spring) to targets placed at varying amplitudes. The authors attempted to quantify what, if any, adjustments in movement vigor individual participants made during the dyadic movements, given the combined or co-dependent nature of the task. This is a novel, timely question of interest within the broader field of human sensorimotor control.

      Participants from each dyad were labeled as "slow" (low vigor) or "fast" (high vigor), and their respective contributions to the combined movement metrics were assessed. The authors presented four candidate models for dyad interactions: (a) independent motor plans (i.e., co-activity hypothesis), (b) individual-led motor plans (i.e., leader-follower hypothesis), (c) generalization to a weighted average motor plan (i.e., weighted adaptation hypothesis), and (d) an uncertainty-based model of dynamic partner-partner interaction (i.e., interactive adaptation hypothesis). The final model allowed for dynamic changes in individual motor plans (and therefore, movement vigor) based on partner-partner interactions and observations. After detailed observations of interaction torque and movement duration (or vigor), the authors concluded that the interactive adaptation model provided the best explanation of human-human interaction during self-paced dyadic movements.

      Strengths:

      The experimental setup (simultaneous wrist extension-flexion movements) has been thoroughly vetted. The task was designed particularly well, with adequate block pseudo-randomization to ensure general validity of the results. The analyses of torque interaction, movement kinematics, and vigor are sound, as are the statistical measures used to assess significance. The authors structured the work via a helpful comparison of several candidate models of human-human interaction dynamics, and how well said models explained variance in the vigor of solo and combined movements. The research question is timely and extends current neuroscientific understanding of sensorimotor control, particularly in social contexts.

      We thank the reviewer for their in-depth analysis and constructive assessment of our manuscript.

      Weaknesses:

      (1) My chief concern about the study as it currently stands is the relatively low number of data points (n=10). The authors recruited 20 participants, but the primary conclusions are based on dyad-specific interactions (i.e., analyses of "fast" vs "slow" participants in each pair). Some of these analyses would benefit greatly, in terms of power, from the addition of more data points.

      We understand and appreciate the reviewer’s concern regarding the effective sample size at the dyad level (n=10). While our primary analyses focus on dyad-specific interactions, we note that the reported effects are consistent across multiple dynamic conditions and are associated with large effect sizes. To provide a conservative assessment the Cohen’s D values reported correspond to the smallest effect size observed across the relevant statistical tests, thereby limiting the risk of false positives or overinterpretation. In addition, to ensure robustness given the sample size and distribution properties of the data, we have replaced all parametric tests with their non-parametric counterparts, as some analyses violated ANOVA assumptions. Friedman and Kruskal-Wallis tests are now used for paired and unpaired main effects respectively, and Wilcoxon and Mann-Whitney tests for paired and unpaired post-hoc comparisons respectively. Note that these changes did not alter the conclusions of the study.

      (a) The distribution of delta-vigor (Fast group vs Slow group) is highly skewed (see Figures 3D, S6D), with over half of the dyads exhibiting delta-vigor less than 0.2 (i.e., less than 20% of unit vigor). Given the relatively low number of dyads, it would be helpful for the authors to provide explicit listings of VigorFast, VigorSlow, and VigorCombined for each of the 10 separate dyads or pairings.

      We agree with this comment. However, we note that the distribution of vigor scores within a population is typically centered around 1, with large deviations observed only for the fastest and slowest participants [1]. As a result, the distri bution of ∆-vigor is inherently skewed. Correcting for this skewness would (i) require pairing participants based on their vigor, which is logistically difficult, and (ii) lead to an atypical sampling of dyads, with an over representation of pairs exhibiting very large vigor differences. The distributions of vigor scores for the fast and slow groups before and after the interaction are reported in Supplementary Fig. S21. In addition, as suggested by the reviewer, we have now included Table S.1 in the supplementary materials, listing the values VigorFast, VigorSlow, and VigorCombined for each of the 10 dyads. This table provides a complete view of the evolution of participant’s vigor throughout the experiment.

      (b) The authors concluded that the interactive adaptation hypothesis provided the best summary of the combined movement dynamics in the study. If this is indeed the case, then the relative degree of difference in vigor between the fast and slow participants in a dyad should matter. How well did the interactive adaptation model explain variance in the dyads with relatively low delta-vigor (e.g., less than 0.2) vs relatively high delta-vigor?

      We initially expected the magnitude of difference in individual vigor within a dyad to play a significant role. However, our analysis did not reveal any systematic effect of ∆-vigor on either the interaction force or the resulting dyadic vigor, as shown by the LMM analysis. Importantly, the interactive adaptation hypothesis does per se imply that the magnitude of vigor differences between the two partners should matter, only that their respective roles in selecting the adapted behavior is different. Although the model includes several free parameters, we did not attempt to fit it to individual dyads as would in principle be possible. Instead, we performed a sensitivity analysis to assess how variations in the difference in vigor between the partners influence model predictions. For this purpose, we simulated increasing values of µ and variations in the fast partner’s cost of time. In addition, we demonstrated that uncertainty in the estimated behavior of the slow partner, which is a priori specific to each individual, has a substantial impact on the optimal movement duration of the dyad. Overall, this analysis shows that the model captures the full range of qualitative trends observed in the experimental data. When applied to predict the behavior of the average dyad, the resulting movement time prediction error remain small, as detailed in the Results section.

      (2) The authors shared the results of one analysis of reaction time, showing that the reaction times of the slow partners and the fast partners did not differ during the initial passive block. Did the authors observe any changes in RT of either the slow or fast partner during the combined (primary task) blocks (KL, KH, etc.)? If the pairs of participants did indeed employ a form of interactive adaptation, then it is certainly plausible that this interaction would manifest in the initial movement planning phase (i.e., RT) in addition to the vigor and smoothness of the movements themselves.

      We thank the reviewer for this interesting question, that prompted us to extend our analysis of reaction times to the connected conditions. This additional analysis revealed a significant main effect of the condition on the reaction time for both the fast and slow groups (in both cases: W<sub>2</sub> > 0.39, p < 0.02). Post-hoc comparisons showed a significant reduction in reaction time between the initial null-field block (NF1) and the KH condition for the slow group (p = 0.03, D = 1.46), and a similar trend for the fast group (p = 0.06, D = 1.03). However, the reaction times remained comparable between the two groups, with no significant difference between them. We have incorporated these observations in the Results section (p.4, l.100–109) and expanded the Discussion (p.11, l.341–348) to address their implications for interactive adaptation in human-human and human-robot physical interactions.

      Reviewer #2 (Public review):

      Summary:

      This study examines how individual movement vigor is integrated into a shared, dyadic vigor when two individuals are physically coupled. Participants performed wrist-reaching movements toward targets at different distances while mechanically linked via a virtual elastic band, and dyads were formed by pairing participants with different baseline vigor profiles. Under interaction conditions, movements converged to coordinated patterns that could not be explained by simple averaging, indicating that each dyad behaved as a single functional unit. Notably, under coupling, movement durations for both partners were shorter than in the solo condition, arguing against the view that each individual simply executed an independent movement plan. Furthermore, dyadic vigor was primarily predicted by the slower partner’s vigor rather than by the faster partner’s, suggesting that neither a leader-follower strategy nor a weighted averaging account fully explains the observed behavior. The authors propose a computational model in which both partners adapt to the emerging interaction dynamics ("interactive adaptation strategy"), providing a coherent explanation of the behavioral observations.

      Strengths:

      The study is carefully designed and addresses an important question about how individual movement vigor is integrated during joint action. The experimental paradigm allows systematic manipulation of interaction strength and partner asymmetry. The behavioral results show clear and robust patterns, particularly the shortening of movement durations under elastic coupling (KL and KH conditions) and the asymmetrical contribution of the slower partner’s vigor to dyadic vigor. The computational model captures the main behavioral patterns well and provides a principled framework for interpreting dyadic vigor not as a simple combination of two independent motor plans, but as an emergent property arising from mutual adaptation. Conceptually, the study is notable in extending the notion of vigor from an individual attribute to a dyad-level construct, opening a new perspective on coordinated movement and motor decision-making.

      We thank the reviewer for their thorough analysis of our manuscript and their constructive feedback.

      Weaknesses:

      (1) A key conceptual issue concerns the apparent asymmetry between partners in the computational framework. While dyadic vigor is empirically better predicted by the slower partner’s vigor, the model formulation appears to emphasize the faster partner’s time-related cost and interaction forces. Although the cost function includes an uncertaintyrelated component associated with the slower partner, it remains unclear from the current formulation and description how dyadic vigor is formally derived from the slower partner’s control policy within the same modeling framework. This raises an important question regarding whether the model offers a symmetric account of dyadic vigor formation for both partners or whether it is effectively anchored to the faster partner’s control architecture.

      We have modified our phrasing to clarify the principles according to which the computational framework was designed (p.7, l.226–231 and p.9, l.260–264). As stated in the Results section, the model is indeed asymmetric by design, which corresponds to the different roles of the fast and slow partner exhibited in the data. In that context, the uncertain term associated with the slow partners should be understood as an overarching constraint that conditions the strategy of the dyad, while the fast partner cost of time acts as a contributor to the expected dyad strategy. Conceptually and numerically as reported in the sensitivity analysis, this asymmetry corresponds to the role of the slow partners in setting the vigor ranking among the dyads and the role of the fast partner in setting the average dyadic behavior.

      (2) A second conceptual issue concerns the interpretation of the term "motor plan." It remains unclear whether this term refers primarily to movement-related characteristics such as speed or duration, or more broadly to the underlying optimization structure that governs these variables. This distinction is theoretically important, as it determines whether the reported interaction effects should be understood as adjustments in movement characteristics or as changes in the structure of the control policy itself.

      We agree with the reviewer that this terminology required clarification. In this paper, the term “motor plan” refers to the time series of control inputs planned by the CNS, rather than solely to kinematic descriptors such as speed or duration. These planned control signals are a direct consequence of the underlying optimization structure and cost functions that govern trajectory generation. We have clarified this definition in the Introduction (p.1, l.23–24).

      Reviewer #3 (Public review):

      Strengths:

      This study provides novel insights into how individuals regulate the speed of their movements both alone and in pairs, highlighting consistent differences in movement vigor across people and showing that these differences can adapt in dyadic contexts. The findings are significant because they reveal stable individual patterns of action that are flexible when interacting with others, and they suggest that multiple factors, beyond reward sensitivity, may contribute to these idiosyncrasies. The evidence is generally strong, supported by careful behavioral measurements and appropriate modeling, though clarifying some statistical choices and including additional measures of accuracy and smoothness would further strengthen the support for the conclusions.

      Thank you for this analysis and the insightful feedback.

      Major Comments:

      (1) Given the idiosyncrasies in individual vigor, would linear mixed models (LMMs) be more appropriate than ANOVAs in some analyses (e.g., in the section "Solo session"), as they can account for random intercepts and slopes on vigor measures? Some figures (e.g., Figure 2.B and 3.E) indeed seem to show that some aspects of behaviour may present variability in slopes and intercepts across participants. In fact, I now realize that LMMs are used in the "Emergence of dyadic vigor from the partners’ individual vigor" section, so could the authors clarify why different statistical approaches were applied depending on the sections?

      We thank the reviewer for this thoughtful comment. We deliberately used different statistical approaches throughout the paper in order to address different types of questions. Note that the statistical tests were converted to their nonparametric equivalent for consistency (see answer to Reviewer 1).

      - Friedman tests were used in a limited number of cases to assess population- or group-level effects, such as differences in movement time, smoothness, or accuracy across the solo, connected, and after-effects conditions. Such tests provide a straightforward framework for these descriptive, condition-level comparisons.

      - The stability of individual and dyadic vigor scores across conditions was assessed using Pearson correlations across all condition pairs, which we consider the most direct and interpretable approach for evaluating consistency across sessions.

      - LMMs were employed to examine how dyadic vigor relates to the partners’ individual vigor measured in the solo conditions, which revealed the critical contribution of the slow partner.

      Rather than applying a single statistical framework throughout, we selected the method best suited to each question. While LMMs are well suited for modeling participant-specific variability when linking individual and dyadic measures, their systematic use in all analyses would be less intuitive and would not directly address several of the population-level comparisons central to this study.

      (2) If I understand correctly, the introduction suggests that idiosyncrasies in movement vigor may be driven by interindividual differences in reward sensitivity. However, the current task does not involve any explicit rewards, yet the authors still observe idiosyncrasies in vigor, which is interesting. Could this indicate that other factors contribute to these consistent individual differences? For example, could sensitivity to temporal costs or physical effort explain the slow versus fast subgrouping? Specifically, might individuals more sensitive to temporal costs move faster to minimize opportunity costs, and might those less sensitive to effort costs also move faster? Along the same lines, could the two subgroups (slow vs. fast) be characterized in terms of underlying computational "phenotypes," such as their sensitivities to time and effort? If this is not feasible with the current dataset, it would still be valuable to discuss whether these factors could plausibly account for the observed patterns, based on existing literature.

      We thank the reviewer for this interesting question. We first note that the notion of reward in motor control is quite broad. Although our task did not include explicit external (e.g. monetary) rewards, we assumed that participants attribute an implicit value to completing the task in accordance with the experimenter’s instructions. This assumption has been shown to be appropriate for characterising baseline behavior in previous studies [2–5].

      As discussed in the Introduction, vigor is generally understood to emerge from a tradeoff between effort, accuracy, and time. The reviewer is correct in noting that inter-individual differences in vigor may reflect differences in reward sensitivity or in its discounting [3,6], given that time and reward are intrinsically coupled. Differences in vigor may also arise from inter-individual variability in sensitivity to effort or perceived task difficulty. Because these factors are intertwined—for example, increasing accuracy through co-contraction typically incurs greater effort [7])—it is challenging to disentangle their respective contributions based solely on behavioral data.

      In the present study, our inverse optimal control procedure to identify the cost of time (and thus predict individuals’ vigor) relies on a predefined effort-accuracy tradeoff under fixed final time across multiple movement amplitudes [8]. As a result, the model does not allow us to independently estimate individual sensitivities to effort, accuracy, and time. Such characterization of computational "phenotypes" would likely require experimental paradigms in which each of these factors is systematically manipulated while the others are held constant, which is beyond the scope of the current dataset. In practice, the main value of behavioral modeling lies in revealing the relative weighting of these criteria by the CNS during motor planning [5]. We have expanded the Discussion to clarify these limitations and considerations (see Discussion p.12, l.396–401 & l.407–412).

      Finally, we chose not to emphasize these broader issues in the present manuscript because (i) they are peripheral to our primary research question on how individual vigor influences human-human interaction, and (ii) although we do not yet have definitive and consensual answers, they have been addressed in multiple studies reviewed elsewhere [9,10].

      (3) The observation that dyads did not lose accuracy or smoothness despite changes in vigor is interesting and suggests a shift in the speed-accuracy tradeoff. Could the authors include accuracy and smoothness measures in the main figures rather than only in supplementary materials? I think it would make the manuscript more complete.

      We also find that the preservation of accuracy and smoothness despite changes in vigor is an interesting result, and we therefore chose to report these measures in the Supplementary Materials. However, we believe it is preferable not to include them in the main figures for the following reasons:

      - We avoid framing our results in terms of a speed-accuracy trade-off, as Fitts’ work was initially designed to study fast movements [11], whereas our work focuses on self-paced movements. As outlined in the Introduction, vigor is more appropriately interpreted as reflecting a tradeoff between effort (related to movement speed), accuracy, and time. From this perspective, the reported changes of vigor already capture a shift in the underlying trade-off selected by the CNS, using a framework better suited to our experimental paradigm.

      - The manuscript is technically dense and reports multiple analyses that are essential to establish (i) the existence and definition of dyadic vigor, and (ii) how it emerges from interaction between partners. Although the observed preservation of accuracy and improvements in smoothness are informative, they are not central to these two primary questions and would risk diverting attention from the core contributions of the paper. In addition, accuracy is not a feature predicted by our deterministic modeling and extensions would be needed to capture these aspect. Here we only attempted to replicate average behaviors.

      (4) It is a bit unclear to me whether the variance assumptions for ANOVAs were checked, for instance, in Figure 3H.

      We thank the reviewer for this comment, which prompted us to verify the assumptions underlying our ANOVAs. We found that a few distributions in the original analysis, as well as in some of the new tests, did not meet these assumptions. To ensure consistency, all statistical analyses have now been replaced with non-parametric tests: Friedman and Kruskal-Wallis tests for paired and unpaired main effects, Wilcoxon and Mann-Whitney tests for paired and unpaired post-hocs. The updated results do not change any of the conclusions. the only minor change is accuracy, that appeared slightly improved in a restricted number of connected conditions, and now appears mostly non-impacted.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Minor points:

      (1) Lines 146-147. The authors state, "Whereas the fast partners maintained a similar duration". Figures S6H,I suggest that fast partners made slower movements during the paired task relative to the solo task, not movements with a similar duration.

      We agree that Fig. S.6H,I suggest slightly slower movements for the fast partners, though not significant. We have modified the sentence to be less assertive than in the previous version (see p.6, l.155).

      (2) In the Discussion (Lines 318-319), the authors state that their findings confirm and extend the "benefits of dyadic control in collaborative actions". What benefits are they referring to here, relative to individual control? It would be helpful if the authors would elaborate on this claim.

      We have modified this sentence to clarify that the benefits of dyadic control refer to previously reported advantages over individual control, namely reduced movement time Reed and Peshkin (2008) [12] and improved tracking accuracy [13,14] (see p.11, l.336–337).

      (3) On Lines 87-89, the authors reference a decomposition of variance of vigor scores across the NF1, VL, and VH conditions; however, I did not see an explanation of how this decomposition was performed. The method used to estimate variance explained by inter-individual vs intra-individual differences in vigor should be outlined for the reader.

      Thank you for pointing out this missing information. We now explain in the statistical analysis section (see p.14, l.504–507), that the percentage of inter-individual variability in vigor is estimated using sum-square values as an estimation of inter- and intra-individual variability.

      (4) How was the absolute interaction torque for a paired movement calculated? Was it an integral of the temporal profile of torque for some portion of the combined movement? The method for calculating the absolute interaction torque needs to be specified.

      We have now clarified in the Methods (see p.14, l.490–491) that the reported average interaction effort was computed as the absolute value of the interaction torque as a function of time averaged over the entire movement.

      (5) Lines 123-124: "... interaction torque showed no significant correlation with differences in individual vigor within dyads." This statement should be supported by appropriate statistical measures.

      This result is now supported by reporting the corresponding Pearson correlation analyses. No significant correlations were found between interaction torque and differences in individual vigor within dyads (KL conditions: |r| < 0.43, p> 0.22; KH conditions: |r| < 0.18, p > 0.61, see p.5, l.132–133).

      (6) For the analysis, presented in Figure 3C, and specified on lines 116-123, the text mentions the main effects of both condition and target. There doesn’t appear to be much of an effect of the target for the KH data. Should these results not be reported as an interaction effect between the two factors instead?

      We agree with the reviewer and have corrected our presentation of these results (see p.4, l.126–128). Consistent with the reviewer’s observation, no significant effect of the target is found in the KH condition.

      (7) Figures 3E and S6B. What is the purpose of including the averaged data for each pair in addition to both individuals’ data from each pair? It would be useful to distinguish the individual data from the average data for each pair. Frankly, the number of data points shown on this sub-figure is excessive.

      There may have been a misunderstanding. Because the partners of a dyad are connected by a virtual elastic band (rather than a rigid bar), they do not execute identical movements. Therefore Figs. 3E,S6B display the movement time of all individual participants, together with the corresponding 20 individual regression lines, like in Fig. 2B. The solid black line represents the average across all individuals, and the averaged behaviors of dyads are not included. We have clarified this point by revising the caption of Fig. 3E (see p.5).

      Noted mis-spellings:

      Figure S.3A caption: "trials towards this target."

      Page 10 Line 313: "Importantly, these findings show ...".

      These mis-spellings have been corrected at supplementary p.2 and main text p.11, l.331. Thank you!

      Reviewer #2 (Recommendations for the authors):

      (1) To illustrate the contribution of the three components used to calibrate the overall cost function, it would be informative to include simulation analyses in which each component is selectively removed (i.e., ablation analyses).

      We did not perform ablation analyses, as selectively removing components of the model can lead to instability or ill-suited control inputs, making the resulting simulations difficult to interpret. Instead, we conducted a sensitivity analysis of the key parameters shaping the overall cost function, including the estimated mean and deviation of the slow partner’s movement duration, the weight associated with uncertain torque minimization (Figs. S.18,S.19), and the fast partner’s cost of time (Fig. S20). This analysis reveals the predominant roles of the estimated slow partner movement patterns in determining the model predictions, in agreement with our experimental observations.

      (2) Although the authors refer to the motor-off condition as "passive," participants actively generated the movements in the absence of external forces. Thus, this condition corresponds to active, unassisted movement. A different term may therefore reduce potential confusion for readers.

      We agree that term “passive” was not well-chosen given the context of the paper, thus we have instead replaced this denomination as “null-field” condition. Consequently, the P1 and P2 blocks are now referred to as NF1 and NF2.

      (3) Please clarify the instructions given to participants. Were they informed in advance that their movements would physically interact with those of their partner?

      Thank you for pointing out this missing clarification. We have now specified in the Methods (p.14, l.465–469) that participants were not informed prior to any condition that they would interact with a human partner; they were only told that the robot would provide assistance. When debriefed at the end of the experiment, only one out of the 20 participants reported having realized that they were connected to another human. Most participants believed they were interacting either with a version of themselves or with a robot with some randomness.

      (4) Line 475. Should "Fig. 2D" be "Fig. 2B"?

      Thank you for catching this error. The reference has been corrected to Fig. 2B (see p.15, l.522).

      Reviewer #3 (Recommendations for the authors):

      (1) The analysis of reaction times shows no difference between groups in the passive block, which challenges the assumption that movement vigor covaries with decision speed or action initiation speed. It may be worth discussing this in the context of recent literature.

      We agree that the initial analysis and discussion of reaction times were too superficial. In the revised manuscript, we now report that dyadic interaction leads to significantly shorter reaction times (p.4, l.100–109), concomitantly with improved movement velocity. We have also expanded the Discussion, on the relationship between decision and action speeds/durations (p.11, l.340–348).

      (2) Many abbreviations are unusual for a non-expert. I would recommend using the full terms instead. At least initially, I found it difficult to follow the results because the abbreviations were not immediately clear (at least to me).

      We agree that the paper had to many abbreviations. Therefore, we have removed the abbreviated names of the models and, when possible without impacting the readability, used the full names of the conditions.

      (3) Relatedly, the notation in Figure 1 may be confusing. The labels "S" and "F" (slow and fast) correspond to different concepts than "F" and "L" (follower and leader), so the same participant could be labeled "F" as fast but not "F" as a leader.

      Thank you for pointing out this potential source of confusion. We have therefore modified Fig. 1A (p.2) to avoid any potential confusion by using the full model names rather than abbreviations. In the remainder of the manuscript, "S" and "F" exclusively denote the slower and faster partners within a dyad, and we do not use abbreviations for "leader" or "follower" in the text.

      (4) In figures like 2.C and 3.I, keeping the same scales on the x and y axes and adding a diagonal reference line would make it easier to see shifts across conditions.

      As explained in the Methods, vigor scores in the low- and high-viscosity conditions were computed using the average movement durations from the NF1 condition as a reference. Consequently, because movements are slower in these conditions, the corresponding vigor values are lower than those in NF1. For this reason, using identical scales on the x- and y-axes and adding a 45◦ reference line could mislead the reader in thinking that the vigor scores are expected to be identical and reduce the readability of the figure.

      (5) Multiple hypotheses about dyadic regulation of vigor are nicely explained; it could help to indicate if any of these were a priori favored based on prior literature.

      Previous literature provides mixed evidence regarding how vigor might be regulated in dyadic interaction. For instance, Takagi et al. (2016) [15] reported that mechanically connected partners may rely on independent motor plans, which corresponds to the co-activity hypothesis considered here. However, in that study, movement duration was prescribed. We therefore expected that removing this constraint on movement duration could allow coordination strategies to emerge, particularly in view of findings on haptic communication during tracking of random targets while connected via an elastic band [13,14].

      At the same time, a large body of work on human–human and human–robot interaction has interpreted coordination through a leader–follower framework. In our context, vigor is understood as the outcome of a tradeoff between effort and elapsed time, with time being associated with a decaying reward. Based on this framework, we hypothesized a priori that a leader–follower scheme would emerge, in which the fast partner—being more sensitive to time costs and/or less sensitive to effort—would tend to drive the interaction, even at the expense of increased effort. For these reasons, the leader–follower hypothesis was formulated as the expected outcome throughout the manuscript.

      (6) In the introduction, statements such as "relative vigor of an individual is remarkably stable" appear true only in the solo condition. The same is true in the discussion where it is said that vigor is a stable trait. The whole study show that an individual can shift his/her vigor to the same vigor of another individual, so it doesn’t appear stable to me in such conditions but adaptable.

      Let us first clarify that when we describe vigor as “remarkably stable”, we do not imply that individuals do not adjust their movement timing in response to changes in external dynamics. For example, movement durations increase in visco-resistive conditions even during solo performance; nevertheless, individuals who move faster in the absence of resistance will remain faster relative to others when resistance is introduced. In this sense, stability refers to the preservation of relative rankings across conditions, rather than invariance of absolute movement timing. Because interaction with another individual constitutes a substantial change in task dynamics, an effect on individual pace is therefore expected.

      Told that (and as pointed to by the reviewer) (i) dyadic interactions lead to the emergence of a dyadic vigor characterized by average movement durations close to those of the fast partners, while the ranking across dyads is largely imposed by the slow partners; and (ii) these adaptations persist after the interaction phase. Importantly, the observed vigor adaptations appear to last longer in our physical interaction task than in previous attempts to manipulate vigor using visual feedback [16]. To account for this adaptability of vigor, we have (i) clarified claims in the Introduction regarding the stability of vigor (see p.1, l.18–20), and (ii) expanded the Discussion to more explicitly address vigor adaptability and the possible resulting consequences for the concept of vigor (see p.12, l.407–412).

      References

      (1) O. Labaune, T. Deroche, C. Teulier, and B. Berret, “Vigor of reaching, walking, and gazing movements: on the consistency of interindividual differences,” Journal of Neurophysiology, vol. 123, pp. 234–242, jan 2020.

      (2) L. Rigoux and E. Guigon, “A model of reward-and effort-based optimal decision making and motor control,” PLoS Computational Biology, vol. 8, pp. 1–13, Jan. 2012.

      (3) R. Shadmehr, J. J. O. de Xivry, M. Xu-Wilson, and T.-Y. Shih, “Temporal discounting of reward and the cost of time in motor control,” Journal of Neuroscience, vol. 30, pp. 10507–10516, aug 2010.

      (4) B. Berret and G. Baud-Bovy, “Evidence for a cost of time in the invigoration of isometric reaching movements,” Journal of Neurophysiology, vol. 127, pp. 689–701, feb 2022.

      (5) D. Verdel, O. Bruneau, G. Sahm, N. Vignais, and B. Berret, “The value of time in the invigoration of human movements when interacting with a robotic exoskeleton,” Science Advances, vol. 9, sep 2023.

      (6) K. Jimura, J. Myerson, J. Hilgard, T. S. Braver, and L. Green, “Are people really more patient than other animals? evidence from human discounting of real liquid rewards,” Psychonomic Bulletin & Review, vol. 16, pp. 1071–1075, dec 2009.

      (7) P. L. Gribble, L. I. Mullin, N. Cothros, and A. Mattar, “Role of cocontraction in arm movement accuracy,” Journal of Neurophysiology, vol. 89, pp. 2396–2405, may 2003.

      (8) B. Berret and F. Jean, “Why Don’t We Move Slower? The Value of Time in the Neural Control of Action,” Journal of Neuroscience, vol. 36, pp. 1056–1070, Jan. 2016.

      (9) R. Shadmehr and A. A. Ahmed, Vigor : neuroeconomics of movement control. The MIT Press, 2020.

      (10) D. Thura, A. M. Haith, G. Derosiere, and J. Duque, “The integrated control of decision and movement vigor,” Trends in Cognitive Sciences, vol. 29, pp. 1146–1157, Dec. 2025.

      (11) P. M. Fitts, “The information capacity of the human motor system in controlling the amplitude of movement,” Journal of Experimental Psychology, vol. 47, pp. 381–391, June 1954.

      (12) K. B. Reed and M. A. Peshkin, “Physical collaboration of human-human and human-robot teams,” IEEE Transactions on Haptics, vol. 1, pp. 108–120, July 2008.

      (13) G. Gowrishankar, A. Takagi, R. Osu, T. Yoshioka, M. Kawato, and E. Burdet, “Two is better than one: physical interactions improve motor performance in humans,” Scientific Reports, vol. 4, Jan. 2014.

      (14) A. Takagi, G. Ganesh, T. Yoshioka, M. Kawato, and E. Burdet, “Physically interacting individuals estimate the partner’s goal to enhance their movements,” Nature Human Behaviour, vol. 1, pp. 1–6, Mar. 2017.

      (15) A. Takagi, N. Beckers, and E. Burdet, “Motion plan changes predictably in dyadic reaching,” PLOS ONE, vol. 11, p. e0167314, Dec. 2016.

      (16) P. Mazzoni, B. Shabbott, and J. C. Cortes, “Motor control abnormalities in Parkinson’s disease,” Cold Spring Harbor Perspectives in Medicine, vol. 2, pp. a009282–a009282, Mar. 2012.

    1. eLife Assessment

      This valuable work extends a previously published regression framework for trial-aligned photometry data incorporating functional variables. However, the evidence is generally incomplete, due to the way that within-trial changes in variables have been incorporated into an inherently cross-trial analysis framework, which will limit general adoption. The ideas in this work will be of interest to researchers analyzing photometry signals.

    2. Reviewer #1 (Public review):

      Summary:

      The authors aimed to extend a prior fiber photometry analysis process they developed by incorporating the new ability to determine instantaneous, within trial, relationships between the photometry signal and continuously changing variables. They present solid evidence via simulations and example use cases from published datasets highlighting that their approach can capture instantaneous relationships. Overall, while they make a compelling case that this approach is less biased and more insightful, the implementation for many experimentalists remains challenging enough and may limit widespread adoption by the community.

      Strengths:

      This work builds on prior efforts to analyze photometry signals in a less biased and more statistically sound way. This work incorporates a very important aspect by avoiding the need to summarize individual trials with singular behavioral variables and instead allows for interactions with continuously changing variables to be investigated. The knowledge and expertise of the authors and the presentation provide strong validity and strength to the work. Examples from prior studies in the field are a necessary and important component of the work.

      Weaknesses:

      While use cases are provided from prior data, a clearer presentation of how common implementations in the field are performed (i.e. GLM) and how one could alternatively use the cFLMM approach would help. Otherwise, most may continue using common approaches of Pearson's correlations and GLMs.

    3. Reviewer #2 (Public review):

      The paper presents a regression-based approach for analysing fiber photometry data termed Concurrent Functional Mixed Models (cFLMMs). The approach works by fitting linear mixed effect models separately to each time point in trial aligned data, then applying smoothing to the model coefficients (betas), and computing confidence intervals. The method extends the authors previous work on using FLMMs for photometry data analysis by allowing for the inclusion of predictors whose value changes across timepoints within a trial, rather than just from trial to trial. As fiber photometry is a rapidly expanding field, developing principled methods to analyse photometry data is valuable, particularly as the authors have released an R package that implements their method to facilitate their use by other groups. The basic FLMM approach for using mixed effects models to analyse trial aligned photometry data, detailed by the authors in their previous manuscript (Loewinger et al. 2025, doi: 10.7554/eLife.95802) appears valuable. The aim of incorporating variables that change within trial into this framework is interesting, and the technical implementation appears to be rigorous. However, I have some reservations as to whether the way in which variables that change within trial have been integrated into the analysis framework is likely to be widely useful, and hence how impactful the additional functionality of cFLMM relative to the previously published FLMM will be.

      In the original FLMM approach, where predictors change only from trial-to-trial, fitting separate regressions at each timepoint generates a timeseries of betas is for each predictor, indicating when and how the predictor explained variance across the trial. This makes a lot of sense and is widely used in neuroscience data analysis. In extending this approach to incorporate variables that change within trial, the authors have used the same method of fitting separate regression models at each timepoint, to obtain a timeseries of betas for each predictor. It is less clear that this approach makes sense for variables that change within trial. This is because the resulting betas only capture how variation in the predictor across trials at a given timepoint explains variance in the signal, but does not capture effects of variation in the predictor across timepoints within trials. This partitioning of variance in the predictor into a between-trial component whose effect on the signal is modelled, and a within-trial component whose effect on the signal is not, is artificial in many experiment designs, and may yield hard to interpret results.

      Consider e.g. the experimental condition considered in Figure 3, taken from Machen et al. 2025 (doi: 10.1101/2025.03.10.642469) in which mice ran down a linear track to collect rewards. In analysing such data, one might want to know how neural activity covaried with the animal's position, but as this variable changes strongly within trial but will have a similar time-course across trials, the cFLMM analysis approach will not work to quantify these effects. This is because variance attributed to position would not capture how neural activity covaried with changes in the animals position within trial, but rather how neural activity covaried with changes in the animals position from trial-to-trial at a given timepoint, which could occur due to e.g. trial-to-trial differences in latency to start moving or running speed. As such, although significant effects of 'position' might be observed, they would not capture covariation between position and neural activity in a straightforwardly interpretable way.

      It is therefore not obvious to me that incorporating variables that change within trial into an analysis framework that runs separate regressions at each timepoint in trial aligned data is likely to be widely useful. If scientific questions require understanding how neural activity covaries as a function of variables that change both within and across trials, an alternative approach would be to run a single regression analysis across all timepoints, and capture the extended temporal responses to discrete behavioural events by using temporal basis functions convolved with the event timeseries. This provides a very flexible framework for capturing covariation of neural activity both with variables that change continuously such as position, and discrete behavioural events such as choices or outcomes, while also handling variable event timing from trial-to-trial.

      One way that cFLMM is used in the manuscript is to handle variable timing of trial events in trial aligned data. In the Machen et al. data, the time when the animal reaches the reward varies from trial to trial, and this is represented in the cFLMM analysis by a binary variable which changes value at this timepoint. From the resulting beta coefficient timeseries (Figure 3C) it is not straightforward to understand how neural activity changed as the subject approached and then received the reward. A simpler approach to quantify this, which I think would have yielded more interpretable coefficient timeseries would have been to align activity across trials on when the subject obtained the reward, rather than on the start of the trial, allowing e.g. the effect of reward type to be visualised as a function of time relative to reward delivery, and hence to see the differential effects during approach vs consumption. More broadly, handling variable trial timing in analyses like FLMM which use trial aligned data, can be achieved either by separately aligning the data to different trial events of interest or by time warping the signal to align multiple important timepoints across trials. It is not obvious that using cFLMM with binary indicator variables that indicate when task states changed will yield a clearer picture of neural activity than these methods.

      It may be that I am missing some key strengths of cFLMM relative to the other approaches I have outlined, or that there are applications where this approach to implementing within-trial variable changes is a natural formalism. However my impression is that while cFLMM represent a technical advance, it is not clear how widely useful the model formalism will be.

    4. Reviewer #3 (Public review):

      Summary:

      This work is an extension of their previous study (Loewinger et al 2025) describing a statistical framework for the analysis of photometry data using functional linear mixed models with joint confidence intervals, together with an open-source tool implemented in R. The present study extends it by adding the possibility of using 'concurrent' variables (variables that change within a trial) as regressors, for example, capturing the change of speed at each timepoint in the trial. The main claim is that using 'concurrent' regressors can identify associations between signal and behavior that could not be captured by 'non-concurrent' regressors (the value for a regressor on a specific trial is the same for each timepoint), which could lead to misleading conclusions. While the motivation for using time-varying covariates is useful and supported by previous literature (using fixed-effects models, although not cited in this manuscript), the reanalysis of previous studies does not clearly prove the benefit of using concurrent regressors as opposed to non-concurrent, and some of the results are difficult to interpret.

      Strengths:

      • The motivation for using time-varying covariates is well supported by previous literature using them on fixed-effects models, and here the authors are extending it to mixed-effects models.<br /> • The authors have included this new functionality in their previous open-source R package.

      Weaknesses:

      • The main weakness of this study is that it is not clear what the conceptual or methodological advance of this work is. As it is written, the manuscript focuses on showing how concurrent regressors offer interpretation advantages over non-concurrent regressors. While the benefit of such time-varying regressors is supported by previous literature (e.g., Engelhard et al., 2020), it is not clear whether the examples provided in the current study clearly support the advantage of one over the other, especially in the reanalysis of Machen et al. (2025), where the choice of regressors is confusing. In this specific example, if the question is about speed and reward type, why variables such as latency to reward or a binary 'reward zone vs corridor' (RZ) regressors are used instead of concurrent velocity (or peak velocity - in the case of the non-concurrent model)? Furthermore, if timing from trial start to reward collection is variable, why not align to reward collection, which would help in the interpretation of the signal and comparison between methods? Furthermore, while for the non-concurrent method, the regressors' coefficients are shown, for the concurrent one, what seems to be plotted are contrasts rather than the coefficients. The authors further acknowledge the interpretational difficulties of their analysis.<br /> • Because the relation between behavioral variables and neuronal signal is not instantaneous, previous literature using fixed effects uses, for example, different temporal lags, splines, and convolutional kernels; however, these are not discussed in the manuscript.<br /> • From the methods, it seems that in the concurrent version of fastFMM, both concurrent and non-concurrent regressors can be included, but this is not discussed in the manuscript.<br /> • The methodological advance is not clearly stated, apart from inputting into fastFMM a 3D matrix of regressors x trial x timepoint, instead of a 2D matrix of regressors x trial.<br /> • This manuscript is neither a clear demonstration of the need for concurrent variables, nor a 'tutorial' of how to use fastFMM with the added extension.

    5. Author response:

      Common responses:

      We thank the editors for considering our paper and the reviewers for their thoughtful and detailed feedback. Based on the comments, we will revise our manuscript to better describe how our approach differs from modeling strategies that are common in the field. We also aim to elaborate on the advantages of fastFMM and what scientific questions it is designed to answer. Finally, we will provide more background on our example analyses and the interpretation of the results.

      Within this response, “within-trial timepoints”, “time-varying predictors/behaviors”, and “signal magnitude” are used as specific examples of the general concepts of functional domain”, “functional co-variates”, and “functional outcome”, respectively. To make statements or examples more concrete, we may use the former neuroscience-specific terms when making general claims about functional models.

      - ncFLMM, cFLMM: non-concurrent or concurrent functional linear mixed models.

      - FUI: fast univariate inference. An approximation strategy to perform FLMM Cui et al. (2022).

      - fastFMM the R package that implements FUI.

      - CI confidence interval.

      Before specific line-by-line responses, we provide a brief comparison between cFLMM and fixed effects encoding models. All three reviewers suggested that fixed effects models could be an existing alternative to cFLMM (Reviewer 1 (1B), Reviewer 2 (2C), Reviewer 3 (3A)). Their shared comments highlight that our revision should articulate the advantages and applications of cFLMM relative to existing analysis strategies.

      Functional regression methods like cFLMM produce functional coefficient estimates that quantify how the magnitude of predictor-signal associations evolve across an ordered functional domain such as within-trial timepoints. Standard scalar outcome regression methods, like the GLMs specified in Engelhard et al. (2019), model these associations and their corresponding coefficients as fixed across the functional domain. While GLM encoding models may include time-varying predictors, these analysis strategies do not model the predictor–signal association as changing over the functional domain.

      Moreover, encoding models are less suited to hypothesis testing in clustered or longitudinal settings (e.g., repeated-measures datasets) and yield regression coefficient estimates that are only interpretable with respect to the units of the basis functions. In contrast, cFLMM provides time-varying coefficient estimates that are interpretable as statistical contrasts in terms of the original variables and produces hypothesis tests in clustered settings. cFLMM can be applied to datasets that define covariates in terms of the same flexible representations of covariates used in encoding models; this is a modeling choice rather than a methodological characteristic.

      The remainder of this provisional author response will respond to reviewers’ concerns line-by-line, approximately in the order they appear.

      Reviewer #1 (Public review):

      We thank Reviewer 1 for their comments, especially their efforts to provide first-hand experience with loading and applying fastFMM. We hope that recent improvements to fastFMM’s public release and vignettes address Reviewer 1’s concerns about ease-of-use.

      (1A) Overall, while they make a compelling case that this approach is less biased and more insightful, the implementation for many experimentalists remains challenging enough and may limit widespread adoption by the community.

      We believe the reviewer may have experimented with an old version of fastFMM, so their experience may not reflect recent rewrites and improvements. fastFMM v1.0.0+ is now stable, validated on CRAN, and contains new example data and step-by-step tutorials. We designed fastFMM’s model-fitting code to be similar to common GLM packages in R to reduce the learning curve for new users.

      (1B) …a clearer presentation of how common implementations in the field are performed (i.e. GLM) and how one could alternatively use the cFLMM approach would help.

      We will provide a clearer description of existing methods in the revised manuscript. Briefly, inference with fastFMM can accommodate large datasets that contain clustered data, repeated measures, or complex hierarchical effects, e.g., experiments with multiple animals and multiple trials per animal. When encoding models are fit to each cluster (e.g., animal, neuron) separately, we are not aware of a principled method to pool these cluster-specific models together to quantify uncertainty or yield an appropriate global hypothesis test.

      Reviewer #2 (Public review):

      Reviewer 2’s thoughtful feedback helped structure our points in the common response above, which we will refer to when applicable. In our response, we aim to clarify the problems that cFLMM solves and characterize the advantages in interpretability.

      (2A) The aim of incorporating variables that change within trial into this framework is interesting, and the technical implementation appears to be rigorous. However, I have some reservations as to whether the way in which variables that change within trial have been integrated into the analysis framework is likely to be widely useful, and hence how impactful the additional functionality of cFLMM relative to the previously published FLMM will be.

      We hope that the common response addresses these concerns. We were motivated to provide a concurrent extension of fastFMM based on our experience with statistical consulting in neuroscience research. Questions that benefit from a functional approach are common and often not adequately modeled with a non-concurrent approach, such as the variable trial length analysis we describe below.

      (2B) It is less clear that this approach makes sense for variables that change within trial…This partitioning of variance in the predictor into a between-trial component whose effect on the signal is modeled, and a within-trial component whose effect on the signal is not, is artificial in many experiment designs, and may yield hard to interpret results.

      We thank Reviewer 2 for highlighting a point that we did not adequately explain and that we will address further in the revision. The pointwise and joint CIs estimated by fastFMM account for uncertainty in the coefficient estimates due to variation in the predictors across within-trial timepoints. cFLMM targets a statistical quantity, or estimand, that is defined by trial timepoint specific effects, so the first step of our estimation strategy fits separate pointwise mixed models. However, models from every within-trial timepoint are then combined to calculate uncertainty and smooth the coefficient estimates. Thus, the widths of the pointwise and joint CIs depend on the estimated between-timepoint covariance and a smoothing penalty. Loewinger et al. (2025a) provides further details in Appendices 2 and 3, describing the covariance structure and detailing the power improvements of FUI compared to multiple-comparisons corrections.

      Other functional regression estimation strategies jointly fit the entire model with a single regression, e.g., functional generalized estimating equations Loewinger et al (2025b). However, these methods use basis expansions of the coefficients. In contrast, the encoding models mentioned in 2C below and Reviewer 3 (3A) apply basis-expansions of the covariates, and the resulting model does not capture how signal–covariate associations evolve across some functional domain. Although the first stage in the fastFMM approach fits pointwise linear models, this is only one of three steps in the estimation strategy. fastFMM yields coefficient estimates comparable to those that would be obtained from functional regression estimation strategies that jointly estimate the functional coefficients in a single regression. We mention this to distinguish between the target statistical quantity (functional coefficients) and the estimation strategy (pointwise vs. joint).

      (2C) …an alternative approach would be to run a single regression analysis across all timepoints, and capture the extended temporal responses to discrete behavioural events by using temporal basis functions convolved with the event timeseries. This provides a very flexible framework for capturing covariation of neural activity both with variables that change continuously such as position, and discrete behavioural events such as choices or outcomes, while also handling variable event timing from trial-to-trial.

      Our understanding is that the suggested approach aims to quantify the association between the outcome and within-trial patterns in covariates. This is a great question and we will incorporate a discussion of this into the revision. However, temporal basis functions convolved with the covariate time series cannot directly characterize these relationships. Encoding models can detect the contribution of predictors to neural signals while remaining agnostic to the precise relationship, but this flexibility can come at the cost of interpretability. The coefficients of the convolutions may not be translatable into a clear statistical contrast in terms of the original covariates.

      In our paper, we provide examples of cFLMM models with simple signal-covariate relationships. The coefficient estimates quantify the expected change in signal given a one unit change in the original predictors. Let 𝑌(𝑠) be the outcome and 𝑋(𝑠) be some covariate at within-trial timepoint 𝑠. For brevity, we will suppress subject/trial indices and random effects in the following notation. The coefficient at time point 𝑠 can be captured by the generic mean model

      𝔼[𝑌(𝑠) ∣ 𝑋(𝑠) = 1] − 𝔼[𝑌 (𝑥)|𝑋(𝑠) = 0].

      In contrast, the change in signal associated with patterns in within-trial covariates can be written as

      𝔼[𝑌 (𝑠<sub>1</sub>) ∣ 𝑋(𝑠<sub>2</sub>) = 1] − 𝔼[𝑌 (𝑠<sub>1</sub>) ∣ 𝑋(𝑠<sub>2</sub>) = 0]

      for all pairs of timepoints 𝑠<sub>1</sub>, 𝑠<sub>2</sub>. While simple lagged or offset outcome-predictor associations can be incorporated as covariates in cFLMM, the approach does not capture all within-trial timepoints 𝑠<sub>1</sub>, 𝑠<sub>2</sub>. Encoding models also do not target the above estimand. Instead, a full function-on-function regression could estimate the above. This topic can be incorporated into our revision and may be a future line of inquiry.

      (2D) In the Machen et al. data…From the resulting beta coefficient timeseries (Figure 3C) it is not straightforward to understand how neural activity changed as the subject approached and then received the reward. A simpler approach to quantify this, which I think would have yielded more interpretable coefficient timeseries would have been to align activity across trials on when the subject obtained the reward. More broadly, handling variable trial timing in analyses like FLMM which use trial aligned data, can be achieved either by separately aligning the data to different trial events of interest or by time warping the signal to align multiple important timepoints across trials.

      In this experiment, mice waited in a trigger zone, ran through a linear corridor, then received a food reward in the reward delivery zone of either water or strawberry milkshake Machen et al. (2026). Mice received different rewards between sessions but the same reward within all trials of a given session. This design complicated the analysis, as the reward type produced prominent differences in average latency (water: 3.3 seconds, milkshake: 2.0 seconds). The authors wanted to disentangle whether mean differences in the signal across reward types reflected differences in motivation to obtain the reward or differences in reaction to reward receipt.

      We agree that performing a reward-aligned analysis would be an intuitive approach to visualize the differences in average signal for mice that received milkshake compared to water. In fact, we provide a ncFLMM reward-aligned analysis in Figure S1 of Machen et al. (2025). We will add this analysis to the revision and thank the reviewer for the suggestion. We emphasize, however, that this method answers a different question. It does not identify how the signal change associated with receiving the milkshake evolves with respect to latency, especially if the relationship is non-linear. Time warping faces similar obstacles in this setting, especially since sufficiently flexible curve registration can induce similarity due purely to noise. Generally, time warping does not lend itself to hypothesis testing as it is unclear how to propagate uncertainty from the time warping model into final hypothesis tests.

      We believe cFLMM is an appropriate choice for the specific question, and we will revise the manuscript to better reflect its advantages. The functional coefficient estimates in Figures 3C-iii and 3C-iv provide insights that are not possible to derive from the proposed alternatives. For example, we can infer that for short latencies, we do not see a significant difference in signal magnitude for mice receiving water and mice receiving the milkshake. However, for latencies longer than around 2 seconds, receiving the milkshake is associated with an additional positive change in signal. We agree that we should make Figure 3C and the accompanying discussion more clear and thank Reviewer 2 for their feedback on interpretation.

      Reviewer 3 (Public review):

      (3A) …it is not clear what the conceptual or methodological advance of this work is. As it is written, the manuscript focuses on showing how concurrent regressors offer interpretation advantages over non-concurrent regressors. While the benefit of such time-varying regressors is supported by previous literature (e.g., Engelhard et al., 2020), it is not clear whether the examples provided in the current study clearly support the advantage of one over the other…

      We assume Reviewer 3 is referencing “Specialized coding of sensory, motor and cognitive variables in VTA dopamine neurons Engelhard et al. (2019). We hope that the Common response sufficiently contrasts the settings where each approach can be applied. Because these models have different goals and assumptions, they are appropriate for answering different questions.

      (3B) In this specific example, if the question is about speed and reward type, why variables such as latency to reward or a binary “reward zone vs corridor” (RZ) regressors are used instead of concurrent velocity (or peak velocity - in the case of the non-concurrent model)? Furthermore, if timing from trial start to reward collection is variable, why not align to reward collection, which would help in the interpretation of the signal and comparison between methods? Furthermore, while for the non-concurrent method, the regressors' coefficients are shown, for the concurrent one, what seems to be plotted are contrasts rather than the coefficients. The authors further acknowledge the interpretational difficulties of their analysis.

      Thank you for pointing out that we were not clear. This was mentioned by multiple reviewers and highlights the need to elaborate on our motivation in the revision. In this example, we wanted to investigate the change in signal-reward association as a function of within-trial timepoints, not the association between instantaneous velocity and the signal. “Slow” or “fast” means “mouse with below or above average latency”. We ask you to please refer to Reviewer 2 (2C) where we discuss why event alignment is an insufficient correction.

      The functional coefficient estimates in Figure 3C are interpreted as contrasts because the fixed effect coefficients capture the difference in expected signal between strawberry milkshake and water along the functional domain. An advantage of cFLMM is that it is easy to specify models in which the coefficients correspond to interpretable contrasts of the signal across conditions. The coefficient estimate shown in Figure 3B-ii also corresponds to a contrast because the estimates capture the difference in mean signal from strawberry milkshake and water. Equations (7) and (8) in the section “Materials and methods” and sub-section “Variable trial length analysis” provide additional details on the fixed effect coefficients. Based on this confusion, we will convert the two 1 x 4 sub-plots of 3B and 3C into two 2 x 2 sub-plots to avoid unintended direct comparisons.

      To contextualize how we “acknowledge the interpretational difficulties of [our] analysis”, we stated that a non-concurrent FLMM attempting to control for a time-based covariate is difficult to interpret. The concurrent FLMM provides a straightforward interpretation directly related to the question of interest, which we discuss above in Reviewer 2 (2D).

      (3C) Because the relation between behavioral variables and neuronal signal is not instantaneous, previous literature using fixed effects uses, for example, different temporal lags, splines, and convolutional kernels; however, these are not discussed in the manuscript.

      Thank you for this suggestion. All three reviewers raised this topic (see Reviewer 1 (1B), Reviewer 2 (2C), and the Common responses), and we will incorporate our response in the revision.

      (3D) From the methods, it seems that in the concurrent version of fastFMM, both concurrent and non-concurrent regressors can be included, but this is not discussed in the manuscript.

      This is an important point that we mentioned implicitly. In our cFLMM specification of the Jeong et al. (2022) model, “we incorporated trial-specific covariates for trial number and session, modeling these as increasing numerical values rather than identical categorical variables”, which are also plotted in Appendix 3. In Box 1, “if the functional covariate of interest is a scalar constant across the domain, the models fit by the concurrent and non-concurrent procedure are identical”. We will explicitly point out that cFLMM can perform inference on combinations of functional and constant covariates.

      (3E) The methodological advance is not clearly stated, apart from inputting into fastFMM a 3D matrix of regressors x trial x timepoint, instead of a 2D matrix of regressors x trial.

      Prior to our work described in this Research Advance, it was not obvious that the existing approximation approach in fastFMM could be generalized to cFLMM. During the writing of the article, a fastFMM user reached out for help with producing pseudo-concurrent FLMMs by duplicating rows in a nonconcurrent model, which both underscores the unmet need for cFLMMs and the difficulty in fitting them with available tools.

      The “under-the-hood” differences are described in Appendix 4. Concurrent FLMM with fast univariate inference was theoretically possible as early as Cui et al. (2022). The univariate step was straightforward, but guaranteeing “fast” and “inference” was not. We needed to verify, for example, that the method-of-moments estimation of the random effects covariance matrix generalized to cFLMM, which is not a trivial step. Characterizing whether the method achieved asymptotic coverage required extensive simulation studies (Figure 4, Appendix 2). Future work may focus on fully characterizing the asymptotic convergence in high noise or high complexity regimes.

      (3F) This manuscript is neither a clear demonstration of the need for concurrent variables, nor a 'tutorial' of how to use fastFMM with the added extension.

      We hope that the Common responses clarifies how cFLMM compares to existing approaches and fills a gap in the data analysis landscape for neuroscience. The fastFMM R package vignettes contain example analyses, and we intend for these files to be work in tandem with the manuscript. To provide more guidance for interested analysts, we can explicitly reference these tutorials within the revision.

      Planned revisions

      The following summary is not exhaustive.

      Writing additions:

      Per 1B, 2C and 3A, the Common responses will be incorporated in the revision.

      Per 2B, we will discuss function-on-function regression and explore how to estimate statistical contrasts for complex within-trial relationships. Relatedly, we will clarify that the CIs in fastFMM are constructed using an estimate of the within-trial covariance of the predictors, and clarify the definition of pointwise and joint CIs.

      Per 3D, we will explicitly state that concurrent FLMMs can include covariates that are constant over within-trial timepoints.

      Though we cannot prescribe a universally correct model selection procedure, we will mention that AIC, BIC, and other summary statistics can inform the specification of the random effects.

      Analysis modifications:

      Parts of Appendix 3 may be included in Figure 2 to directly address the question investigated by Jeong et al. (2022) and Loewinger et al (2024).

      When discussing Machen et al. (2025) data, the supplementary analysis with reward-aligned ncFLMM models might be added to clarify the ncFLMM/cFLMM difference.

      Per \ref{rvw2:encoding}, the additional analysis aimed at disentangling latency and reward in Machen et al.’s variable trial length data may be incorporated as an additional sub-figure in Figure 3.

      Aesthetic changes:

      Figure 3 will be reorganized to avoid unintended direct comparisons between the coefficients of the non-concurrent and concurrent model.

      Citations for Machen et al. (2026) will be updated to reflect publication of the preprint.

      The version number for fastFMM will be updated.

      References

      Cui E, Leroux A, Smirnova E, Crainiceanu CM. Fast Univariate Inference for Longitudinal Functional Models. Journal of Computational and Graphical Statistics. 2022; 31(1):219–230. https://doi.org/10.1080/10618600.2021.1950006, doi: 10.1080/10618600.2021.1950006, pMID: 35712524.

      Engelhard B, Finkelstein J, Cox J, Fleming W, Jang HJ, Ornelas S, Koay SA, Thiberge SY, Daw ND, Tank DW, Witten IB. Specialized coding of sensory, motor and cognitive variables in VTA dopamine neurons. Nature. 2019 Jun; 570(7762):509–513. https://www.nature.com/articles/s41586-019-1261-9, doi: 10.1038/s41586-019-1261-9.

      Jeong H, Taylor A, Floeder JR, Lohmann M, Mihalas S, Wu B, Zhou M, Burke DA, Namboodiri VMK. Mesolimbic dopamine release conveys causal associations. Science. 2022; 378(6626):eabq6740. https://www.science.org/doi/abs/10.1126/science.abq6740, doi: 10.1126/science.abq6740.

      Loewinger G, Cui E, Lovinger D, Pereira F. A statistical framework for analysis of trial-level temporal dynamics in fiber photometry experiments. eLife. 2025 Mar; 13:RP95802. doi: 10.7554/eLife.95802.

      Loewinger G, Levis AW, Cui E, Pereira F. Fast Penalized Generalized Estimating Equations for Large Longitudinal Functional Datasets. ArXiv. 2025 Jun; p. arXiv:2506.20437v1. https://pmc.ncbi.nlm.nih.gov/articles/PMC12306803/.

      Machen B, Miller SN, Xin A, Lampert C, Assaf L, Tucker J, Herrell S, Pereira F, Loewinger G, Beas S. The encoding of interoceptive-based predictions by the paraventricular nucleus of the thalamus D2R+ neurons. iScience. 2026 Jan; 29(1):114390. doi: 10.1016/j.isci.2025.114390.

    1. eLife Assessment

      This manuscript explores the dynamic behaviors of Pol II and Pol III puncta that encompass the SL1 and 5S genes, following up on the authors' prior studies on ATTF-6. The authors show that ATTF-6 is required for RNA Pol II but not RNA Pol III foci, demonstrating that within the gene cluster, the regulation of RNA Pol II and RNA Pol III remain distinct from each other. The study is useful for analyzing understudied gene families, but it is incomplete and needs additional edits and experiments.

    2. Reviewer #1 (Public review):

      This study examines how two types of RNA polymerases organize themselves within the nucleus of C. elegans cells, building directly on the same group's prior publication and largely functioning as a companion to that earlier work. While the observation that the two polymerases occupy distinct but neighboring locations at the same genomic region adds nuance to our understanding of gene cluster regulation, the manuscript would benefit from more clearly delineating which findings are new versus continuations of previously published work. Protein localization, gene expression effects, and genomic mapping data appear to overlap substantially with the earlier paper.

      The condensate claims would also benefit from additional experimental support. Demonstrating fusion events and concentration-dependent assembly are now standard expectations in the field. Additionally, one measurement reported appears inconsistent with a condensate model, warranting further discussion.

      Some findings would benefit from more interpretive context. Why does polymerase clustering fluctuate with the cell cycle? What are the functional implications of ATTF-6 being required for one polymerase's foci but not the others?

      The elevated-temperature experiments are intriguing but difficult to interpret, as the temperature used is well-established as a broad stress trigger in this organism. Acknowledging this and considering additional controls would help clarify whether the observed effects are specific to foci behavior.

      Finally, the manuscript would be strengthened by adding quantification to some figures and revising the model diagram to better reflect what the current data support.

    3. Reviewer #2 (Public review):

      Summary:

      The researchers analyzed GFP-tagged RNA Pol II and RNA Pol III catalytic subunits RPB-1 and RPC-1, and showed that they form foci in early embryo nuclei that overlap with the 5S rDNA loci and foci by ATTF-6-RFP. They showed foci are round, dissolve upon hexanediol incubation, and are detected during S phase, removed during, and re-established after mitosis. The researchers performed FRAP and showed fast exchange of polymerases, unlike ATTF-6. They show that, unlike RNA Pol III, RNA Pol II foci are dependent on ATTF-6 and temperature sensitive. The researchers propose that the two polymerases form distinct foci with different biochemical dependencies. This study shows that, although closely located within a gene cluster, the regulation of RNA Pol II and RNA Pol III is independent.

      Strengths:

      The researchers provide high-quality images that support the main results. The researchers' use of auxin-inducible and RNAi depletion work is validated in the same embryos by fluorescent analysis of the target protein.

      Weaknesses:

      Although the researchers propose the hypothesis that the RNA Pol II and RNA Pol III form distinct condensates, alternative hypotheses are not presented, and the criteria by which the other possibilities are ruled out are not discussed.

    4. Reviewer #3 (Public review):

      Wang et al demonstrate that RNA polymerase II and RNA polymerase III form distinct nuclear foci at the 5S rDNA-SL1 gene cluster in C. elegans. By ChIP, Pol II is highly enriched at the SL1 gene, whereas Pol III is enriched at the 5S rRNA gene. Both polymerase foci are spherical, show rapid exchange in FRAP experiments, and assemble in a cell-cycle-dependent manner, predominantly during S phase. The transcription factors ATTF-6 and SNPC-4 are required for the formation of Pol II foci but are dispensable for Pol III foci. Pol II foci, but not Pol III foci, are temperature-sensitive and dissolve upon heat stress; dissolution correlates with a strong reduction of SL1 transcription, whereas 5S rRNA levels remain largely unaffected.

      Overall, this is a clean, well-organized, and well-controlled study, and I only have two comments.

      (1) Roundness measurements, FRAP, and sensitivity to 1,6-hexanediol are indicative but not sufficient to show that these foci are condensates. They could, for example, also be scaffolded /chromatin-anchored assemblies (see https://pubmed.ncbi.nlm.nih.gov/36526633/). Please either provide better evidence or rephrase/tone down the condensate statements.

      (2) Image quantification is only provided for Figure 5, but should also be reported for Figures 6 and 7. In addition to the foci number, also, e.g., intensity over background (similar to partition coefficient) should be quantified.

    5. Author response:

      Reviewer #1:

      We appreciate the reviewer’s suggestions. In the revision, we will clarify which results are new and better position this work relative to our earlier publication. We will also expand the discussion of the functional implications of polymerase clustering and its cell-cycle dynamics.

      Regarding the condensate interpretation, we agree that the current evidence is suggestive but not definitive. In the revised manuscript, we will clarify how our measurements relate to commonly used criteria for condensate assemblies and revise the text to avoid overstating this interpretation. We will also add quantification to additional figures and revise the model diagram to more accurately reflect the conclusions supported by the data.

      Reviewer #2:

      We thank the reviewer for the positive assessment of the imaging quality. We agree that the manuscript would benefit from a broader discussion of possible models for the observed polymerase foci. In the revision, we will expand the discussion to include alternative interpretations, such scaffolded assemblies as suggested by the reviewer 3, and further clarify the properties of the RNA Pol II and RNA Pol III foci.

      Reviewer #3:

      We thank the reviewer for the positive evaluation of the study and the helpful suggestions. We agree that the current evidence is indicative but not sufficient to definitively demonstrate condensate formation. In the revision, we will revise the language and discuss alternative interpretations, including scaffolded assemblies. We will also provide additional quantifications for the relevant figures.

      Overall, we appreciate the reviewers’ suggestions and believe that the planned revisions will improve the clarity and impact of the manuscript.

    1. eLife Assessment

      This fundamental work uncovers an unexpected lysosomal function for NINJ2 and links it to ferroptosis and cancer biology. The evidence supporting the conclusions appears to be convincing. Additional mechanistic clarification, particularly around the NINJ2-LAMP1 interaction and ferroptosis specificity, will further strengthen the manuscript. This work will be of general interest to the community of ferroptosis and cancer biology.

    2. Reviewer #1 (Public review):

      Summary:

      This study reports a novel and potentially impactful role for NINJ2 in maintaining lysosomal integrity and regulating cellular susceptibility to ferroptosis. The authors demonstrate that NINJ2 localizes to lysosomes and interacts with LAMP1, a key lysosomal membrane glycoprotein involved in sensing lysosomal stress. Loss of NINJ2 increases lysosomal membrane permeabilization (LMP), resulting in selective leakage of lysosomal contents, including labile iron, into the cytosol. The authors further show that NINJ2 deficiency reduces the expression of ferritin storage proteins, thereby sensitizing cells to ferroptosis induced by RSL3 and erastin. Collectively, the work proposes a mechanistic link between NINJ2-mediated control of LMP, iron homeostasis, and ferroptotic vulnerability, with potential relevance to cancer biology.

      Strengths:

      This study identifies a novel role for NINJ2 in regulating lysosomal integrity and ferroptosis and establishes a mechanistic link between lysosomal membrane permeabilization, iron homeostasis, and ferroptotic sensitivity, with potential translational relevance in cancer.

      Weaknesses:

      The results overall support the authors' conclusions and provide a plausible mechanistic framework; however, additional quantification of Western blot data and further discussion of mechanistic questions would strengthen the study.

      The findings are likely to have a broad impact by linking lysosomal integrity to ferroptosis and iron homeostasis, both of which are relevant to cancer biology and therapeutic targeting.

    3. Reviewer #2 (Public review):

      This manuscript, "Nerve Injury-Induced Protein 2 preserves lysosomal membrane integrity to suppress ferroptosis", identifies a previously unrecognized function of NINJ2 as a regulator of lysosomal membrane integrity and iron homeostasis, thereby suppressing ferroptosis. The authors demonstrate that NINJ2 localizes to lysosomes, interacts with LAMP1, limits lysosomal membrane permeabilization (LMP), stabilizes ferritin, and protects cells from ferroptotic cell death. They further extend these mechanistic findings to human cancer datasets, showing co-overexpression and positive correlation of NINJ2 with ferritin genes in iron-addicted cancers.

      Overall, the study is conceptually interesting, technically solid, and integrates cell biology, iron metabolism, and ferroptosis in a coherent framework. The work expands the functional repertoire of the Ninjurin family beyond plasma membrane rupture and inflammation, which will be of interest to researchers in cell death, lysosome biology, and cancer metabolism.

      Strengths:

      (1) The identification of NINJ2 as a lysosome-associated protein that suppresses ferroptosis represents a meaningful advance beyond its previously described roles in inflammation, pyroptosis, and tumorigenesis.

      (2) The work distinguishes NINJ2 functionally from NINJ1, reinforcing the idea that structurally related Ninjurins have divergent membrane-related roles.

      (3) The study presents a logically connected pathway:<br /> NINJ2 loss → LMP → labile iron increase → ferritin degradation → ferroptosis sensitization, which is well supported by the data.

      (4) The link between LAMP1, ferritin turnover, and ferroptosis is particularly compelling and timely given recent interest in lysosomal contributions to ferroptotic signaling.

      (5) The authors use confocal microscopy, proximity ligation assays, biochemical IPs, iron measurements, protein half-life analyses, ferroptosis assays, and TCGA-based analyses, providing convergent evidence for their model.

      (6) Use of two distinct cell lines (MCF7 and Molt4) strengthens generalizability.

      (7) The integration of cancer expression datasets linking NINJ2 with ferritin expression in hepatocellular and breast carcinomas enhances translational relevance.

      (8) Assigning NINJ2 a lysosomal protective function, distinct from NINJ1-mediated plasma membrane rupture, is novel.

      (9) Linking NINJ2 to ferroptosis regulation via lysosomal iron handling, rather than canonical GPX4 or system Xc⁻ pathways, is also novel, along with proposing a NINJ2-LAMP1-ferritin axis as a buffering mechanism against iron-driven lipid peroxidation.

      (10) These insights are not incremental; they reframe how NINJ2 may function at the intersection of membrane biology, iron metabolism, and regulated cell death.

      Areas for improvement:

      While the study is strong, several issues should be addressed for mechanistic depth and general relevance.

      (1) Although NINJ2 is shown to interact with LAMP1 and LAMP1 knockdown rescues ferritin levels, it remains unclear whether the NINJ2-LAMP1 interaction is required for lysosomal protection. The authors could:<br /> a) Map the NINJ2 domain required for LAMP1 interaction and test whether an interaction-deficient mutant fails to protect against LMP and ferroptosis.<br /> b) Rescue NINJ2 KO cells with wild-type versus mutant NINJ2 to establish causality.

      (2) The conclusion that NINJ2 suppresses ferroptosis relies primarily on RSL3 and Erastin sensitivity. A direct assessment of ferroptosis would hence the study, such as:<br /> a) Include ferroptosis rescue experiments using ferrostatin 1 or liproxstatin 1.<br /> b) Assess lipid peroxidation directly (e.g., C11 BODIPY staining) to strengthen the ferroptosis claim.

      (3) The manuscript discusses lysosomal ferritin degradation but does not directly examine NCOA4, a central mediator of ferritinophagy. It would be good to:<br /> a) Test whether NCOA4 knockdown rescues ferritin loss and ferroptosis sensitivity in NINJ2 KO cells.<br /> b) This would clarify whether NINJ2 acts upstream of canonical ferritinophagy pathways or via an alternative mechanism.

      (4) The study is entirely cell-based, despite references to inflammatory and tumor phenotypes in Ninj2-deficient mice. While not strictly required, even limited in vivo validation (e.g., ferroptosis markers or iron accumulation in existing Ninj2 KO tissues) would substantially strengthen the manuscript.

      (5) Finally, most imaging data (e.g., Galectin 3/LAMP1 colocalization, PLA signals) and immunoblot data are presented qualitatively. The authors should provide the qualifications of Western blots and other measurements.

    4. Author response:

      Reviewer #1:

      We appreciate the reviewer’s insightful suggestions. In the revised manuscript, we will provide quantitative analysis of Western blot data throughout the study to improve data robustness and reproducibility. In addition, we will expand the “Discussion” session to address the following points raised by the reviewer #1: (1) Potential mechanisms underlying the regulation of LAMP1 transcript levels by NINJ2; (2) Whether Ninjurin1 may play a similar role in regulating lysosomal membrane permeabilization (LMP); (3) The potential clinical implications of our findings, particularly in relation to cancer progression and therapeutic targeting.

      Reviewer #2:

      We thank the reviewer for the insightful and constructive suggestions, which would further deepen the mechanistic understanding of the NINJ2-LAMP1 pathway and its role in ferroptosis regulation. To address the reviewer’s concerns, we will clarify the interpretation of our findings, add quantitative analyses where appropriate, and expand the Discussion to acknowledge these important mechanistic questions and future research directions. Specifically, we will revise the Statistical Analysis section to clearly describe the statistical methods used, including whether corrections for multiple comparisons were applied where appropriate. We will further discuss the potential interaction domain(s) between NINJ2 and LAMP1. We will also discuss the potential role of NCOA4, a central mediator of ferritinophagy, in the NINJ2-FTH1-LAMP1 pathway. Finally, we will include a schematic model summarizing the proposed NINJ2-LAMP1-iron-ferroptosis axis to better illustrate the working model of our study.

    1. eLife Assessment

      This important study addresses the long-debated hypothesis that humans preferentially choose partners with dissimilar immune genes, using data from a small-scale society that allows comparison between arranged and self-chosen partnerships. Across multiple analyses controlling for genome-wide relatedness and examining functional immune diversity, the authors find no evidence of HLA/MHC-based (dis)assortative mating, suggesting that immune gene variation has limited influence on mate choice in this relatively homogeneous population and that the observed patterns instead reflect selection acting directly on immune loci. While the strength of the evidence is compelling for this population, several conclusions rely on indirect reconstruction methods and imputed data for a very complex region of the genome, which may limit how firmly some claims can be supported.

    2. Reviewer #1 (Public review):

      Summary:

      This study aims to test whether human mate choice is influenced by HLA similarity while accounting for genome-wide relatedness, using the Himba as an evolutionarily relevant small-scale society population, unique among most HLA-mate choice studies. By comparing self-chosen ("love") and arranged marriages and using NGS-based 8-locus HLA class I and II sequences and genome-wide SNP data, the authors ask whether partners who freely choose each other are more HLA-dissimilar than those paired through social arrangements or random pairs. They further extend their work by examining functional differences in peptide-binding divergence among pairs and predicted pathogen recognition in potential offspring.

      Strengths:

      This study has many strengths. The most obvious is their ability to test for HLA-based mate choice in the Himba, a non-European, non-admixed, small-scale society population, the type of population that has been missing, in my opinion, from the majority of HLA mate choice studies. While Hedrick and Black (1997) used a similarly evolutionarily relevant remote tribe of native South Americans, they only considered 2 class I loci (HLA-A and HLA-B) at the first typing field (serological allele group) and did not have data for genome-wide relatedness. The Himba are also unique among previously studied populations because they have both socially arranged and self-chosen partnerships, so the authors could test if freely-chosen partners had lower MHC-similarity than assigned or randomly chosen partners.

      Another key strength of the study was the relatively large sample size (HLA allele calls from 366 individuals, 102 unrelated) and 219 individuals with HLA data, whole genome SNP data, and involved in a partnership.

      The study was also unique among HLA-mate choice studies for comparing peptide binding region protein divergence (calculated as the Grantham distance between amino acid sequences) among partner types and randomly generated pairs. This was also the first time I have seen a study use peptide binding prediction analysis of relevant human pathogens for potential offspring among partners to test if there would be a pathogen-relevant fitness benefit of partner selection.

      Weaknesses:

      My main concerns relate to the reliance on imputed HLA haplotypes and on IBD-based metrics in a region of the genome where both approaches are known to be problematic.

      First, several key results depend on HLA haplotypes inferred through imputation rather than directly observed sequence data. The authors trained HIBAG imputation models on Himba SNP data across the full 5 Mb HLA region using paired HLA allele calls from target capture sequencing (L251-253). However, the underlying SNP data were generated by mapping reads to a 1000 Genomes Yoruba reference, meaning that both SNP discovery and subsequent imputation depend on the haplotypes represented in that reference panel. As a result, the imputation framework is likely biased toward common haplotypes shared between the Himba and Yoruba populations, while rare or Himba-specific HLA alleles are less likely to be imputed accurately or at all. This limitation has been noted previously for HLA imputation, particularly for novel or low-frequency variants and for populations that are poorly represented in reference panels. While the authors compare (first-field) imputed alleles to sequenced alleles to assess imputation accuracy, this validation step itself may be biased toward the same common haplotypes that are easiest to impute. This becomes especially problematic if IBD is inferred using imputed haplotypes, because haplotype sharing would then primarily reflect common, reference-supported haplotypes, while true population-specific variation would be effectively invisible. In this scenario, downstream estimates of IBD sharing may be inflated for common haplotypes and deflated for rare ones, potentially biasing conclusions about haplotype sharing, selection, and mate choice at the HLA region.

      Second, the interpretation of excess identity-by-descent (IBD) sharing in the HLA region is difficult given the well-documented genomic properties of this locus. The classical HLA region is highly gene-dense, structurally complex, and characterized by extreme heterogeneity in recombination rates, with pronounced hot- and cold-spots (Miretti et al. 2005; de Bakker et al. 2006, reviewed in Radwan et al. 2020). Elevated IBD in such regions can arise from low recombination, background selection, or demographic processes such as bottlenecks, all of which can mimic signals of recent positive selection. While the authors suggest fluctuating or directional selection, extensive haplotype sharing is also consistent with long-term balancing selection at the MHC (Albrechtsen et al. 2010) or recent demographic history in this population.

      Beyond these main issues, there are several additional concerns that affect interpretation. Sample sizes and partnership counts are sometimes unclear; some figures would benefit from clearer scaling (Figure 1) and annotation (Figures S6 and S7), and key methodological choices (e.g., treatment of DRB copy number variation, no recombination correction in IBD calling) require further explanation. Finally, some conclusions, particularly those invoking optimality or specific selective mechanisms, are not directly tested by the analyses presented and would benefit from more cautious framing.

    3. Reviewer #2 (Public review):

      Summary:

      Evidence for the influence of MHC on mate choice in humans is challenging, as social structures and norms often confound the power of studying populations. This study uses an unusual, diverse, but relatively isolated population that allows a direct comparison of arranged and chosen partners to determine if MHC diversity is increased when choice drives mate choice. Overall, the authors use a range of genetic analyses to determine individual relationships alongside different measures of MHC diversity and potential selection pressures. The overall finding that there is no heterozygous dissimilarity difference between arranged and chosen partners. There is evidence of positive selection that may be a stronger driver, or at least it may mask other selection forces.

      Strengths:

      A rare opportunity to study human mate choice and genetic diversity. An excellent range of data and analysis that is well applied, and all results point to the same conclusion.

      Overall, this is a very well-written and concise paper when considering the significant amount of data and excellent analysis that has been undertaken.

      Weaknesses:

      (1) For the type of samples and data available, none are obvious.

      (2) Although this paper is clearly focused on humans, I was expecting more discussion around the studies that have been undertaken in animals. It is likely that between populations and species, there are different pressures that have driven the MHC evolution, but also mate choice.

      (3) The peptide presentation based on pathogen genomes is interesting but usually not significant. I wondered if another measure of MHC haplotype diversity to complement this would be the overall repertoire of peptides that could be presented, pathogen-based or otherwise. There is usually significant overlap in the peptides that can be presented, for example, between HLA-A and HLA-B, and this may reveal more significant differences between the alleles and haplotype frequencies.

    4. Reviewer #3 (Public review):

      The study investigates MHC-related mate choice in humans using a sample of couples from a small-scale sub-Saharan society. This is an important endeavour, as the vast majority of previous studies have been based on samples from complex, highly structured societies that are unlikely to reflect most of human evolutionary history. Moreover, the study controls for genome-wide diversity, allowing for a test of the specificity of the MHC region, as theoretically predicted. Finally, the authors examine potential fitness benefits by analysing predicted pathogen-binding affinities. Across all analyses, no deviations from random pairing are detected, suggesting a limited role for MHC-related mate choice in a relatively homogeneous society. Overall, I find the study to be carefully executed, and the paper clearly written. Nevertheless, I believe the paper would benefit if the following points were considered:

      (1) The authors claim (p. 2, l. 85) that their study is the first to employ a non-European small-scale society. I believe this claim is incorrect, as Hendrick and Black (1997) investigated MHC similarity among couples from South American indigenous populations.

      (2) Regarding the argument that in complex societies, mating with a random individual would already result in sufficient MHC dissimilarity (p. 2, 78), see the paper from Croy et al. 2020, which used the largest sample to date in this research area.

      (3) Dataset. As some relationships are parallel, I assume that certain individuals entered the dataset multiple times. This should be explicitly reported in the Methods. If I understand the analyses correctly, this non-independence was addressed by including individual identity as a random effect in the model - the authors should confirm whether this is the case. I am also wondering to what extent so-called "discovered partnerships" may affect the results. Shared offspring may be the outcome of short or transient affairs and could have a different social status compared with other informal relationships. Would the observed patterns change if these partnerships were excluded from the analyses?

      (4) How many pairs were due to relatedness closer than 3rd degree? In addition, why was 4th degree relatedness used as a threshold in some of the other analyses?

      (5) I was surprised by the exclusion of HIV, given that Namibia has a very high prevalence of HIV in the general population (e.g., Low et al. 2021).

      (6) It appears that age criteria were applied when generating random pairs (p. 8, l. 350). Could the authors please specify what they consider a realistic age gap, and on what basis this threshold was chosen? As these are virtual couples used solely to estimate random variation within the population, it is not entirely clear why age constraints are necessary. Would the observed patterns change if no age criteria were applied?

      (7) I think it would be helpful for readers if the Results section explicitly stated that real couples did not differ from randomly generated pairs. At present, only the comparison between chosen and arranged pairs is reported.

      (8) I appreciate the separate analyses of pathogen-binding properties for MHC class I and class II, given their functional distinctiveness. For the same reason, I would welcome a parallel analysis of MHC sharing conducted separately for class I and class II loci.

      (9) I think the Discussion would benefit from a more detailed comparison with previous studies. In addition, the manuscript does not explicitly address limitations of the current study, including the relatively limited sample size given the extensive polymorphism in the MHC region.

      References:

      Hedrick, P. W., & Black, F. L. (1997). HLA and mate selection: no evidence in South Amerindians. The American Journal of Human Genetics, 61(3), 505-511.

      Croy, I., Ritschel, G., Kreßner-Kiel, D., Schäfer, L., Hummel, T., Havlíček, J., ... & Schmidt, A. H. (2020). Marriage does not relate to major histocompatibility complex: A genetic analysis based on 3691 couples. Proceedings of the Royal Society B, 287(1936), 20201800.

      Low, A., Sachathep, K., Rutherford, G., Nitschke, A. M., Wolkon, A., Banda, K., ... & Mutenda, N. (2021). Migration in Namibia and its association with HIV acquisition and treatment outcomes. PLoS One, 16(9), e0256865.

    5. Author response:

      Reviewer 1 (Public review):

      Summary:

      This study aims to test whether human mate choice is influenced by HLA similarity while accounting for genome-wide relatedness, using the Himba as an evolutionarily relevant small-scale society population, unique among most HLA-mate choice studies. By comparing self-chosen ("love") and arranged marriages and using NGS-based 8-locus HLA class I and II sequences and genome-wide SNP data, the authors ask whether partners who freely choose each other are more HLA-dissimilar than those paired through social arrangements or random pairs. They further extend their work by examining functional differences in peptide-binding divergence among pairs and predicted pathogen recognition in potential offspring.

      Strengths:

      This study has many strengths. The most obvious is their ability to test for HLA-based mate choice in the Himba, a non-European, non-admixed, small-scale society population, the type of population that has been missing, in my opinion, from the majority of HLA mate choice studies. While Hedrick and Black (1997) used a similarly evolutionarily relevant remote tribe of native South Americans, they only considered 2 class I loci (HLA-A and HLA-B) at the first typing field (serological allele group) and did not have data for genome-wide relatedness. The Himba are also unique among previously studied populations because they have both socially arranged and self-chosen partnerships, so the authors could test if freely-chosen partners had lower MHC-similarity than assigned or randomly chosen partners.

      Another key strength of the study was the relatively large sample size (HLA allele calls from 366 individuals, 102 unrelated) and 219 individuals with HLA data, whole genome SNP data, and involved in a partnership.

      The study was also unique among HLA-mate choice studies for comparing peptide binding region protein divergence (calculated as the Grantham distance between amino acid sequences) among partner types and randomly generated pairs. This was also the first time I have seen a study use peptide binding prediction analysis of relevant human pathogens for potential offspring among partners to test if there would be a pathogen-relevant fitness benefit of partner selection.

      Weaknesses:

      My main concerns relate to the reliance on imputed HLA haplotypes and on IBD-based metrics in a region of the genome where both approaches are known to be problematic.

      First, several key results depend on HLA haplotypes inferred through imputation rather than directly observed sequence data. The authors trained HIBAG imputation models on Himba SNP data across the full 5 Mb HLA region using paired HLA allele calls from target capture sequencing (L251-253). However, the underlying SNP data were generated by mapping reads to a 1000 Genomes Yoruba reference, meaning that both SNP discovery and subsequent imputation depend on the haplotypes represented in that reference panel. As a result, the imputation framework is likely biased toward common haplotypes shared between the Himba and Yoruba populations, while rare or Himba-specific HLA alleles are less likely to be imputed accurately or at all. This limitation has been noted previously for HLA imputation, particularly for novel or low-frequency variants and for populations that are poorly represented in reference panels. While the authors compare (first-field) imputed alleles to sequenced alleles to assess imputation accuracy, this validation step itself may be biased toward the same common haplotypes that are easiest to impute. This becomes especially problematic if IBD is inferred using imputed haplotypes, because haplotype sharing would then primarily reflect common, reference-supported haplotypes, while true population-specific variation would be effectively invisible. In this scenario, downstream estimates of IBD sharing may be inflated for common haplotypes and deflated for rare ones, potentially biasing conclusions about haplotype sharing, selection, and mate choice at the HLA region.

      We appreciate the reviewer's concern, but would like to clarify two important misunderstandings in this assessment.

      First, the reviewer suggests that our SNP data were generated by mapping reads to a 1000 Genomes Yoruba reference, and that IBD inference may therefore be biased toward haplotypes common between the Himba and Yoruba. This is not the case. Our SNP genotype data were generated from the H3Africa and MEGAex genotyping arrays, which incorporated diverse reference variation to minimize ascertainment bias in non-European ancestries. No read mapping to a Yoruba reference genome was involved in SNP discovery or genotyping. The Yoruba 1000 Genomes data were used solely to provide an ancestry-matched recombination map for phasing and IBD calling–this would not bias IBD inference toward common Yoruba haplotypes. The reviewer's concern about imputation-driven inflation of IBD sharing for common haplotypes should not be relevant in our case.

      Second, regarding HLA haplotype resolution: we trained a bespoke HIBAG model directly on the Himba SNP array genotype data paired with ground-truth HLA allele calls from our own targeted HLA capture sequencing. This Himba-specific model was then used to impute HLA alleles from pseudo-homozygous genotypes derived by extracting phased SNP-based haplotypes across the HLA region for the same individuals. In this way we resolved the phase of the HLA allele calls.. To our knowledge, this paired-data approach to individual-level HLA haplotype resolution is novel; existing HLA haplotype resolution tools generally provide only population-level haplotype frequency estimates rather than individual-level phase assignments. We are confident in the reliability of the haplotypes we report. Resolved haplotypes were required to match the known targeted-sequencing HLA allele calls at a minimum of the first field for at least one allele, and both haplotypes could not be assigned to the same allele unless the individual's HLA allele calls were homozygous. Of 722 total haplotypes, 698 were successfully resolved under these criteria. We report results only on these confidently resolved haplotypes.

      Second, the interpretation of excess identity-by-descent (IBD) sharing in the HLA region is difficult given the well-documented genomic properties of this locus. The classical HLA region is highly gene-dense, structurally complex, and characterized by extreme heterogeneity in recombination rates, with pronounced hot- and cold-spots (Miretti et al. 2005; de Bakker et al. 2006, reviewed in Radwan et al. 2020). Elevated IBD in such regions can arise from low recombination, background selection, or demographic processes such as bottlenecks, all of which can mimic signals of recent positive selection. While the authors suggest fluctuating or directional selection, extensive haplotype sharing is also consistent with long-term balancing selection at the MHC (Albrechtsen et al. 2010) or recent demographic history in this population.

      We thank the reviewer for highlighting the difficulty in modeling selection at the HLA - a problem that deserves considerable attention. We acknowledge that demographic processes such as the documented Himba population bottleneck can result in elevated IBD sharing (Swinford et al. 2023, PNAS). However, our comparison of HLA IBD sharing rates against a genome-wide baseline is designed to address this: demographic processes affect all regions of the genome, so if the HLA region maintains elevated IBD sharing significantly above the genome-wide threshold, this provides meaningful evidence for a locus-specific effect beyond demographic history alone.

      We agree with the reviewer that the recombination landscape of the HLA region is complex, but this complexity itself is consistent with the region being a frequent target of selection. Previous HLA analyses have found that at the allele level, frequencies are consistent with balancing selection, while multi-locus haplotype frequencies are consistent with purifying selection and positive frequency-dependent selection (Alter et al., 2017), patterns that contribute to the complex recombination rate heterogeneity observed in the region. Recombination rate can be both a cause of extended haplotypes but also the consequence of selection against combinations of alleles.

      As Alter et al. note, the high levels of linkage disequilibrium observed among HLA alleles serve to limit the amount of diversity within HLA haplotypes, but balancing selection at the allelic level maintains multiple HLA haplotypes at high frequency across populations over long periods of time — so-called "conserved extended haplotypes" as we observe (Supplementary Figures 1 and 9). Regarding the specific selective mechanism, our results are not equally consistent with all forms of balancing selection. Albrechtsen et al. (2010) explicitly modeled overdominant balancing selection and demonstrated that equilibrium overdominance does not produce elevated IBD sharing as we observe — our results are therefore inconsistent with this mechanism. Instead, Albrechtsen et al. conclude that allele frequency change is required to generate elevated IBD, consistent with bouts of directional selection such as negative frequency-dependent or fluctuating positive selection. We will make explicit that while our findings do not support overdominance, they are consistent with these temporally dynamic forms of selection driving periodic allele frequency change at the HLA locus. We will also incorporate local recombination rate into Figure 4 to provide a comparison of local recombination rate across chromosome 6 with the observed areas of elevated IBD sharing.

      Alter, I., Gragert, L., Fingerson, S., Maiers, M., & Louzoun, Y. (2017). HLA class I haplotype diversity is consistent with selection for frequent existing haplotypes. PLoS computational biology, 13(8), e1005693.

      Beyond these main issues, there are several additional concerns that affect interpretation. Sample sizes and partnership counts are sometimes unclear; some figures would benefit from clearer scaling (Figure 1) and annotation (Figures S6 and S7), and key methodological choices (e.g., treatment of DRB copy number variation, no recombination correction in IBD calling) require further explanation. Finally, some conclusions, particularly those invoking optimality or specific selective mechanisms, are not directly tested by the analyses presented and would benefit from more cautious framing.

      We will clarify the presentation of partnership counts and sample sizes throughout the manuscript and improve the scaling and annotation of the flagged figures. Regarding DRB copy number variation, we will add explicit discussion of our analytical choices and their potential limitations. As described in our responses to the main concerns above, we will also provide more nuanced framing of the selective mechanisms consistent with our IBD results, avoiding conclusions that go beyond what our analyses directly support.

      Reviewer #2 (Public review):

      Summary:

      Evidence for the influence of MHC on mate choice in humans is challenging, as social structures and norms often confound the power of studying populations. This study uses an unusual, diverse, but relatively isolated population that allows a direct comparison of arranged and chosen partners to determine if MHC diversity is increased when choice drives mate choice. Overall, the authors use a range of genetic analyses to determine individual relationships alongside different measures of MHC diversity and potential selection pressures. The overall finding that there is no heterozygous dissimilarity difference between arranged and chosen partners. There is evidence of positive selection that may be a stronger driver, or at least it may mask other selection forces.

      Strengths:

      A rare opportunity to study human mate choice and genetic diversity. An excellent range of data and analysis that is well applied, and all results point to the same conclusion.

      Overall, this is a very well-written and concise paper when considering the significant amount of data and excellent analysis that has been undertaken.

      Weaknesses:

      (1) For the type of samples and data available, none are obvious.

      (2) Although this paper is clearly focused on humans, I was expecting more discussion around the studies that have been undertaken in animals. It is likely that between populations and species, there are different pressures that have driven the MHC evolution, but also mate choice.

      We will improve the framing of our project within the broader non-human MHC mate choice literature in our discussion.

      (3) The peptide presentation based on pathogen genomes is interesting but usually not significant. I wondered if another measure of MHC haplotype diversity to complement this would be the overall repertoire of peptides that could be presented, pathogen-based or otherwise. There is usually significant overlap in the peptides that can be presented, for example, between HLA-A and HLA-B, and this may reveal more significant differences between the alleles and haplotype frequencies.

      We would like to clarify that we did assess the unique pathogen peptides bound across all HLA class I and class II genes by each population's common haplotypes (Figures S12–S13). We acknowledge the reviewer's point that non-pathogenic peptides are also important — for example, binding with self-produced proteins. However, binding with self-produced proteins is more relevant to autoimmune risk, and the selective pressures involved are outside the scope of our current work, which focuses on pathogen-induced fluctuating directional selection and heterozygote advantage. Furthermore, selection on non-pathogenic peptide binding repertoires likely operates in the opposite direction to pathogen repertoire; whereas broader pathogen peptide binding is advantageous, broader self-peptide binding risks excessive immune activation.

      Reviewer #3 (Public review):

      The study investigates MHC-related mate choice in humans using a sample of couples from a small-scale sub-Saharan society. This is an important endeavour, as the vast majority of previous studies have been based on samples from complex, highly structured societies that are unlikely to reflect most of human evolutionary history. Moreover, the study controls for genome-wide diversity, allowing for a test of the specificity of the MHC region, as theoretically predicted. Finally, the authors examine potential fitness benefits by analysing predicted pathogen-binding affinities. Across all analyses, no deviations from random pairing are detected, suggesting a limited role for MHC-related mate choice in a relatively homogeneous society. Overall, I find the study to be carefully executed, and the paper clearly written. Nevertheless, I believe the paper would benefit if the following points were considered:

      (1) The authors claim (p. 2, l. 85) that their study is the first to employ a non-European small-scale society. I believe this claim is incorrect, as Hendrick and Black (1997) investigated MHC similarity among couples from South American indigenous populations.

      We thank the reviewer for this important clarification. Our claim was intended to be more specific: to our knowledge, this is the first study to investigate HLA-based mate preferences in a non-European small-scale society while explicitly controlling for genome-wide relatedness. Hedrick and Black (1997) did not include genome-wide relatedness controls, which is a critical distinction given that ancestry-assortative mating can produce spurious patterns of HLA similarity or dissimilarity in the absence of such correction. We will make this qualification explicit in the revised manuscript.

      (2) Regarding the argument that in complex societies, mating with a random individual would already result in sufficient MHC dissimilarity (p. 2, 78), see the paper from Croy et al. 2020, which used the largest sample to date in this research area.

      We thank the reviewer for this reference. In our revision, we will incorporate Croy et al. (2020) into our discussion and use it as a reference for comparing the Himba’s probability of highly homozygous offspring given population allele frequencies. This comparison will help support our claim that background HLA diversity in the Himba is sufficiently high so that any unrelated partner is already likely to yield adequately dissimilar offspring—a scenario that would reduce the selective benefit of active HLA-based mate choice and could mask any such preference even if it exists.

      (3) Dataset. As some relationships are parallel, I assume that certain individuals entered the dataset multiple times. This should be explicitly reported in the Methods. If I understand the analyses correctly, this non-independence was addressed by including individual identity as a random effect in the model - the authors should confirm whether this is the case. I am also wondering to what extent so-called "discovered partnerships" may affect the results. Shared offspring may be the outcome of short or transient affairs and could have a different social status compared with other informal relationships. Would the observed patterns change if these partnerships were excluded from the analyses?

      The reviewer is correct that individuals appear multiple times in the dataset—some individuals are members of multiple known partnerships, and all individuals are additionally included many times across the full set of possible random heterosexual pairings that meet our age and relatedness criteria. This non-independence is explicitly addressed in our dyadic linear mixed models by including female ID and male ID as random effects, which account for each individual's unique contribution to their similarity scores across all pairings, both real and random. We explain this explicitly in the (n) Statistical Models section of the methods section.

      Regarding discovered partnerships: we grouped these with reported informal partnerships in the current analyses due to modest sample sizes. We agree this is worth examining more carefully and will test, in our revision, whether treating discovered partnerships as a separate category, or excluding them entirely, meaningfully affects our results. We will report these analyses as a sensitivity check.

      (4) How many pairs were due to relatedness closer than 3rd degree? In addition, why was 4th degree relatedness used as a threshold in some of the other analyses?

      This information is reported in the (n) ‘Statistical Models section of the Methods’. No pairs were found to be closer than 3rd degree relatives. No arranged marriages were related at 3rd degree or closer; 1 love match marriage and 2 informal partnerships discovered through pedigree analysis were found to be 3rd degree relatives.

      Regarding the difference in relatedness thresholds: we used a 4th degree cutoff to define the unrelated set of individuals for allele and haplotype frequency analyses (n=102), as even 3rd degree relatives would inflate allele frequency estimates. In contrast, we permitted 3rd degree relatives in the background distribution for the partnership analyses to reflect the stated cultural preference for cousin marriages in arranged unions—excluding them would have made the background distribution less representative of the actual mating pool. We explain both decisions in Methods sections (d) and (n).

      (5) I was surprised by the exclusion of HIV, given that Namibia has a very high prevalence of HIV in the general population (e.g., Low et al. 2021).

      While HIV prevalence is indeed high in Namibia generally, the Himba are a relatively isolated population and, based on personal communication with Dr. Ashley Hazel—who has extensive field experience studying sexually transmitted infections in the Himba (see references 36, 52, 53, and 54)—there is no evidence of HIV transmission within this population. Dr. Hazel's expertise on this question was the basis for our exclusion of HIV from the pathogen list.

      (6) It appears that age criteria were applied when generating random pairs (p. 8, l. 350). Could the authors please specify what they consider a realistic age gap, and on what basis this threshold was chosen? As these are virtual couples used solely to estimate random variation within the population, it is not entirely clear why age constraints are necessary. Would the observed patterns change if no age criteria were applied?

      We will clarify this in our revision, but we restricted random couples to have an age gap within the range observed in actual, known partnerships (the woman is maximum 16 years older than then man and minimum 53 years younger than the man). We included this criteria to make sure random couples represented the best approximation of background, realistic partners. Our age gap criteria was quite permissive due to the large range observed in our actual pairs and we do not imagine it significantly impacted our results.

      (7) I think it would be helpful for readers if the Results section explicitly stated that real couples did not differ from randomly generated pairs. At present, only the comparison between chosen and arranged pairs is reported.

      We would like to clarify that for each analysis we explicitly report both the effects of chosen and arranged partnerships relative to the background distribution intercept, and the pairwise contrast between chosen and arranged partnerships. The intercept of each model is derived from the full background distribution of random opposite-sex pairings meeting our age and relatedness criteria, providing a null expectation under random mating. A non-significant effect for both partnership types therefore indicates that neither arranged nor chosen partnerships differ from random mating with respect to the metric in question. We describe this explicitly in the Statistical Models section of the Methods, but we will ensure this interpretation is stated more prominently in the Results section of the revised manuscript to avoid any confusion.

      (8) I appreciate the separate analyses of pathogen-binding properties for MHC class I and class II, given their functional distinctiveness. For the same reason, I would welcome a parallel analysis of MHC sharing conducted separately for class I and class II loci.

      We can incorporate separate HLA similarity/log odds of homozygous offspring analyses for class 1 and class 2 in our revision.

      (9) I think the Discussion would benefit from a more detailed comparison with previous studies. In addition, the manuscript does not explicitly address limitations of the current study, including the relatively limited sample size given the extensive polymorphism in the MHC region.

      We will expand our discussion in the revision to provide a more detailed comparison with previous studies, including Croy et al. (2020), and will add an explicit limitations section incorporating suggestions from multiple reviewers on more careful framing of optimality and specific selective mechanisms. Regarding sample size, we acknowledge this as a genuine limitation given the extensive polymorphism of the MHC region. However, our unrelated sample size used for allelic diversity estimated is comparable to previous studies in African populations (Figure 1), and our dataset is uniquely comprehensive in combining HLA class I, class II, genome-wide SNP data, and partnership data within the same individuals—a combination that enables the genome-wide relatedness correction that distinguishes our study from much of the prior literature.

      References

      Hedrick, P. W., & Black, F. L. (1997). HLA and mate selection: no evidence in South Amerindians. The American Journal of Human Genetics, 61(3), 505-511.

      Croy, I., Ritschel, G., Kreßner-Kiel, D., Schäfer, L., Hummel, T., Havlíček, J., ... & Schmidt, A. H. (2020). Marriage does not relate to major histocompatibility complex: A genetic analysis based on 3691 couples. Proceedings of the Royal Society B, 287(1936), 20201800.

      Low, A., Sachathep, K., Rutherford, G., Nitschke, A. M., Wolkon, A., Banda, K., ... & Mutenda, N. (2021). Migration in Namibia and its association with HIV acquisition and treatment outcomes. PLoS One, 16(9), e0256865.

    1. eLife Assessment

      This study presents a valuable finding on the condition dependence of autophagy-mediated lifespan regulation in C. elegans. The evidence is solid, as the data broadly support the main claims, although variability between biological replicates and limited mechanistic exploration leave some conclusions less firmly established. The work will be of interest to researchers studying autophagy, ageing, and intracellular trafficking.

    2. Reviewer #1 (Public review):

      Summary:

      Hsiung et al. investigated whether the effects of autophagy gene knockdown on the lifespan of long-lived C. elegans mutants depend on experimental conditions. The authors first compiled published data on autophagy-dependent lifespan regulation in daf-2 and wild-type backgrounds, highlighting that prior results are notably inconsistent and likely context-dependent. They then systematically tested the lifespan effects of RNAi knockdown of six autophagy genes (atg-2, atg-4.1, atg-9, atg-13, atg-18, and bec-1) in wild-type (N2), daf-2 (reduced insulin/IGF-1 signalling), and glp-1 (germlineless) animals, while varying temperature, daf-2 allele, FUDR concentration, and bacterial infection status.

      The key findings are as follows. In wild-type animals, lifespan suppression by most autophagy gene knockdowns was more pronounced at 20{degree sign}C than at 25{degree sign}C, where little or no effect was observed. In daf-2 mutants, stronger lifespan suppression was seen in the weaker daf-2(e1368) allele at 20{degree sign}C, but not in the stronger daf-2(e1370) allele, and effects were largely absent at 25{degree sign}C. In glp-1 mutants, four of six gene knockdowns suppressed lifespan to a greater extent than in N2, though again in a temperature-dependent manner. FUDR at a high concentration (800 µM) abolished the life-shortening effects of most knockdowns and, in the case of atg-9 and atg-13, led to lifespan extension. Kanamycin treatment to eliminate bacterial proliferation did not fully account for the lifespan effects, suggesting that increased susceptibility to infection is not the primary mechanism. The authors also tested the programmed aging hypothesis that autophagy promotes lifespan reduction through biomass repurposing, but found no changes in vitellogenin levels upon knockdown of any of the six genes.

      Altogether, among all genes tested, atg-18 knockdown produced the strongest and most consistent lifespan suppression across nearly all conditions, including both daf-2 and glp-1 backgrounds. The authors probed whether atg-18 acts through the FOXO transcription factor DAF-16 by examining dauer formation and ftn-1 expression, but found no evidence for this, suggesting a DAF-16-independent mechanism.

      Strengths:

      The primary strength of this work lies in its systematic and comprehensive approach to dissecting how experimental variables influence the outcome of autophagy-lifespan epistasis tests. The compilation of prior data alongside the authors' own multi-condition dataset is a genuinely useful resource for the field. The study raises a timely and important point about condition selection bias, which is relevant not only to autophagy research but to C. elegans aging studies more broadly. The finding that atg-18 behaves distinctly from other autophagy genes across all conditions is noteworthy and opens avenues for future mechanistic work.

      Weaknesses:

      Despite its breadth, the study has several weaknesses that limit the strength of some conclusions.

      (1) Variability in control lifespan data. The N2 lifespan values under ostensibly identical conditions (e.g., GFP RNAi at 20{degree sign}C) differ substantially across experiments (compare Tables S2, S5, S6, S7, and S9). Since N2 serves as the baseline for calculating whether the effect is greater in long-lived mutants via Cox proportional hazard (CPH) analysis, this variability in controls directly affects the reliability of those comparisons.

      (2) Limited biological replication. Most experiments were performed with only two biological replicates. In several cases, the two replicates yield contradictory outcomes: one showing significant lifespan suppression and the other showing no effect or even extension. The authors combine these into cumulative datasets for analysis, which, while not incorrect in principle, may obscure genuine irreproducibility. Given that the central message of the paper concerns variability and condition dependence, additional replication would have substantially strengthened confidence in the reported results.

      (3) Low sample sizes in individual trials. A number of lifespan assays were conducted with only 40-50 worms per replicate, and in some cases, as few as 30. Such sample sizes are below the standard commonly used in the C. elegans aging field and are likely to contribute to the variability observed.

      (4) RNAi efficacy measured only in N2 at 20{degree sign}C. The authors demonstrated that atg-2 and atg-4.1 RNAi did not significantly reduce target mRNA levels, which may explain their weaker lifespan effects. However, these same RNAi treatments significantly affected lifespan in several other conditions (e.g., daf-2(e1368) at 20{degree sign}C, glp-1 at 20{degree sign}C and 25{degree sign}C, and N2 with 15 µM FUDR). Measuring RNAi efficacy across different genetic backgrounds and conditions would be needed to properly interpret these variable results.

      (5) Incomplete mechanistic exploration. The investigation of why atg-18 knockdown has uniquely strong effects was limited to DAF-16. Given published evidence that atg-18 may regulate HLH-30/TFEB, a master transcriptional regulator of autophagy and lysosomal biogenesis, testing whether atg-18 specifically affects HLH-30 nuclear localisation or activity could have provided valuable mechanistic insight and would distinguish atg-18 from the other genes tested.

    3. Reviewer #2 (Public review):

      Summary:

      This study examines how genes involved in cellular recycling (autophagy) influence lifespan under different experimental conditions. The findings help clarify why previous studies have reported conflicting results about whether blocking autophagy shortens or extends lifespan. The work will be of interest to researchers studying aging and cellular stress responses, particularly those using model organisms.

      Strengths:

      The findings are valuable, as they help resolve inconsistencies within a specific subfield of aging research. The evidence presented is solid, as the data broadly support the primary claims of the study. In addition, the discussion is thorough and thoughtfully integrates the findings within the broader context of the field.

      Weaknesses:

      Additional functional validation would further strengthen the conclusions.

    1. eLife Assessment

      This study establishes a methodology (machine vision and gaze pose estimation) and behavioral apparatus for examining social interactions between pairs of marmoset monkeys. It has been difficult to study social interactions using artificial stimuli rather than genuine interactions between unrestrained animals. This study makes a fundamental contribution to social neuroscience research in a laboratory setting. Their results are convincing showing that the study of unrestrained social interactions is possible with detailed quantification of position and gaze. The methodology presented here is relevant to research in social neuroscience, neuroethology, and primatology.

    2. Reviewer #1 (Public review):

      Summary:

      The current study by Xing et al. establishes the methodology (machine vision and gaze pose estimation) and behavioral apparatus for examining social interactions between pairs of marmoset monkeys. Their results enable unrestrained social interactions under more rigorous conditions with detailed quantification of position and gaze. It has been difficult to study social interactions using artificial stimuli, as opposed to genuine interactions between unrestrained animals. This study makes an important contribution for studying social neuroscience within a laboratory setting that will be valuable to the field.

      Strengths:

      Marmosets are an ideal species for studying primate social interactions due to their prosocial behavior and the ease of group housing within laboratory environments. They also predominantly orient their gaze through head-movements during social monitoring. Recent advances in machine vision pose estimation set the stage for estimating 3D gaze position in marmosets but requires additional innovation beyond DeepLabCut or equivalent methods. A six point facial frame is designed to accurately fit marmoset head gaze. A key assumption in the study is that head-gaze is a reliable indicator of the marmoset's gaze direction, which will also depend on the eye position. Overall, this assumption has been well supported by recent studies in head-free marmosets. Thus the current work introduces an important methodology for leveraging machine vision to track head-gaze and demonstrates its utility for use with interacting marmoset dyads as a first step in that study.

      Comments on revisions:

      I thank the authors for their careful revisions of the manuscript. It has addressed all of my comments.

      One final suggestion would be to add a scale bar in Supplemental Figure 2A so the size of the video/image stimuli is clear (in cm of monitor size) and also to report a range for how far away was the marmoset in viewing these stimuli (in cm). This will enable calculation of the rough accuracy in visual degrees.

    3. Reviewer #2 (Public review):

      Summary:

      The manuscript describes novel technique development and experiments to track the social gaze of marmosets. The authors used video tracking of multiple cameras in pairs of marmoset to infer head orientation and gaze, and then studied gaze direction as a function of distance between animals, relationships, and social conditions/stimuli.

      Strengths:

      Overall the work is interesting and well done. It addresses an area of growing interest in animal social behavior, an area that has largely been dominated by research in rodents and other non-primate species. In particular, this work addresses something that is uniquely primate (perhaps not unique, but not studied much in other laboratory model organisms), which is that primates, like humans, look at each other, and this gaze is an important social cue of their interactions. As such, the presented work is an important advance and addition to the literature that will allow more sophisticated quantification of animal behaviors. I am particularly enthusiastic about how the authors approach the cone of uncertainty in gaze, which can be both due to some error in head orientation measurements as well as variable eye position

      Weaknesses:

      While there remains some degree of uncertainty in the precise accuracy of the gaze measure, the authors have done an excellent job accounting for these as well as they can, and appropriately acknowledge the limitations of their approach.

      Comments on revisions:

      I have no further recommendations. The authors addressed my previous suggestions or acknowledged them as topics for future investigation. This is excellent work.

    4. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The current study by Xing et al. establishes the methodology (machine vision and gaze pose estimation) and behavioral apparatus for examining social interactions between pairs of marmoset monkeys. Their results enable unrestrained social interactions under more rigorous conditions with detailed quantification of position and gaze. It has been difficult to study social interactions using artificial stimuli, as opposed to genuine interactions between unrestrained animals. This study makes an important contribution for studying social neuroscience within a laboratory setting that will be valuable to the field.

      Strengths:

      Marmosets are an ideal species for studying primate social interactions due to their prosocial behavior and the ease of group housing within laboratory environments. They also predominantly orient their gaze through head movements during social monitoring. Recent advances in machine vision pose estimation set the stage for estimating 3D gaze position in marmosets but require additional innovation beyond DeepLabCut or equivalent methods. A six-point facial frame is designed to accurately fit marmoset head gaze. A key assumption in the study is that head gaze is a reliable indicator of the marmoset's gaze direction, which will also depend on the eye position. Overall, this assumption has been well supported by recent studies in head-free marmosets. Thus the current work introduces an important methodology for leveraging machine vision to track head gaze and demonstrates its utility for use with interacting marmoset dyads as a first step in that study.

      Weaknesses:

      One weakness that should be easily addressed is that no data is provided to directly assess how accurate the estimated head gaze is based on calibrations of the animals, for example, when they are looking at discrete locations like faces or video on a monitor. This would be useful to get an upper bound on how accurate the 3D gaze vector is estimated to be, for planned use in other studies. Although the accuracy appears sufficient for the current results, it would be difficult to know if it could be applied in other contexts where more precision might be necessary.

      Please see our detailed responses to the reviewer comments below.

      Reviewer #2 (Public review):

      Summary:

      The manuscript describes novel technique development and experiments to track the social gaze of marmosets. The authors used video tracking of multiple cameras in pairs of marmosets to infer head orientation and gaze and then studied gaze direction as a function of distance between animals, relationships, and social conditions/stimuli.

      Strengths:

      Overall the work is interesting and well done. It addresses an area of growing interest in animal social behavior, an area that has largely been dominated by research in rodents and other non-primate species. In particular, this work addresses something that is uniquely primate (perhaps not unique, but not studied much in other laboratory model organisms), which is that primates, like humans, look at each other, and this gaze is an important social cue of their interactions. As such, the presented work is an important advance and addition to the literature that will allow more sophisticated quantification of animal behaviors. I am particularly enthusiastic with how the authors approach the cone of uncertainty in gaze, which can be both due to some error in head orientation measurements as well as variable eye position.

      Weaknesses:

      There are a few technical points in need of clarification, both in terms of the robustness of the gaze estimate, and possible confounds by gaze to non-face targets which may have relevance but are not discussed. These are relatively minor, and more suggestions than anything else.

      Please see our detailed responses to the reviewer comments below.

      Reviewer #1 (Recommendations for the authors):

      Major comments:

      (1) It appears that the accuracy of the estimated gaze angle must be well under the size of the gaze cone (+/- 10 degrees), but I can't find any direct estimate of the accuracy even if it is just a ballpark figure. On Lines 219-233 is where performance is described for viewing images and video on a monitor, where it should be possible to reconstruct the point of gaze on the monitor while images and video are shown, in order to evaluate the accuracy of the system for where the marmoset is looking? Would you see eye position traces that would show fixation clusters around those images or videos with stationary points on the monitor much like that seen for head-fixed animals looking at faces on a screen (Mitchell et al, 2014)? If so, what is the typical spread of those clusters during fixations on an image, both in terms of the precision by RMS error during a fixation epoch and the spread around the images at different locations (accuracy of projection)? For example, if gaze clusters were always above the displayed images one would have an idea that the face plane is slightly offset above the true gaze direction. It is not completely clear how well the face plane and corresponding gaze cone do in describing gaze direction in space, but the monitor stimuli could be used as an initial validation of it.

      We thank the reviewer for this important suggestion regarding the quantitative validation of gaze accuracy. We agree that, when animals view stimuli presented on a monitor, the estimated gaze direction can be evaluated by examining the spatial distribution of gaze–monitor intersection points relative to stimulus locations.

      To address this, we generated a new figure (Fig. S2A) analyzing gaze behavior following the onset of video stimuli presented at different locations on the monitor. Specifically, we selected video clips in which human annotators verified that the marmosets were looking at the monitor. Consistent with prior work in head-fixed marmosets (Mitchell et al., 2014), we observe clustering of gaze–monitor intersection centers within and around the corresponding stimulus locations after stimulus onset. These clusters provide an empirical validation that the estimated gaze direction aligns with stimulus position in space.

      Importantly, unlike the head-fixed preparation used in Mitchell et al. (2014), marmosets in our study were freely moving. As a result, they do not exhibit prolonged, stationary fixations on the monitor, and fixation clusters are therefore more diffuse. This increased spread reflects natural head and body motion rather than limitations of the gaze estimation method itself. Despite this, gaze intersection points remain spatially localized to the vicinity of the presented stimuli across different monitor locations.

      We did observe small offsets in some gaze clusters relative to stimulus centers; however, these offsets were not systematic across stimulus locations or animals. Crucially, there was no consistent bias (e.g., clusters appearing uniformly above or below stimuli) that would indicate a systematic misalignment of the face plane or gaze cone relative to true gaze direction. Together, these observations support the conclusion that the face-plane-based gaze cone provides an accurate estimate of gaze direction in space, with precision well within the ±10° aperture of the gaze cone.

      While the freely moving component of the behavior precludes direct estimation of fixation RMS error comparable to head-fixed paradigms, the observed stimulus-locked clustering serves as an initial validation of both the accuracy and practical utility of our approach under naturalistic conditions.

      (2) A second major comment is about clarity in the writing of the results and discussion. At the end of the manuscript, a major takeaway is the difference between familiar and unfamiliar dyads, that males show more interest in viewing females including unfamiliar females, but for familiar females, this distinction is also associated with being likely to look at them if they look at the male, and then to engage in joint gaze with them after looking at them, which indicates more of a social interaction than simply monitoring them when they are unfamiliar. Those aspects of the results could be emphasized more in the topic sentences of paragraphs presenting data to support those features of the gaze data (at present is buried at the ends of results paragraphs and back in the discussion).

      We thank the reviewer for this insightful suggestion. We have restructured the Results and Discussion sections to lead with the primary social takeaways rather than technical descriptions (Tracked changes in Word). Specifically, we now emphasize the distinction between "social monitoring" (characteristic of unfamiliar dyads) and "active social coordination" (characteristic of familiar dyads).

      (1) Topic Sentences: We revised the topic sentences of all Results paragraphs to immediately highlight the findings regarding male interest and the influence of familiarity on reciprocation.

      (2) Conceptual Framework: We added a conceptual distinction in the Discussion, explaining that while unfamiliar marmosets maintain high social attention through "peripheral monitoring" and proximity-dependent joint gaze, familiar pairs exhibit sophisticated, distance-independent coordination and gaze reciprocation.

      (3) Clarification of Male Interest: We explicitly stated that while male interest in females is high regardless of familiarity, it manifests as persistent monitoring in unfamiliar pairs versus a more aware, reciprocal state in familiar pairs.

      Minor comments:

      (1) Methods:

      a) Lines 522-539: The 200 continuous frames used for validation of the model containing two marmosets are sufficient to test how well it generalizes to other animals outside the training set? The RMSE reported, does it vary for animals inside vs outside the training set? To what extent does the RMSE, in image pixels, translate into accuracy in estimating the gaze direction, for example, as assessed by estimating error when marmosets look at images or video on the monitor?

      To address the reviewer’s concern regarding generalization and the translation of pixel RMSE to angular accuracy, we emphasize that the six facial features selected are prominent, high-contrast features across the species. Consequently, we observed that the RMSE remained consistent for marmosets both inside and outside the training set. To quantify how pixel-level tracking error translates into gaze estimation accuracy, we performed a sensitivity analysis. We simulated landmark (i.e., feature) jitter by sampling perturbations from circular distributions based on our empirical data (2.4 pixels for eyes; 2.1 pixels for the central blaze). Our results, illustrated in uthpr response image 1, show that 90% of the resulting head gaze deviations fall within 10°, which is consistent with the angular threshold used for our gaze cone model. This confirms that the reported RMSE provides sufficient precision for reliable gaze estimation.

      Author response image 1.

      Probability distribution of gaze angular deviation under circular perturbation. The histogram (blue) represents the change in reconstructed gaze angle (degrees) following stochastic perturbation of facial features. To simulate real-world variance, noise was sampled from circular distributions with radii of 2.4 pixels (eyes) and 2.1 pixels (central blaze). The red curve represents an exponential fit to the empirical data (y=ae<sup>bx</sup>, a=0.9591, b=0.1813. Approximately 90% of the reconstructed gaze deviations remain below 10°, indicating the model’s localised stability under pixel level coordinate jitter.

      b) Line 542-43: Is there any difference between a rigid model fit to the six facial points, versus using the plane defined by the two eyes and central blaze in terms of direction accuracy (in the ground truth validation)? How does the "semi-rigid" set of six points (mentioned also in lines 201-203) constrain the fit of the three points (two eyes and central blaze) that define the normal plan for the gaze cone?

      We thank the reviewer for the opportunity to clarify our geometric model. The plane used to define the gaze cone's origin was indeed determined by the two eyes and the central blaze. However, a plane defined by only three points was insufficient to determine a unique gaze direction, as the normal vector was ambiguous (it could point forward through the face or backward through the head).

      To resolve this, we utilized the relative positions of the two ear tufts. Because the tufts are anatomically situated behind the eyes and blaze, these additional points provide the necessary spatial context to orient the gaze vector correctly. In our validation, we found that the mouth does not alter the angular accuracy compared to a 3-point fit, supporting that the facial features are correctly identified.

      We use the term 'semi-rigid' to describe the six-point constellation because their relative spatial configurations remain stable across individuals and expressions, imposing a biological constraint on the model. This prevents unphysical warping of the face frame during 3D reconstruction and ensures the gaze cone remains anchored to the animal's true midline.

      (2) Results:

      a) Lines 203-205: What is the distinction between gaze orientation (defined by facial plane, 3D vector) and gaze direction (defined by ear tufts) ... is gaze direction in the 2D x-y plane? Why are two measures needed or different? It does not appear gaze orientation is used further in the manuscript and perhaps could be omitted.

      We appreciate the reviewer’s comment regarding the terminology. We have replaced all instances of ‘gaze orientation’ with ‘gaze direction’ to ensure consistency throughout the manuscript.

      To clarify, both terms referred to the same 3D unit vector. The ear tufts were not used to define a separate 2D measure; rather, they served as posterior anatomical anchors to resolve the 3D polarity of the normal vector (ensuring the vector points 'forward' from the face rather than 'backward'). Gaze direction was calculated in 3D space and was not restricted to a 2D x-y plane. We have clarified this in the revised Methods section (Lines 203–205) to avoid further ambiguity.

      b) Line 215-216: why is head-gaze velocity put in normalized units instead of degrees visual angle per second? How was the normalization performed (lines 549-557)? It would be simpler to see velocity as an angular speed (degrees angle per second) rather than a change in norms.

      We thank the reviewer for this suggestion. We agree that the expression is misleading.

      (1) We have replaced "face norm" with "face normal vector" (N) throughout the manuscript to clarify that we are referring to the 3D unit vector perpendicular to the facial plane.

      (2) Lines 224-225 and the corresponding Methods section (Lines 599-609) have been updated to reflect this change in units and terminology.

      We chose to use the change in the face normal vector in normalized units for our primary calculations because it allows for efficient spatiotemporal smoothing and is computationally robust at the very low thresholds required for our stability analysis. However, to address the reviewer's concern regarding interpretability, we have verified that our threshold of 0.05 normalized units corresponds to an angular velocity of 2.87 degrees/frame duration [33ms]. Since we are operating at very small angular changes, the Euclidean distance between unit vectors is a near-linear proxy for the angular displacement in radians.

      c) Lines 215-216: How do raw gaze traces appear over time ... are there gaze saccades and then stable fixations, or does it vary continuously? A plot of the gaze trace might be useful besides just showing velocity with a threshold, to evaluate to what extent stable fixation vs shifts are distinct.

      Author response image 2.

      Time course of gaze, angular velocity and stability, thresholding. The plot illustrates the temporal dynamics of the face normal vector velocity used to define stable gaze states. The blue trace represents the raw gaze velocity calculated in normalised units. The red dashed line demotes the empirical cut off threshold of 0.05 units per frame.

      To clarify the temporal dynamics of marmoset head movements, we have provided a representative time course of head gaze velocity as shown in Author response image 2. The data clearly show a "saccade-and-fixate" pattern: large, distinct spikes in velocity (representing rapid head redirections) are separated by periods of relative stability.

      While minor high-frequency fluctuations in the raw trace (blue) may be attributed to facial feature detection noise, they remain significantly below our stability threshold (red dashed line). By applying this threshold, we successfully isolated biologically relevant "stable fixations" from "head saccades," ensuring that our subsequent social gaze analysis is based on periods of intentional head gaze direction.

      d) Lines 237-286: The writing in this section does not emphasize the main results. There seem to be three takeaway points that could be emphasized better in the topic sentences of each of the paragraphs: i) Marmosets tended to spend most of their time on either end of the elongated box, not in the middle, ii) Males spent more time near the front of the box near the other animal than females, iii) Familiar pairs spent more time closer to each other.

      To address this comment, we have reorganized this section to lead with the three key behavioral findings:

      (1) We now state clearly in the topic sentence that marmosets preferred the ends of the arena over the middle.

      (2) We have highlighted the finding that males spend significantly more time near the inner edge (closer to the partner) than females, irrespective of familiarity.

      (3) We emphasized that familiar pairs maintain closer and more dynamic social distances over time, whereas unfamiliar pairs tend to move further apart as a session progresses.

      e) Line 303: It would be useful to see time traces of head velocity of each member of the pair and categorization over time of the gaze event types. A stable epoch must be brief on the order of 100-200ms. It is unclear how distinct the stable fixation epochs are from the moments when the gaze is shifting. Also, the state transition analysis treats each stable epoch like one event, and then following a gaze movement by either of the pair, the state is defined again, is that correct?

      We defined stable epochs as continuous periods where the face normal vector velocity remained below 0.05 normalized units for both animals. This ensures that a "gaze state" is only categorized when both marmosets have relatively fixed head orientations. As shown in the provided time traces in Author response image 2), the velocity profile is characterized by sharp peaks (head saccades) and clearly defined troughs (fixations). Further, we generated a probability histogram of stable head-gaze epoch durations (Author response image 3). The median duration of these stable epochs is 200ms, which aligns with biological expectations for fixation durations in primates and confirms that these states are distinct from the high-velocity shifts.

      The reviewer’s interpretation is correct. Our Markov chain model treats each stable epoch as a single event. A transition occurs when at least one animal moves (exceeding the velocity threshold), resulting in a new stable epoch where the relative gaze state is re-evaluated. This approach allows us to model the sequence of social interactions as a series of discrete behavioral decisions.

      Author response image 3.

      Temporal characteristics of stable gaze, head gaze, epochs. The histogram illustrates the probability distribution of the duration (ms) of stablegaze behaviour epochs. A minimum duration threshold of 100 ms was applied to exclude transient, non-purposeful head gazes.

      f) Lines 316-326: Some general summarizing statements to lead this paragraph would be useful. It seems that familiar pairs are more likely to participate in joint gaze, especially when close to each other, and perhaps, that males tended to gaze at females more than the reverse. Is there any notion that males were following the gaze of females?

      We thank the reviewer for these suggestions. We have revised the topic sentences of this section to lead with a summary of the social takeaways, specifically highlighting the higher level of male interest and the shift toward reciprocal coordination in familiar pairs.

      The reviewer correctly identified an important dynamic. Our transition analysis (Fig. 4D) confirms that males in both familiar and unfamiliar dyads frequently follow the female's gaze. This is evidenced by a robust transition probability (~17%) from "Male-to-Female Partner Gaze" (blue node) to "Joint Gaze" (green node). We found that this gaze-following behavior was a general feature of the dyads and did not differ significantly by familiarity, which is why it was not previously emphasized. However, we have now added a statement to the Results (Lines 358-365) to explicitly describe this male-led gaze-following behavior.

      g) Lines 328-337: Can these findings in this paragraph be summarized more generally? It seems males view unfamiliar females longer, whereas for familiar females they are more likely to reciprocate viewing if being viewed by them and then to join in joint gaze with them. Would that event, viewing a female and then a transition to joint gaze, not be categorized as a gaze-following event?

      We have now summarized the paragraph to emphasize the transition from vigilant monitoring in unfamiliar pairs to reciprocal awareness in familiar pairs.

      Regarding "longer" viewing: We have clarified the text to specify that males' interest in unfamiliar females is persistent and robust rather than simply "longer" in a single duration. The high recurrence probability signifies that males consistently re-orient their gaze back to the unfamiliar female even if the interaction is briefly interrupted by movement.

      Regarding gaze following and joint gaze: The reviewer asks if the transition from viewing a female to joint gaze constitutes gaze following. We agree that a transition from "male-to-female gaze" to "joint gaze" is indeed a gaze-following event (as noted in our previous response regarding Fig. 4D). However, the specific transition discussed in this paragraph (female-to-male gaze to male-to-female gaze) is different: it describes a "reciprocal" event where the male responded to being looked at by looking back at the female, while the female simultaneously shifted her gaze away. Since the two gaze cones did not intersect on an external object or on each other's faces simultaneously at the end of this transition, it was not categorized as joint gaze or gaze following.

      h) Lines 339-351: It is not clear why gazing at the region surrounding a female's face (as opposed to the face itself) reflects "gaze monitoring tied to increased social attention (Dal Monte et l, 2022). This hypothesis could be expanded to make the prediction clear in this paragraph.

      We thank the reviewer for identifying the need to clarify the hypothesis regarding the region surrounding the face. We have expanded this paragraph to explain why gazing at the peripheral facial region reflects social monitoring.

      In many primate species, direct and sustained eye contact can be often interpreted as a threat or a challenge, particularly between unfamiliar individuals. Peripheral monitoring (looking at the area immediately surrounding the face) can strategically allow an animal to stay highly attentive to the partner's head orientation, gaze direction, and facial expressions—all critical for anticipating future actions—while minimizing the risk of social conflict. By demonstrating that unfamiliar marmosets utilize this peripheral strategy significantly more than familiar ones, we provide evidence that social attention in novel dyads is characterized by a social monitoring strategy that balances the need for information with social caution.

      i) Lines 354-373: This section seems to suggest again that in a familiar male/female pair, the male is more likely to follow the female gaze and establish a joint gaze, and this occurs less with the unfamiliar pair only when closer in distance. Some summary sentences to begin the paragraph could help frame what to expect from the results.

      We have added summarizing topic sentences to this section to clarify the relationship between familiarity and the spatial distribution of joint gaze.

      (3) Discussion:

      Lines 380-463: This section reads more clearly than most of the results, where it is often hard to connect the data plots to their significance for behavior. Overall, I believe the manuscript could be improved by setting up a hypothesis before presenting results in the paragraphs demonstrating the data. Some of the main findings appear in text from lines 413-419 (somewhat hidden even in discussion).

      We sincerely appreciate the reviewer’s positive feedback on the clarity of the latter sections of our Discussion. We have taken the suggestion to heart and have performed a comprehensive restructuring of the Results and Discussion sections.

      (1) We have moved the key takeaways, specifically the distinction between vigilant monitoring in unfamiliar pairs and reciprocal coordination in familiar pairs, from the end of the Discussion to the topic sentences of the relevant Results paragraphs.

      (2) We established a unified framework throughout the manuscript that connects pixel-level tracking stability to the biological "saccade-and-fixate" movement pattern, and ultimately to the social dimensions of sex and familiarity.

      (4) A couple of additional questions to address in the discussion:

      a) Can you speculate why in this behavioral context the marmosets do not engage in reciprocal gaze where both are simultaneously looking at each other (lines 297-301)? How low is the incidence of this event, numerically, in comparison to the other events (1 in 1000 events, etc)?

      We appreciate the reviewer’s interest in the lack of reciprocal gaze (mutual eye contact).

      Numerically, reciprocal gaze events occurred with a frequency of approximately 1 in 500 social gaze events (comprising less than 0.2% of our social dataset). Given this extreme scarcity, we felt that any statistical comparisons across sex or familiarity would be underpowered and potentially misleading, leading to our decision to focus on partner and joint gaze states.

      We speculate that the rarity of reciprocal gaze is primarily due to our task-free experimental setup. Unlike directed cooperation tasks where animals must look at each other to coordinate actions for a reward (e.g., Miss & Burkart, 2018), our study focused on task-free interactions. In a free-moving context without a common goal, marmosets may prioritize monitoring the environment or the partner’s actions (joint or partner gaze) over direct, sustained mutual eye contact, which can sometimes be perceived as a confrontational or high-arousal signal in primate social hierarchies.

      b) Does a transition from a marmoset viewing their partner, to a joint gaze, count as a gaze-following event? It appears the authors are reluctant to use that terminology. What are the potential concerns in that terminology? Is there a concern that both animals orient to the same object that is salient to them without it being due to their gaze?

      A transition from a partner-directed gaze to a joint gaze is indeed a gaze-following event. We distinguish these events from a transition between partner-directed gazes (e.g., male-to-female to female-to-male). In these "reciprocation" cases, once the second animal looked at the first, the first animal shifted their gaze away. Because the two gaze cones did not intersect on a common object at the end of the transition, I classified such events as a social exchange of attention rather than a coordinated gaze-following event.

      Reviewer #2 (Recommendations for the authors):

      I do have a few questions/points for clarification:

      (1) While your approach appears to be able to track head orientation when the face is occluded or turned away from the primary cameras, how was the accuracy of this validated? Since you have multiple cameras, it should be possible to make the estimate using the occluded cameras and then validate using the non-occluded ones.

      We appreciate the reviewer's comment regarding the validation of our tracking during partial occlusions.

      We wish to clarify that our system does not utilize "primary" vs "auxiliary" cameras. Rather, any two or more cameras that capture facial features with high confidence are used to triangulate the points into 3D space. Thus, the "primary" cameras are dynamically determined frame-by-frame based on the animal's orientation.

      To validate the accuracy of our 3D reconstruction during occlusions, we utilized a "projection-validation" approach. As demonstrated in Figure 2B (left panel), when the face is turned away from a specific camera, leaving only the back of the head visible, we used the facial features triangulated from the other non-occluded cameras and projected them onto the image plane of the occluded camera. The fact that these projected points aligned precisely with the expected (but hidden) anatomical landmarks confirms the global accuracy of our 3D model.

      We previously benchmarked this approach using a three-camera system where we triangulated coordinates via two cameras and successfully projected them onto the third camera's image plane with high accuracy. This ensures that even when a camera is "blind" to the face, the 3D position estimated by the rest of the array remains robust.

      (2) Marmosets, like other non-human primates, also look at other body postures for their social communication, though admittedly marmosets are far more likely to look others in the face than larger primates. The tail-raised genital displays come to mind. While the paper primarily focuses on shared vs deviant gaze, and I believe tracks not only the angle of viewing towards the target but also the distance from the face (please clarify if I am wrong), it would also be useful to know how often marmosets are looking at each other beyond just the face. This is particularly interesting if the gaze towards the partner varies depending on whether that partner was generally oriented towards the gazer, or not. For the joint gaze, were there conditions in which the two were looking at the same target, but had body postures that were not oriented toward one another (i.e. looking at a distant target beyond one of the animals, like looking over someone else's shoulder)?

      We thank the reviewer for highlighting the importance of body postures and non-facial social signals (e.g., genital displays) in marmoset communication.

      At the inception of this project, we explored tracking multiple body parts. However, due to the marmoset's dense fur and the lack of distinct skeletal markers under naturalistic lighting, human annotators and early automated tools struggled to achieve the precision required for high-resolution 3D kinematics. While recent advances in whole-body tracking now make these questions approachable, we chose to focus on the face normal vector because it provided the most robust and high-confidence signal for social orientation in our current dataset.

      Regarding the "looking over the shoulder" scenario, we utilized a hierarchical classification system to prevent wrong categorization. Intersection with the partner’s face always took priority. If one animal’s gaze cone contained the other’s face, the state was classified as "Partner Gaze", even if the two gaze cones happened to intersect at a distant point in space. This ensures that "Joint Gaze" specifically captures instances where both animals ignore one another’s face regions to focus on a shared external target.

      We agree that the relationship between body posture and head gaze is a fascinating area for future research. In our current setup, while "Joint Gaze" requires the head-gaze cones to intersect, the animals' bodies could indeed be oriented in different directions (e.g., looking at a distant target behind the partner). We have added a note to the Discussion acknowledging that incorporating whole-body gestures would further deepen the understanding of marmoset social ethology.

      (3) In the introduction, (line 70), you raise the question of ecological relevance, using rhesus in laboratory settings. This could use a little more expansion/explanation of the limitations of current/past approaches.

      We thank the reviewer for the suggestion to expand upon the ecological limitations of traditional laboratory paradigms.

      We have substantially revised the Introduction (Lines 70–82) to provide a more detailed critique of past approaches. Specifically, we now highlight how traditional head-fixed or screen-based paradigms decouple eye movements from natural head-body dynamics and lack the reciprocal, multi-agent complexity found in real-world social environments (e.g., Land, 2006; Shepherd, 2010). By contrasting these constraints with the spatially and socially embedded nature of marmoset interactions, we clarify why a more naturalistic, quantitative approach is necessary to understand the true dynamics of social gaze. These additions provide a stronger theoretical foundation for our move toward a free-moving experimental model.

    1. eLife Assessment

      This important work examines the effects of side-wall confinement on chemotaxis of swimming bacteria in a shallow microfluidic channel. The authors present convincing experimental evidence, combined with geometric analysis and numerical simulations of simplified models, showing that chemotaxis is enhanced when the distance between the side walls is comparable to the intrinsic radius of chiral circular swimming near open surfaces. This study should be of interest to scientists specializing in bacteria-surface interactions.

    2. Reviewer #1 (Public review):

      The authors show experimentally that, in 2D, bacteria swim up a chemotactic gradient much more effectively when they are in the presence of lateral walls. Systematic experiments identify an optimum for chemotaxis for a channel width of ~8µm, a value close to the average radius of the circle trajectories of the unconfined bacteria in 2D. These chiral circles impose that the bacteria swim preferentially along the right-side wall, which indeed yields chemotaxis in the presence of a chemotactic gradient. These observations are backed by numerical simulations and a geometrical analysis.