10,000 Matching Annotations
  1. Sep 2025
    1. eLife Assessment

      This valuable work explores the timely idea that aperiodic activity in human electrophysiology recordings is dynamically modulated in response to task events in a manner that may be relevant for behavioral performance. Moreover, the authors present solid evidence that, in some circumstances, these aperiodic changes might be misinterpreted as oscillatory changes.

    2. Reviewer #1 (Public review):

      Summary:

      Frelih et al. investigated both periodic and aperiodic activity in EEG during working memory tasks. In terms of periodic activity, they found post-stimulus decreases in alpha and beta activity, while in terms of aperiodic activity, they found a bi-phasic post-stimulus steepening of the power spectrum, which was weakly predictive of performance. They conclude that it is crucial to properly distinguish between aperiodic and periodic activity in event-related designs as the former could confound the latter. They also add to the growing body of research highlighting the functional relevance of aperiodic activity in the brain.

      Strengths:

      This is a well-written, timely paper that could be of interest to the field of cognitive neuroscience, especially to researchers investigating the functional role of aperiodic activity. The authors describe a well-designed study that looked at both the oscillatory and non-oscillatory aspects of brain activity during a working memory task. The analytic approach is appropriate, as a state-of-the-art toolbox is used to separate these two types of activity. The results support the basic claim of the paper that it is crucial to properly distinguish between aperiodic and periodic activity in event-related designs as the former could confound the latter. They also add to the growing body of research highlighting the functional relevance of aperiodic activity in the brain. Commendably, the authors include replications of their key findings on multiple independent data sets.

      Comments on the previous version:

      The authors have addressed several of the weaknesses I noted in my original review, specifically, they softened their claims regarding the theta findings, while simultaneously strengthening these findings with additional analyses (using simulations as well as a new measure of rhythmicity, the phase autocorrelation function, pACF). Most of the other suggested control analyses were also implemented. While I believe the fact that the participants in the main sample were not young adults could be made even more explicit, and the potential interaction between age and aperiodic changes could be unpacked a little in the discussion, the age of the sample is definitely addressed upfront.

    3. Reviewer #2 (Public review):

      Summary:

      In this manuscript, Frelih et al, investigate the relationship between aperiodic neural activity, as measured by EEG, and working memory performance, and compares this to the more commonly analyzed periodic, and in particular theta, measures that are often associated with such tasks. To do so, they analyze a primary dataset of 57 participants engaging in an n-back task, as well as a replication dataset, and use spectral parameterization to measure periodic and aperiodic features of the data, across time. In the revision, the authors have clarified some key points, and added a series of additional analyses and controls, including the use of an additional method, that helps to complement the original analyses and further corroborates their claims. In doing so, they find both periodic and aperiodic features that relate to the task dynamics, but importantly, the aperiodic component appears to explain away what otherwise looks like theta activity in a more traditional analysis. This study therefore helps to establish that aperiodic activity is a task-relevant dynamic feature in working memory tasks and may be the underlying change in many other studies that reported 'theta' changes, but did not use methods that could differentiate periodic and aperiodic features.

      Strengths:

      Key strengths of this paper include that it addresses an important question - that of properly adjudicating which features of EEG recordings relate to working memory tasks - and in doing so provides a compelling answer, with important implications for considering prior work and contributing to understanding the neural underpinnings of working memory. The revision is improved by showing this using an additional analysis method. I do not find any significant faults or error with the design, analysis, and main interpretations as presented by this paper, and as such, find the approach taken to be a valid and well-enacted. The use of multiple variants of the working memory task, as well as a replication dataset significantly strengthens this manuscript, by demonstrating a degree of replicability and generalizability. This manuscript is also an important contribution to motivating best practices for analyzing neuro-electrophysiological data, including in relation to using baselining procedures. I think the updates in the revision have helped to clarify the findings and impact of this study.

      Weaknesses:

      Overall, I do not find any obvious weaknesses with this manuscript and it's analyses that challenge the key results and conclusions. Updates through the revision have addressed my previous points about adding some additional notes on the methods and conclusions.

    4. Reviewer #3 (Public review):

      Summary:

      Using a specparam (1/f) analysis of task-evoked activity, the authors propose that "substantial changes traditionally attributed to theta oscillations in working memory tasks are, in fact, due to shifts in the spectral slope of aperiodic activity." This is a very bold and ambitious statement, and the field of event-related EEG would benefit from more critical assessments of the role of aperiodic changes during task events. Unfortunately, the data shown here does not support the main conclusion advanced by the authors.

      Strengths:

      The field of event-related EEG would benefit from more critical assessments of the role of aperiodic changes during task events. The authors perform a number of additional control analyses, including different types of baseline correction, ERP subtraction, as well as replication of the experiment with two additional datasets.

      Comments on previous revisions:

      The authors have completed a substantial revision based on the comments from all of the reviewers. Overall, the major claims of the initial report have been profoundly tempered.

      [Editors' note: We determined that this revised version appropriately tempers some of the prior claims and addresses the concerns raised by the reviewers through two rounds of review.]

    1. eLife Assessment

      To evaluate phenotypic correlations between complex traits, this study aimed to measure the genetic overlap of traits by evaluating GWAS signals assisted by eQTL signals. They suggested an improved version of the previous Sherlock to integrate SNP-level signals into gene-level signals. Then they compared 59 human traits to identify known and novel genetic distance relationships. This work is valuable to the field, but still needs substantial improvement because many parts of the paper are incomplete.

    2. Reviewer #1 (Public review):

      The authors tried to quantify the difference between human complex traits by calculating genetic overlap scores between a pair of traits. Sherlock-II was devised to integrate GWAS with eQTL signals. The authors claim that Sherlock-II is superior to the previous version (robustness, accuracy, etc). It appears that their framework provides a reasonable solution to this important question, although the study needs further clarification and improvements.

      (1) Sherlock-II incorporates GWAS and eQTL signals to better quantify genetic signals for a given complex trait. However, this approach is based on the hypothesis that "all GWAS signals confer association to complex trait via eQTL", which is not true (PMID: 37857933). This should be acknowledged (through mentioning in the text) and incorporated into the current setup (through differential analysis - for example, with or without eQTL signals, or with strong colocalization only).

      (2) When incorporating eQTL, why did the authors use the top p-value tissues for eQTL? This approach seems simpler and probably more robust. But many eQTLs are tissue-specific. Therefore, it would also be important to know if eQTLS from appropriate tissues were incorporated instead.

      (3) One of the main examples is the novel association between Alzheimer's disease and breast cancer. Although the authors provided a molecular clue underlying the association, it is still hard to comprehend the association easily, as the two diseases are generally known to be exclusive to each other. This is probably because breast cancer GWAS is performed for germline variants and does not consider the contribution of somatic variants.

      (4) It would help readers understand the story better if a summary figure of the entire process were provided. The current Figure 1 does not fulfil that role.

      (5) Figure 2 is not very informative. The readers would want to know more quantitative information rather than a heatmap-style display. Is there directionality to the relationship, or is it always unidirectional?

      (6) In Figure 3, readers may want to know more specific information. For example, what gene signals are really driving the hypoxia signal in Alzheimer's disease vs breast cancer? And what SNP signals are driving these gene-level signals?

    3. Reviewer #2 (Public review):

      Summary:

      The authors introduce a gene-level framework to detect shared genetic architecture between complex traits by integrating GWAS summary statistics with eQTL data via a new algorithm, Sherlock-II, which aggregates signals from multiple (cis/trans) eSNPs to produce gene-phenotype p-values. Shared pathways are identified with Partial-Pearson-Correlation Analysis (PPCA).

      Strengths:

      The authors show the gene-based approach is complementary and often more sensitive than SNP-level methods, and discuss limitations (in terms of no directionality, dependence on eQTL coverage).

      Weaknesses:

      (1) How do the authors explain data where missing tissues or sparse eQTL mapping are available? Would that bias as to which genes/traits can be linked and may produce false negatives or tissue-specific false positives?

      (2) Aggregating SNP-level signals into gene scores can be confounded by LD; for example, a nearby causal variant for a different gene or non-expression mechanism may drive a gene's score, producing spurious gene-trait links. How do the authors prevent this?

      (3) How the SNPs are assigned to genes would affect results, this is because different choices can change which genes appear shared between traits. The authors can expand on these.

      (4) Many reported novel trait links remain speculative without functional or orthogonal validation (e.g., colocalization, perturbation data). Thus, the manuscript's claims are inconclusive and speculative.

      (5) It would be best to run LD-aware colocalization and power-matched simulations to check for robustness.

    4. Author response:

      Reviewer #1 (Public review):

      The authors tried to quantify the difference between human complex traits by calculating genetic overlap scores between a pair of traits. Sherlock-II was devised to integrate GWAS with eQTL signals. The authors claim that Sherlock-II is superior to the previous version (robustness, accuracy, etc). It appears that their framework provides a reasonable solution to this important question, although the study needs further clarification and improvements.

      (1) Sherlock-II incorporates GWAS and eQTL signals to better quantify genetic signals for a given complex trait. However, this approach is based on the hypothesis that "all GWAS signals confer association to complex trait via eQTL", which is not true (PMID: 37857933). This should be acknowledged (through mentioning in the text) and incorporated into the current setup (through differential analysis - for example, with or without eQTL signals, or with strong colocalization only). 

      The reviewer is correct that in this version of the tool, we focused on SNPs with effect on gene expression, as the majority of the SNPs identified by GWASs are non-coding SNPs. In the future improvement, we should also include coding SNPs that change the amino acid sequence of genes. We will discuss this point more in the revised manuscript.

      (2) When incorporating eQTL, why did the authors use the top p-value tissues for eQTL? This approach seems simpler and probably more robust. But many eQTLs are tissue-specific. Therefore, it would also be important to know if eQTLS from appropriate tissues were incorporated instead. 

      This is a simple scheme to incorporate eQTL data from multiple tissues, assuming that the tissue that gives the strongest association is most relevant, or mainly mediates the effect from the SNP to the phenotype. This is a reasonable approach given that the tissues of origin for most of the phenotypes are unknown. In the future improvement, we should incorporate eQTL data from the appropriate tissue(s) if that is known.

      (3) One of the main examples is the novel association between Alzheimer's disease and breast cancer. Although the authors provided a molecular clue underlying the association, it is still hard to comprehend the association easily, as the two diseases are generally known to be exclusive to each other. This is probably because breast cancer GWAS is performed for germline variants and does not consider the contribution of somatic variants. 

      This is due to one of the limitations of the current algorithm: no direction of association is predicted explicitly. It could be that increasing the expression of a gene reduced the risk of one disease but increase the risk of another. Currently we have to analyze the details of the SNPs to infer direction once overlapping genes are found. This needs improvement in the future.  

      (4) It would help readers understand the story better if a summary figure of the entire process were provided. The current Figure 1 does not fulfil that role. 

      We plan to incorporate reviewer's suggestion in the revised manuscript.

      (5) Figure 2 is not very informative. The readers would want to know more quantitative information rather than a heatmap-style display. Is there directionality to the relationship, or is it always unidirectional? 

      We will consider a different presentation in the revised manuscript.

      (6) In Figure 3, readers may want to know more specific information. For example, what gene signals are really driving the hypoxia signal in Alzheimer's disease vs breast cancer? And what SNP signals are driving these gene-level signals? 

      We will add these information in the revised manuscript.

      Reviewer #2 (Public review):

      Summary:

      The authors introduce a gene-level framework to detect shared genetic architecture between complex traits by integrating GWAS summary statistics with eQTL data via a new algorithm, Sherlock-II, which aggregates signals from multiple (cis/trans) eSNPs to produce gene-phenotype p-values. Shared pathways are identified with Partial-Pearson-Correlation Analysis (PPCA).

      Strengths:

      The authors show the gene-based approach is complementary and often more sensitive than SNP-level methods, and discuss limitations (in terms of no directionality, dependence on eQTL coverage).

      Weaknesses:

      (1) How do the authors explain data where missing tissues or sparse eQTL mapping are available? Would that bias as to which genes/traits can be linked and may produce false negatives or tissue-specific false positives?

      Missing tissues or sparse eQTL certainly can produce false negatives as the signals linking the two phenotypes are simply not captured in the data. It is less likely to produce false positives as long as the statistical test is well controlled.   

      (2) Aggregating SNP-level signals into gene scores can be confounded by LD; for example, a nearby causal variant for a different gene or non-expression mechanism may drive a gene's score, producing spurious gene-trait links. How do the authors prevent this? 

      When there are multiple SNPs in LD with multiple genes nearby, it is generally difficult to map the causal SNP and the causal gene it affected, and thus there will be spurious gene-trait links. When we calculate the global similarity based on the gene-trait association profiles,  we tried to control this by simulating with random GWASs that have the same power as the real GWAS and preserve the LD structure, as the spurious links will also be present in the simulated data (but may appear in different loci) that are used to calibrate the statistical significance. 

      (3) How the SNPs are assigned to genes would affect results, this is because different choices can change which genes appear shared between traits. The authors can expand on these. 

      We assign SNPs to genes based on their strongest eQTL association from the available data. Improvement can be made if the relevant tissues for a trait are known (see response to Reviewer 1 above).

      (4) Many reported novel trait links remain speculative without functional or orthogonal validation (e.g., colocalization, perturbation data). Thus, the manuscript's claims are inconclusive and speculative. 

      We agree with the reviewer that the reported trait links are speculative, and they should be treated as hypotheses generated from the computational analyses. To truly validate some of these proposed relationships, deeper functional analyses and experimental tests are needed.

      (5) It would be best to run LD-aware colocalization and power-matched simulations to check for robustness. 

      We agree more control on LD and power-matched simulations will be important for testing the robustness of the predictions.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this review, the author covered several aspects of the inflammation response, mainly focusing on the mechanisms controlling leukocyte extravasation and inflammation resolution.

      Strengths:

      This review is based on an impressive number of sources, trying to comprehensively present a very broad and complex topic.

      Weaknesses:

      (1) This reviewer feels that, despite the title, this review is quite broad and not centred on the role of the extracellular matrix.

      (2) The review will benefit from a stronger focus on the specific roles of matrix components and dynamics, with more informative subheadings.

      (3) The macrophage phenotype section doesn't seem well integrated with the rest of the review (and is not linked to the ECM).

      (4) Table 1 is difficult to follow. It could be reformatted to facilitate reading and understanding

      (5) Figure 2 appears very complex and broad.

      (6) Spelling and grammar should be thoroughly checked to improve the readability.

      This review focuses on the whole extravasation journey of leukocyte and highlights involvement of extracellular matrix (ECM) in multiple phases of the process. ECM may exert their roles either as a collective structure or as individual components. In the revision, for those functions involving specific matrix components, we will emphasize the matrix components and incorporate this information to subheadings as suggested. The parts of macrophage phenotype (Section 10-11) are included for its pivotal roles on deciding the tissue fate following inflammation (ie. to resolve / to regenerate damages incurred or to sustain inflammation), which is an important aspect of this review. ECM could modify macrophage phenotypes either directly (section 10) or indirectly via modulations of tissue stiffness or other cell types like fibroblasts (section 9). However, as pointed out by other reviewers as well, we acknowledge that Section 11 does not integrate well enough to the rest of the review. We plan to reorganize this part and to emphasize its link to ECM during the revision for better integration. We will reformat Table 1 for easier comprehension. We will consider restructuring Figure 2, which outlines various events influencing tissue decision of resolution/inflammation, perhaps by breaking up into two separate figures, to better focus the message. We will also check the language to improve readability.

      Reviewer #2 (Public review):

      Summary:

      The manuscript is a timely and comprehensive review of how the extracellular matrix (ECM), particularly the vascular basement membrane, regulates leukocyte extravasation, migration, and downstream immune function. It integrates molecular, mechanical, and spatial aspects of ECM biology in the context of inflammation, drawing from recent advances. The framing of ECM as an active instructor of immune cell fate is a conceptual strength.

      Strengths:

      (1) Comprehensive synthesis of ECM functions across leukocyte extravasation and post-transmigration activity.

      (2) Incorporation of recent high-impact findings alongside classical literature.

      (3) Conceptually novel framing of ECM as an active regulator of immune function.

      (4) Effective integration of molecular, mechanical, and spatial perspectives.

      Weaknesses:

      (1) Insufficient narrative linkage between the vascular phase (Sections 2-6) and the in-tissue phase (Sections 7-10).

      (2) Underrepresentation of lymphocyte biology despite mention in early sections.

      (3) The MIKA macrophage identity framework is only loosely tied to ECM mechanisms.

      (4) Limited discussion of translational implications and therapeutic strategies.

      (5) Overly dense figure insets and underdeveloped links between ECM carryover and downstream immune phenotypes.

      (6) Acronyms and some mechanistic details may limit accessibility for a broader readership.

      We will add a transition paragraph between Section 6 and Section 7 to provide a narrative that the extravasation processes affect downstream leukocyte functions. While lymphocytes follow a similar extravasation principle, their in-tissue activities differ from innate leukocytes. We will thus include discussion of lymphocyte-ECM crosstalk to Section 8 and/or 9 in the revision. We will restructure Section 11 and Figure 3 to better integrate to the rest of the review: In the current manuscript, we merely describe the capability of the MIKA framework to describe identity of any tissue macrophages and thus the framework could serve as a roadmap to facilitate identity normalization of pathological macrophages. We plan, in the revision, by employing the MIKA framework, to discuss and demonstrate linkage between macrophage identities and expression/production of modulators to functional ECM effectors described in Section 8-9. Regarding the comment of limited discussion of translational implications / therapeutic strategies, we will try to enrich this aspect throughout the manuscript where appropriate, in addition to the existing ones (eg. line 293-297; 388-391; 460-463; 512-517) We will also revise figure structure in general to avoid too dense information and to improve clarity. We will consider to provide a glossary explaining specialized terms to expand readership accessibility.

      Reviewer #3 (Public review):

      Summary & Strengths:

      This review by Yu-Tung Li sheds new light on the processes involved in leukocyte extravasation, with a focus on the interaction between leukocytes and the extracellular matrix. In doing so, it presents a fresh perspective on the topic of leukocyte extravasation, which has been extensively covered in numerous excellent reviews. Notably, the role of the extracellular matrix in leukocyte extravasation has received relatively little attention until recently, with a few exceptions, such as a study focusing on the central nervous system (J Inflamm 21, 53 (2024) doi.org/10.1186/s12950-024-00426-6) and another on transmigration hotspots (J Cell Sci (2025) 138 (11): jcs263862 doi.org/10.1242/jcs.263862). This review synthesizes the substantial knowledge accumulated over the past two decades in a novel and compelling manner.

      The author dedicates two sections to discussing the relevant barriers, namely, endothelial cell-cell junctions and the basement membrane. The following three paragraphs address how leukocytes interact with and transmigrate through endothelial junctions, the mechanisms supporting extravasation, and how minimal plasma leakage is achieved during this process. The subsequent question of whether the extravasation process affects leukocyte differentiation and properties is original and thought-provoking, having received limited consideration thus far. The consequences of the interaction between leukocytes and the extracellular matrix, particularly regarding efferocytosis, macrophage polarization, and the outcome of inflammation, are explored in the subsequent three chapters. The review concludes by examining tissue-specific states of macrophage identity.

      Weaknesses:

      Firstly, the first ten sections provide a comprehensive overview of the topic, presenting logical and well-formulated arguments that are easily accessible to a general audience. In stark contrast, the final section (Chapter 11) fails to connect coherently with the preceding review and is nearly incomprehensible without prior knowledge of the author's recent publication in Cell. Mol. Life Sci. CMLS 772 82, 14 (2024). This chapter requires significantly more background information for the general reader, including an introduction to the Macrophage Identity Kinetics Archive (MIKA), which is not even introduced in this review, its basis (meta-analysis of published scRNA-seq data), its significance (identification of major populations), and the reasons behind the revision of the proposed macrophage states and their further development. Secondly, while the attempt to integrate a vast amount of information into fewer figures is commendable, it results in figures that resemble a complex puzzle. The author may consider increasing the number of figures and providing additional, larger "zoom-in" panels, particularly for the topics of clot formation at transmigration hotspots and the interaction between ECM/ECM fragments and integrins. Specifically, the color coding (purple for leukocyte α6-integrins, blue for interacting laminins, also blue for EC α6 integrins, and red for interacting 5-1-1 laminins) is confusing, and the structures are small and difficult to recognize.

      We agree with and appreciate the specific and helpful suggestions by the reviewer. During the revision, we will provide the requested background description of MIKA to enhance accessibility of general readership. As pointed out by other reviewers, since this part (Section 11) is less well-integrated to the rest of the review, we will restructure this part by linking tissue macrophage identities under MIKA framework to modulation of functional ECM effectors described in previous sections (Section 8-9). We acknowledge the current figure organization might be overly information-dense and will consider breaking down the contents to multiple figures. The size and color-coding issues will also be addressed.

    2. eLife Assessment

      This Review Article takes an original angle and covers several aspects of the leukocytes extravasation process with a focus on the role of ECM proteins. It is a timely piece with an original viewpoint. The current manuscript would benefit from improvement in writing and organization.

    3. Reviewer #1 (Public review):

      Summary:

      In this review, the author covered several aspects of the inflammation response, mainly focusing on the mechanisms controlling leukocyte extravasation and inflammation resolution.

      Strengths:

      This review is based on an impressive number of sources, trying to comprehensively present a very broad and complex topic.

      Weaknesses:

      (1) This reviewer feels that, despite the title, this review is quite broad and not centred on the role of the extracellular matrix.

      (2) The review will benefit from a stronger focus on the specific roles of matrix components and dynamics, with more informative subheadings.

      (3) The macrophage phenotype section doesn't seem well integrated with the rest of the review (and is not linked to the ECM).

      (4) Table 1 is difficult to follow. It could be reformatted to facilitate reading and understanding

      (5) Figure 2 appears very complex and broad.

      (6) Spelling and grammar should be thoroughly checked to improve the readability.

    4. Reviewer #2 (Public review):

      Summary:

      The manuscript is a timely and comprehensive review of how the extracellular matrix (ECM), particularly the vascular basement membrane, regulates leukocyte extravasation, migration, and downstream immune function. It integrates molecular, mechanical, and spatial aspects of ECM biology in the context of inflammation, drawing from recent advances. The framing of ECM as an active instructor of immune cell fate is a conceptual strength.

      Strengths:

      (1) Comprehensive synthesis of ECM functions across leukocyte extravasation and post-transmigration activity.

      (2) Incorporation of recent high-impact findings alongside classical literature.

      (3) Conceptually novel framing of ECM as an active regulator of immune function.

      (4) Effective integration of molecular, mechanical, and spatial perspectives.

      Weaknesses:

      (1) Insufficient narrative linkage between the vascular phase (Sections 2-6) and the in-tissue phase (Sections 7-10).

      (2) Underrepresentation of lymphocyte biology despite mention in early sections.

      (3) The MIKA macrophage identity framework is only loosely tied to ECM mechanisms.

      (4) Limited discussion of translational implications and therapeutic strategies.

      (5) Overly dense figure insets and underdeveloped links between ECM carryover and downstream immune phenotypes.

      (6) Acronyms and some mechanistic details may limit accessibility for a broader readership.

    5. Reviewer #3 (Public review):

      Summary & Strengths:

      This review by Yu-Tung Li sheds new light on the processes involved in leukocyte extravasation, with a focus on the interaction between leukocytes and the extracellular matrix. In doing so, it presents a fresh perspective on the topic of leukocyte extravasation, which has been extensively covered in numerous excellent reviews. Notably, the role of the extracellular matrix in leukocyte extravasation has received relatively little attention until recently, with a few exceptions, such as a study focusing on the central nervous system (J Inflamm 21, 53 (2024) doi.org/10.1186/s12950-024-00426-6) and another on transmigration hotspots (J Cell Sci (2025) 138 (11): jcs263862 doi.org/10.1242/jcs.263862). This review synthesizes the substantial knowledge accumulated over the past two decades in a novel and compelling manner.

      The author dedicates two sections to discussing the relevant barriers, namely, endothelial cell-cell junctions and the basement membrane. The following three paragraphs address how leukocytes interact with and transmigrate through endothelial junctions, the mechanisms supporting extravasation, and how minimal plasma leakage is achieved during this process. The subsequent question of whether the extravasation process affects leukocyte differentiation and properties is original and thought-provoking, having received limited consideration thus far. The consequences of the interaction between leukocytes and the extracellular matrix, particularly regarding efferocytosis, macrophage polarization, and the outcome of inflammation, are explored in the subsequent three chapters. The review concludes by examining tissue-specific states of macrophage identity.

      Weaknesses:

      Firstly, the first ten sections provide a comprehensive overview of the topic, presenting logical and well-formulated arguments that are easily accessible to a general audience. In stark contrast, the final section (Chapter 11) fails to connect coherently with the preceding review and is nearly incomprehensible without prior knowledge of the author's recent publication in Cell. Mol. Life Sci. CMLS 772 82, 14 (2024). This chapter requires significantly more background information for the general reader, including an introduction to the Macrophage Identity Kinetics Archive (MIKA), which is not even introduced in this review, its basis (meta-analysis of published scRNA-seq data), its significance (identification of major populations), and the reasons behind the revision of the proposed macrophage states and their further development. Secondly, while the attempt to integrate a vast amount of information into fewer figures is commendable, it results in figures that resemble a complex puzzle. The author may consider increasing the number of figures and providing additional, larger "zoom-in" panels, particularly for the topics of clot formation at transmigration hotspots and the interaction between ECM/ECM fragments and integrins. Specifically, the color coding (purple for leukocyte α6-integrins, blue for interacting laminins, also blue for EC α6 integrins, and red for interacting 5-1-1 laminins) is confusing, and the structures are small and difficult to recognize.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This work aims to elucidate the molecular mechanisms affected in hypoxic conditions, causing reduced cortical interneuron migration. They use human assembloids as a migratory assay of subpallial interneurons into cortical organoids and show substantially reduced migration upon 24 hours of hypoxia. Bulk and scRNA-seq show adrenomedullin (ADM) up-regulation, as well as its receptor RAMP2, confirmed atthe protein level. Adding ADM to the culture medium after hypoxic conditions rescues the migration deficits, even though the subtype of interneurons affected is not examined. However, the authors demonstrate very clearly that ineffective ADM does not rescue the phenotype, and blocking RAMP2 also interferes with the rescue. The authors are also applauded for using 4 different cell lines and using human fetal cortex slices as an independent method to explore the DLXi1/2GFP-labelled iPSC-derived interneuron migration in this substrate with and without ADM addition (after confirming that also in this system ADM is up-regulated). Finally, the authors demonstrate PKA-CREB signalling mediating the effect of ADM addition, which also leads to up-regulation of GABAreceptors. Taken together, this is a very carefully done study on an important subject - how hypoxia affects cortical interneuron migration. In my view, the study is of great interest.

      Strengths:

      The strengths of the study are the novelty and the thorough work using several culture methods and 4 independent lines.

      Weaknesses:

      The main weakness is that other genes regulated upon hypoxia are not confirmed, such that readers will not know until which fold change/stats cut-off data are reliable.

      Reviewer #2 (Public review):

      Summary

      The manuscript by Puno and colleagues investigates the impact of hypoxia on cortical interneuron migration and downstream signaling pathways. They establish two models to test hypoxia, cortical forebrain assembloids, and primary human fetal brain tissue. Both of these models provide a robust assay for interneuron migration. In addition, they find that ADM signaling mediates the migration deficits and rescue using exogenous ADM.

      Strengths:

      The findings are novel and very interesting to the neurodevelopmental field, revealing new insights into how cortical interneurons migrate and as well, establishing exciting models for future studies. The authors use sufficient iPSC lines including both XX and XY, so the analysis is robust. In addition, the RNAseq data with re-oxygenation is a nice control to see what genes are changed specifically due to hypoxia. Further, the overall level of validation of the sequencing data and involvement of ADM signaling is convincing, including the validation of ADM at the protein level. Overall, this is a very nice manuscript.

      Weaknesses:

      I have a few comments and suggestions for the authors. See below.

      Reviewer #3 (Public review):

      Summary:

      The authors aimed to test whether hypoxia disrupts the migration of human cortical interneurons, a process long suspected to underlie brain injury in preterm infants but previously inaccessible for direct study. Using human forebrain assembloids and ex vivo developing brain tissue, they visualized and quantified interneuron migration under hypoxic conditions, identified molecular components of the response, and explored the effect of pharmacological intervention (specifically ADM) on restoring the migration deficits.

      Strengths:

      The major strength of this study lies in its use of human forebrain assembloids and ex vivo prenatal brain tissue, which provide a direct system to study interneuron migration under hypoxic conditions. The authors combine multiple approaches: long-term live imaging to directly visualize interneuron migration, bulk and single-cell transcriptomics to identify hypoxia-induced molecular responses, pharmacological rescue experiments with ADM to establish therapeutic potential, and mechanistic assays implicating the cAMP/PKA/pCREB pathway and GABA receptor expression in mediating the effect. Together, this rigorous and multifaceted strategy convincingly demonstrates that hypoxia disrupts interneuron migration and that ADM can restore this defect through defined molecular mechanisms.

      Overall, the authors achieve their stated aims, and the results strongly support their  conclusions. The work has a significant impact by providing the first direct evidence of hypoxia-induced interneuron migration deficits in the human context, while also nominating a candidate therapeutic avenue. Beyond the specific findings, the methodological platform - particularly the combination of assembloids and live imaging - will be broadly useful to the community for probing neurodevelopmental processes in health and disease.

      Weaknesses:

      The main weakness of the study lies in the extent to which forebrain assembloids

      recapitulate in vivo conditions, as the migration of interneurons from hSO to hCO does not fully reflect the native environment or migratory context of these cells. Nevertheless, this limitation is tempered by the fact that the work provides the first direct observation of human interneuron migration under hypoxia, representing a major advance for the field. In addition, while the transcriptomic analyses are valuable and highlight promising candidates, more in-depth exploration will be needed to fully elucidate the molecular mechanisms governing neuronal migration and maturation under hypoxic conditions.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) The authors should examine if all cortical interneurons are affected by ADM or only subtypes (Parvalbumin/Somatostatin).

      We thank the reviewer for raising this important question. In our study, we utilized the Dlx1/2b::eGFP reporter to broadly label cortical interneurons; however, this system does not distinguish specific interneuron subtypes. To address this, in the revised version of the manuscript we will use the single-cell RNA sequencing data and immunostainings to provide this information. Based on previous analyses from Birey et al (Cell Stem Cell, 2022), we expect interneurons within assembloids to express mostly calbindin (CALB2) and somatostatin (SST) at this in vitro stage of development; parvalbumin subtype appears later based on data from Birey et al (Nature, 2017) and more recently from Varela et al, (bioRxiv, 2025).

      In parallel, we will analyze available scRNA-seq data from developing human primary brain tissue a similar age as the one used in the manuscript, and check whether these subtypes of interneurons are similar to the ones within assembloids.

      (2) The authors should test more candidates from their bulk RNA-seq data with different fold changes for regulation after hypoxia, to allow the reader to judge at which cut-off the DEGs may be reproducible. This would make this database much more valuable for the field of hypoxia research.

      We appreciate the reviewers’ thoughtful suggestion. In addition to the bulk RNA-seq analysis, we did validate several upregulated hypoxia-responsive genes with varying fold changes by qPCR; these include PDK1, PFKP, VEGFA (Figure S1). 

      We go agree that in-depth investigation of specific cut-offs would be interesting, however, this could be the focus of a different manuscript.

      Reviewer #2 (Recommendations for the authors):

      (1) Can the authors comment on the possibility of inflammatory response pathways being activated by hypoxia? Has this been shown before? While not the focus of the manuscript, it could be discussed in the Discussion as an interesting finding and potential involvement of other cells in the Hypoxic response.

      We thank the reviewer this important comment about inflammation. Indeed, hypoxia has been shown to activate the inflammatory response pathways. In various studies, it was found that HIF-1a can interact with NF-κB signaling, leading to the upregulation of pro-inflammatory cytokines such as IL-1β, IL-6, and TNF-α (Rius et al., Cell, 2008; Hagberg et al., Nat Rev Neurol, 2015).

      In our transcriptomics data (Figure 2D), and to the reviewers’ point, we identified enrichment of inflammatory signaling response following the hypoxic exposure. Since hSO at the time of analyses do contain astrocytes, we think these glia contribute to the observed pro-inflammatory changes. Based on these results and because ADM is known to have strong anti-inflammatory properties, the effects of ADM on hypoxic astrocytes should be investigated in future studies focused on hypoxia-induced inflammation. In the revision, we will address this comment in the discussion section and cite the appropriate papers.

      (2) Could the authors comment on the mechanism at play here with respect to ADM and binding to RAMP2 receptors - is this a potential autocrine loop, or is the source of ADM from other cell types besides inhibitory neurons? Given the scRNA-seq data, what cell-to-cell mechanisms can be at play? Since different cells express ADM, there could be different mechanisms in place in ventral vs dorsal areas.

      Based on our scRNA-seq data in hSOs showing significant upregulation of ADM expression in astrocytes and progenitors, we speculate that the primary mechanism is likely to involve paracrine interactions. However, we cannot exclude autocrine mechanisms with the included experiments. Dissecting these interactions in a cell-type specific manner could be an important focus for future ADM-related studies.

      To address the question about the possible different mechanisms in ventral versus dorsal areas, in the revision we will plot and include in the figures the data about the cell-type expression of ADM and its receptors in hCOs.

      (3) For data from Figure 6 - while the ELISA assays are informative to determine which pathways (PKA, AKT, ERK) are active, there is no positive control to indicate these assays are "working" - therefore, if possible, western blot analysis from assembloid tissue could be used (perhaps using the same lysates from Figure 3) as an alternative to validate changes at the protein level (however, this might prove difficult); further to this, is P-CREB activated at the protein level using WB?

      We thank the reviewer for this comment and the observation. Although we did not include a traditional positive control in these ELISA assays, several lines of evidence indicate that the measurements are reliable. First, the standard curves behaved as expected, and all sample values fell within the assay’s dynamic range. Second, technical replicates showed low variability, and the observed changes across experimental conditions (e.g., hypoxia vs. control) were consistent with the expected biological responses based on previous literature. We agree that including western blot validation would strengthen the findings, and we will note this for our future studies focused on CREB and ADM.

      (4) Could the authors comment further on the mechanism and what biological pathways and potential events are downstream of ADM binding to RAMP2 in inhibitory neurons? What functional impact would this have linked to the CREB pathway proposed? While the link to GABA receptors is proposed, CREB has many targets beyond this.

      We appreciate the reviewers’ insightful question. Currently, not much is known about the molecular pathways and downstream cellular events triggered by ADM binding to RAMP2 in inhibitory neurons, and in general in brain cells. The data from our study brings the first information about the cell-type specific expression of ADM in baseline and hypoxic conditions and is one of the key novelties of our study.

      While the signaling landscape of ADM in interneurons is largely unexplored, several studies in other (non-brain) cell types have demonstrated that ADM binding to RAMP2 can activate downstream cascades such as the cAMP/PKA/CREB pathway, PI3K/AKT, and ERK/MAPK, all of which are also known to be critical regulators of neuronal development and survival. These previously published data along with our CREB-targeted findings in hypoxic interneurons, suggest ADM–RAMP2 signaling could influence multiple aspects of interneuron biology, but these remain to be evaluated in future studies.

      We agree with the reviewer that CREB has a wide range of transcriptional targets. We decided to focus on GABA as a target of CREB for two main reasons, including: (i) GABA signaling has been previously shown to play an important role in the migration of cortical interneurons, and (ii) a previous study by Birey et al. (Cell Stem Cell, 2022) demonstrated that CREB pathway activity is essential for regulating interneuron migration in assembloid models of Timothy Syndrom, thus further providing evidence that dysregulation of CREB activity disrupts migration dynamics.

      While our study provides a first step toward uncovering the mechanisms of interneuron migration protection by ADM, we fully acknowledge that future work will be needed to delineate the full spectrum of ADM–RAMP2 downstream signaling events in inhibitory neurons and other brain cells.

      (5) Does hypoxia cause any changes to inhibitory neurogenesis (earlier stages than migration?) - this might always be known, but was not discussed.

      We appreciate this question from the reviewer; however, this was not something that we focused on in this manuscript due to the already large amount of data included. A separate study focusing on neurogenesis defects and the molecular mechanisms of injury for that specific developmental process would be an important next step.

      (6) In the Discussion section, it might be worth detailing to the readers what the functional impact of delayed/reduced migration of inhibitory neurons into the cortex might result in, in terms of functional consequences for neural circuit development.

      We thank the Reviewer for the suggestion of detailing the functional impact of reduced inhibitory neuron migration. We will revise the manuscript by incorporating a paragraph about this in the Discussion section.

      Reviewer #3 (Recommendations for the authors):

      Most of the evidence presented is convincing in supporting the conclusions, and I have only minor suggestions for improvement:

      (1) The bulk RNA-seq was performed in hSOs only, which may not fully capture the phenotypes of migrating or migrated interneurons. It would be valuable, if feasible, to sort migrated cells from hSO-hCO assembloids and specifically examine their molecular mediators.

      We thank the reviewer for this suggestion. While it is likely that the cellular environment will have some influence on a subset of the molecular changes, based on all the data from the manuscript and our specific target, the RNA-sequencing on hCOs was sufficient to capture essential changes like ADM upregulation. The in-depth exploration on differential responses of migrated versus non-migrated interneurons to hypoxia could be the focus of a different project.

      (2) In Figure 3, it is striking that cell-type heterogeneity dominates over hypoxia vs. control conditions. A joint embedding of hSO and hCO cells could provide further insight into molecular differences between migrated and non-migrated interneurons.

      We thank the reviewer for this observation and opportunity to clarify. Since we manually separated the assembloids before the analyses, we processed these samples separately. That is why they separate like this. In the revision, we will add data about ADM expression and its receptors’ expression in the hCOs.

      (3) It would be helpful to expand the discussion on how closely the migration observed in hSO-hCO assembloids reflects in vivo conditions, and what environmental aspects are absent from this model. This would better frame the interpretation and translational relevance of the findings.

      We thank the Reviewer for bringing up this important point. Although the assembloid model offers the unique advantage of allowing the direct investigation of migration patterns of hypoxic interneurons, we fully agree it does not fully recapitulate the in vivo environment. While there are multiple aspects that cannot be recapitulated in vitro at this time (e.g. cellular complexity, vasculature, immune response, etc), we are encouraged by the validation of our main findings in ex vivo developing human brain tissue, which strongly supports the validity of our findings for in vivo conditions.

      We will expand our discussion to include more details and the need to validate these findings using in vivo models, while also acknowledging that different species (e.g. rodents versus non-human primates versus humans) might have different responses to hypoxia.

      (4) The authors suggest that hypoxia is also associated with delayed interneuron maturation, yet the bulk RNA-seq data primarily reveal stress and hypoxia-related genes. A more detailed discussion of why genes linked to interneuron maturation and function were not strongly affected would clarify this point.

      We thank the Reviewer for the opportunity to clarify.

      The RNAseq data was performed during the acute stages of hypoxia/reoxygenation and we think a maturation phenotype might be difficult to capture at this point and would require analysis at later in vitro assembloid maturation stages.

      Our speculation about a possible maturation defect is based on data from previous studies from developmental biology that showed failure of interneurons to reach their final cortical location within a specified developmental window will impair their integration within the neuronal network, and thus lead to maturation defects and possible elimination by apoptosis.

      Since preterm infants suffer from countless hypoxic events over multiple months, we suggest these repetitive events are likely to induce cumulative delays in migration, inability of interneurons to reach their target in time, followed by abnormal integration within the excitatory network, and eventual elimination of some of these interneurons through apoptosis. However, the direct demonstration of this effect following a hypoxic insult would require prolonged in vivo experiments in rodents to follow the migration, network integration and apoptosis of interneurons; to our knowledge this experimental design is not technically feasible at this time.

      (5) Relatedly, while the focus on interneuron migration is well justified, acknowledging how hypoxia might also impact other aspects of cortical development (e.g., progenitor proliferation, neuronal maturation, or circuit integration) would place the findings in a broader developmental framework and strengthen their relevance.

      We appreciate the Reviewer’s suggestion to discuss the role of hypoxia on other processes during cortical development. In the revised manuscript, we will include citations about the effects of hypoxia on interneuron proliferation, maturation and circuit integration as available, and also expand to other cell types known to be affected.

      (6) Very minor: in Figure S3C and D, it was not stated what the colors mean (grey: control, yellow: hypoxia)

      Thank you for pointing out this error and we will correct it in our revision.

    2. eLife Assessment

      In this manuscript, the authors investigate the migration of human cortical interneurons under hypoxic conditions using forebrain assembloids and developing human brain tissue, and probe the underlying mechanisms. The study provides the first direct evidence that hypoxia delays interneuron migration and identifies adrenomedullin (ADM) as a potential therapeutic intervention. The findings are important, and the conclusions are convincingly supported by experimental evidence.

    3. Reviewer #1 (Public review):

      Summary:

      This work aims to elucidate the molecular mechanisms affected in hypoxic conditions, causing reduced cortical interneuron migration. They use human assembloids as a migratory assay of subpallial interneurons into cortical organoids and show substantially reduced migration upon 24 hours of hypoxia. Bulk and scRNA-seq show adrenomedullin (ADM) up-regulation, as well as its receptor RAMP2, confirmed atthe protein level. Adding ADM to the culture medium after hypoxic conditions rescues the migration deficits, even though the subtype of interneurons affected is not examined. However, the authors demonstrate very clearly that ineffective ADM does not rescue the phenotype, and blocking RAMP2 also interferes with the rescue. The authors are also applauded for using 4 different cell lines and using human fetal cortex slices as an independent method to explore the DLXi1/2GFP-labelled iPSC-derived interneuron migration in this substrate with and without ADM addition (after confirming that also in this system ADM is up-regulated). Finally, the authors demonstrate PKA-CREB signalling mediating the effect of ADM addition, which also leads to up-regulation of GABAreceptors. Taken together, this is a very carefully done study on an important subject - how hypoxia affects cortical interneuron migration. In my view, the study is of great interest.

      Strengths:

      The strengths of the study are the novelty and the thorough work using several culture methods and 4 independent lines.

      Weaknesses:

      The main weakness is that other genes regulated upon hypoxia are not confirmed, such that readers will not know until which fold change/stats cut-off data are reliable.

    4. Reviewer #2 (Public review):

      Summary

      The manuscript by Puno and colleagues investigates the impact of hypoxia on cortical interneuron migration and downstream signaling pathways. They establish two models to test hypoxia, cortical forebrain assembloids, and primary human fetal brain tissue. Both of these models provide a robust assay for interneuron migration. In addition, they find that ADM signaling mediates the migration deficits and rescue using exogenous ADM. The findings are novel and very interesting to the neurodevelopmental field, revealing new insights into how cortical interneurons migrate and as well, establishing exciting models for future studies. The authors use sufficient iPSC line,s including both XX and XY, so the analysis is robust. In addition, the RNAseq data with re-oxygenation is a nice control to see what genes are changed specifically due to hypoxia. Further, the overall level of validation of the sequencing data and involvement of ADM signaling is convincing, including the validation of ADM at the protein level. Overall, this is a very nice manuscript. I have a few comments and suggestions for the authors.

      Strengths and Weaknesses:

      (1) Can the authors comment on the possibility of inflammatory response pathways being activated by hypoxia? Has this been shown before? While not the focus of the manuscript, it could be discussed in the Discussion as an interesting finding and potential involvement of other cells in the Hypoxic response.

      (2) Could the authors comment on the mechanism at play here with respect to ADM and binding to RAMP2 receptors - is this a potential autocrine loop, or is the source of ADM from other cell types besides inhibitory neurons? Given the scRNA-seq data, what cell-to-cell mechanisms can be at play? Since different cells express ADM, there could be different mechanisms in place in ventral vs dorsal areas.

      (3) For data from Figure 6 - while the ELISA assays are informative to determine which pathways (PKA, AKT, ERK) are active, there is no positive control to indicate these assays are "working" - therefore, if possible, western blot analysis from assembloid tissue could be used (perhaps using the same lysates from Figure 3) as an alternative to validate changes at the protein level (however, this might prove difficult); further to this, is P-CREB activated at the protein level using WB?

      (4) Could the authors comment further on the mechanism and what biological pathways and potential events are downstream of ADM binding to RAMP2 in inhibitory neurons? What functional impact would this have linked to the CREB pathway proposed? While the link to GABA receptors is proposed, CREB has many targets beyond this.

      (5) Does hypoxia cause any changes to inhibitory neurogenesis (earlier stages than migration?) - this might always be known, but was not discussed.

      (6) In the Discussion section, it might be worth detailing to the readers what the functional impact of delayed/reduced migration of inhibitory neurons into the cortex might result in, in terms of functional consequences for neural circuit development.

    5. Reviewer #3 (Public review):

      Summary:

      The authors aimed to test whether hypoxia disrupts the migration of human cortical interneurons, a process long suspected to underlie brain injury in preterm infants but previously inaccessible for direct study. Using human forebrain assembloids and ex vivo developing brain tissue, they visualized and quantified interneuron migration under hypoxic conditions, identified molecular components of the response, and explored the effect of pharmacological intervention (specifically ADM) on restoring the migration deficits.

      Strengths:

      The major strength of this study lies in its use of human forebrain assembloids and ex vivo prenatal brain tissue, which provide a direct system to study interneuron migration under hypoxic conditions. The authors combine multiple approaches: long-term live imaging to directly visualize interneuron migration, bulk and single-cell transcriptomics to identify hypoxia-induced molecular responses, pharmacological rescue experiments with ADM to establish therapeutic potential, and mechanistic assays implicating the cAMP/PKA/pCREB pathway and GABA receptor expression in mediating the effect. Together, this rigorous and multifaceted strategy convincingly demonstrates that hypoxia disrupts interneuron migration and that ADM can restore this defect through defined molecular mechanisms.

      Overall, the authors achieve their stated aims, and the results strongly support their conclusions. The work has a significant impact by providing the first direct evidence of hypoxia-induced interneuron migration deficits in the human context, while also nominating a candidate therapeutic avenue. Beyond the specific findings, the methodological platform - particularly the combination of assembloids and live imaging - will be broadly useful to the community for probing neurodevelopmental processes in health and disease.

      Weaknesses:

      The main weakness of the study lies in the extent to which forebrain assembloids recapitulate in vivo conditions, as the migration of interneurons from hSO to hCO does not fully reflect the native environment or migratory context of these cells. Nevertheless, this limitation is tempered by the fact that the work provides the first direct observation of human interneuron migration under hypoxia, representing a major advance for the field. In addition, while the transcriptomic analyses are valuable and highlight promising candidates, more in-depth exploration will be needed to fully elucidate the molecular mechanisms governing neuronal migration and maturation under hypoxic conditions.

    1. eLife Assessment

      This study provides novel and fundamental insights into the long-term use of DREADDs to modulate neuronal activity in nonhuman primates. The exceptional evidence demonstrates the peak dynamics and the subsequent stability of chemogenetic effects for 1.5 years, informing the experimental designs and the interpretation of highly impactful chemogenetic studies in macaques. The protocols, data, and outcomes can serve as guidelines for future experiments. Therefore, the findings will be of significant interest to the field of chemogenetics and may also be of broader interest to researchers and clinicians who seek to utilize viral vectors and/or related genetic technologies.

    2. Reviewer #1 (Public review):

      Summary:

      Inhibitory hM4Di and excitatory hM3Dq DREADDs are currently the most commonly utilized chemogenetic tools in the field of nonhuman primate research, but there is a lack of available information regarding the temporal aspects of virally-mediated DREADD expression and function. Nagai et al. investigated the longitudinal expression and efficacy of DREADDs to modulate neuronal activity in the macaque model. The authors demonstrate that both hM4Di and hM3Dq DREADDs reach peak expression levels after approximately 60 days and that stable expression was maintained for up to two years for hM4Di and at least one year for hM3Dq DREADDs. During this period, DREADDs effectively modulated neuronal activity, as evidenced by a variety of measures, including behavioural testing, functional imaging, and/or electrophysiological recording. Notably, some of the data suggest that DREADD expression may decline after two-three years. This is a novel finding and has important implications for the utilization of this technology for long-term studies, as well as its potential therapeutic applications. Lastly, the authors highlight that peak DREADD expression may be significantly influenced by the presence of fused or co-expressed protein tags, emphasizing the importance of careful design and selection of viral constructs for neuroscientific research. This study represents a critical step in the field of chemogenetics, setting the scene for future development and optimization of this technology.

      Strengths:

      The longitudinal approach of this study provides important preliminary insights into the long-term utility of chemogenetics, which has not yet been thoroughly explored.

      The data presented are novel and inclusive, relying on well-established in vivo imaging methods as well as behavioral and immunohistochemical techniques. The conclusions made by the authors are generally supported by a combination of these techniques. In particular, the utilization of in vivo imaging as a non-invasive method is translationally relevant and likely to make an impact in the field of chemogenetics, such that other researchers may adopt this method of longitudinal assessment in their own experiments. Rigorous standards have been applied to the datasets, and the appropriate controls have been included where possible.

      The number of macaque subjects (20) from which data was available is also notable. Behavioral testing was performed in 11 subjects, FDG-PET in 5, electrophysiology in 1, and [11C]DCZ-PET in 15. This is an impressive accumulation of work that will surely be appreciated by the growing community of researchers using chemogenetics in nonhuman primates.

      The implication that chemogenetic effects can be maintained for up to 1.5-2 years, followed by a gradual decline beyond this period, is an important development in knowledge. The limited duration of DREADD expression may present an obstacle in the translation of chemogenetic technology as a potential therapeutic tool, and it will be of interest for researchers to explore whether this limitation can be overcome. This study therefore represents a key starting point upon which future research can build.

      Weaknesses:

      None.

    3. Reviewer #2 (Public review):

      Summary:

      This paper reports histological, PET imaging, functional and behavioural data evaluating the longevity of AAV2 infection in multiple brain areas of macaques in the context of DREADD experiments. The central aim is to provide unprecedented information about how long the expression of HM4di or HM3dq receptors are expressed and efficient in modulating brain functions after vector injections. The data show peak expression after 40 to 60 days of vector injection, and stable expressions for up to 1.5 years for hM4di, and that hM3dq remained mostly at 75% of peak after a year, declining to 50% after 2 years. DREADDs effectively modulated neuronal activity and behaviour for approximately two years, evaluated with behavioural testings, neural recordings or FDG-PET. A statistical evaluation revealed that vector titers, DREADD type and tags contribute to the measured peak level of DREADD expression.

      The article present a thorough discussion of the limitations and specificities of chemogenetic approaches in monkeys.

      Strength:

      These are unique data, in non-human primate (NHP), an animal model that not only features physiological and immunological characteristics similar to humans, but also contributes to neurobiological functional studies over long timescales with experiments spanning months or years. This evaluation of long-term efficacy of DREADDs will be very important for all laboratories using chemogenetics in NHP but also for future use of such approach in experimental therapies. The longevity estimates are based on multiple approaches including behavioural and neurophysiological, thus providing information on functional efficacy of DREADD expression.

      Performing such evaluation requires specific tools like PET imaging that very few monkey labs have access to. This study was done by the laboratory that has developed the radiotracer c11-DCZ, used here, a radiotracer binding selectively to DREADDs and providing, using PET, quantitative in vivo measures of DREADD expression. This study and its data should thus be a reference in the field, providing estimates to plan future chemogenetic experiments.

      Publishing databases of experimental outcomes in NHP DREADD experiments is crucial for the community because such experiments are rare, expensive and long. It contributes to refining experiments and reducing the number of animals overall used in the domain.

      Weaknesses:

      This study is a meta-analysis of several experiments performed in one lab. The good side is that it combined a large amount of data that might not have been published individually; the down side is that all things where not planned and equated, creating a lot of unexplained variances in the data. However, this was judiciously used by the authors to provide very relevant information. One might think that organized multi-centric experiments planned using the knowledge acquired here, will provide help testing more parameters, including some related to inter-individual variability, and particular genetic constructs.

    4. Reviewer #3 (Public review):

      Summary

      This manuscript, from the developers of the novel DREADD-selective agonist DCZ (Nagai et al., 2020), utilizes a unique dataset where multiple PET scans in a large number of monkeys, including baseline scans before AAV injection, 30-120 days post-injection, and then periodically over the course of the prolonged experiments, were performed to access short- and long-term dynamics of DREADD expression in vivo, and to associate DREADD expression with the efficacy of manipulating the neuronal activity or behavior. The goal was to provide critical insights into practicality and design of multi-year studies using chemogenetics, and to elucidate factors affecting expression stability.

      Strengths are systematic quantitative assessment of the effects of both excitatory and inhibitory DREADDs, quantification of both the short-term and longer-term dynamics, a wide range of functional assessment approaches (behavior, electrophysiology, imaging), and assessment of factors affecting DREADD expression levels, such as serotype, promoter, titer (concentration), tag, and DREADD type.

      These finding will undoubtedly have a very significant impact on the rapidly growing, but still highly challenging field of primate chemogenetic manipulations. As such, the work represents an invaluable resource for the community.

    1. eLife Assessment

      This study presents an important finding on the role of GATA4 in aging- and OA-associated cartilage pathology. The conclusions are well supported by compelling in vitro and in vivo evidence. This work will be of broad interest to both cell biologists and orthopedic/skeletal health clinicians.

    2. Reviewer #1 (Public review):

      Summary:

      This manuscript assesses the differences between young and aged chondrocytes. Through transcriptomic analysis and further assessments in chondrocytes, GATA4 was found to be increased in aged chondrocyte donors compared to young. Subsequent mechanistic analysis with lentiviral vectors, siRNAs, and a small molecule were used to study the role of GATA4 in young and old chondrocytes. Lastly, an in vivo study was used to assess the effect of GATA4 expression on osteoarthritis progression in a DMM mouse model.

      Strengths:

      This work linked the over expression of GATA4 to NF-kB signaling pathway activation, alterations to the TGF-b signaling pathway, and found that GATA4 increased the progression of OA compared to the DMM control group. Indicating that GATA4 contributes to the onset and progression of OA in aged individuals.

      Comments on revised version:

      Great work! All my concerns have been well addressed.

    3. Reviewer #2 (Public review):

      Summary:

      This study elucidated the impact of GATA4 on aging- and injury-induced cartilage degradation and osteoarthritis (OA) progression, based on the team's finding that GATA expression is positively correlated with aging in human chondrocytes. By integrating cell culture of human chondrocytes, gene manipulation tools (siRNA, lentivirus), biological/biochemical analyses and murine models of post-traumatic OA, the team found that increasing GATA4 levels reduced anabolism and increased catabolism of chondrocytes from young donors, likely through upregulation of the BMP pathway, and that this impact is not correlated with TGF-β stimulation. Conversely, silencing GATA4 by siRNA attenuated catabolism and elevated aggrecan/collagen II biosynthesis of chondrocytes from old donors. The physiological relevance of GATA4 was further validated by the accelerated OA progression observed in lentivirus-infected mice in the DMM model.

      Strengths:

      This is a highly significant and innovative study that provides new molecular insights into cartilage homeostasis and pathology in the context of aging and disease. The experiments were performed in a comprehensive and rigorous manner. The data were interpreted thoroughly in the context of the current literature.

      Weaknesses:

      The only aspect that would benefit from further clarification is a more detailed discussion of aging-associated ECM changes in the context of prior literature.

    4. Reviewer #3 (Public review):

      Summary:

      This is an exciting, comprehensive paper that demonstrates the role of GATA4 on OA-like changes in chondrocytes. The authors present elegant reverse translational experiments that justify this mechanism and demonstrate the sufficiency of GATA4 in a mouse model of osteoarthritis (DMM), where GATA4 drove cartilage degeneration and pain in a manner that was significantly worse than DMM alone. This could pave the way for new therapies for OA that account for both structural changes and pain.

      Strengths:

      (1) GATA4 was identified from human chondrocytes.

      (2) IHC and sequencing confirmed GATA4 presence.

      (3) Activation of SMADs is clearly shown in vitro with GATA4 overexpression.

      (4) The role of GATA4 was functionally assessed in vivo using the mouse DMM model, where the authors uncovered that GATA4 worsens OA structure and hyperalgesia in male mice.

      (5) It is interesting that GATA4 is largely known to be found in cardiac cells and to have a role in cardiac repair, metabolism, and inflammation, among other things listed by the authors in the discussion (in liver, lung, pancreas). What could this new knowledge of GATA4 mean for OA as a potentially systemically mediated disease, where cardiac disease and metabolic syndrome are often co-morbid?

      Weaknesses:

      I do not have further comments. Thank you for addressing the previously mentioned concerns.

    5. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #2 (Public review):

      The only aspect that would benefit from further clarification is a more detailed discussion of aging-associated ECM changes in the context of prior literature. 

      Thank you. Please refer to the new section (Lines 604-617)

      Reviewer #3 (Public review):

      (1) It would be useful to explain why GATA4 was chosen over HIF1a, which was the most differentially expressed. 

      Thank you. Please refer to Lines 530-537.  

      “Of note, Hypoxia-Inducible Factor 1α (HIF1 α) was the most differentially expressed gene predicted to regulate chondrocyte aging. The connection between HIF1 α and aging has been previously reported.[32] Furthermore, additional studies have investigated HIF1 in association with OA and assessed its use as a therapeutic target.[33,34] Therefore, we decided to focus on GATA4, which was less studied in chondrocytes but highly associated with cellular senescence, an aging hallmark. However, our selection did not dampen the importance of HIF1α and other molecules listed in Figure 1D in chondrocyte aging. They can be further studied in the future using the same strategy employed in the current work.”

      (2) In Figure 5, it would be useful to demonstrate the non-surgical or naive limbs to help contextualize OARSI scores and knee hyperalgesia changes. 

      In the current study, we focused on the DMM control and DMM Gata4 virus groups so we did not include a sham control group. We recognized this was a limitation of this study.  

      (3) While there appear to be GATA4 small-molecule inhibitors in various stages of development that could be used to assess the effects in age-related OA, those experiments are out of scope for the current study.  

      We agree with this comment that the results are still preliminary, which was the reason that we put it in the supplementary materials. However, we felt like the result is informative, which will support the potential of GATA4 as a therapeutic target and inspire the development of more specific inhibitors. Therefore, we would still keep the results in the current study.

    1. eLife Assessment

      This is a useful tool for code-less analysis of patterns in cell migratory behaviours in vivo using intravital microscopy data and allows correlation with spatial features of the tumour microenvironment. There is a clear need for these tools to make quantitative analysis, comparison and interpretation of complex cell tracking data more accessible and solid evidence is provided of its applicability to tracks generated by both proprietary and open tracking software.

    2. Reviewer #1 (Public review):

      In this work, Rios-Jimenez and Zomer et al have developed a 'zero-code' accessible computational framework (BEHAV3D-Tumour Profiler) designed to facilitate unbiased analysis of Intravital imaging (IVM) data to investigate tumour cell dynamics (via the tool's central 'heterogeneity module' ) and their interactions with the tumour microenvironment (via the 'large-scale phenotyping' and 'small-scale phenotyping' modules). A key strength is that it is designed as an open-source modular Jupyter Notebook with a user-friendly graphical user interface and can be implemented with Google Colab, facilitating efficient, cloud-based computational analysis at no cost. In addition, demo datasets are available on the authors GitHub repository to aid user training and enhance the usability of the developed pipeline.

      To demonstrate the utility of BEHAV3D-TP, they apply the pipeline to timelapse IVM imaging datasets to investigate the in vivo migratory behaviour of fluorescently labelled DMG cells in tumour bearing mice. Using the tool's 'heterogeneity module' they were able to identify distinct single-cell behavioural patterns (based on multiple parameters such as directionality, speed, displacement, distance from tumour edge) which was used to group cells into distinct categories (e.g. retreating, invasive, static, erratic). They next applied the framework's 'large-scale phenotyping' and 'small-scale phenotyping' modules to investigate whether the tumour microenvironment (TME) may influence the distinct migratory behaviours identified. To achieve this, they combine TME visualisation in vivo during IVM (using fluorescent probes to label distinct TME components) or ex vivo after IVM (by large-scale imaging of harvested, immunostained tumours) to correlate different tumour behavioural patterns with the composition of the TME. They conclude that this tool has helped reveal links between TME composition (e.g. degree of vascularisation, presence of tumour-associated macrophages) and the invasiveness and directionality of tumour cells, which would have been challenging to identify when analysing single kinetic parameters in isolation.<br /> While the analysis provides only preliminary evidence in support of the authors conclusions on DMG cell migratory behaviours and their relationship with components of the tumour microenvironment, conclusions are appropriately tempered in the absence of additional experiments and controls.

      The authors also evaluated the BEHAV3D TP heterogeneity module using available IVM datasets of distinct breast cancer cell lines transplanted in vivo, as well as healthy mammary epithelial cells to test its usability in non-tumour contexts where the migratory phenotypes of cells may be more subtle. This generated data is consistent with that produced during the original studies, as well as providing some additional (albeit preliminary) insights above that previously reported. Collectively, this provides some confidence in BEHAV3D TP's ability to uncover complex, multi-parametric cellular behaviours that may be missed using traditional approaches.

      While the tool does not facilitate the extraction of quantitative kinetic cellular parameters (e.g. speed, directionality, persistence and displacement) from intravital images, the authors have developed their tool to facilitate the integration of other data formats generated by open-source Fiji plugins (e.g. TrackMate, MTrackJ, ManualTracking) which will help ensure its accessibility to a broader range of researchers. Overall, this computational framework appears to represent a useful and comparatively user-friendly tool to analyse dynamic multi-parametric data to help identify patterns in cell migratory behaviours, and to assess whether these behaviours might be influenced by neighbouring cells and structures in their microenvironment.

      When combined with other methods, it therefore has the potential to be a valuable addition to a researcher's IVM analysis 'tool-box'.

    3. Reviewer #2 (Public review):

      Summary:

      The authors produce a new tool, BEHAV3D to analyse tracking data and to integrate these analyses with large and small scale architectural features of the tissue. This is similar to several other published methods to analyse spatio-temporal data, however, the connection to tissue features is a nice addition, as is the lack of requirement for coding. The tool is then used to analyse tracking data of tumour cells in diffuse midline glioma. They suggest 7 clusters exist within these tracks and that they differ spatially. They ultimately suggest that these behaviours occur in distinct spatial areas as determined by CytoMAP.

      Strengths:

      The tool appears relatively user-friendly and is open source. The combination with CytoMAP represents a nice option for researchers.

      The identification of associations between cell track phenotype and spatial features is exciting and the diffuse midline glioma data nicely demonstrates how this could be used.

    4. Reviewer #3 (Public review):

      The manuscript by Rios-Jimenez developed a software tool, BEHAV3D Tumor Profiler, to analyze 3D intravital imaging data and identify distinctive tumor cell migratory phenotypes based on the quantified 3D image data. Moreover, the heterogeneity module in this software tool can correlate the different cell migration phenotypes with variable features of the tumor microenvironment. Overall, this is a useful tool for intravital imaging data analysis and its open-source nature makes it accessible to all interested users.

      Strengths:

      An open-source software tool that can quantify cell migratory dynamics from intravital imaging data and identify distinctive migratory phenotypes that correlate with variable features of the tumor microenvironment.

      Weaknesses:

      Motility is the main tumor cell feature analyzed in the study together with some other tumor-intrinsic features, such as morphology. However, these features are insufficient to characterize and identify the heterogeneity of the tumor cell population that impacts their behaviors in the complex tumor microenvironment (TME). For instance, there are important non-tumor cell types in the TME, and the interaction dynamics of tumor cells with other cell types, e.g., fibroblasts and distinct immune cells, play a crucial role in regulating tumor behaviors. BEHAV3D-TP focuses on analysis of tumor-alone features, and cannot be applied to analyze important cell-cell interaction dynamics in 3D.

    1. eLife Assessment

      This study uses steered molecular dynamics simulations to interrogate force transmission in the mechanosensitive NOMPC channel, which plays roles including soft-touch perception, auditory function, and locomotion. The valuable finding that the ankyrin spring transmits force through torsional rather than compression forces may help understand the entire TRP channel family. The evidence is considered to be solid, although full opening of the channel is not seen, and it has been noted that experimental validation of reduced mechanosensitivity through mutagenesis of proposed ankyrin/TRP domain coupling interactions would help substantiate the findings.

    2. Reviewer #1 (Public review):

      Summary:

      This manuscript uses molecular dynamics simulations to understand how forces felt by the intracellular domain are coupled to opening of the mechanosensitive ion channel NOMPC. The concept is interesting - as the only clearly defined example of an ion channel that opens due to forces on a tethered domain, the mechanism by which this occur are yet to be fully elucidated. The main finding is that twisting of the transmembrane portion of the protein - specifically via the TRP domain that is conserved within the broad family of channels- is required to open the pore. That this could be a common mechanism utilised by a wide range of channels in the family, not just mechanically gated ones, makes the result significant. It is intriguing to consider how different activating stimuli can produce a similar activating motion within this family. While the authors do not see full opening of the channel, only an initial dilation, this motion is consistent with partial opening of structurally characterized members of this family.

      Strengths:

      Demonstrating that rotation of the TRP domain is the essential requirement for channel opening would have significant implcaitions for other members of this channel family.

      Weaknesses:

      The manuscript centres around 3 main computational experiments. In the first, a compression force is applied on a truncated intracellular domain and it is shown that this creates both a membrane normal (compression) and membrane parallel (twisting) force on the TRP domain. This is a point that was demonstrated in the authors prior eLife paper - so the point here is to quantify these forces for the second experiment.

      The second experiment is the most important in the manuscript. In this, forces are applied directly to two residues on the TRP domain with either a membrane normal (compression) or membrane parallel (twisting) direction, with the magnitude and directions chosen to match that found in the first experiment. Only the twisting force is seen to widen the pore in the triplicate simulations, suggesting that twisting, but not compression can open the pore. This result is intriguing and there appears to be a significant difference between the dilation of pore with the two force directions. When the forces are made of similar magnitude, twisting still has a larger effect than forces along the membrane normal.

      The second important consideration is that the study never sees full pore opening, rather a widening that is less than that seen in open state structures of other TRP channels and insufficient for rapid ion currents. This is something the authors acknowledge in their prior manuscript Twist may be the key to get this dilation, but we don't know if it is the key to full pore opening. Structural comparison to open state TRP channels supports that this represents partial opening along the expected pathway of channel gating.

      Experiment three considers the intracellular domain and determines the link between compression and twisting of the intracellular AR domain. In this case, the end of the domain is twisted and it is shown that the domain compresses, the converse to the similar study previously done by the authors in which compression of the domain was shown to generate torque.

    3. Reviewer #2 (Public review):

      This study uses all atom MD simulation to explore the mechanics of channel opening for the NOMPC mechanosensitive channel. Previously the authors used MD to show that external forces directed along the long-axis of the protein (normal to the membrane) results in AR domain compression and channel opening. This force causes two changes to the key TRP domains adjacent to the channel gate: 1) a compressive force pushes the TRP domain along the membrane normal, while 2) a twisting torque induces a clock-wise rotation on the TRP domain helix when viewing the bottom of the channel from the cytoplasm. Here, the authors wanted to understand which of those two changes are responsible for increasing the inner pore radius, and they show that it is the torque. The simulations in Figure 2 probe this question with different forces, and we can see the pore open with parallel forces in the membrane, but not with the membrane-normal forces. I believe this result as it is reproducible, the timescales are reaching 1 microsecond, and the gate is clearly increasing diameter to about 4 Å. This seems to be the most important finding in the paper, but the impact is limited since the authors already shows how forces lead to channel opening, and this is further teasing apart the forces and motions that are actually the ones that cause the opening.

    4. Reviewer #3 (Public review):

      Summary:

      This manuscript by Duan and Song interrogates the gating mechanisms and specifically force transmission in mechanosensitive NOMPC channels using steered molecular dynamics simulations. They propose that the ankyrin spring can transmit force to the gate through torsional forces adding molecular detail to the force transduction pathways in this channel.

      Strengths:

      Detailed, rigorous simulations coupled with a novel model for force transduction.

      Weaknesses:

      Experimental validation of reduced mechanosensitivity through mutagenesis of proposed ankyrin/TRP domain coupling interactions would greatly enhance the manuscript.

    1. eLife Assessment

      This important study examined the complexity of emergent dynamics of large-scale neural network models after perturbation (perturbational complexity index, PCI) and used it as a measurement of consciousness to account for previous recordings of humans at various anesthetized levels. The evidence supporting the conclusion is convincing and constitutes a unified framework for different observations related to consciousness. There are many fields that would be interested in this study, including cognitive neuroscience, psychology, complex systems, neural networks, and neural dynamics.

    2. Reviewer #1 (Public review):

      Summary:

      This paper attempts to measure the complex changes of consciousness in the human brain as a whole. Inspired by the perturbational complexity index (PCI) from classic research, authors introduce simulation PCI (𝑠𝑃𝐶𝐼) of a time series of brain activity as a measure of consciousness. They first use large-scale brain network modeling to explore its relationship with the network coupling and input noise. Then the authors verify the measure with empirical data collected in previous research.

      Strengths:

      The conceptual idea of the work is novel. The authors measure the complexity of brain activity from the perspective of dynamical systems. They provide a comparison of the proposed measure with four other indexes. The text of this paper is very concise, supported by experimental data and theoretical model analysis.

      Comments on revisions:

      The manuscript is in good shape after revision. I would suggest that the author open-source the code and data in this study.

    3. Reviewer #2 (Public review):

      Summary:

      Breyton and colleagues analysed the emergent dynamics from a neural mass model, characterised the resultant complexity of the dynamics, and then related these signatures of complexity to datasets in which individuals had been anaesthetised with different pharmacological agents. The results provide a coherent explanation for observations associated with different time series metrics, and further help to reinforce the importance of modelling when integrating across scientific studies.

      Strengths:

      * The modelling approach was clear, well-reasoned and explicit, allowing for direct comparison to other work and potential elaboration in future studies through the augmentation with richer neurobiological detail.

      * The results serve to provide a potential mechanistic basis for the observation that Perturbational Complexity Index changes as a function of consciousness state.

      Weaknesses:

      * Coactivation cascades were visually identified, rather than observed through an algorithmic lens. Given that there are numerous tools for quantifying the presence/absence of cascades from neuroimaging data, the authors may benefit from formalising this notion.

      * It was difficult to tell, graphically, where the model's operating regime lay. Visual clarity here will greatly benefit the reader.

      Comments on revisions:

      The authors have addressed my concerns.

    1. eLife Assessment

      This useful study examines the contribution of synaptotagmin 1 and synaptotagmin 7 to metabolite antigen presentation to mucosal-associated invariant T (MAIT) cells; it begins to address a critical gap in our understanding of the antigen presentation mechanisms to these cells. Strengths of the study include the use of Mtb to study the dynamics of antigen presentation to MAIT cells instead of a synthetic antigen. However, the strength of the evidence to support the conclusion is currently incomplete. The conclusions could be enhanced by additional dissection of some of the cell biological events that lead to antigen presentation by MR1.

    2. Reviewer #1 (Public review):

      Summary:

      The manuscript "Synaptotagmin 1 and Synaptotagmin 7 promote MR1-mediated presentation of Mycobacterium tuberculosis antigens", authored by Kim et al., showed that the calcium-sensing trafficking proteins Synaptotagmin (Syt) 1 and Syt7 specifically promote (are critical for) MAIT cell activation in response to Mtb-infected bronchial epithelial cell line BEAS-2B (Fig. 1) and monocyte-like cell line THP-1 (Figure 3) . This work also showed co-localization of Syt1 and Syt7 with Rab7a and Lamp1, but not with Rab5a (Figure 5). Loss of Syt1 and Syt7 resulted in a larger area of MR1 vesicles (Figure 6f) and an increased number of MR1 vesicles in close proximity to an Auxotrophic Mtb-containing vacuoles during infection (Figure 7ab). Moreover, flow organellometry was used to separate phagosomes from other subcellular fractions and identify enrichment of auxotrophic Mtb-containing vacuoles in fractions 42-50, which were enriched with Lamp1+ vacuoles or phagosomes (Figures 7e-f).

      Strengths:

      This work nicely associated Syt1 and Syt7 with late endocytic compartments and Mtb+ vacuoles. Gene editing of Syt1 and Syt7 loci of bronchial epithelial and monocyte-like cells supported Syt1 and Syt7 facilitated maintaining a normal level of antigen presentation for MAIT cell activation in Mtb infection. Imaging analyses further supported that Syt1 and Syt7 mutants enhanced the overlaps of MR1 with Mtb fluorescence, and the MR1 proximity with Mtb-infected vacuoles, suggesting that Syt1 and Syt7 proteins help antigen presentation in Mtb infection for MAIT activation.

      Weaknesses:

      Additional data are needed to support the conclusion, "identify a novel pathway in which Syt1 and Syt7 facilitate the translocation of MR1 from Mtb-containing vacuoles" and some pieces of other evidence may be seen by some to contradict this conclusion.

    3. Reviewer #2 (Public review):

      Summary:

      The study demonstrates that calcium-sensing trafficking proteins Synaptotagmin (Syt) 1 and Syt7 are involved in the efficient presentation of mycobacterial antigens by MR1 during M. tuberculosis infection.

      This is achieved by creating antigen-presenting cells in which the Syt1 and Syt7 genes are knocked out. These mutated cell lines show significantly reduced stimulation of MAIT cells, while their stimulation of HLA class I-restricted T cells remains unchanged. Syt1 and Syt7 co-localize in a late endo-lysosomal compartment where MR1 molecules are also located, near M. tuberculosis-containing vacuoles.

      Strengths:

      This work uncovers a new aspect of how mycobacterial antigens generated during infection are presented. The finding that Syt1 and Syt7 are relevant for final MR1 surface expression and presentation to MR1-restricted T cells is novel and adds valuable information to this process.

      The experiments include all necessary controls and convincingly validate the role of Syt1 and Syt7.

      Another key point is that these proteins are essential during infection, but they are not significant when an exogenous synthetic antigen is used in the experiments. This emphasizes the importance of studying infection as a physiological context for antigen presentation to MAIT cells.

      An additional relevant aspect is that the study reveals the existence of different MR1 antigen presentation pathways, which differ from the endoplasmic reticulum or endosomal pathways that are typical for MHC-presented peptides.

      Weaknesses:

      The reduced MAIT cell response observed with Syt1 and Syt7-deficient cell lines is statistically significant but not completely abolished. This may suggest that only some MR1-loaded molecules depend on these two Syt proteins. Further research is needed to determine whether, during persistent M. tuberculosis infection, enough MR1-loaded molecules are produced and transported to the plasma membrane to sufficiently stimulate MAIT cells.

      The study proposes that other Syt proteins might also play a role, as outlined by the authors. However, exploring potential redundant mechanisms that facilitate MR1 loading with antigens remains a challenging task.

    4. Reviewer #3 (Public review):

      Summary:

      In the submitted manuscript, the authors investigate the role of Synaptotagmins (Syt1) and (Syt7) in MR1 presentation of MtB.

      Strengths:

      In the first series of experiments, the authors determined that knocking down Syt1 and Sy7 in antigen-presenting cells decreases IFN-γ production following cellular infection with Mtb. These experiments are well performed and controlled.

      Weaknesses:

      Next, they aim to mechanistically investigate how Syt1 and Syt7 affect MtB presentation. In particular, they focus on MR1, a non-classical MHC-I molecule known to present endogenous and exogenous metabolites, including MtB metabolites.

      Results from these next series of experiments are less clear. Firstly, they show that knocking down Syt1 and Sy7 does not change MtB phagocytosis as well as MR1 ER-plasma membrane translocation. Based on this, they suggest that Syt1 and Syt7 may affect MR1 trafficking in endosomal compartments. However, neither subcellular compartment analysis nor flow organelleometry clearly establishes the role of Syt1 and Syt7 in MtB trafficking.

      Altogether, the notion that Synaptotagmins facilitate MR1 interaction with Mtb-containing compartments and its vesicular transport was already known. As such, the manuscript should add additional insight on where/how the interaction occurs. The reviewer is left with the notion that Syt1 and Sy7 may affect MR1 presentation, facilitating the trafficking of MR1 vesicles from endosomal compartments to either the cell surface or other endosomal compartments. The analysis is observational and additional data or discussion could address what the insight gained beyond what is already known from the literature.

    1. eLife Assessment

      This important study shows how hunger alters avoidance of harmful heat in C. elegans by reconfiguring the activity of key sensory neurons. The evidence is convincing, with well-designed behavioural, genetic, and imaging experiments that support the main conclusions. The work will be of interest to neuroscientists studying how internal states shape sensory processing and behaviour across species.

    2. Reviewer #1 (Public review):

      This study by Thapliyal and Glauser investigates the neural mechanisms that contribute to the progressive suppression of thermonociceptive behavior that is induced under conditions of starvation. Several previous studies have demonstrated that when starved, C. elegans alters its preferences for a variety of sensory cues, including CO2, temperature, and odors, in order to prioritize food seeking over other behavioral drives. The varied mechanisms that underlie the ability of internal states to alter behavioral responses are not fully understood, however there is growing evidence for a role by neuropeptidergic signaling as well as capacity for functionally distinct microcircuits, formed by distinct internal states, to trigger similar behavior outcomes.

      Within the physiological range of C. elegans (~15-25C), starvation triggers a profound reduction in temperature-driven thermotaxis behaviors. This reduction involves the recruitment of the amphid sensory neuron pair AWC. The AWC neurons primarily act to sense appetitive chemosensory cues, however under starvation conditions begin to display temperature responses that previous studies have linked to the reduction in thermotaxis navigation. Here, Thapliyal and Glauser investigate the impact of starvation on thermonociceptive responses, innate escape behaviors that are triggered by exposure to noxious temperatures above 26C or rapid thermal stimuli below 26C. They compare the strength of thermonociceptive behaviors, specifically heat-triggered reversals, in worms experiencing either early food deprivation (1 hour off food) or prolonged starvation (6 hours off food). Their experiments demonstrate a progressive loss of heat-triggered reversals that is mediated by AWC and ASI neurons, as well as both glutamateric and neuropeptidergic signaling.

      At the level of neural activity, this study reports that the transition from early food deprivation to prolonged starvation reconfigures the temperature-driven activity of AWC neurons from largely deterministic to stochastic. This finding is interesting in light of previous work that reported the opposite transition (from stochastic to deterministic) in temperature-driven AWC responses when comparing well-fed worms to those kept from food for 3 hours. This study also identifies neural and genetic mechanisms that contribute to differences in thermonociceptive responses at +1 versus +6 hours starvation; confusingly, these mechanisms are partially distinct from those that contribute to differences in negative thermotaxis behaviors in well-fed and +3 hours starvation worms (Takeishi et al 2020). A limitation of this manuscript is that these differences are not particularly acknowledged or addressed, other than the hypothesis that independent mechanisms underlie negative thermotaxis versus thermonociceptive stimuli. However, this suggestion is not experimentally verified. Multiple additional aspects of this study make the results difficult to synthesize with existing knowledge, including 1) differences in - and insufficient discussion of - the magnitude and kinetics of thermal stimuli; 2) this study's use of "heating power" rather than temperature values when presenting behavioral results; 3) the use of +1 hours starvation as a baseline instead of well-fed worms. Indeed, this last point reflects a noticeable experimental result that differs from previous studies, namely that at room temperature the basal movements of well-fed and starved worms are not different. Such a surprisingly result warrants further quantification of worm mobility in general and could have prompted a set of experiments directly testing previously published thermal conditions, to demonstrate that the new effects reported arise specifically from the use of thermonociceptive stimuli, as hypothesized. Finally, a previous report (Yeon et al 2021) demonstrated differences in the impact of chronic versus acute neural silencing on starvation-dependent plasticity in the context of negative thermotaxis. We therefore wonder whether similar developmental compensation impacts the neural circuits that contribute to starvation-dependent plasticity in the thermonociceptive responses.

      A weakness of this manuscript is that the introduction is insufficiently scholarly in terms of citations and the description of current knowledge surrounding the impact of internal state on sensory behavior, particularly given previous work on the impact of feeding state on thermosensory behavioral plasticity (Takeshi et al 2020, Yeon et al 2021) and chemosensory valence (Banerjee et al 2023, Rengarajan et al 2019, etc). Similarly, the authors commanding knowledge of the distinction between thermotaxis navigation (especially negative thermotaxis) and thermonociceptive behaviors could be communicated in more depth and clarity to the readers, in order to contextualize this study's new findings within the previous literature.

      Nevertheless, this study represents a solid addition to the growing evidence that C. elegans sensory behaviors are strongly impacted by internal states, and that neuropeptigergic signaling plays a key role in mediating behavioral plasticity. To that end, the authors have provided solid evidence of their claims.

    3. Reviewer #2 (Public review):

      In this work Thapliyal and Glauser tried to provide mechanistic understanding by which animals modulate their neural circuit responses to control nociceptive behavior on the basis of the dynamic internal feeding state. It is an important study that adds to growing body of evidences coming from multiple model systems. They have used elegant genetics, behavioral and Ca-imaging experiments to demonstrate how the auxiliary thermosensory neuron pair, AWC and one of the internal state sensing interneuron pair, ASI, respond to dynamic internal starvation-state to modulate behavioral response to noxious heat. Interestingly, these neuron pairs use distinct molecular mechanisms along with some other unidentified neurons to suppress heat-indued reversal response under short-term and prolonged starvations. The experiments are well performed that support most of the claims and provide important framework for future studies.

      I have some queries that if answered, will certainly enhance the study,

      (1) The results suggests that ASI is one of the primary drivers for the starvation-evoked behavioral plasticity, which regulates AWC activity under prolonged starvation. It raises many important questions including, a) how starvation modulates ASI response to heat? b) under prolonged starvation, whether ASI also promotes other, non-AWC, glutamatergic inhibitory neurons to suppress heat-induced reversal and how?

      (2) How does ASI regulate AWC activity? In the proposed model (figure 8) authors suggested an independent, unknown signal, other than INS-32 and NLP-18, from ASI to regulate AWC activity. However, from the results the existence of another signal is not very clear.

      (3) Previously, Takeishi et. al., showed that ins-1 dynamically modulates AWC-AIA mediated thermotaxis behavior based on the feeding state of the animal. It raises questions whether ins-1 also contributes to noxious heat-induced reversal behavior.

      (4) Experiments with AWC fate conversion mutants (nsy-1 and nsy-7) were very good ideas, however the results obtained were confusing. flp-6 mutant data suggests AWCoff would be essential for heat induced reversal, especially at the low intensity stimulus level. However, nsy-1 mutant forming two AWCon neurons showed complete rescue at the low heat level, which is quite opposite. Similarly, although less prominent, eat-4 rescue experiments suggested both nsy-1 and nsy-7 should behave normally at high heat condition, which was not the result observed.

    4. Reviewer #3 (Public review):

      Summary:

      Thapliyal and Glauser show that hunger alters how C. elegans respond to noxious thermal stimuli. Using targeted neural ablation, mutant analysis, and live-cell functional imaging the authors demonstrate that hunger changes the properties of AWC sensory neurons, which sense noxious heat. The authors further show that effects of hunger on nociception require ASI neurons, which are known to respond to hunger and mediate effects of food deprivation on behavior. Finally, the study uses mutant analysis to implicate glutamate and specific neuropeptides in thermal nociception and in modulation of nociceptors by hunger-responsive neurons.

      Strengths:

      The study clearly shows a strong effect of hunger on nociception and documents a striking effect of hunger on the intrinsic properties of AWC sensory neurons, which respond to noxious heat. The study also clearly and compellingly demonstrates that ablation of hunger-responsive ASI neurons blocks effects of hunger on nociceptive AWCs. These data, which constitute the kernel of the manuscript, are striking and exciting.

      Weaknesses:

      The study has some weaknesses that the authors should address.

      (1) Ablation of AWC neurons alters the basal sensitivity to noxious heat stimuli. This should be clearly noted in the description of the result and warrants some discussion.

      (2) Throughout the study it seems that data are replotted in multiple figure panels. The authors should clearly indicate in figure legends when this occurs. Also, the authors should ensure that statistical tests requiring multiple comparisons are correctly implemented and reflect the number of times experimental data are compared to a single set of control data.

      (3) How ASIs modulate AWCs remains unclear. The authors find that loss of INS-6, an insulin-like peptide provided by ASIs, partially recapitulates the effect of ASI ablation. This is observation is not further developed and instead the authors characterize other secreted factors that seem to mediate sensitization of animals to noxious heat stimuli. While it is interesting that there are multiple opposing inputs into the nociceptor circuit, the essential connection between ASIs and AWCs that underlies the foundational observations in figures 1 and 2 is not sufficiently characterized.

      (4) The assertion that 'starvation reshapes AWC responses from deterministic to stochastic' is not clearly supported by the data. AWC neurons seem capable of showing different responses to thermal stimuli, and the probabilities associated with these responses change after fasting. The different kinds of responses are seen under basal and fasted conditions.

    1. eLife Assessment

      This study presents a valuable quantitative framework for analyzing transcription dynamics data for enhancers and genes expressed in the early Drosophila embryo. By analyzing existing data across both synthetic reporters and an endogenous gene (eve), this work provides evidence that spatial gene expression patterns within the embryo are largely determined by "activity time" - the time during which a gene is bursting. The methods and evidence are solid and should be of broad interest to researchers in developmental biology and quantitative gene regulation, but the study would be significantly enhanced by clarifying the novelty of the findings relative to prior work and presenting a rigorous benchmarking of their algorithm against previously used algorithms.

    2. Reviewer #1 (Public review):

      Summary:

      In this article, the authors develop a method to re-analyze published data measuring the transcription dynamics of developmental genes within Drosophila embryos. Using a simple framework, they identify periods of transcriptional activity from traces of MS2 signal and analyze several parameters of these traces. In the five data sets they analyzed, the authors find that each transcriptional "burst" has a largely invariant duration, both across spatial positions in the embryo and across different enhancers and genes, while the time between transcriptional bursts varies more. However, they find that the best predictor of the mean transcription levels at different spatial positions in the embryo is the "activity time" -- the total time from the first to the last transcriptional burst in the observed cell cycle.

      Strengths:

      (1) The algorithm for analyzing the MS2 transcriptional traces is clearly described and appropriate for the data.

      (2) The analysis of the four transcriptional parameters -- the transcriptional burst duration, the time between bursts, the activity time, and the polymerase loading rate is clearly done and logically explained, allowing the reader to observe the different distributions of these values and the relationship between each of these parameters and the overall expression output in each cell. The authors make a convincing case that the activity time is the best predictor of a cell's expression output.

      (3) The figures are clearly presented and easy to follow.

      Weaknesses:

      (1) The strength of the relationship between the different transcriptional parameters and the mean expression output is displayed visually in Figures 5 and 7, but is not formally quantified. Given that the tau_off times seem more correlated to mean activity for some enhancers (e.g., rho) than others (e.g., sna SE), the quantification might be useful.

      (2) There are some mechanistic details that are not discussed in depth. For example, the authors observe that the accumulation and degradation of the MS2 signal have similar slopes. However, given that the accumulation represents the transcription of MS2 loops, while the degradation represents diffusion of nascent transcripts away from the site of transcription, there is no mechanistic expectation for this. The degradation of signal seems likely to be a property of the mRNA itself, which shouldn't vary between cells or enhancer reporters, but the accumulation rate may be cell- or enhancer-specific. Similarly, the activity time depends both on the time of transcription onset and the time of transcription cessation. These two processes may be controlled by different transcription factor properties or levels and may be interesting to disentangle.

      (3) There are previous analyses of the eve stripe dynamics, which the authors cite, but do not compare the results of their work to the previous work in depth.

    3. Reviewer #2 (Public review):

      Summary:

      In this work, Nieto et al. investigate how spatial gene expression patterns in the early Drosophila embryo are regulated at the level of transcriptional bursting. Using live-cell MS2 imaging data of four reporter constructs and the endogenous eve gene, the authors extract temporal dynamics of nascent transcription at single-cell resolution. They implement a novel, simplified algorithm to infer promoter ON/OFF states based on fluorescence slope dynamics and use this to quantify burst duration (Ton), inter-burst duration (Toff), and total activity time across space.

      The key finding is that while Ton and Toff remain relatively constant across space, the activity time-the window between first and last burst-is spatially modulated and best explains mean expression differences across the embryo. This uncovers a general strategy where early embryonic patterning genes modulate the duration of their transcriptionally permissive states, rather than the frequency or strength of bursting itself. The manuscript also shows that different enhancers of the same gene (e.g., sna proximal vs. shadow) can differentially modulate Toff and activity time, providing mechanistic insight into enhancer function.

      Strengths:

      The manuscript introduces activity time as a major, previously underappreciated determinant of spatial gene expression, distinct from Ton and Toff, providing an intuitive mechanistic link between temporal bursting and spatial patterning.

      The authors develop a tractable inference algorithm based on linear accumulation/decay rates of MS2 fluorescence, allowing efficient burst state segmentation across thousands of trajectories.

      Analysis across multiple biological replicates and different genes/enhancers lends confidence to the reproducibility and generalizability of the findings.

      By analyzing both synthetic reporter constructs and an endogenous gene (eve), the work provides a coherent view of how enhancer architecture and spatial regulation are intertwined with transcriptional kinetics.

      The supplementary information extends the biological findings with a gene expression noise model that accounts for non-exponential dwell times and illustrates how low-variability Ton buffers stochasticity in transcript levels.

      Weaknesses:

      The manuscript does not clearly delineate how this analysis extends beyond the prior landmark study (citation #40: Fukaya et al., 2016). While the current manuscript offers new modeling and statistics, more explicit clarification of what is novel in terms of biological conclusions and methodological advancement would help position the work.

      While the methods are explained in detail in the Supplementary Information, the manuscript would benefit from including a diagrammatic model and explicitly clarifying whether the model is descriptive or predictive in scope.

      The interpretation that fluorescence decay reflects RNA degradation could be confounded by polymerase runoff or transcript diffusion from the transcription site. These potential limitations are not thoroughly discussed.

      The so-called loading rate is used as an empirical parameter in fitting fluorescence traces, but is not convincingly linked to distinct biological processes. The manuscript would benefit from a more precise definition or reframing of this term.

      Impact and Utility:

      The study provides a general and scalable framework for dissecting transcriptional kinetics in developing embryos, with implications for understanding enhancer logic and developmental robustness. The algorithm is suitable for adaptation to other live-imaging datasets and could be useful across systems where temporal transcriptional variability is being quantified. By highlighting activity time as a key regulatory axis, the work shifts attention to transcriptionally permissive windows as a primary developmental control layer.

      This work will be of interest to: developmental biologists investigating spatial gene expression, researchers studying transcriptional regulation and noise, quantitative biologists developing models for transcriptional dynamics, and imaging and computational biologists working with live single-cell data.

    4. Reviewer #3 (Public review):

      Summary:

      In this paper, the authors developed a simple algorithm to analyse live imaging transcription data (MS2) and infer various kinetic parameters. They then applied it to analyse data from previous publications on Drosophila that measured the dynamics of reporter genes driven by various enhancers alone (sna, Kr, rho), or in an endogenous context (eve).

      The authors find that the main correlate with mean gene expression levels is the activity time, that is, the time during which the gene is bursting. They also find a correlation with the variation of the off time.

      Strengths:

      (1) The findings are very clearly presented.

      (2) The simplicity of the algorithm is nice, and the comparative analysis among the various enhancers can be helpful for the field.

      Weaknesses:

      (1) The algorithm is not benchmarked against previously used algorithms in the field to infer ON and OFF times, for example, those based on Hidden Markov models. A comparison would help strengthen the support for this algorithm (if it really works well) or show at which point one must be careful when interpreting this data.

      (2) More broadly, the novelty of the findings and how those fit within the knowledge of the field is not super clear. A better account of previous findings that have already quantified ON, OFF times and so on, and how the current findings fit within those, would help better appreciate the significance of the work.

    5. Author response:

      Reviewer #1 (Public review):

      (1) The strength of the relationship between the different transcriptional parameters and the mean expression output is displayed visually in Figures 5 and 7, but is not formally quantified. Given that the tau_off times seem more correlated to mean activity for some enhancers (e.g., rho) than others (e.g., sna SE), the quantification might be useful.

      We re-plot Figure 5 and Figure 7 to present the correlation between the studied burst parameters. As the reviewer suggested, after quantifying the correlation we can better study the correlation between the cells averaged tau-off and the cell-averaged fluorescence signal in some of the selected enhancers. As a result of these findings we decide to change our message and instead of claiming that the burst statistics are homogeneous over the embryo domain, to claim that these statistics have weak but significant correlations with the cell-averaged mean gene fluorescence.  

      (2) There are some mechanistic details that are not discussed in depth. For example, the authors observe that the accumulation and degradation of the MS2 signal have similar slopes. However, given that the accumulation represents the transcription of MS2 loops, while the degradation represents diffusion of nascent transcripts away from the site of transcription, there is no mechanistic expectation for this. The degradation of signal seems likely to be a property of the mRNA itself, which shouldn't vary between cells or enhancer reporters, but the accumulation rate may be cell- or enhancer-specific. Similarly, the activity time depends both on the time of transcription onset and the time of transcription cessation. These two processes may be controlled by different transcription factor properties or levels and may be interesting to disentangle.

      The accumulation slope represents the rate of nascent transcript production, which depends on transcription initiation frequency and RNA polymerase elongation rate. While transcription initiation rates can vary between enhancers, our results show that the loading rates are relatively comparable across different enhancer sequences (Figure 5D). Instead, the primary difference observed was in activity time and burst frequency, consistent with previous findings that enhancers predominantly modulate burst frequency (Fukaya et al., 2016). The degradation slope represents the diffusion of completed transcripts away from the transcription site, which should be an intrinsic property of the mRNA molecule and therefore independent of the regulatory sequences driving transcription.

      (3) There are previous analyses of the eve stripe dynamics, which the authors cite, but do not compare the results of their work to the previous work in depth.

      The goal of this manuscript is to compare transcriptional bursting properties across different enhancers, rather than to provide an in-depth analysis of eve stripe dynamics specifically. We analyzed four transgenic constructs with different enhancers alongside an endogenous eve construct, focusing on comparative bursting parameters rather than detailed eve expression patterns. Additionally, the previously published eve stripe dynamics data came from BAC constructs, whereas our data comes from the endogenous eve locus. This methodological difference makes direct comparison of stripe dynamics less straightforward and less relevant to our central research question about enhancer-driven bursting variability.

      Reviewer #2 (Public review):

      (1) The manuscript does not clearly delineate how this analysis extends beyond the prior landmark study (citation #40: Fukaya et al., 2016). While the current manuscript offers new modeling and statistics, more explicit clarification of what is novel in terms of biological conclusions and methodological advancement would help position the work.

      The prior study (Fukaya et al., 2016) characterized transcriptional bursting qualitatively, focusing on average burst properties per nucleus without systematic mathematical modeling or statistical analysis of burst-to-burst variability. While they demonstrated that enhancer strength correlates with burst frequency, no quantitative framework was developed to dissect the molecular mechanisms underlying these differences or to connect burst dynamics to spatial gene expression patterns.

      (1) We developed an explicit mathematical model with rigorous inference algorithms to quantify transcriptional states from fluorescence trajectories; (2) We performed comprehensive statistical analysis of burst timing distributions, revealing that inter-burst intervals follow exponential distributions while burst durations are hypo-exponentially distributed; (3) Most importantly, we discovered that burst kinetics (τON, τOFF) remain remarkably consistent across different genes and spatial locations, while spatial expression gradients arise primarily through modulation of activity time - the temporal window during which bursting occurs. This mechanistic insight reveals that enhancers regulate spatial patterning not by changing intrinsic burst properties, but by controlling the duration of transcriptionally permissive periods.

      (2) While the methods are explained in detail in the Supplementary Information, the manuscript would benefit from including a diagrammatic model and explicitly clarifying whether the model is descriptive or predictive in scope.

      We plan to prepare the diagrammatic model in the formal response. 

      (3) The interpretation that fluorescence decay reflects RNA degradation could be confounded by polymerase runoff or transcript diffusion from the transcription site. These potential limitations are not thoroughly discussed. (Write few lines in the discussion)

      This concern, related to the interpretation of the predictive model will be addressed in a future work. The decay in the fluorescence signal can be biologically related to the transcription termination, polymerase detachment, and diffusion. A key limitation of the approach is that the model is phenomenological and does not these capture processes that can be addressed with a more mechanistic model.

      (4) The so-called loading rate is used as an empirical parameter in fitting fluorescence traces, but is not convincingly linked to distinct biological processes. The manuscript would benefit from a more precise definition or reframing of this term.

      We modify the language of our definition of loading rate as follows: Loading rate is defined as the rate of increase of fluorescence signal following promoter activation. This quantity is a proxy measurement for the rate of RNA Polymerase II transcription initiation.” The full transcription process has multiple mechanisms including chromatin dynamics, 3D enhancer-promoter interactions, transcription factor binding, mRNA polymerase pausing, and interactions between developmental promoter motifs and associated proteins. We did not have access to specific measurements of these mechanisms and therefore cannot provide a solid biological meaning of the model behind the inference algorithm. However, the fact that we have reproducible results in biological replicas can support the robustness of our method at predicting the promoter state in the studied datasets. In the formal response we will compare the performance of our method with other available ones.

      Reviewer #3 (Public review):

      (1)The algorithm is not benchmarked against previously used algorithms in the field to infer ON and OFF times, for example, those based on Hidden Markov models. A comparison would help strengthen the support for this algorithm (if it really works well) or show at which point one must be careful when interpreting this data.

      We are implementing a benchmarking protocol to compare our results with the proposed and already published models. We expect to present this comparison in the formal response.

      (2) More broadly, the novelty of the findings and how those fit within the knowledge of the field is not super clear. A better account of previous findings that have already quantified ON, OFF times and so on, and how the current findings fit within those, would help better appreciate the significance of the work.

      To have a better clarity of the new findings we modified the title from “Regulation of Transcriptional Bursting and Spatial Patterning in Early Drosophila Embryo Development” to “Temporal Duration of Gene Activity is the main Regulator of Spatial Expression Patterns in Early Drosophila Embryos”.

      In short, (1) We developed an explicit mathematical model with rigorous inference algorithms to quantify transcriptional states from fluorescence trajectories; (2) We performed comprehensive statistical analysis of burst timing distributions, revealing that inter-burst intervals follow exponential distributions while burst durations are hypo-exponentially distributed; (3) Most importantly, we discovered that burst kinetics (τON, τOFF) remain remarkably consistent across different genes and spatial locations, while spatial expression gradients arise primarily through modulation of activity time - the temporal window during which bursting occurs. This mechanistic insight reveals that enhancers regulate spatial patterning not by changing intrinsic burst properties, but by controlling the duration of transcriptionally permissive periods.

    1. eLife Assessment

      There is a growing interest in understanding the individuality of animal behaviours. In this important article, the authors build and use an impressive array of high throughput phenotyping paradigms to examine the 'stability' (consistency) of behavioural characteristics in a range of contexts and over time. The results show that certain behaviours are individualistic and persist robustly across external stimuli while others are less robust to these changing parameters. The data supporting their findings is extensive and convincing.

    2. Reviewer #1 (Public review):

      Summary:

      The authors state the study's goal clearly: "The goal of our study was to understand to what extent animal individuality is influenced by situational changes in the environment, i.e., how much of an animal's individuality remains after one or more environmental features change." They use visually guided behavioral features to examine the extent of correlation over time and in a variety of contexts. They develop new behavioral instrumentation and software to measure behavior in Buridan's paradigm (and variations thereof), the Y-maze, and a flight simulator. Using these assays, they examine the correlations between conditions for a panel of locomotion parameters. They propose that inter-assay correlations will determine the persistence of locomotion individuality.

      Strengths:

      The OED defines individuality as "the sum of the attributes which distinguish a person or thing from others of the same kind," a definition mirrored by other dictionaries and the scientific literature on the topic. The concept of behavioral individuality can be characterized as: (1) a large set of behavioral attributes, (2) with inter-individual variability, that are (3) stable over time. A previous study examined walking parameters in Buridan's paradigm, finding that several parameters were variable between individuals, and that these showed stability over separate days and up to 4 weeks (DOI: 10.1126/science.aaw718). The present study replicates some of those findings, and extends the experiments from temporal stability to examining correlation of locomotion features between different contexts.

      The major strength of the study is using a range of different behavioral assays to examine the correlations of several different behavior parameters. It shows clearly that the inter-individual variability of some parameters is at least partially preserved between some contexts, and not preserved between others. The development of high-throughput behavior assays and sharing the information on how to make the assays is a commendable contribution.

      Weaknesses:

      The definition of individuality considers a comprehensive or large set of attributes, but the authors consider only a handful. In Supplemental Fig. S8, the authors show a large correlation matrix of many behavioral parameters, but these are illegible and are only mentioned briefly in Results. Why were five or so parameters selected from the full set? How were these selected? Do the correlation trends hold true across all parameters? For assays in which only a subset of parameters can be directly compared, were all of these included in the analysis, or only a subset?

      The correlation analysis is used to establish stability between assays. For temporal re-testing, "stability" is certainly the appropriate word, but between contexts it implies that there could be 'instability'. Rather, instead of the 'instability' of a single brain process, a different behavior in a different context could arise from engaging largely (or entirely?) distinct context-dependent internal processes, and have nothing to do with process stability per se. For inter-context similarities, perhaps a better word would be "consistency".

      The parameters are considered one-by-one, not in aggregate. This focuses on the stability/consistency of the variability of a single parameter at a time, rather than holistic individuality. It would appear that an appropriate measure of individuality stability (or individuality consistency) that accounts for the high-dimensional nature of individuality would somehow summarize correlations across all parameters. Why was a multivariate approach (e.g. multiple regression/correlation) not used? Treating the data with a multivariate or averaged approach would allow the authors to directly address 'individuality stability', along with the analyses of single-parameter variability stability.

      The correlation coefficients are sometimes quite low, though highly significant, and are deemed to indicate stability. For example, in Figure 4C top left, the % of time walked at 23{degree sign}C and 32{degree sign}C are correlated by 0.263, which corresponds to an R2 of 0.069 i.e. just 7% of the 32{degree sign}C variance is predictable by the 23{degree sign}C variance. Is it fair to say that 7% determination indicates parameter stability? Another example: "Vector strength was the most correlated attention parameter... correlations ranged... to -0.197," which implies that 96% (1 - R2) of Y-maze variance is not predicted by Buridan variance. At what level does an r value not represent stability?

      The authors describe a dissociation between inter-group differences and inter-individual variation stability, i.e. sometimes large mean differences between contexts, but significant correlation between individual test and retest data. Given that correlation is sensitive to slope, this might be expected to underestimate the variability stability (or consistency). Is there a way to adjust for the group differences before examining correlation? For example, would it be possible to transform the values to in-group ranks prior to correlation analysis?

      What is gained by classifying the five parameters into exploration, attention, and anxiety? To what extent have these classifications been validated, both in general, and with regard to these specific parameters? Is increased walking speed at higher temperature necessarily due to increased 'explorative' nature, or could it be attributed to increased metabolism, dehydration stress, or a heat-pain response? To what extent are these categories subjective?

      The legends are quite brief and do not link to descriptions of specific experiments. For example, Figure 4a depicts a graphical overview of the procedure, but I could not find a detailed description of this experiment's protocol.

      Using the current single-correlation analysis approach, the aims would benefit from re-wording to appropriately address single-parameter variability stability/consistency (as distinct from holistic individuality). Alternatively, the analysis could be adjusted to address the multivariate nature of individuality, so that the claims and the analysis are in concordance with each other.

      The study presents a bounty of new technology to study visually guided behaviors. The Github link to the software was not available. To verify successful transfer or open-hardware and open-software, a report would demonstrate transfer by collaboration with one or more other laboratories, which the present manuscript does not appear to do. Nevertheless, making the technology available to readers is commendable.

      The study discusses a number of interesting, stimulating ideas about inter-individual variability, and presents intriguing data that speaks to those ideas, albeit with the issues outlined above.

      While the current work does not present any mechanistic analysis of inter-individual variability, the implementation of high-throughput assays sets up the field to more systematically investigate fly visual behaviors, their variability, and their underlying mechanisms.

      Comments on revisions:

      While the incorporation of a hierarchical mixed model (HMM) appears to represent an improvement over their prior single-parameter correlation approach, it's not clear to me that this is a multivariate analysis. They write that "For each trait, we fitted a hierarchical linear mixed-effects model in Matlab (using the fit lme function) with environmental context as a fixed effect and fly identity (ID) as a random intercept... We computed the intraclass correlation coefficient (ICC) from each model as the between-fly variance divided by total variance. ICC, therefore, quantified repeatability across environmental contexts."

      Does this indicate that HMM was used in a univariate approach? Can an analysis of only five metrics of several dozen total metrics be characterized as 'holistic'?

      Within Figure 10a, some of the metrics show high ICC scores, but others do not. This suggests that the authors are overstating the overall persistence and/or consistency of behavioral individuality. It is clear from Figure S8 that a large number of metrics were calculated for each fly, but it remains unclear, at least to me, why the five metrics in Figure 10a are justified for selection. One is left wondering how rare or common is the 0.6 repeatability of % time walked among all the other behavioral metrics. It appears that a holistic analysis of this large data set remains impossible.

      The authors write: "...fly individuality persists across different contexts, and individual differences shape behavior across variable environments, thereby making the underlying developmental and functional mechanisms amenable to genetic dissection." However, presumably the various behavioral features (and their variability) are governed by different brain regions, so some metrics (high ICC) would be amenable to the genetic dissection of individuality/variability, while others (low ICC) would not. It would be useful to know which are which, to define which behavioral domains express individuality, and could be targets for genetic analysis, and which do not. At the very least, the Abstract might like to acknowledge that inter-context consistency is not a major property of all or most behavioral metrics.

      I hold that inter-trial repeatability should rightly be called "stability" while inter-context repeatability should be called "consistency". In the current manuscript, "consistency" is used throughout the manuscript, except for the new edits, which use "stability". If the authors are going to use both terms, it would be preferable if they could explain precisely how they define and use these terms.

    3. Reviewer #2 (Public review):

      Summary:

      The authors repeated measured the behavior of individual flies across several environmental situations in custom-made behavioral phenotyping rigs.

      Strengths:

      The study uses several different behavioral phenotyping devices to quantify individual behavior in a number of different situations and over time. It seems to be a very impressive amount of data. The authors also make all their behavioral phenotyping rig design and tracking software available, which I think is great and I'm sure other folks will be interested in using and adapting to their own needs.

      Weaknesses/Limitations:

      I think an important limitation is that while the authors measured the flies under different environmental scenarios (i.e. with different lighting, temperature) they didn't really alter the "context" of the environment. At least within behavioral ecology, context would refer to the potential functionality of the expressed behaviors so for example, an anti-predator context, or a mating context, or foraging. Here, the authors seem to really just be measuring aspects of locomotion under benign (relatively low risk perception) contexts. This is not a flaw of the study, but rather a limitation to how strongly the authors can really say that this demonstrates that individuality is generalized across many different contexts. It's quite possible that rank-order of locomotor (or other) behaviors may shift when the flies are in a mating or risky context.

      I think the authors are missing an opportunity to use much more robust statistical methods. It appears as though the authors used pearson correlations across time/situations to estimate individual variation; however far more sophisticated and elegant methods exist. The problem is that pearson correlation coefficients can be anti-conservative and additionally, the authors have thus had to perform many many tests to correlate behaviors across the different trials/scenarios. I don't see any evidence that the authors are controlling for multiple testing which I think would also help. Alternatively, though, the paper would be a lot stronger, and my guess is, much more streamlined if the authors employ hierarchical mixed models to analyse these data, which are the standard analytical tools in the study of individual behavioral variation. In this way, the authors could partition the behavioral variance into its among- and within-individual components and quantify repeatability of different behaviors across trials/scenarios simultaneously. This would remove the need to estimate 3 different correlations for day 1 & day 2, day 1 & 3, day 2 & 3 (or stripe 0 & stripe 1, etc) and instead just report a single repeatability for e.g. the time spent walking among the different strip patterns (eg. figure 3). Additionally, the authors could then use multivariate models where the response variables are all the behaviors combined and the authors could estimate the among-individual covariance in these behaviors. I see that the authors state they include generalized linear mixed models in their updated MS, but I struggled a bit to understand exactly how these models were fit? What exactly was the response? what exactly were the predictors (I just don't understand what Line404 means "a GLM was trained using the environmental parameters as predictors (0 when the parameter was not change, 1 if it was) and the resulting individual rank differences as the response"). So were different models run for each scenario? for different behaviors? Across scenarios? what exactly? I just harp on this because I'm actually really interested in these data and think that updating these methods can really help clarify the results and make the main messages much clearer!

      I appreciate that the authors now included their sample sizes in the main body of text (as opposed to the supplement) but I think that it would still help if the authors included a brief overview of their design at the start of the methods. It is still unclear to me how many rigs each individual fly was run through? Were the same individuals measured in multiple different rigs/scenarios? Or just one?

      I really think a variance partitioning modeling framework could certainly improve their statistical inference and likely highlight some other cool patterns as these methods could better estimate stability and covariance in individual intercepts (and potentially slopes) across time and situation. I also genuinely think that this will improve the impact and reach of this paper as they'll be using methods that are standard in the study of individual behavioral variation

    4. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewer #1 (Public review):  

      Summary:  

      The authors state the study's goal clearly: "The goal of our study was to understand to what extent animal individuality is influenced by situational changes in the environment, i.e., how much of an animal's individuality remains after one or more environmental features change." They use visually guided behavioral features to examine the extent of correlation over time and in a variety of contexts. They develop new behavioral instrumentation and software to measure behavior in Buridan's paradigm (and variations thereof), the Y-maze, and a flight simulator. Using these assays, they examine the correlations between conditions for a panel of locomotion parameters. They propose that inter-assay correlations will determine the persistence of locomotion individuality.

      Strengths:  

      The OED defines individuality as "the sum of the attributes which distinguish a person or thing from others of the same kind," a definition mirrored by other dictionaries and the scientific literature on the topic. The concept of behavioral individuality can be characterized as: (1) a large set of behavioral attributes, (2) with inter-individual variability, that are (3) stable over time. A previous study examined walking parameters in Buridan's paradigm, finding that several parameters were variable between individuals, and that these showed stability over separate days and up to 4 weeks (DOI: 10.1126/science.aaw718). The present study replicates some of those findings and extends the experiments from temporal stability to examining correlation of locomotion features between different contexts.  

      The major strength of the study is using a range of different behavioral assays to examine the correlations of several different behavior parameters. It shows clearly that the inter-individual variability of some parameters is at least partially preserved between some contexts, and not preserved between others. The development of high-throughput behavior assays and sharing the information on how to make the assays is a commendable contribution.

      Weaknesses:  

      The definition of individuality considers a comprehensive or large set of attributes, but the authors consider only a handful. In Supplemental Fig. S8, the authors show a large correlation matrix of many behavioral parameters, but these are illegible and are only mentioned briefly in Results. Why were five or so parameters selected from the full set? How were these selected? Do the correlation trends hold true across all parameters? For assays in which only a subset of parameters can be directly compared, were all of these included in the analysis, or only a subset?  

      The correlation analysis is used to establish stability between assays. For temporal re-testing, "stability" is certainly the appropriate word, but between contexts it implies that there could be 'instability'. Rather, instead of the 'instability' of a single brain process, a different behavior in a different context could arise from engaging largely (or entirely?) distinct context-dependent internal processes, and have nothing to do with process stability per se. For inter-context similarities, perhaps a better word would be "consistency".  

      The parameters are considered one-by-one, not in aggregate. This focuses on the stability/consistency of the variability of a single parameter at a time, rather than holistic individuality. It would appear that an appropriate measure of individuality stability (or individuality consistency) that accounts for the high-dimensional nature of individuality would somehow summarize correlations across all parameters. Why was a multivariate approach (e.g. multiple regression/correlation) not used? Treating the data with a multivariate or averaged approach would allow the authors to directly address 'individuality stability', along with the analyses of single-parameter variability stability.

      The correlation coefficients are sometimes quite low, though highly significant, and are deemed to indicate stability. For example, in Figure 4C top left, the % of time walked at 23{degree sign}C and 32{degree sign}C are correlated by 0.263, which corresponds to an R2 of 0.069 i.e. just 7% of the 32{degree sign}C variance is predictable by the 23{degree sign}C variance. Is it fair to say that 7% determination indicates parameter stability? Another example: "Vector strength was the most correlated attention parameter... correlations ranged... to -0.197," which implies that 96% (1 - R2) of Y-maze variance is not predicted by Buridan variance. At what level does an r value not represent stability?

      The authors describe a dissociation between inter-group differences and inter-individual variation stability, i.e. sometimes large mean differences between contexts, but significant correlation between individual test and retest data. Given that correlation is sensitive to slope, this might be expected to underestimate the variability stability (or consistency). Is there a way to adjust for the group differences before examining correlation? For example, would it be possible to transform the values to in-group ranks prior to correlation analysis?

      What is gained by classifying the five parameters into exploration, attention, and anxiety? To what extent have these classifications been validated, both in general, and with regard to these specific parameters? Is increased walking speed at higher temperature necessarily due to increased 'explorative' nature, or could it be attributed to increased metabolism, dehydration stress, or a heat-pain response? To what extent are these categories subjective?

      The legends are quite brief and do not link to descriptions of specific experiments. For example, Figure 4a depicts a graphical overview of the procedure, but I could not find a detailed description of this experiment's protocol.

      Using the current single-correlation analysis approach, the aims would benefit from re-wording to appropriately address single-parameter variability stability/consistency (as distinct from holistic individuality). Alternatively, the analysis could be adjusted to address the multivariate nature of individuality, so that the claims and the analysis are in concordance with each other.

      The study presents a bounty of new technology to study visually guided behaviors. The Github link to the software was not available. To verify successful transfer or open-hardware and open-software, a report would demonstrate transfer by collaboration with one or more other laboratories, which the present manuscript does not appear to do. Nevertheless, making the technology available to readers is commendable.

      The study discusses a number of interesting, stimulating ideas about interindividual variability and presents intriguing data that speaks to those ideas, albeit with the issues outlined above.

      While the current work does not present any mechanistic analysis of interindividual variability, the implementation of high-throughput assays sets up the field to more systematically investigate fly visual behaviors, their variability, and their underlying mechanisms.  

      Comments on revisions:  

      I want to express my appreciation for the authors' responsiveness to the reviewer feedback. They appear to have addressed my previous concerns through various modifications including GLM analysis, however, some areas still require clarification for the benefit of an audience that includes geneticists.  

      (1) GLM Analysis Explanation (Figure 9)  

      While the authors state that their new GLM results support their original conclusions, the explanation of these results in the text is insufficient. Specifically:

      The interpretation of coefficients and their statistical significance needs more detailed explanation. The audience includes geneticists and other nonstatistical people, so the GLM should be explained in terms of the criteria or quantities used to assess how well the results conform with the hypothesis, and to what extent they diverge.

      The criteria used to judge how well the GLM results support their hypothesis are not clearly stated.

      The relationship between the GLM findings and their original correlationbased conclusions needs better integration and connection, leading the reader through your reasoning.

      We thank the reviewer for highlighting this important point. We have revised the Results section in the reviseed manuscript to include a more detailed explanation of the GLM analysis. Specifically, we now clarify the interpretation of the model coefficients, including the direction and statistical significance, in relation to the hypothesized effects. We also outline the criteria we used to assess how well the GLM supports our original correlation-based conclusions—namely, whether the sign and significance of the coefficients align with the expected relationships derived from our prior analysis. Finally, we explicitly describe how the GLM results confirm or extend the patterns observed in the correlation-based analysis, to guide readers through our reasoning and the integration of both approaches.

      (2) Documentation of Changes  

      One struggle with the revised manuscript is that no "tracked changes" version was included, so it is hard to know exactly what was done. Without access to the previous version of the manuscript, it is difficult to fully assess the extent of revisions made. The authors should provide a more comprehensive summary of the specific changes implemented, particularly regarding:

      We thank the reviewer for bringing this to our attention. We were equally confused to learn that the tracked-changes version was not visible, despite having submitted one to eLife as part of our revision. 

      Upon contacting the editorial office, they confirmed that we did submit a trackedchanges version, but clarified that it did not contain embedded figures (as they were added manually to the clean version).  The editorial response said in detail: “Regarding the tracked-changes file: it appears the version with markup lacked figures, while the figure-complete PDF had markup removed, which likely caused the confusion mentioned by the reviewers.” We hope this answer from eLife clarifies the reviewers’ concern.

      (2)  Statistical Method Selection  

      The authors mention using "ridge regression to mitigate collinearity among predictors" but do not adequately justify this choice over other approaches. They should explain:

      Why ridge regression was selected as the optimal method  

      How the regularization parameter (λ) was determined  

      How this choice affects the interpretation of environmental parameters' influence on individuality

      We appreciate the reviewer’s thoughtful question regarding our choice of statistical method. In response, we have expanded the Methods section in the revised manuscript to provide a more detailed justification for the use of a GLM, including ridge regression. Specifically, we explain that ridge regression was selected to address collinearity and to control for overfitting.

      We now also describe how the regularization parameter (λ) was selected: we used 5-fold cross-validation over a log-spaced grid (10<sup>⁻⁶</sup> - 10<sup>⁶</sup) to identify the optimal value that minimized the mean squared error (MSE).

      Finally, we clarify in both the Methods and Results sections how this modeling choice affects the interpretation of our findings. 

      Reviewer #2 (Public review):  

      Summary:  

      The authors repeatedly measured the behavior of individual flies across several environmental situations in custom-made behavioral phenotyping rigs.

      Strengths:  

      The study uses several different behavioral phenotyping devices to quantify individual behavior in a number of different situations and over time. It seems to be a very impressive amount of data. The authors also make all their behavioral phenotyping rig design and tracking software available, which I think is great, and I'm sure other folks will be interested in using and adapting to their own needs.

      Weaknesses/Limitations:  

      I think an important limitation is that while the authors measured the flies under different environmental scenarios (i.e. with different lighting, temperature) they didn't really alter the "context" of the environment. At least within behavioral ecology, context would refer to the potential functionality of the expressed behaviors so for example, an anti-predator context, or a mating context, or foraging. Here, the authors seem to really just be measuring aspects of locomotion under benign (relatively low risk perception) contexts. This is not a flaw of the study, but rather a limitation to how strongly the authors can really say that this demonstrates that individuality is generalized across many different contexts. It's quite possible that rank-order of locomotor (or other) behaviors may shift when the flies are in a mating or risky context.  

      I think the authors are missing an opportunity to use much more robust statistical methods It appears as though the authors used pearson correlations across time/situations to estimate individual variation; however far more sophisticated and elegant methods exist. The problem is that pearson correlation coefficients can be anti-conservative and additionally, the authors have thus had to perform many many tests to correlate behaviors across the different trials/scenarios. I don't see any evidence that the authors are controlling for multiple testing which I think would also help. Alternatively, though, the paper would be a lot stronger, and my guess is, much more streamlined if the authors employ hierarchical mixed models to analyse these data, which are the standard analytical tools in the study of individual behavioral variation. In this way, the authors could partition the behavioral variance into its among- and within-individual components and quantify repeatability of different behaviors across trials/scenarios simultaneously. This would remove the need to estimate 3 different correlations for day 1 & day 2, day 1 & 3, day 2 & 3 (or stripe 0 & stripe 1, etc) and instead just report a single repeatability for e.g. the time spent walking among the different strip patterns (eg. figure 3). Additionally, the authors could then use multivariate models where the response variables are all the behaviors combined and the authors could estimate the among-individual covariance in these behaviors. I see that the authors state they include generalized linear mixed models in their updated MS, but I struggled a bit to understand exactly how these models were fit? What exactly was the response? what exactly were the predictors (I just don't understand what Line404 means "a GLM was trained using the environmental parameters as predictors (0 when the parameter was not changed, 1 if it was) and the resulting individual rank differences as the response"). So were different models run for each scenario? for different behaviors? Across scenarios? What exactly? I just harp on this because I'm actually really interested in these data and think that updating these methods can really help clarify the results and make the main messages much clearer!

      I appreciate that the authors now included their sample sizes in the main body of text (as opposed to the supplement) but I think that it would still help if the authors included a brief overview of their design at the start of the methods. It is still unclear to me how many rigs each individual fly was run through? Were the same individuals measured in multiple different rigs/scenarios? Or just one?

      I really think a variance partitioning modeling framework could certainly improve their statistical inference and likely highlight some other cool patterns as these methods could better estimate stability and covariance in individual intercepts (and potentially slopes) across time and situation. I also genuinely think that this will improve the impact and reach of this paper as they'll be using methods that are standard in the study of individual behavioral variation

      Reviewer #3 (Public review):  

      This manuscript is a continuation of past work by the last author where they looked at stochasticity in developmental processes leading to inter-individual behavioural differences. In that work, the focus was on a specific behaviour under specific conditions while probing the neural basis of the variability. In this work, the authors set out to describe in detail how stable individuality of animal behaviours is in the context of various external and internal influences. They identify a few behaviours to monitor (read outs of attention, exploration, and 'anxiety'); some external stimuli (temperature, contrast, nature of visual cues, and spatial environment); and two internal states (walking and flying).

      They then use high-throughput behavioural arenas - most of which they have built and made plans available for others to replicate - to quantify and compare combinations of these behaviours, stimuli, and internal states. This detailed analysis reveals that:

      (1) Many individualistic behaviours remain stable over the course of many days.  

      (2) That some of these (walking speed) remain stable over changing visual cues. Others (walking speed and centrophobicity) remain stable at different temperatures.

      (3) All the behaviours they tested fail to remain stable over spatially varying environment (arena shape).

      (4) and only angular velocity (a read out of attention) remains stable across varying internal states (walking and flying)

      Thus, the authors conclude that there is a hierarchy in the influence of external stimuli and internal states on the stability of individual behaviours.

      The manuscript is a technical feat with the authors having built many new high-throughput assays. The number of animals are large and many variables have been tested - different types of behavioural paradigms, flying vs walking, varying visual stimuli, different temperature among others.  

      Comments on revisions:'  

      The authors have addressed my previous concerns.  

      We thank the reviewer for the positive feedback and are glad our revisions have satisfactorily addressed the previous concerns. We appreciate the thoughtful input that helped us improve the clarity and rigor of the manuscript.

      Reviewer #1 (Recommendations for the authors):  

      Comment on Revised Manuscript  

      Recommendations for Improvement  

      (1) Expand the Results section for Figure 9 with a more detailed interpretation of the GLM coefficients and their biological significance

      (2) Provide explicit criteria (or at least explain in detail) for how the GLM results confirm or undermine their original hypothesis about environmental context hierarchy

      While the claims are interesting, the additional statistical analysis appears promising. However, clearer explanation of these new results would strengthen the paper and ensure that readers from diverse backgrounds can fully understand how the evidence supports the authors' conclusions about individuality across environmental contexts. 

      We thank the reviewer for these constructive suggestions. In response to these suggestions, we have expanded both the Methods and Results sections to provide a more detailed explanation of the GLM coefficients, including their interpretation and how they relate to our original correlation-based findings.

      We now clarify how the direction, magnitude, and statistical significance of specific coefficients reflect the influence of different environmental factors on the persistence of individual behavioral traits. To make this accessible to readers from diverse backgrounds, we explicitly outline the criteria we used to evaluate whether the GLM results support our hypothesis about the hierarchical influence of environmental context, namely, whether the structure and strength of effects align with the patterns predicted from our prior correlation analysis.

      These additions improve clarity and help readers understand how the new statistical results reinforce our conclusions about the context-dependence of behavioral individuality.

      Reviewer #2 (Recommendations for the authors):  

      Thanks for the revision of the paper! I updated my review to try and provide a little more guidance by what I mean about updating your analyses. I really think this is a super cool data set and I genuinely wish this were MY dataset so that way I could really dig into it to partition the variance. These variance partitioning methods are standard in my particular subfield (study of individual behavioral variation in ecology and evolution) and so I think employing them is 1) going to offer a MUCH more elegant and holistic view of the behavioral variation (e.g. you can report a single repeatability estimate for each behavior rather than 3 different correlations) and 2) improve the impact and readership for your paper as now you'll be using methods that a whole community of researchers are very familiar with. It's just a suggestion, but I hope you consider it!

      We sincerely thank the reviewer for the insightful and encouraging feedback and for introducing us to this modeling approach. In response to this suggestion, we have incorporated a hierarchical linear mixed-effects model into our analysis (now presented in Figure 10), accompanied by a new supplementary table (Table T3). We also updated the Methods, Results, and Discussion sections to describe the rationale, implementation, and implications of the mixed-model analysis.

      We agree with the reviewer that this approach provides a more elegant way to quantify behavioral variation and individual consistency across contexts. In particular, the ability to estimate repeatability directly aligns well with the core questions of our study. It facilitates improved communication of our findings to ecology, evolution, and behavior researchers. We greatly appreciate the suggestion; it has significantly strengthened both the analytical framework and the interpretability of the manuscript.

    1. eLife Assessment

      This valuable study analyzes aging-related chromatin changes through the lens of intra-chromosomal gene correlation length, which is a novel computational metric that captures spatial correlations in gene expression along the chromosome. The authors propose that this metric reflects chromatin structure and can serve as a proxy for its changes during aging. While currently the strength of evidence is somewhat incomplete, if revised with further supporting data, this work will provide a systems-level understanding of aging and genome regulation, which is predicted to have a substantive impact on the field.

    2. Reviewer #1 (Public review):

      Summary:

      In this manuscript, Mahajan et.al introduce two innovative macroscopic measures-intrachromosomal gene correlation length (𝓁∗) and transition energy barrier-to investigate chromatin structural dynamics associated with aging and age-related syndromes such as Hutchinson-Gilford Progeria Syndrome (HGPS) and Werner Syndrome (WRN). The authors propose a compelling systems-level approach that complements traditional biomarker-driven analyses, offering a more holistic and quantitative framework to assess genome-wide dysregulation. The concept of 𝓁∗ as a spatial correlation metric to capture chromatin disorganization is novel and well-motivated. The use of autocorrelation on distance-binned gene expression adds depth to the interpretation of chromatin state shifts. The energy landscape framework for gene state transitions is an elegant abstraction, with the notion of "irreversibility" providing a thermodynamic interpretation of transcriptional dysregulation. The application to multiple datasets (Fleischer, Line-1) and pathological states adds robustness to the analysis. The consistency of chromosome 6 (and to some extent chromosomes 16 and X) emerging as hotspots aligns well with known histone cluster localization and disease-relevant pathways. The manuscript does an excellent job of integrating transcriptomic trends with known epigenetic hallmarks of aging, and the proposed metrics can be used in place of traditional techniques like PCA in capturing structural transcriptome features. However, a direct correlation with ATACseq/ HiC data with the present analysis will be more informative.

      Strengths:

      Novel inclusion of statistical metrics that can help in systems-level studies in aging and chromatin biology.

      Weaknesses:

      (1) In the manuscript, the authors mention "While it may be intuitive to assume that highly expressed genes originate from euchromatin, this cannot be conclusively stated as a complete representation of euchromatin genes, nor can LAT be definitively linked to heterochromatin". What percentage of LAT can be linked to heterochromatin? What is the distribution of LAT and HAT in the euchromatin?

      (2) In Figure 2, the authors observe "that the signal from the HAT class is the stronger between two and the signal from the LAT class, being mostly uniform, can be constituted as background noise." Is this biologically relevant? Are low-abundance transcripts constitutively expressed? The authors should discuss this in the Results section.

      (3) The authors make a very interesting observation from Figure 3: that ASO-treated LINE-1 appears to be more effective in restoring HGPS cell lines closer to wild-type compared to WRN.. This can be explained by the difference in the basal activity of L1 elements in the HGPS vs WRN cell types. The authors should comment on this.

      (4) The authors report that "from the results on Fleicher dataset is the magnitude of the difference in similarity distance is more pronounced in 𝓁∗ than in gene expression." Does this mean that the alterations in gene distance and chromatin organization do not result in gene expression change during aging?

      (5) "In Fleischer dataset, as evident in Figure 4a, although changes in the heterochromatin are not identical for all chromosomes shown by the different degrees of variation of 𝓁∗ in each age group." The authors should present a comprehensive map of each chromosome change in gene distance to better explain the above statement.

      (6) While trends in 𝓁∗ are discussed at both global and chromosome-specific levels, stronger statistical testing (e.g., permutation tests, bootstrapping) would lend greater confidence, especially when differences between age groups or treatment states are modest.

      (7) While the transition energy barrier is an insightful conceptual addition, further clarification on the mathematical formulation and its physical assumptions (e.g., energy normalization, symmetry conditions) would improve interpretability. Also, in between Figures 7 and 8, the authors first compare the energy barrier of Chromosome 1 and then for all other chromosomes. What is the rationale for only analyzing chromosome 1? How many HAT or LAT are present there?

    3. Reviewer #2 (Public review):

      The authors report that intra-chromosomal gene correlation length (spatial correlations in gene expressions along the chromosome) serves as a proxy of chromatin structure and hence gene expression. They further explore changes in these metrics with aging. These are interesting and important findings. However, there are fundamental problems at this time.

      (1) The basic method lacks validation. There is no validation of the method by approaches that directly measure chromatin structure, for example ATAC-seq, ChIP-seq, or CUT n RUN.

      (2) There is no validation by interventions that directly probe chromatin structure, such as HDAC inhibitors. The authors employ datasets with knockdown of LINE-1 for validation. However, this is not a specific chromatin intervention.

      (3) There is no statistical analysis, e.g., in Figures 4 and 5.

      (4) The authors state, "in Figure 4a changes in the heterochromatin are not identical for all chromosomes shown...." I do not see the data for individual chromosomes.

      (5) In comparisons of WT vs HGPS NT or HGPS SCR (Figure S6), is this a fair comparison? The WT and HGPS are presumably from different human donors, so they have genetic and epigenetic differences unrelated to HGPS.

    4. Author response:

      Reviewer #1 (Public review):

      Summary:

      In this manuscript, Mahajan et. al. introduce two innovative macroscopic measures-intrachromosomal gene correlation length (𝓁∗) and transition energy barrier-to investigate chromatin structural dynamics associated with aging and age-related syndromes such as Hutchinson-Gilford Progeria Syndrome (HGPS) and Werner Syndrome (WRN). The authors propose a compelling systems-level approach that complements traditional biomarker-driven analyses, offering a more holistic and quantitative framework to assess genome-wide dysregulation. The concept of 𝓁∗ as a spatial correlation metric to capture chromatin disorganization is novel and well-motivated. The use of autocorrelation on distance-binned gene expression adds depth to the interpretation of chromatin state shifts. The energy landscape framework for gene state transitions is an elegant abstraction, with the notion of "irreversibility" providing a thermodynamic interpretation of transcriptional dysregulation. The application to multiple datasets (Fleischer, Line-1) and pathological states adds robustness to the analysis. The consistency of chromosome 6 (and to some extent chromosomes 16 and X) emerging as hotspots aligns well with known histone cluster localization and disease-relevant pathways. The manuscript does an excellent job of integrating transcriptomic trends with known epigenetic hallmarks of aging, and the proposed metrics can be used in place of traditional techniques like PCA in capturing structural transcriptome features. However, a direct correlation with ATACseq/HiC data with the present analysis will be more informative.

      (1) In the manuscript, the authors mention "While it may be intuitive to assume that highly expressed genes originate from euchromatin, this cannot be conclusively stated as a complete representation of euchromatin genes, nor can LAT be definitively linked to heterochromatin". What percentage of LAT can be linked to heterochromatin? What is the distribution of LAT and HAT in the euchromatin?

      Thank you for this insightful question. In the revision we will add chromatin state annotations using ChromHMM to identify overlap between HAT/LAT and corresponding chromatin state. This should provide the specific percentages and distributions you requested.

      We would like to take this opportunity to clarify that based on the plots Fig S1, and differential gene expressions, HAT is most likely a subset of euchromatin and LAT may contain both euchromatin and heterochromatin. The HAT/LAT cutoff occurs around the knee point in the log-log plot (Figure S1), where the linear portion indicates scale-invariant behavior with similar relative changes across expression ranks. The non-linear portion represents departure from power-law scaling, where low-expression genes exhibit sharper decline than expected. This suggests potential biological mechanisms such as chromatin silencing, detection limits, or technical artifacts related to sequencing depth.

      We will provide detailed chromatin state analysis in the revision. For reference, HAT gene lists per chromosome are available in our GitHub repository at: https://github.com/altoslabs/papers-2025-rnaseq-chrom-aging/tree/main/data/Preprocessed_dat a under /<dataset>/chromosome_{}/data_hi.

      (2) In Figure 2, the authors observe "that the signal from the HAT class is the stronger between two and the signal from the LAT class, being mostly uniform, can be constituted as background noise." Is this biologically relevant? Are low-abundance transcripts constitutively expressed? The authors should discuss this in the Results section.

      We apologize for the confusion arising from the usage of the term “background noise”. We agree that the distinction between high-abundance transcripts (HATs) and low-abundance transcripts (LATs) deserves more explicit discussion in the Results.

      Our intention is to say that HAT has a higher signal-to-noise ratio (SNR) compared to LAT. This is coming from the power law graph of FigS1.  Our intention is to state that the HAT class provides a strong, robust signal, consistent across chromosomes and the LAT class exhibits lower SNR and a more uniform background-like distribution in the context of the problem we are solving and not rather a generic biological statement. The experiment result that led to this statement is presented in FigS3. This does not imply that low-abundance transcripts lack biological relevance, but rather that they contribute less to the spatial organization patterns we measure.

      (3) The authors make a very interesting observation from Figure 3: that ASO-treated LINE-1 appears to be more effective in restoring HGPS cell lines closer to wild-type compared to WRN.. This can be explained by the difference in the basal activity of L1 elements in the HGPS vs WRN cell types. The authors should comment on this.

      We thank the reviewer for this incisive biological observation. While the differential effectiveness of ASO-treated LINE-1 in HGPS versus WRN cell lines is indeed an interesting phenomenon that may relate to basal L1 activity differences, this biological mechanism falls outside the scope of our current study.

      Our paper focuses on demonstrating that the 𝓁∗ metric can sensitively detect chromatin structural changes that have been independently validated. We utilize the Della Valle et al. (2022) dataset specifically because it provides experimentally confirmed chromatin structural differences (Progeroid vs wild-type vs ASO-treated Progeriod), allowing us to validate that 𝓁∗ correlates with these established changes.

      For detailed discussion of the biological mechanisms underlying differential LINE-1 ASO effectiveness between progeroid syndromes, we would direct readers to Della Valle et al. (2022) and related LINE-1 biology literature. Our contribution lies in demonstrating that 𝓁∗ can capture these chromatin organizational changes with enhanced sensitivity compared to traditional expression-based approaches. We are reluctant, without further experimentation, to venture into over-interpreting these results from a biology perspective.  

      (4) The authors report that "from the results on Fleischer dataset is the magnitude of the difference in similarity distance is more pronounced in 𝓁∗ than in gene expression." Does this mean that the alterations in gene distance and chromatin organization do not result in gene expression change during aging?

      Thank you for this important clarification request. This observation, illustrated in Figure 3, highlights two key points: (1) 𝓁∗ shows similar trends to PCA analysis, and (2) 𝓁∗ demonstrates higher sensitivity than traditional gene expression analysis.

      This enhanced sensitivity enables better discrimination between aging states, particularly in the Fleischer dataset representing natural aging where changes are more gradual. The higher sensitivity stems from 𝓁∗'s ability to capture transcriptional spatial organization through spatial autocorrelation, which can detect subtle organizational changes that may precede or accompany expression changes rather than replacing them.

      We will clarify in the revision that chromatin organizational changes and gene expression changes are complementary rather than mutually exclusive phenomena during aging.

      (5) "In Fleischer dataset, as evident in Figure 4a, although changes in the heterochromatin are not identical for all chromosomes shown by the different degrees of variation of 𝓁∗ in each age group." The authors should present a comprehensive map of each chromosome change in gene distance to better explain the above statement.

      Thank you for the feedback. If we understand your comment correctly, we need to provide a chromosome-wise distribution for Fig3c. We will update the paper and the supplementary.

      (6) While trends in 𝓁∗ are discussed at both global and chromosome-specific levels, stronger statistical testing (e.g., permutation tests, bootstrapping) would lend greater confidence, especially when differences between age groups or treatment states are modest.

      Thank you for the helpful suggestion. In the revision, we will incorporate permutation-based significance testing by shuffling the gene annotation and count table to generate a null distribution for our 𝓁∗ calculation. This will allow us to more rigorously assess whether the observed differences across age groups or treatment states deviate from chance expectations and thereby lend greater statistical confidence to our findings.

      (7) While the transition energy barrier is an insightful conceptual addition, further clarification on the mathematical formulation and its physical assumptions (e.g., energy normalization, symmetry conditions) would improve interpretability. Also, in between Figures 7 and 8, the authors first compare the energy barrier of Chromosome 1 and then for all other chromosomes.

      What is the rationale for only analyzing chromosome 1? How many HAT or LAT are present there?

      Regarding chromosome 1 focus: we initially presented chromosome 1 as a representative example, but we will include energy landscape analysis for all chromosomes in the supplementary materials

      We use the same HATs that were extracted during 𝓁∗ for the energy landscape as well. The HAT details are present in the github repo, the link provided in response to 1st feedback.

      The normalization of the energy barrier ensures comparability across chromosomes of different sizes and across samples with different absolute expression scales. Specifically, we normalize with respect to the total area under the two-dimensional energy landscape while using the thermal energy (k_B T) as a scaling factor to place transition energy barriers on the scale of thermal fluctuations. This is formally expressed as in Eq. (1). 

      The physical consequences of symmetry in the energy landscape are discussed in lines 472-491 of the manuscript, where we also introduce the concept of irreversibility. In brief, the chromatin energy landscape (Figure 8) is constructed by quantifying the energy contributions of genes that are upregulated (lower triangular matrix) and downregulated (upper triangular matrix) between two states. If the integrated energy contributions of upregulated and downregulated genes are equal, the landscape is symmetric, representing a thermodynamically reversible process, for example, nucleosome repositioning between euchromatic and heterochromatic regions without net gain or loss of nucleosomes. However, in cases where epigenetic modifications alter nucleosome density (e.g., disease states that reduce nucleosome numbers), the integrated energies are unequal, reflecting an irreversible energy cost. In this case, restoring chromatin requires additional energy input (e.g., to replace “missing” nucleosomes), which manifests as asymmetry in the landscape.

      Reviewer #2 (Public review):

      The authors report that intra-chromosomal gene correlation length (spatial correlations in gene expressions along the chromosome) serves as a proxy of chromatin structure and hence gene expression. They further explore changes in these metrics with aging. These are interesting and important findings. However, there are fundamental problems at this time.

      (1) The basic method lacks validation. There is no validation of the method by approaches that directly measure chromatin structure, for example ATAC-seq, ChIP-seq, or CUT n RUN.

      We appreciate the reviewer’s point that direct measurements such as ATAC-seq and ChIP-seq remain the gold standard for characterizing chromatin structure. Our method is designed to complement, not replace, these approaches by leveraging RNA-seq data to detect large-scale transcriptional patterns that correlate with chromatin dynamics.

      We agree that integrating datasets with paired RNA-seq and chromatin accessibility assays would strengthen the manuscript and plan to include one such dataset in the revision.

      Based on this feedback, we will also take the opportunity during revision to clarify and soften certain statements. Specifically, we will reposition ℓ∗ as a sensitive, computational proxy for detecting transcriptional signatures that are suggestive of chromatin structural changes. In other words, ℓ∗ provides an indirect window into chromatin dynamics through transcriptional spatial organization, allowing detection of patterns that may precede or accompany structural changes. Direct assays such as ATAC-seq or ChIP-seq remain essential for confirming the underlying physical modifications. To make this scope clear, we will revise the title to: “Macroscopic RNA-seq Analysis to Detect Transcriptional Patterns Associated with Chromatin State Changes,” and adjust the main text.  

      We would like to take this opportunity to clarify why our initial version focused on the Della Valle and Fleischer datasets rather than including new paired datasets with direct chromatin measurements. The primary objective of our paper is to introduce two macroscopic RNA-seq–based measures, ℓ∗ and the energy landscape, that are designed to detect transcriptional signatures suggestive of chromatin structural changes in the context of aging and age-related diseases. These measures explicitly model transcriptional spatial organization and provide a sensitive, scalable way to analyze RNA-seq data in domains where direct chromatin assays may not be readily available.

      The datasets we used (Della Valle et al., Fleischer et al.) have been rigorously validated and independently demonstrated differences in chromatin structure between conditions. Our goal was to show that ℓ∗ and the energy landscape align with and extend these established findings, offering a more sensitive measure of transcriptional spatial organization. Specifically, in the Della Valle dataset, chromatin structural differences between progeroid and healthy donors — and their partial rescue by LINE-1 ASO treatment — were experimentally confirmed, providing a strong foundation for testing whether our metrics reflect these known changes. Similarly, the Fleischer dataset captures natural, in vivo aging, which has also been linked to chromatin alterations in prior studies.

      Thus, our approach builds on this well-established biological context rather than attempting to re-demonstrate these chromatin differences from scratch. Finally, we emphasize that our current focus is aging and age-related diseases. While the framework could potentially be applied to other chromatin modification contexts, we have not tested it outside this domain and do not claim general applicability at this stage.

      (2) There is no validation by interventions that directly probe chromatin structure, such as HDAC inhibitors. The authors employ datasets with knockdown of LINE-1 for validation. However, this is not a specific chromatin intervention.

      We request the reviewer to refer to our response to (1) as it includes the rationale behind the selection of LINE-1 and Fleischer dataset. We would also like to state that while the focus of Della Valle et al. was LINE-1 treated ASO to show rescue of progeroid samples, it also contains data for non-treated as well as healthy samples. Importantly, untreated progeroid samples show distinctly different chromatin structure compared to healthy samples, with substantial differences detectable by both PCA and our 𝓁∗ metric.

      Our 𝓁∗ method provides additional interpretability by capturing transcriptional spatial organization, resulting in shorter correlation lengths for healthy patients and longer lengths for progeroid patients.

      But as mentioned in our response to (1) we will try to add an additional dataset with paired rna-seq and one of ATAC, ChIP-seq or CUT n RUN in the revision

      (3) There is no statistical analysis, e.g., in Figures 4 and 5.

      We have provided statistical analysis for Fig 4 (lines 237-241). We will do a similar analysis for Fig. 5. 

      (4) The authors state, "in Figure 4a changes in the heterochromatin are not identical for all chromosomes shown...." I do not see the data for individual chromosomes.

      The data for individual chromosomes is available in supplementary Fig. S11 – references at line 425. We will make this cross-reference clearer in the main text and consider whether some of this chromosome-specific information should be elevated to the main figures for better accessibility.

      (5) In comparisons of WT vs HGPS NT or HGPS SCR (Figure S6), is this a fair comparison? The WT and HGPS are presumably from different human donors, so they have genetic and epigenetic differences unrelated to HGPS.

      Figure S6 demonstrates that 𝓁∗ analysis identifies chromosome 6 as most affected, consistent with differential gene expression patterns.

      Regarding donor differences in WT vs HGPS comparisons, we defer to the experimental design of Della Valle et al., which follows standard practices in progeroid research. Our review of the literature indicates that progeroid studies typically use either parent/child samples or different donor comparisons (as individuals cannot simultaneously represent both WT and HGPS states).

      Importantly, the LINE-1 ASO treatment comparisons use the same cell lines, eliminating donor variability concerns. This experimental design allows us to validate that 𝓁∗ can detect rescue effects within genetically identical samples, supporting the method's sensitivity to chromatin structural changes  

      Reviewing Editor Comments:

      You'll note that both reviewers were very thoughtful in their comments, and in principle are supportive and excited by the work. However, their evaluation of the strength of evidence diverged substantially. I'm inclined to suggest that finding a way to support the novel method with an alternative approach would greatly improve the impact of this work. I encourage you to consider a revision that provides such data, in the context of technology currently available to the field.

      We sincerely thank the editor for their thoughtful and encouraging assessment of our work. We are grateful for their recognition of the novelty of our macroscopic measures (ℓ∗ and the transition energy barrier) and their potential to provide a systems-level understanding of chromatin structural dynamics in aging and age-related syndromes. In response to the editor’s suggestion for direct validation with chromatin accessibility data, we plan to integrate an additional dataset containing paired RNA-seq and ATAC-seq or related measurements in our revision. This will help strengthen the link between our RNA-seq–based metrics and direct chromatin assays. We have also clarified and softened the manuscript text to ensure it is clear that ℓ∗ serves as a complementary, computational proxy, not a replacement, for direct experimental approaches. Very specifically, to make this scope clear, we will revise the title to: “Macroscopic RNA-seq Analysis to Detect Transcriptional Patterns Associated with Chromatin State Changes,” and adjust the main text. We thank the editor for the feedback. We have provided additional details in response to specific comments made by the reviewers.

    1. eLife Assessment

      This study presents a new toolbox for Representational Similarity Analysis, representing a valuable contribution to the neuroscience community. The authors offer a well-integrated platform that brings together a range of state-of-the-art methodological advances within a convincing framework, with strong potential to enable more rigorous and insightful analyses of neural data across multiple subfields.

    2. Reviewer #1 (Public review):

      Summary

      This manuscript presents an updated version of rsatoolbox, a Python package for performing Representational Similarity Analysis (RSA) on neural data. The authors provide a comprehensive and well-integrated framework that incorporates a range of state-of-the-art methodological advances. The updated version extends the toolbox's capabilities.

      The paper outlines a typical RSA workflow in five steps:

      (1) Importing data and estimating activity patterns.

      (2) Estimating representational geometries (computing RDMs).

      (3) Comparing RDMs.

      (4) Performing inferential model comparisons.

      (5) Handling multiple testing across space and time.

      For each step, the authors describe methodological advances and best practices implemented in the toolbox, including improved measures of representational distances, evaluators for representational models, and statistical inference methods.

      While the relative impact of the manuscript is somewhat limited to the new contributions in this update (which are nonetheless very useful), the general toolbox - here thoroughly described and discussed - remains an invaluable contribution to the field and is well-received by the cognitive and computational neuroscience communities.

      Strengths:

      A key strength of the work is the breadth and integration of the implemented methods. The updated version introduces several new features, such as additional comparators and dissimilarity estimators, that closely follow recent methodological developments in the field. These enhancements build on an already extensive set of functionalities, offering seamless support for RSA analyses across a wide variety of data sources, including deep neural networks, fMRI, EEG, and electrophysiological recordings.

      The toolbox also integrates effectively with the broader open-source ecosystem, providing compatibility with BIDS formats and outputs from widely used neuroscience software. This integration will make it easier for researchers to incorporate rsatoolbox into existing workflows. The documentation is extensive, and the scope of functionality - from dissimilarity estimation to statistical inference - is impressive.

      For researchers already familiar with RSA, rsatoolbox offers a coherent environment that can streamline analyses, promote methodological consistency, and encourage best practices.

      Weaknesses:

      While I enjoyed reading the manuscript - and even more so exploring the toolbox - I have some comments for the authors. None of these points is strictly major, and I leave it to the authors' discretion whether to act on them, but addressing them could make the manuscript an even more valuable resource for those approaching RSA.

      (1) While several estimators and comparators are implemented, Figure 4 appears to suggest that only a subset should be used in practice. This raises the question of whether the remaining options are necessary, and under what circumstances they might be preferable. Although it is likely that different measures are suited to different scenarios, this is not clearly explained in the manuscript. As presented, a reader following the manuscript's guidance might rely on only a few of the available comparators and estimators without understanding the rationale. It would be helpful if the authors could provide practical examples illustrating when one measure might be preferred over another, and how different measures behave under varying conditions-for instance, in what situations the user should choose manifold similarity versus Bures similarity?

      (2) The comparison to other RSA tools is minimal, making it challenging to place rsatoolbox in the broader landscape of available resources. Although the authors mention some existing RSA implementations, they do not provide a detailed comparison of features or performance between their toolbox and alternatives.

      (3) Finally, given the growing interest in comparing neural network models with brain data, a more detailed discussion of how the toolbox can be applied to common questions in this area would be a valuable addition.

    3. Reviewer #2 (Public review):

      Summary:

      The manuscript, "A Python Toolbox for Representational Similarity Analysis", presents an overview of the RSAToolbox, including a review of the methods it implements (some of which are more recently developed) and recommendations for constructing RSA analysis pipelines. It is encouraging to see that this toolbox, which has existed in both Python and other forms, continues to be actively developed and maintained.

      Strengths:

      The authors do a nice job reviewing the history of RSA analysis while introducing the methods within the toolbox. It is helpful that the authors discuss when and how to apply specific measures to different data types (e.g., why Euclidean or Mahalanobis distances are suboptimal for spike data). The manuscript strikes a valuable balance between theoretical background and hands-on instruction. The inclusion of decision-making aids, such as the Euler diagram for selecting similarity measures, and well-maintained demo scripts (available on GitHub), enhance the manuscript's utility as a practical guide.

      Overall, this paper will be particularly useful to researchers new to RSA and those interested in performing a rigorous analysis using this framework. The manuscript and accompanying toolbox provide everything a researcher needs to get started, provided they take the time to engage with the methodological details and references offered

      Weaknesses:

      While the links to the demos in the figure legend did not work for me, it was easy to locate the current demos online, and it's encouraging to see that they are actively maintained. One small issue is that a placeholder ("XXX") remains in the description of Figure 3b and should be corrected.

    4. Author response:

      We thank the reviewers for their valuable feedback. We will prepare a revision of the manuscript based on these suggestions and comments. We are sure these revisions will improve the paper.

      The only major point we wish to clarify is that this is the first and only manuscript describing the toolbox; it is not a version update. Although it shares a similar name with its 2015 MATLAB predecessor (Nili et al., PLoS Comput Biol), rsatoolbox was designed from scratch. Also, they have no code or structural overlap beyond implementing some similar methods.

      Developed publicly since 2019, rsatoolbox reflects a decade of research in RSA methodology across multiple labs and incorporates new dissimilarity metrics, RDM comparators, inferential procedures, and visualization methods. Importantly, although we cite several papers describing methods implemented in the toolbox, this is the first manuscript to present the toolbox as a whole, its design principles, and the unified analytical framework it offers.

      We are sorry about the forgotten placeholder and the links not working. The links work for us in the pdf at least and we will certainly fix the placeholder as soon as possible.

    1. eLife Assessment

      This important study uses advanced computational methods to elucidate how environmental dielectric properties influence the interaction strengths of tyrosine and phenylalanine in biomolecular condensates. The evidence supporting the claims of the authors is convincing, as the simulations are performed rigorously providing mechanistic insights into the origin of the differences between the two aromatic amino acids considered. This study will be of broad interest to researchers studying biomolecular phase separation.

    2. Reviewer #1 (Public review):

      This is an interesting and timely computational study using molecular dynamics simulation as well as quantum mechanical calculation to address why tyrosine (Y), as part of an intrinsically disordered protein (IDP) sequence, has been observed experimentally to be stronger than phenylalanine (F) as a promoter for biomolecular phase separation. Notably, the authors identified the aqueous nature of the condensate environment and the corresponding dielectric and hydrogen bonding effects as a key to understand the experimentally observed difference. This principle is illustrated by the difference in computed transfer free energy of Y- and F-containing pentapeptides into solvent with various degrees of polarity. The elucidation offered by this work is important. The computation appears to be carefully executed, the results are valuable, and the discussion is generally insightful. However, there is room for improvement in some parts of the presentation in terms of accuracy and clarity, including, e.g., the logic of the narrative should be clarified with additional information (and possibly additional computation), and the current effort should be better placed in the context of prior relevant theoretical and experimental works on cation-π interactions in biomolecules and dielectric properties of biomolecular condensates. Accordingly, this manuscript should be revised to address the following, with added discussion as well as inclusion of references mentioned below.

      (1) Page 2, line 61: "Coarse-grained simulation models have failed to account for the greater propensity of arginine to promote phase separation in Ddx4 variants with Arg to Lys mutations (Das et al., 2020)". As it stands, this statement is not accurate, because the cited reference to Das et al. showed that although some coarse-grained model, namely the HPS model of Dignon et al., 2018 PLoS Comput did not capture the Arg to Lys trend, the KH model described in the same Dignon et al. paper was demonstrated by Das et al. (2020) to be capable of mimicking the greater propensity of Arg to promote phase separation than Lys. Accordingly, a possible minimal change that would correct the inaccuracy of this statement in the manuscript would be to add the word "Some" in front of "coarse-grained simulation models ...", i.e., it should read "Some coarse-grained simulation models have failed ...". In fact, a subsequent work [Wessén et al., J Phys Chem B 126: 9222-9245 (2022)] that applied the Mpipi interaction parameters (Joseph et al., 2021, already cited in the manuscript) showed that Mpipi is capable of capturing the rank ordering of phase separation propensity of Ddx4 variants, including a charge scrambled variant as well as both the Arg to Lys and the Phe to Ala variants (see Fig.11a of the above-cited Wessén et al. 2022 reference). The authors may wish to qualify their statements in the introduction to take note of these prior results. For example, they may consider adding a note immediately after the next sentence in the manuscript "However, by replacing the hydrophobicity scales ... (Das et al., 2020)" to refer to these subsequent findings in 2021-2022.

      (2) Page 8, lines 285-290 (as well as the preceding discussion under the same subheading & Fig.4): "These findings suggest that ... is not primarily driven by differences in protein-protein interaction patterns ..." The authors' logic in terms of physical explanation is somewhat problematic here. In this regard, "Protein-protein interaction patterns" appears to be a straw man, so to speak. Indeed, who (reference?) has argued that the difference in the capability of Y and F in promoting phase separation should be reflected in the pairwise amino acid interaction pattern in a condensate that contains either only Y (and G, S) and only F (and G, S) but not both Y and F? Also, this paragraph in the manuscript seems to suggest that the authors' observation of similar contact patterns in the GSY and GSF condensates is "counterintuitive" given the difference in Y-Y and F-F potentials of mean force (Joseph et al., 2021); but there is nothing particularly counterintuitive about that. The two sets of observations are not mutually exclusive. For instance, consider two different homopolymers, one with a significantly stronger monomer-monomer attraction than the other. The condensates for the two different homopolymers will have essentially the same contact pattern but very different stabilities (different critical temperatures), and there is nothing surprising about it. In other words, phase separation propensity is not "driven" by contact pattern in general, it's driven by interaction (free) energy. The relevant issue here is total interaction energy or critical point of the phase separation. If it is computationally feasible, the authors should attempt to determine the critical temperatures for the GSY condensate versus the GSF condensate to verify that the GSY condensate has a higher critical temperature than the GSF condensate. That would be the most relevant piece of information for the question at hand.

      (3) Page 9, lines 315-316: "...Our ε [relative permittivity] values ... are surprisingly close to that derived from experiment on Ddx4 condensates (45{plus minus}13) (Nott et al., 2015)". For accuracy, it should be noted here that the relative permittivity provided in the supplementary information of Nott et al. was not a direct experimental measurement but based on a fit using Flory-Huggins (FH), but FH is not the most appropriate theory for polymer with long-spatial-range Coulomb interactions. To this reviewer's knowledge, no direct measurement of relative permittivity in biomolecular condensates has been made to date. Explicit-water simulation suggests that relative permittivity of Ddx4 condensate with protein volume fraction ≈ 0.4 can have relative permittivity ≈ 35-50 (Das et al., PNAS 2020, Fig.7A), which happens to agree with the ε = 45{plus minus}13 estimate. This information should be useful to include in the authors' manuscript.

      (4) As for the dielectric environment within biomolecular condensates, coarse-grained simulation has suggested that whereas condensates formed by essentially electric neutral polymers (as in the authors' model systems) have relative permittivities intermediate between that of bulk water and that of pure protein (ε = 2-4, or at most 15), condensates formed by highly charge polymers can have relative permittivity higher than that of bulk water [Wessén et al., J Phys Chem B 125:4337-4358 (2021), Fig.14 of this reference]. In view of the role of aromatic residues (mainly Y and F) in the phase separation of IDPs such as A1-LCD and LAF-1 that contain positively and negatively charged residues (Martin et al., 2020; Schuster et al., 2020, already cited in the manuscript), it should be useful to address briefly how the relationship between the relative phase-separation promotion strength of Y vs F and dielectric environment of the condensate may or may not be change with higher relative permittivities.

      (5) The authors applied the dipole moment fluctuation formula (Eq.2 in the manuscript) to calculate relative permittivity in their model condensates. Does this formula apply only to an isotropic environment? The authors' model condensates were obtained from a "slab" approach (p.4) and thus the simulation box has a rectangular geometry. Did the authors apply their Eq.2 to the entire simulation box or only to the central part of the box with the condensate (see, e.g., Fig.3C in the manuscript). If the latter is the case, is it necessary to use a different dipole moment formula that distinguishes between the "parallel" and "perpendicular" components of the dipole moment (see, e.g., Eq.16 in the above-cited Wessén et al. 2021 paper). A brief added comments will be useful.

      (6) With regard to the general role of Y and F in the phase separation of biomolecules containing positively charged Arg and Lys residues, the relative strength of cation-π interactions (cation-Y vs cation-F) should be addressed (in view of the generality implied by the title of the manuscript), or at least discussed briefly in the authors' manuscript if a detailed study is beyond the scope of their current effort. It has long been known that in the biomolecular context, cation-Y is slightly stronger than cation-F, whereas cation-tryptophan (W) is significantly stronger than either cation-Y and cation-F [Wu & McMahon, JACS 130:12554-12555 (2008)]. Experimental data from a study of EWS (Ewing sarcoma) transactivation domains indicated that Y is a slightly stronger promoter than F for transcription, whereas W is significantly stronger than either Y or F [Song et al., PLoS Comput Biol 9:e1003239 (2013)]. In view of the subsequent general recognition that "transcription factors activate genes through the phase-separation capacity of their activation domain" [Boija et al., Cell 175:1842-1855.e16 (2018)] which is applicable to EWS in particular [Johnson et al., JACS 146:8071-8085 (2024)], the experimental data in Song et al. 2013 (see Fig.3A of this reference) suggests that cation-Y interactions are stronger than cation-F interactions in promoting phase separation, thus generalizing the authors' observations (which focus primarily on Y-Y, Y-F and F-F interactions) to most situations in which cation-Y and cation-F interactions are relevant to biomolecular condensation.

      (7) Page 9: The observation of a weaker effective F-F (and a few other nonpolar-nonpolar) interaction in a largely aqueous environment (as in an IDP condensate) than in a nonpolar environment (as in the core of a folded protein) is intimately related to (and expected from) the long-recognized distinction between "bulk" and "pair" as well as size dependence of hydrophobic effects that have been addressed in the context of protein folding [Wood & Thompson, PNAS 87:8921-8927 (1990); Shimizu & Chan, JACS 123:2083-2084 (2001); Proteins 49:560-566 (2002)]. It will be useful to add a brief pointer in the current manuscript to this body of relevant resource in protein science.

      Comments on revisions:

      The authors have largely addressed my previous concerns and the manuscript has been substantially improved. Nonetheless, it will benefit the readers more if the authors had included more of the relevant references provided in my previous review so as to afford a broader and more accurate context to the authors' effort. This deficiency is particularly pertinent for point number 6 in my previous report about cation-pi interactions. The authors have now added a brief discussion but with no references on the rank ordering of Y, F, and W interactions. I cannot see how providing additional information about a few related works could hurt. Quite the contrary, having the references will help readers establish scientific connections and contribute to conceptual advance.

    3. Reviewer #2 (Public review):

      Summary:

      In this preprint, De Sancho and López use alchemical molecular dynamics simulations and quantum mechanical calculations to elucidate the origin of the observed preference of Tyr over Phe in phase separation. The paper is well written, and the simulations conducted are rigorous and provide good insight into the origin of the differences between the two aromatic amino acids considered.

      Strengths:

      The study addresses a fundamental discrepancy in the field of phase separation where the predicted ranking of aromatic amino acids observed experimentally is different from their anticipated rankings when considering contact statistics of folded proteins. While the hypothesis that the difference in the microenvironment of the condensed phase and hydrophobic core of folded proteins underlies the different observations, this study provides a quantification of this effect. Further, the demonstration of the crossover between Phe and Tyr as a function of the dielectric is interesting and provides further support for the hypothesis that the differing microenvironments within the condensed phase and the core of folded proteins is the origin of the difference between contact statistics and experimental observations in phase separation literature. The simulations performed in this work systematically investigate several possible explanations and therefore provide depth to the paper.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      This is an interesting and timely computational study using molecular dynamics simulation as well as quantum mechanical calculation to address why tyrosine (Y), as part of an intrinsically disordered protein (IDP) sequence, has been observed experimentally to be stronger than phenylalanine (F) as a promoter for biomolecular phase separation. Notably, the authors identified the aqueous nature of the condensate environment and the corresponding dielectric and hydrogen bonding effects as a key to understanding the experimentally observed difference. This principle is illustrated by the difference in computed transfer free energy of Y- and F-containing pentapeptides into a solvent with various degrees of polarity. The elucidation offered by this work is important. The computation appears to be carefully executed, the results are valuable, and the discussion is generally insightful. However, there is room for improvement in some parts of the presentation in terms of accuracy and clarity, including, e.g., the logic of the narrative should be clarified with additional information (and possibly additional computation), and the current effort should be better placed in the context of prior relevant theoretical and experimental works on cation-π interactions in biomolecules and dielectric properties of biomolecular condensates. Accordingly, this manuscript should be revised to address the following, with added discussion as well as inclusion of references mentioned below.

      We are grateful for the referee’s assessment of our work and insightful suggestions, which we address point by point below.

      (1) Page 2, line 61: "Coarse-grained simulation models have failed to account for the greater propensity of arginine to promote phase separation in Ddx4 variants with Arg to Lys mutations (Das et al., 2020)". As it stands, this statement is not accurate, because the cited reference to Das et al. showed that although some coarse-grained models, namely the HPS model of Dignon et al., 2018 PLoS Comput did not capture the Arg to Lys trend, the KH model described in the same Dignon et al. paper was demonstrated by Das et al. (2020) to be capable of mimicking the greater propensity of Arg to promote phase separation than Lys. Accordingly, a possible minimal change that would correct the inaccuracy of this statement in the manuscript would be to add the word "Some" in front of "coarse-grained simulation models ...", i.e., it should read "Some coarse-grained simulation models have failed ...". In fact, a subsequent work [Wessén et al., J Phys Chem B 126: 9222-9245 (2022)] that applied the Mpipi interaction parameters (Joseph et al., 2021, already cited in the manuscript) showed that Mpipi is capable of capturing the rank ordering of phase separation propensity of Ddx4 variants, including a charge scrambled variant as well as both the Arg to Lys and the Phe to Ala variants (see Figure 11a of the above-cited Wessén et al. 2022 reference). The authors may wish to qualify their statements in the introduction to take note of these prior results. For example, they may consider adding a note immediately after the next sentence in the manuscript "However, by replacing the hydrophobicity scales ... (Das et al., 2020)" to refer to these subsequent findings in 2021-2022.

      We agree with the referee that the wording used in the original version was inaccurate. We did not want to expand too much on the previous results on Lys/Arg, to avoid overwhelming our readers with background information that was not directly relevant to the aromatic residues Phe and Tyr. We have now introduced some of the missing details in the hope that this will provide a more accurate account of what has been achieved with different versions of coarse-grained models. In the revised version, we say the following:

      Das and co-workers attempted to explain arginine’s greater propensity to phase separate in Ddx4 variants using coarse-grained simulations with two different energy functions (Das et al., 2020). The model was first parametrized using a hydrophobicity scale, aimed to capture the “stickiness” of different amino acids (Dignon et al., 2018), but this did not recapitulate the correct rank order in the stability of the simulated condensates (Das et al., 2020). By replacing the hydrophobicity scale with interaction energies from amino acid contact matrices —derived from a statistical analysis of the PDB (Dignon et al., 2018; Miyazawa and Jernigan, 1996; Kim and Hummer, 2008)— they recovered the correct trends (Das et al., 2020). A key to the greater propensity for LLPS in the case of Arg may derive from the pseudo-aromaticity of this residue, which results in a greater stabilization relative to the more purely cationic character of Lys (Gobbi and Frenking, 1993; Wang et al., 2018; Hong et al., 2022).

      (2) Page 8, lines 285-290 (as well as the preceding discussion under the same subheading & Figure 4): "These findings suggest that ... is not primarily driven by differences in protein-protein interaction patterns ..." The authors' logic in terms of physical explanation is somewhat problematic here. In this regard, "Protein-protein interaction patterns" appear to be a straw man, so to speak. Indeed, who (reference?) has argued that the difference in the capability of Y and F in promoting phase separation should be reflected in the pairwise amino acid interaction pattern in a condensate that contains either only Y (and G, S) and only F (and G, S) but not both Y and F? Also, this paragraph in the manuscript seems to suggest that the authors' observation of similar contact patterns in the GSY and GSF condensates is "counterintuitive" given the difference in Y-Y and F-F potentials of mean force (Joseph et al., 2021); but there is nothing particularly counterintuitive about that. The two sets of observations are not mutually exclusive. For instance, consider two different homopolymers, one with a significantly stronger monomer-monomer attraction than the other. The condensates for the two different homopolymers will have essentially the same contact pattern but very different stabilities (different critical temperatures), and there is nothing surprising about it. In other words, phase separation propensity is not "driven" by contact pattern in general, it's driven by interaction (free) energy. The relevant issue here is total interaction energy or the critical point of the phase separation. If it is computationally feasible, the authors should attempt to determine the critical temperatures for the GSY condensate versus the GSF condensate to verify that the GSY condensate has a higher critical temperature than the GSF condensate. That would be the most relevant piece of information for the question at hand.

      We are grateful for this very insightful comment by the referee. We have followed this suggestion to address whether, despite similar interaction patterns in GSY and GSF condensates, their stabilities are different. As in our previous work (De Sancho, 2022), we have run replica exchange MD simulations for both condensates and derived their phase diagrams. Our results, shown in the new Figure 5 and supplementary Figs. S6-S7, clearly indicate that the GSY condensate has a lower saturation density than the GSF condensate. This result is consistent with the trends observed in experiments on mutants of the low-complexity domain of hnRNPA1, where the relative amounts of F and Y determine the saturation concentration (Bremer et al., 2022).

      (3) Page 9, lines 315-316: "...Our ε [relative permittivity] values ... are surprisingly close to that derived from experiment on Ddx4 condensates (45{plus minus}13) (Nott et al., 2015)".  For accuracy, it should be noted here that the relative permittivity provided in the supplementary information of Nott et al. was not a direct experimental measurement but based on a fit using Flory-Huggins (FH), but FH is not the most appropriate theory for a polymer with long-spatial-range Coulomb interactions. To this reviewer's knowledge, no direct measurement of relative permittivity in biomolecular condensates has been made to date. Explicit-water simulation suggests that the relative permittivity of Ddx4 condensate with protein volume fraction ≈ 0.4 can have a relative permittivity ≈ 35-50 (Das et al., PNAS 2020, Fig.7A), which happens to agree with the ε = 45{plus minus}13 estimate. This information should be useful to include in the authors' manuscript.

      We thank the referee for this useful comment. We are aware that the estimate we mentioned is not direct. We have now clarified this point and added the additional estimate from Das et al. In the new version of the manuscript, we say:

      Our 𝜀 values for the condensates (39 ± 5 for GSY and 47 ± 3 for GSF) are surprisingly close to that derived from experiments on Ddx condensates using Flory-Huggins theory (45±13) (Nott et al., 2015) and from atomistic simulations of Ddx4 (∼35−50 at a volume fraction of 𝜙 = 0.4) (Das et al., 2020).

      (4) As for the dielectric environment within biomolecular condensates, coarse-grained simulation has suggested that whereas condensates formed by essentially electric neutral polymers (as in the authors' model systems) have relative permittivities intermediate between that of bulk water and that of pure protein (ε=2-4, or at most 15), condensates formed by highly charged polymers can have relative permittivity higher than that of bulk water [Wessén et al., J Phys Chem B 125:4337-4358 (2021), Fig.14 of this reference]. In view of the role of aromatic residues (mainly Y and F) in the phase separation of IDPs such as A1-LCD and LAF-1 that contain positively and negatively charged residues (Martin et al., 2020; Schuster et al., 2020, already cited in the manuscript), it should be useful to address briefly how the relationship between the relative phase-separation promotion strength of Y vs F and dielectric environment of the condensate may or may not be change with higher relative permittivities.

      We thank the referee for their comment regarding highly charged polymers. However, we have chosen not to address these systems in our manuscript, as they are significantly different from the GSY/GSF peptide condensates under investigation. In polyelectrolyte systems, condensate formation is primarily driven by electrostatic interactions and counterion release, while we highlight the role of transfer free energies. At high dielectric constants (and dielectrics even higher than that of water), the strength of electrostatic interactions will be greatly reduced. In our approach to estimate differences between Y and F, the transfer free energy should plateau at a value of ΔΔG=0 in water. At greater values of ε>80, it becomes difficult to predict whether additional effects might become relevant. As this lies beyond the scope of our current study, we prefer not to speculate further.

      (5) The authors applied the dipole moment fluctuation formula (Eq.2 in the manuscript) to calculate relative permittivity in their model condensates. Does this formula apply only to an isotropic environment? The authors' model condensates were obtained from a "slab" approach (page 4 and thus the simulation box has a rectangular geometry. Did the authors apply Equation 2 to the entire simulation box or only to the central part of the box with the condensate (see, e.g., Figure 3C in the manuscript). If the latter is the case, is it necessary to use a different dipole moment formula that distinguishes between the "parallel" and "perpendicular" components of the dipole moment (see, e.g., Equation 16 in the above-cited Wessén et al. 2021 paper). A brief added comment will be useful.

      We have calculated the relative permittivity from dense phases only. These dense phases were sliced from the slab geometry and then re-equilibrated. Long simulations were then run to converge the calculation of the dielectric constant. We have clarified this in the Methods section of the paper. We say:

      For the calculation of the dielectric constant of condensates, we used the simulations of isolated dense phases mentioned above.

      (6) Concerning the general role of Y and F in the phase separation of biomolecules containing positively charged Arg and Lys residues, the relative strength of cation-π interactions (cation-Y vs cation-F) should be addressed (in view of the generality implied by the title of the manuscript), or at least discussed briefly in the authors' manuscript if a detailed study is beyond the scope of their current effort. It has long been known that in the biomolecular context, cation-Y is slightly stronger than cation-F, whereas cation-tryptophan (W) is significantly stronger than either cation-Y and cation-F [Wu & McMahon, JACS 130:12554-12555 (2008)]. Experimental data from a study of EWS (Ewing sarcoma) transactivation domains indicated that Y is a slightly stronger promoter than F for transcription, whereas W is significantly stronger than either Y or F [Song et al., PLoS Comput Biol 9:e1003239 (2013)]. In view of the subsequent general recognition that "transcription factors activate genes through the phase-separation capacity of their activation domain" [Boija et al., Cell 175:1842-1855.e16 (2018)] which is applicable to EWS in particular [Johnson et al., JACS 146:8071-8085 (2024)], the experimental data in Song et al. 2013 (see Figure 3A of this reference) suggests that cation-Y interactions are stronger than cation-F interactions in promoting phase separation, thus generalizing the authors' observations (which focus primarily on Y-Y, Y-F and F-F interactions) to most situations in which cation-Y and cation-F interactions are relevant to biomolecular condensation.

      We thank our referee for this insightful comment. While we restrict our analysis to aromatic pairs in this work, the observed crossover will certainly affect other pairs where tyrosine or phenylalanine are involved. We now comment on this point in the discussions section of the revised manuscript. This topic will be explored in detail in a follow-up manuscript we are currently completing. We say:

      We note that, although we have not included in our analysis positively charged residues that form cation-π interactions with aromatics, the observed crossover will also be relevant to Arg/Lys contacts with Phe and Tyr. Following the rationale of our findings, within condensates, cation-Tyr interactions are expected to promote phase separation more strongly than cation-Phe pairs.

      (7) Page 9: The observation of weaker effective F-F (and a few other nonpolar-nonpolar) interactions in a largely aqueous environment (as in an IDP condensate) than in a nonpolar environment (as in the core of a folded protein) is intimately related to (and expected from) the long-recognized distinction between "bulk" and "pair" as well as size dependence of hydrophobic effects that have been addressed in the context of protein folding [Wood & Thompson, PNAS 87:8921-8927 (1990); Shimizu & Chan, JACS 123:2083-2084 (2001); Proteins 49:560-566 (2002)]. It will be useful to add a brief pointer in the current manuscript to this body of relevant resources in protein science.

      We thank the referee for bringing this body of work to our attention. In the revised version of our work, we briefly mention how it relates to our results. We also note that the suggested references have pointed to another of the limitations of our study, that of chain connectivity, addressed in the work by Shimizu and Chan. While we were well aware of these limitations, we had not mentioned them in our manuscript. Concerning the distinction between pair and bulk hydrophobicities, we include the following in the concluding lines of our work:

      The observed context dependence has deep roots in the concepts of “pair” and “bulk” hydrophobicity (Wood and Thompson, 1990; Shimizu and Chan, 2002). While pair hydrophobicity is connected to dimerisation equilibria (i.e. the second step in Figure 2B), bulk hydrophobicity is related to transfer processes (the first step). Our work stresses the importance of considering both the pair contribution that dominates at high solvation, and the transfer free energy contribution, which overwhelms the interaction strength at low dielectrics.

      Reviewer #2 (Public review):

      Summary:

      In this preprint, De Sancho and López use alchemical molecular dynamics simulations and quantum mechanical calculations to elucidate the origin of the observed preference of Tyr over Phe in phase separation. The paper is well written, and the simulations conducted are rigorous and provide good insight into the origin of the differences between the two aromatic amino acids considered.

      We thank the referee for his/her positive assessment of our work. Below, we address all the questions raised one by one.

      Strengths:

      The study addresses a fundamental discrepancy in the field of phase separation where the predicted ranking of aromatic amino acids observed experimentally is different from their anticipated rankings when considering contact statistics of folded proteins. While the hypothesis that the difference in the microenvironment of the condensed phase and hydrophobic core of folded proteins underlies the different observations, this study provides a quantification of this effect. Further, the demonstration of the crossover between Phe and Tyr as a function of the dielectric is interesting and provides further support for the hypothesis that the differing microenvironments within the condensed phase and the core of folded proteins is the origin of the difference between contact statistics and experimental observations in phase separation literature. The simulations performed in this work systematically investigate several possible explanations and therefore provide depth to the paper.

      Weaknesses:

      While the study is quite comprehensive and the paper well written, there are a few instances that would benefit from additional details. In the methods section, it is unclear as to whether the GGXGG peptides upon which the alchemical transforms are conducted are positioned restrained within the condensed/dilute phase or not. If they are not, how would the position of the peptides within the condensate alter the calculated free energies reported? 

      The peptides are not restrained in our simulations and can therefore diffuse out of the condensate given sufficient time. Although the GGXGG peptide can, given sufficient time, leave the peptide condensate, we did not observe any escape event in the trajectories we used to generate starting points for switching. Hence, the peptide environment captured in our calculations reflects, on average, the protein-protein and protein-solvent interactions inside the model condensate. We believe this is the right way of performing the calculation of transfer free energy differences into the condensate. We have clarified this point when we describe the equilibrium simulation results in the revised manuscript. We say:

      Also, the peptide that experiences the transformation, which is not restrained, must remain buried within the condensate for all the snapshots that we use as initial frames, to avoid averaging the work in the dilute and dense phases.

      On the referee’s second point of whether there would be differences if the peptide visited the dilute phase, the answer is that, indeed, we would. We expect that the behaviour of the peptide would approach ΔΔG=0, considering the low protein concentration in the dilute phase. For mixed trajectories with sampling in both dilute and dense phases, our expectation would be a bimodal distribution in the free energy estimates from switching (see e.g. Fig. 8 in DOI:10.1021/acs.jpcb.0c10263). Because we are exclusively interested in the transfer free energies into the condensate, we do not pursue such calculations in this work.

      It would also be interesting to see what the variation in the transfer of free energy is across multiple independent replicates of the transform to assess the convergence of the simulations. 

      Upon submission of our manuscript, we were confident that the results we had obtained would pass the test of statistical significance. We had, after all, done many more simulations than those reported, plus the comparable values of ΔΔG<sub>Transfer</sub> for both GSY and GSF pointed in the right direction. However, we acknowledge that the more thorough test of running replicates recommended by the referee is important, considering the slow diffusion within the Tyr peptide condensates due to its stickiness. Also, the non-equilibrium switching method had not been tested before for dense phases like the ones considered here.

      We have hence followed our referee's suggestion and done three different replicates, 1 μs each, of the equilibrium runs starting from independent slab configurations, for both the GSY and GSF condensates (see the new supporting figures Fig. S1, S2 and S5). We now report the errors from the three replicates as the standard error of the mean (bootstrapping errors remain for the rest of the solvents). Our results are entirely consistent with the values reported originally, confirming the validity of our estimates.

      Additionally, since the authors use a slab for the calculation of these free energies, are the transfer free energies from the dilute phase to the interface significantly different from those calculated from the dilute phase to the interior of the condensate? 

      We thank the referee for this valuable comment, as it has pointed us in the direction of a rapidly increasing body of work on condensate interfaces, for example, as mediators of aggregation, that we may consider for future study with the same methodology. However, as discussed above, we have not considered this possibility in our work, as we decided to focus on the condensate environment, rather than its interface.

      The authors mention that the contact statistics of Phe and Tyr do not show significant difference and thereby conclude that the more favorable transfer of Tyr primarily originates from the dielectric of the condensate. However, the calculation of contacts neglects the differences in the strength of interactions involving Phe vs. Tyr. Though the authors consider the calculation of energy contact formation later in the manuscript, the scope of these interactions are quite limited (Phe-Phe, Tyr-Tyr, Tyr-Amide, Phe-Amide) which is not sufficient to make a universal conclusion regarding the underlying driving forces. A more appropriate statement would be that in the context of the minimal peptide investigated the driving force seems to be the difference in dielectric. However, it is worth mentioning that the authors do a good job of mentioning some of these caveats in the discussion section.

      We thank the referee for this important comment. Indeed, the similar contact statistics and interaction patterns that we reported originally do not necessarily imply identical interaction energies. In other words, similar statistics and patterns can still result in different stabilities for the Phe and Tyr condensates if the energetics are different. Hence, we cannot conclude that the GSF and GSY condensate environments are equivalent.

      To address this point, we have run new simulations for the revised version of our paper, using the temperature-replica exchange method, as before. From the new datasets, we derive the phase diagrams for both the GSF and GSY condensates (see the new Fig. 5). We find that the tyrosine-containing condensate is more stable than that of phenylalanine, as can be inferred from the lower saturation density in the low-density branch of the phase diagram. In consequence, despite the similar contact statistics, the energetics differ, making the saturation density of the GSY slightly lower than that of GSF. This result is consistent with experimental data by Bremer et al (Nat. Chem. 2022). 

      Reviewer #3 (Public review):

      Summary:

      In this study, the authors address the paradox of how tyrosine can act as a stronger sticker for phase separation than phenylalanine, despite phenylalanine being higher on the hydrophobicity scale and exhibiting more prominent pairwise contact statistics in folded protein structures compared to tyrosine.

      We are grateful for the referee’s favourable opinion on the paper. Below, we address all of the issues raised.

      Strengths:

      This is a fascinating problem for the protein science community with special relevance for the biophysical condensate community. Using atomistic simulations of simple model peptides and condensates as well as quantum calculations, the authors provide an explanation that relies on the dielectric constant of the medium and the hydration level that either tyrosine or phenylalanine can achieve in highly hydrophobic vs. hydrophilic media. The authors find that as the dielectric constant decreases, phenylalanine becomes a stronger sticker than tyrosine. The conclusions of the paper seem to be solid, it is well-written and it also recognises the limitations of the study. Overall, the paper represents an important contribution to the field.

      Weaknesses:

      How can the authors ensure that a condensate of GSY or GSF peptides is a representative environment of a protein condensate? First, the composition in terms of amino acids is highly limited, second the effect of peptide/protein length compared to real protein sequences is also an issue, and third, the water concentration within these condensates is really low as compared to real experimental condensates. Hence, how can we rely on the extracted conclusions from these condensates to be representative for real protein sequences with a much more complex composition and structural behaviour?

      We agree with the main weakness identified by the referee. In fact, all these limitations had already been stated in our original submission. Our ternary peptide condensates are just a minimal model system that bears reasonable analogies with condensates, but definitely is not identical to true LCR condensates. The analogies between peptide and protein condensates are, however, worth restating: 

      (1) The limited composition of the peptide condensates is inspired by LCR sequences (see Fig. 4 in Martin & Mittag, 2018).

      (2) The equilibrium phase diagram, showing a UCST, is consistent with that of LCRs from Ddx4 or hnRNPA1.

      (3) The dynamical behaviour is intermediate between liquid and solid (De Sancho, 2022). 

      (4) The contact patterns are comparable to those observed for FUS and LAF1 (Zheng et al, 2020).

      The third issue pointed out by the referee requires particular attention. Indeed, the water content in the model condensates is low (~200 mg/mL for GSY) relative to the experiment (e.g. ~600 mg/mL for FUS and LAF-1 from simulations). Considering that both interaction patterns and solvation contribute to the favorability of Tyr relative to Phe, we speculate that a greater degree of solvation in the true protein condensates will further reinforce the trends we observe.

      In any case, in the revised version of the manuscript, we have made an effort to insist on the limitations of our results, some of which we plan to address in future work.

      Reviewer #3 (Recommendations for the authors):

      (1) The fact that protein density is so high within GSY or GSF peptide condensates may significantly alter the conclusions of the paper. Can the authors show that for condensates in which the protein density is ~0.2-0.3 g/cm3, the same conclusions hold? Could the authors use a different peptide sequence that establishes a more realistic protein concentration/density inside the condensate?

      Unfortunately, recent work with a variety of peptide sequences suggests that finding peptides in the density range proposed by the referee may be very challenging. For example, Pettit and his co-workers have extensively studied the behaviour of GGXGG peptides. In a recent work, using the CHARMM36m force field and TIP3P water, they report densities of ~1.2-1.3 g/mL for capped pentapeptide condensates (Workman et al, Biophys. J. 2024; DOI: 10.1016/j.bpj.2024.05.009). Brown and Potoyan have recently run simulations of zwitterionic GXG tripeptides with the Amber99sb-ILDNQ force field and TIP3P water, starting with a homogenous distribution in cubic simulation boxes (Biophys. J. 2024, DOI: 10.1016/j.bpj.2023.12.027). In a box with an initial concentration of 0.25 g/mL, upon phase separation, the peptide ends up occupying what would seem to be ~1/3 of the box, although we could not find exact numbers. This would imply densities of ~0.75 g/mL in the dense phase, with the additional problem of many charges. Finally, Joseph and her co-workers have recently simulated a set of hexapeptide condensates with varied compositions using a combination of atomistic and coarse-grained simulations. For the atomistic simulations, the Amber03ws force field and TIP4P water were used (see BioRxiv reference 10.1101/2025.03.04.641530). They have found values of the protein density in the dense phase ranging between 0.8 and 1.2 g/mL.  The consistency in the range of densities reported in these studies suggests that short peptides, at least up to 7-residues long, tend to form quite dense condensates, akin to those investigated in our work. While the examples mentioned do not comprehensively span the full range of peptide lengths, sequences, and force fields, they nonetheless support the general behaviour we observe. A systematic exploration of all these variables would require an extensive search in parameter space, which we believe falls outside the scope of the present study.

      (2) Do the conclusions hold for phase-separating systems that mostly rely on electrostatic interactions to undergo LLPS, like protein-RNA complex coacervates? In other words, could the authors try the same calculations for a binary mixture composed of polyR-polyE, or polyK-polyE?

      This is an excellent idea that we may attempt in future work, but the remit of the current work is aromatic amino acids Phe and Tyr only. Hence, we do not include calculations or discussion on polyR-polyE systems in our revised manuscript.

      (3) One of the major approximations made by the authors is the length of the peptides within the condensates, which is not realistic, or their density. Specifically, could they double or triple the length of these peptides while maintaining their composition so it can be quantified the impact of sequence length in the transfer of free energies?

      We thank the referee for this comment and agree with the main point, which was stated as a limitation in our original submission. The suggested calculations anticipate research that we are planning but will not include in the current work. One of the advantages of our model systems is that the small size of the peptides allows for small simulation boxes and relatively rapid sampling. Longer peptide sequences would require conformational sampling beyond our current capabilities, if done systematically. An example of these limitations is the amount of data that we had to discard from the new simulations we report, which amounts to up to 200 ns of our replica exchange runs in smaller simulation boxes (i.e. >19 μs in total for the 48 replicas of the two condensates!). As stated in the answer to point 1, we have found in the literature work on peptides in the range of 1-7 residues with consistent densities. Additionally, a recent report using alchemical transformations using equilibrium techniques with tetrapeptide condensates, pointing to the role of transfer free energy as driving force for condensate formation, further supports the observations from our work.

      Minor issues:

      (1) The caption of Figure 3B is not clear. It can only be understood what is depicted there once you read the main text a couple of times. I encourage the authors to clarify the caption.

      We have rewritten the caption for greater clarity. Now it reads as follows:

      Time evolution of the density profiles calculated across the longest dimension of the simulation box (L) in the coexistence simulations. In blue we show the density of all the peptides, and in dark red that of the F/Y residue in the GGXGG peptide.

      (2) Why was the RDF from Figure 5A cut at such a short distance? Can the authors expand the figure to clearly show that it has converged?

      In the updated Figure 5 (now Fig. 6), we have extended the g(r) up to r=1.75 nm so that it clearly plateaus at a value of 1.

    1. eLife Assessment

      This valuable study reports evidence that items maintained in working memory can bias attention in an oscillatory manner, with the attentional capture effect fluctuating at theta frequency. The study provides incomplete evidence that this dynamic attentional bias is associated with oscillatory neural mechanisms, particularly in the alpha and theta bands, as measured by EEG. The study will be relevant for researchers studying attention, working memory, and neural oscillations, particularly those interested in how memory and perception interact over time.

    2. Reviewer #1 (Public review):

      Summary

      In the presented paper, Lu and colleagues focus on how items held in working memory bias someone's attention. In a series of three experiments, they utilized a similar paradigm in which subjects were asked to maintain two colored squares in memory for a short and variable time. After this delay, they either tested one of the memory items or asked subjects to perform a search task.

      In the search task, items could share colors with the memory items, and the authors were interested in how these would capture attention, using reaction time as a proxy. The behavioral data suggest that attention oscillates between the two items. At different maintenance intervals, the authors observed that items in memory captured different amounts of attention (attentional capture effect).

      This attentional bias fluctuates over time at approximately the theta frequency range of the EEG spectrum. This part of the study is a replication of Peters and colleagues (2020).

      Next, the authors used EEG recordings to better understand the neural mechanisms underlying this process. They present results suggesting that this attentional capture effect is positively correlated with the mean amplitude of alpha power. Furthermore, they show that the weighted phase lag index (wPLI) between the alpha and theta bands across different electrodes also fluctuates at the theta frequency.

      Strengths

      The authors focus on an interesting and timely topic: how items in working memory can bias our attention. This line of research could improve our understanding of the neural mechanisms underlying working memory, specifically how we maintain multiple items and how these interact with attentional processes. This approach is intriguing because it can shed light on neuronal mechanisms not only through behavioral measures but also by incorporating brain recordings, which is definitely a strength.<br /> Subjects performed several blocks of experiments, ranging from 4 to 30, over a few days depending on the experiment. This makes the results - especially those from behavioral experiments 2 and 3, which included the most repetitions - particularly robust.

      Weaknesses

      One of the main EEG results is based on the weighted phase lag index (wPLI) between oscillations in the alpha and theta bands. In my opinion, this is problematic, as wPLI measures the locking of oscillations at the same frequency. It quantifies how reliably the phase difference stays the same over time. If these oscillations have different frequencies, the phase difference cannot remain consistent. Even worse, modeling data show that even very small fluctuations in frequency between signals make wPLI artificially small (Cohen, 2015).

      In response authors stated : "Additionally, the present study referenced previous research by using the wPLI index as a measure of cross-frequency coupling strength31,64-66"<br /> Unfortunately, after checking those publications, we can see that in paper 31 there is no mention of "wPLI" or "PLV." In 64 and 65, the authors use wPLI, but only to measure same-frequency coherence, whereas cross-frequency coupling is computed by phase-amplitude coupling or cross-frequency coupling also known as n:m-PS. In 66, I cannot find any cross-frequency results, only cross-species analysis. This is very problematic, as it indicates that the authors included references in their rebuttal without verifying their relevance.<br /> 31 de Vries, I. E. J., van Driel, J., Karacaoglu, M. & Olivers, C. N. L. Priority Switches in Visual Working Memory are Supported by Frontal Delta and Posterior Alpha Interactions. Cereb Cortex 28, 4090-4104, doi:10.1093/cercor/bhy223 (2018).64 Delgado-Sallent, C. et al. Atypical, but not typical, antipsychotic drugs reduce hypersynchronized prefrontal-hippocampal circuits during psychosis-like states in mice: Contribution of 5-HT2A and 5-HT1A receptors. Cerebral Cortex 32, 870 3472-3487 (2022). 65 Siebenhühner, F. et al. Genuine cross-frequency coupling networks in human resting-state electrophysiological recordings. PLoS Biology 18, e3000685 (2020). 66 Zhang, F. et al. Cross-Species Investigation on Resting State Electroencephalogram. Brain Topogr 32, 808-824, doi:10.1007/s10548-019-00723-x (2019).

      Another result from the electrophysiology data shows that the attentional capture effect is positively correlated with the mean amplitude of alpha power. In the presented scatter plot, it seems that this result is driven by one outlier. Unfortunately, Pearson correlation is very sensitive to outliers, and the entire analysis can be driven by an extreme case. I extracted data from the plot and obtained a Pearson correlation of 0.4, similar to what the authors report. However, the Spearman correlation, which is robust against outliers, was only 0.13 (p = 0.57) indicating a non-significant relationship.

      Cohen, M. X. (2015). Effects of time lag and frequency matching on phase based connectivity. Journal of Neuroscience Methods, 250, 137-146

    3. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Thank you very much for your recognition of our work and for pointing out the shortcomings. We have made revisions one by one and provided corresponding explanations regarding the issues you raised.

      Weaknesses:

      One of the main EEG results is based on the weighted phase lag index (wPLI) between oscillations in the alpha and theta bands. In my opinion, this is problematic, as wPLI measures the locking of oscillations at the same frequency. It quantifies how reliably the phase difference stays the same over time. If these oscillations have different frequencies, the phase difference cannot remain consistent. Even worse, modeling data show that even very small fluctuations in frequency between signals make wPLI artificially small (Cohen, 2015).

      thank you for raising the question regarding the application of wPLI between the alpha and theta bands, which indeed deserves further explanation. In our study, we referred to some relevant previous literatures and adopted their approach of using wPLI to measure cross-frequency coupling strength, as this index itself can reflect the stability of phase differences. We have also considered the point you mentioned that the phase differences of oscillations with different frequencies are difficult to remain consistent. However, in this study, the presentation times of the two memory items are the same, which is fair to both from this perspective. Moreover, the study observed that the wPLI values of these two items alternately dominate over time, and this changing pattern is consistent with the regularity of behavioral data. It seems hard to explain this as a mere coincidence. 

      The corresponding discussion has been added to the revised part of the paper:“the present study referenced previous research by using the wPLI index as a measure of cross-frequency coupling strength31,64-66 (this index quantifies the stability of phase differences), yet the phases of different oscillations inherently change over time. However, this is fair to the two memory items in the present study, as their presentation times were balanced. The study found that the wPLI values of the two items alternately dominated over time, consistent with the pattern of behavioral data, which is hardly explicable by coincidence”

      Another result from the electrophysiology data shows that the attentional capture effect is positively correlated with the mean amplitude of alpha power. In the presented scatter plot, it seems that this result is driven by one outlier. Unfortunately, Pearson correlation is very sensitive to outliers, and the entire analysis can be driven by an extreme case. I extracted data from the plot and obtained a Pearson correlation of 0.4, similar to what the authors report. However, the Spearman correlation, which is robust against outliers, was only 0.13 (p = 0.57), indicating a non-significant relationship.

      you mentioned that the correlation between the attentional capture effect and the mean amplitude of alpha power in the electrophysiological data might be influenced by an outlier, and you also compared the results of Pearson and Spearman correlation coefficients, which we fully agree with.

      It is true that the small sample size of the current study makes the results vulnerable to interference from extreme data. Regarding this point, I have already explained it in the limitations section of the discussion in the revised manuscript:“the sample size of the current study is small, which may render the results vulnerable to interference from extreme cases”

      The behavioral data are interesting, but in my opinion, they closely replicate Peters and colleagues (2020) using a different paradigm. In that study, participants memorized four spatial positions that formed the endpoints of two objects, and one object was cued. Similarly, reaction times fluctuated at theta frequency, and there was an anti-phase relationship between the two objects. The main novelty of the present study is that this bias can be transferred to an unrelated task. While the current study extends Peters and colleagues' findings to a different task context, the lack of a thorough, direct comparison with Peters et al. limits the clarity of the novel insights provided.

      thank you very much for your attention to the behavioral data and its relevance to the study by Peters et al. (2020). We have noticed that there are similarities in some results between the two studies, which also indicates the stability of the relevant phenomena from one aspect.

      However, we would also like to further explain the differences between this study and the study by Peters et al. In the study by Peters et al., participants memorized four spatial positions that formed the endpoints of two objects (one of which was cued), and their results showed that after the two objects disappeared, attention fluctuated at the theta rhythm between their original positions with an inverse correlation. In contrast, the present study explores the manner of memory maintenance indirectly by leveraging the guiding effect of working memory on attention, effectively avoiding the influence of spatial positions.

      The study by Peters et al. directly examined differences in probe positions, clearly demonstrating that attention undergoes rhythmic changes at the two spatial locations and persists after the objects vanish, but it hardly clarifies the rhythmicity of working memory performance. Whereas the present study directly investigates such performance using the attention-capture effect of working memory, revealing that when maintaining multiple memory items, their attention-capturing capabilities alternate in dominance, i.e., multiple working memory items alternately become priority templates in a rhythmic manner. This is also some new attempts in the research perspective and method of this study.

      The corresponding discussion has been added to the revised part of the paper

      “Similar to the present study, Peters et al. had participants memorize four spatial positions forming the endpoints of two objects (one cued), and their results showed that after the two objects disappeared, attention fluctuated at the theta rhythm between their original positions with an inverse correlation; in contrast, the present study explores the manner of memory maintenance indirectly by leveraging the guiding effect of working memory on attention, effectively avoiding the influence of spatial positions—while Peters et al.’s study, which directly examined differences in probe positions, clearly demonstrates that attention undergoes rhythmic changes at the two spatial locations and persists after the objects vanish, it hardly clarifies the rhythmicity of working memory performance, whereas the present study directly investigates such performance using the attention-capture effect of working memory, revealing that when maintaining multiple memory items, their attention-capturing capabilities alternate in dominance, i.e., multiple working memory items alternately become priority templates in a rhythmic manner.”

      Reviewer #2 (Public review):

      The information provided in the current version of the manuscript is not sufficient to assess the scientific significance of the study.

      thank you very much for pointing out the multiple issues in our manuscript. Due to several revisions of this work, including experimental adjustments, there have been some inconsistencies in details. We appreciate you identifying them one by one.  We have made corresponding revisions based on your comments:

      (1) In many cases, the details of the experiments or behavioral tasks described in the main text are not consistent with those provided in the Materials and Methods section. Below, I list only a few of these discrepancies as examples:

      a) For Experiment 1, the Methods section states that the detection stimulus was presented for 2000 ms (lines 494 and 498), but Figure 1 in the main text indicates a duration of 1500 ms.

      we greatly appreciate you catching this inconsistency. We have made unified revisions by referring to the final implemented experimental procedures.  Corresponding revisions have been made in the paper:

      b) For Experiment 2, not only is the range of SOAs mentioned in the Methods section inconsistent with that shown in the main text and the corresponding figure, but the task design also differs between sections.

      Thank you for bringing this discrepancy to our attention. We have made unified revisions by referring to the final implemented experimental procedures. The correct SOAs are 233:33:867 ms.

      Corresponding revisions have been made in the paper:

      c) For Experiment 3, the main text indicates that EEG recordings were conducted, but in the Methods section, the EEG recording appears to have been part of Experiment 2 (lines 538-540).

      we’re grateful for you noticing this mix-up. In fact, only Experiment 3 is an EEG experiment, and we have made corresponding corrections in the "Methods" section. Corresponding revisions have been made in the paper: “The remaining components after this process were then projected back into the channel space. We extracted data from -500 ms to 2000 ms relative to cue stimulus presentation in Experiment 3.”  

      (2) The results described in the text often do not match what is shown in the corresponding figure. For example:

      a) In lines 171-178, the SOAs at which a significant difference was found between the two conditions do not appear to match those shown in Figure 2A.

      Many thanks for spotting this error. The previous results missed one SOA time, namely 33 ms, leading to a 33 ms difference in time. We have corrected it in the revised manuscript.

      Corresponding revisions have been made in the paper:“Specifically, the capture effect of cued items was significantly greater than that of uncued items at SOAs of 267ms (t(24) = 2.72, p = 0.03, Cohen's d = 1.11), 667ms (t(24) = 2.37, p = 0.03, Cohen's d= 0.97) and 833ms (t(24) = 3.53, p = 0.002, Cohen's d = 1.44), while the capture effect of uncued items was significantly greater than that of cued items at SOAs of 333ms (t(24) = 2.97, p = 0.007, Cohen's d = 1.21), 367ms (t(24) = 2.14, p = 0.04, Cohen's d = 0.87), 433ms (t(24 )= 2.49, p = 0.02, Cohen's d = 1.02), 467ms (t(24)=2.37, p = 0.03, Cohen's d = 0.97) and 567ms (t(24)=2.72, p = 0.02, Cohen's d = 1.11). ”

      (b) In Figure 4, the figure legend (lines 225-228) does not correspond to the content shown in the figure.

      we appreciate you pointing out this oversight. When adjusting the color scheme during the revision of the manuscript, we neglected to revise the legend, which has now been corrected in the revised manuscript.

      Corresponding revisions have been made in the paper:“Figure 4. The red line represents the average across all participants of the Fourier transforms of the differences in capture effects between left and right memory items at the individual level. The gray area represents values below the group average of medians derived from 1000 permutations, with each permutation involving Fourier transforms for each participant. *: p < 0.05.”

      (c) In Figure 9, not sufficient information is provided within the figure or in the text, making it difficult to understand. Consequently, the results described in the text cannot be clearly linked to the figure.

      Thank you for drawing our attention to this issue. We have revised Figure 9 and its legend in the revised manuscript to make them clearer and easier to understand.

      Corresponding revisions have been made in the paper

      (3) Insufficient information is provided regarding the data analysis procedures, particularly the permutation tests used for the data presented in Figures 2B, 4, and 10. The results shown in these figures are critical for the main conclusions drawn in the manuscript.

      we’re thankful for you highlighting this gap. In the revised manuscript, we have provided a more detailed explanation in the "Methods" section, especially regarding the content related to frequency analysis, to make the expression clearer.

      Corresponding revisions have been made in the paper:“As shown in Figure 8, the alpha power (8-14 Hz) induced by cued and uncued items alternated in dominance during the memory retention phase. To quantify this rhythmic alternation, we conducted a spectral analysis following these steps: First, we computed the power difference between cued and uncued items within the 8-14 Hz range during the retention phase. These differences were then downsampled to 100 Hz using a 10 ms window for averaging, generating a one-dimensional time series spanning the 0-2000 ms retention period. This time series was subsequently subjected to amplitude spectrum analysis across frequencies from 1 Hz to 50 Hz using Fourier transformation.

      To assess the statistical significance of the observed spectral features, we employed a permutation test. Specifically, we randomly shuffled the temporal order of the time series of power differences between cued and uncued items—thereby preserving the amplitude distribution of the data while eliminating temporal correlations in the original sequence—and repeated the Fourier transform and spectral analysis for each shuffled time series. This permutation process was replicated 1000 times to generate a null distribution of spectral power values. A frequency component in the original data was considered statistically significant if its power ranked within the top 5% of the corresponding null distribution (p < 0.05).

      We applied the same analytical pipeline to investigate differences in the weighted phase-lag index (wPLI) between the contralateral regions of the two items and the prefrontal cortex during the retention phase. Specifically, wPLI differences (i.e., the difference between the two conditions) were computed, downsampled to 100 Hz using a 10 ms window for averaging to generate a time series spanning 0-2000 ms, and then subjected to amplitude spectrum analysis (1-50 Hz) using Fourier transformation. Significance was assessed via the identical permutation test procedure described above (randomly shuffling the temporal order of the difference time series).”

    1. eLife Assessment

      Marshall et al describe the effects of altering metabotropic glutamate receptor 5 activity on activity of D1 receptor expressing spiny projection neurons in dorsolateral striatum focusing on two states - locomotion and rest. The authors examine effects of dSPN-specific constitutive mGlu5 deletion in several motor tests to arrive at this finding. Effects of inhibiting the degradation of the endocannabinoid 2-arachidonoyl glycerol are also examined. Overall, this is a valuable study that provides solid new information of relevance to movement disorders and possibly psychosis.

    2. Joint Public Review:

      Marshall et al describe the effects of altering metabotropic glutamate receptor 5 activity on activity of D1 receptor expressing spiny projection neurons in dorsolateral striatum focusing on two states - locomotion and rest. The authors examine effects of dSPN-specific constitutive mGlu5 deletion in several motor tests to arrive at this finding. Effects of inhibiting the degradation of the endocannabinoid 2-arachidonoyl glycerol are also examined. Overall, this is a valuable study that provides solid new information of relevance to movement disorders and possibly psychosis.

      The combination of in vivo cellular calcium imaging, pharmacology, receptor knockout and movement analysis is effectively used. The main findings do not involve gross firing rates or numbers of active neurons, but rather are revealed by specialized measures involving Jaccard coefficient and an assessment of coactivity. The authors conclude that mGlu5 expressed in dSPNs contributes to movement through effects on clustered spatial coactivity of dSPNs. More specifically, reduced mGluR5 increases coactivity during rest (defined as low velocity periods) but not during locomotion periods. The authors observe a role for mGlu5 expression in dSPNs in modulating the frequency of mEPSCs, suggesting a role in presynaptic neurotransmitter release. Some data suggesting the story may be different in the other major SPN subpopulation (iSPNs) are also presented but these studies are relatively underdeveloped leaving some ambiguity as to how cell-selective the findings are. In addition, an occlusion experiment in which the pharmacological mGluR5 agents are delivered to the dSPN mGluR5 KO to clarify if other sites of action are involved beyond the proposed D1-expressing neurons is missing. Finally, the authors present a working model that sets the stage for future experimentation. Overall, this study provides an important and detailed assessment of mGluR5 contributions to striatal circuit function and behavior.

      Remaining concerns include:

      (1) To clarify that dSPNs are sole site of action, it is necessary to examine effects of the mGlu5 NAM in the dSPN mGlu5 cKO mice. If the effects of the two manipulations occluded one another this would certainly support the hypothesis that the drug effects are mediated by receptors expressed in dSPNs. A similar argument can be made for examining effects of the JNJ PAM in the cKO mice.

      (2) There is a concern that the D1 Cre line used (Ey262), which may also target cortical neurons expands the interpretation of the study beyond the striatal populations. Further discussion of this point, particularly in the interpretation of the mGluR5 cKO experiments, would provide a better understanding of the contribution of the paper.

      (3) The use of CsF-based whole-cell internal solutions has caused concern in some past studies due to possible interference with G-protein, phosphatase and channel function (https://www.sciencedirect.com/science/article/abs/pii/S1044743104000296, https://www.jneurosci.org/content/jneuro/6/10/2915.full.pdf). It is reassuring the DHPG-induced LTD was still observable with this solution. However, it might be worth examining this plasticity with a different internal to ensure that the magnitude of the agonist effect is not altered by this manipulation.

      (4) Behavioral resolution of actions at low velocity that are termed "rest" are not explored in this study. Thus, a remaining ambiguity is whether the activities in rest include only periods of immobility or other low-velocity activities such as grooming or rearing.

    3. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      “Can the authors offer a hypothesis as to how decreased coactivity promotes increased movement velocity.” 

      In our revision we have added an additional metric measuring how spatial coactivity changes during movement onset, the spatial correlation index, which replicates a previous finding that co-activity among proximal neurons is statistically greater surrounding movement onset. We did not find, as outlined in the revision, that mGluR5 manipulations significantly altered this relationship. Our data therefore shows, consistent with that shown previously, that ensembles of dSPNs that are co-active during movement onset, in particular ambulatory movement, are more likely to contain neurons that are closer together and the neurons are highly active. In contrast, rest ensembles contain neurons that are less active but have more highly correlated activity, across all pairwise distances. Additionally, mGluR5 inhibition, genetic or pharmacological, promotes the activation of rest ensembles but does not affect the properties of movement ensembles. Previous studies (e.g. Klaus A. et al., 2017) have shown that neurons in rest ensembles are, in general, unlikely to also be members of movement ensembles, We therefore hypothesize that corticostriatal synapses onto SPNs of rest ensembles are more likely, during spontaneous behavior, to have reduced synaptic weight due to mGluR5 signaling, potentially due to eCB mediated inhibition of neurotransmitter release. Therefore, when we inhibit mGluR5 at these synapses, we increase synaptic weight and increase the probability of activation of this coordinated rest ensemble, which suppresses movement. If, on the other hand, the synapses that govern activation of neurons in movement ensembles have a higher weight, they may be unaffected by mGluR5 inhibition. 

      The use of the Jaccard similarity index in this study is not intuitive and not fully explained by the methods or the diagram in Figure 1. 

      We have added more detail to the paper to explain the methodology of the jaccard similarity measure. The advantage of this method is that is specifically captures cells that are jointly active, as opposed to jointly inactive and is therefore useful for capturing co-activity in our sparsely active Ca<sup>2+</sup> imaging data. 

      The analysis of a possible 2-AG role in the mGlu5 mediated processes is incomplete. 

      We agree that, as an experiment to outline which endocannabinoids are involved in modulating synaptic strength through mGluR5, this experiment alone is not sufficient.

      However, our main focus in this paper is how manipulations of mGluR5 affect the spatiotemporal dynamics of dSPNs and we chose not to focus on specific mechanisms of endocannabinoid signaling, though these would certainly be interesting to investigate further in vivo.

      It would seem to be a simple experiment to examine effects of the mGlu5 NAM in the dSPN mGlu5 cKO mice. If effects of the two manipulations occluded one another this would certainly support the hypothesis that the drug effects are mediated by receptors expressed in dSPNs. A similar argument can be made for examining effects of the JNJ PAM in the cKO mice. 

      We agree that this experiment would be valuable and extend our findings presented in the paper, however, it has practically been outside the scope of the current work. 

      Reviewer #2 (Public review):

      Pharmacological and genetic manipulations of mGluR5 do not differentially/preferentially modulate the activity of proximal vs distal dSPNs, therefore, it could also be interpreted that mGluR5 is blanketly boosting/suppressing all dSPN activity as opposed to differential proximal/distal spatial relationships. 

      As in the response to reviewer 1 above, we have added additional clarification to the text explaining that our manipulations do not differentially affect the co-activity of proximal vs distal dSPNs, this is also quantified throughout the text using the spatial coordination index. However, we disagree that “it could also be interpreted that mGluR5 is blanketly boosting/suppressing all dSPN activity” as we do not observe statistically significant changes in the event rate following either pharmacological or genetic manipulations of mGluR5. Rather, we consistently observe statistically significant changes in co-activity among neurons, the extent to which activity of active neurons during either rest or movement are correlated with each other. This is the central finding of our manuscript, inhibiting or potentiating mGluR5 signaling alters behavior, not by blanket suppression or enhancement of the activity as measured using the event rate, of dSPNs, but by affecting their ensemble dynamic properties.  Co-activity during rest versus ambulatory movement is statistically greater in both proximal and distal cells and inhibiting mGluR5 increases this co-activity and decreases movement. 

      For these analyses of prox vs distal and all others, please include the detail of how many proximal vs distal cells were involved and per subject. 

      We have added a supplemental table that details the number of cells included per subject in all analyses

      Ln. 151-152: Please provide data concerning how volumes of infectivity differ between injecting AAV vs. coating the lens? If these numbers are very different, this could impact the number of Jaccard pairings and bias results. 

      While viral injection may lead to a larger volume of expression, with this one photon imaging method only those cells within ~200 microns of the edge of the lens will be able to be resolved, therefore practically, if there is an additional volume of infected tissue outside of the field of view of the lens, it would not affect the results as these neurons will not be resolved by the endoscope camera. Accordingly, the average number of cells detected per session is very similar following each approach (mean # of cells per session with coating 90.93 ± 23.69 cells, with viral injection 90.03 ± 29.29 cells)

      Is mGluR5 affecting dSPN activity in other measures beyond co-activity and rate? Does the amplitude of events change?

      We have added supplemental data for figures 2, 3, and 5 demonstrating that manipulations of mGluR5 do not affect the amplitude or length of Ca<sup>2+</sup> events included in the analysis. 

      What is the model of mGluR5 signaling in a resting state vs. movement? What other behaviors are occurring when the mouse is in a low velocity "resting state" (0-0.5 cm/s). If this includes other forms of movement (i.e. rearing, grooming) then the animal really isn't in a resting state. This is not mentioned in the open field behavior section of the methods and should be described (Ln. 486) in addition to greater explanation of what behavior measures were obtained from the video tracking software (only locomotion?)

      It would be very interesting to determine if during “rest,” when the animals is not engaged in ambulatory behavior, it may be engaged in some fine motor behavior. However, the resolution of the cameras used to measure locomotor activity in this dataset does not allow us to do this. 

      There is large variability in co-activity in proximal dSPNs when animals are "resting" (2j). Could this be explained by different behavior states within your definition of "rest"?

      We agree that if the animal is engaging in fine motor behavior that we cannot resolve with our behavior setup, this could produce some variability in coactivity. However, as shown previously (e.g. Klaus A. et al., 2017), ensembles active when the animal is not moving (our definition of “resting”), regardless of additional fine motor behaviors the animal may be engaged in when not moving, are substantially different that those ensembles that are active when the animal is moving. We therefore expect that this may limit, although potentially not eliminate, variability due to different behavioral states we may have grouped into our “resting” category. Unfortunately, as mentioned above, we are not able resolve variations in fine motor output in this behavioral data. 

      Have you performed IHC, ISH or another measure to validate D1 cell specific cKO?

      The mGluR5<sup>loxP/loxP</sup> mice used in this study were characterized previously by our lab (Xu et al., 2009), we used the same mice here with a different, but also published and characterized Cre-driver line, Drd1a-Cre Ey262 (Gerfen et al., 2013).

      Why are the "Mean Norm Co-activity" values in 5e so high in this experiment relative to figures 2-4?  

      In experiments where we treated the same animal with vehicle and a drug (i.e., experiments in Figure 2 and 3), we normalized the values for each animal in the drug treatment group to the distal bin of that animal following vehicle treatment. This allowed us to more clearly resolve the changes within each animal due to drug treatment. As comparisons in the data in figure 5 d–f are between different animals (rather than different treatments of the same animal) we could not perform this normalization procedure.  

      Reviewer #3 (Public review):

      Some D1 Cre lines have expression in the cortex. Which specific Cre line was used in this study? 

      We used, Drd1a-Cre Ey262. This is included in methods. 

      The text says JNJ treatment .... increased locomotor speed (Figure 3b) and increased the duration but not frequency of movement bouts (Figure 3c, d). However, the statistics of the figure legends say: however the change in mean velocity (3b) is not significant (p=0.060, U=3, Mann-Whitney U test), nor is the mean bout length during vehicle and JNJ (p=0.060, U=3, Mann-Whitney U test) (3d) Comparison of mean number of bouts of each animal during vehicle and JNJ (p=0.403, U=8, Mann-Whitney U test). 

      This has been corrected to indicate only the change in time spend at rest is statistically significant.

      This effect was most pronounced during periods of rest (Figure 3i, j). The decrease was only in rest? Are the colors in Figure 3J inverted? Therefore, JNJ treatment had effects that were qualitatively the inverse to the effects of fenobam on locomotion and dSPN activity. 

      We have corrected the text to state that, overall, and during periods of rest but not movement, JNJ had effects that were qualitatively the opposite of fenobam.

    1. eLife Assessment

      The important paper presents a new behavioral assay for Drosophila aggression and demonstrates that social experience influences fighting strategies, with group-housed males favoring high-intensity but low-frequency tussling over aggressive lunging observed in isolated males. The experiments are solid and the conclusions are of interest to researchers studying the impact of social isolation on aggression.

    2. Reviewer #1 (Public review):

      This work addresses an important question in the field of Drosophila aggression and mating. Prior social isolation is known to increase aggression in males, manifesting as increased lunging, which is suppressed by group housing (GH). However, it is also known that single housed (SH) males, despite their higher attempts to court females, are less successful. Here, Gao et al., develop a modified aggression assay to address this issue by recording aggression in Drosophila males for 2 hours, with a virgin female immobilized by burying its head in the food. They found that while SH males frequently lunge in this assay, GH males switch to higher intensity but very low frequency tussling. Constitutive neuronal silencing and activation experiments implicate cVA sensing Or67d neurons in promoting high frequency lunging, similar to earlier studies, whereas Or47b neurons promote low frequency but higher intensity tussling. Optogenetic activation revealed that three pairs of pC1SS2 neurons increase tussling. Cell-type-specific DsxM manipulations combined with morphological analysis of pC1SS2 neurons and side-by-side tussling quantification link the developmental role of DsxM to the functional output of these aggression-promoting cells. In contrast, although optogenetic activation of P1a neurons in the dark did not increase tussling, thermogenetic activation under visible light drove aggressive tussling. Using a further modified aggression assay, GH males exhibit increased tussling and maintain territorial control, which could contribute to a mating advantage over SH males, although direct measures of reproductive success are still needed

      Strengths:

      Through a series of clever neurogenetic and behavioral approaches, the authors implicate specific subsets of ORNs and pC1 neurons in promoting distinct forms of aggressive behavior, particularly tussling. They have devised a refined territorial control paradigm, which appears more robust than earlier assays. This new setup is relatively clutter-free and could be amenable to future automation using computer vision approaches. The updated Figure 5, which combines cell-type-specific developmental manipulation of pC1SS2 neurons with behavioral output, provides a link between developmental mechanisms and functional aggression circuits. The manuscript is generally well written, and the claims are largely supported by the data.

      Weakness:

      All prior concerns have been addressed in the revised manuscript. The added 'Limitations of the study' section is a welcome and important clarification. Despite these limitations, the study provides valuable insights into the neural and behavioral mechanisms of Drosophila aggression.

    3. Reviewer #2 (Public review):

      Summary:

      Gao et al. investigated the change of aggression strategies by the social experience and its possible biological significance by using Drosophila. Two modes of inter-male aggression in Drosophila are known: lunging, high-frequency but weak mode, and tussling, low-frequency but more vigorous mode. Previous studies have mainly focused on the lunging. In this paper, the authors developed a new behavioral experiment system for observing tussling behavior and found that tussling is enhanced by group rearing, while lunging is suppressed. They then searched for neurons involved in the generation of tussling. Although olfactory receptors named Or67d and Or65a have previously been reported to function in the control of lunging, the authors found that these neurons do not function in the execution of tussling and another olfactory receptor, Or47b, is required for tussling, as shown by the inhibition of neuronal activity and the gene knockdown experiments. Further optogenetic experiments identified a small number of central neurons pC1[SS2] that induce the tussling specifically. These neurons express doublesex (dsx), a sex-determination factor, and knockdown of dsx strongly suppresses the induction of tussling. In order to further explore the ecological significance of the aggression mode change in group-rearing, a new behavioral experiment was performed to examine the territorial control and the mating competition. And finally, the authors found that differences in the social experience (group vs. solitary rearing) are important in these biologically significant competitions. These results add a new perspective to the study of aggression behavior in Drosophila. Furthermore, this study discusses an interesting general model in which the social experience modified behavioral changes play a role in reproductive success.

      Strengths:

      A behavioral experiment system that allows stable observation of tussling, which could not be easily analyzed due to its low-frequency, would be very useful. The experimental setup itself is relatively simple, just addition of a female to the platform, so it should be applicable to future research. The finding about the relationship between the social experience and the aggression mode change is quite novel. Although the intensity of aggression changes with the social experience was already reported in several papers (Liu et al., 2011 etc), the fact that the behavioral mode itself changes significantly has rarely been addressed, and is extremely interesting. The identification of sensory and central neurons required for the tussling makes appropriate use of the genetic tools and the results are clear. A major strength of this study in the neurobiology is the finding that another group of neurons (Or47b-expressing olfactory neurons and pC1[SS2] neurons), distinct from the group of neurons previously thought to be involved in low-intensity aggression (i.e. lunging), function in the tussling behavior. Furthermore, the results showing that the regulation of aggression by pC1[SS2] neurons is based the function of the dsx gene will bring a new perspective to the field. Further investigation of the detailed circuit analysis is expected to elucidate the neural substrate of the conflicting between the two aggression modes. The experimental systems examining the territory control and the reproductive competition in Fig. 6 are novel and have advantages in exploring their biological significance. It is important to note that, in addition to showing the effects of age and social experience on territorial and mating behaviors, the authors suggested that an altered fighting strategy has effects with respect to these behaviors.

      Weaknesses:

      New experimental paradigm in Fig. 6 is quite useful, but as the authors mentioned, still the future investigations are needed to reveal a direct relationship between aggression strategies and reproductive success.

    4. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #1 (Public review):

      This work addresses an important question in the field of Drosophila aggression and mating. Prior social isolation is known to increase aggression in males, manifesting as increased lunging, which is suppressed by group housing (GH). However, it is also known that single housed (SH) males, despite their higher attempts to court females, are less successful. Here, Gao et al., develop a modified aggression assay to address this issue by recording aggression in Drosophila males for 2 hours, with a virgin female immobilized by burying its head in the food. They found that while SH males frequently lunge in this assay, GH males switch to higher intensity but very low frequency tussling. Constitutive neuronal silencing and activation experiments implicate cVA sensing Or67d neurons in promoting high frequency lunging, similar to earlier studies, whereas Or47b neurons promote low frequency but higher intensity tussling. Optogenetic activation revealed that three pairs of pC1SS2 neurons increase tussling. Cell-type-specific DsxM manipulations combined with morphological analysis of pC1SS2 neurons and side-by-side tussling quantification link the developmental role of DsxM to the functional output of these aggression-promoting cells. In contrast, although optogenetic activation of P1a neurons in the dark did not increase tussling, thermogenetic activation under visible light drove aggressive tussling. Using a further modified aggression assay, GH males exhibit increased tussling and maintain territorial control, which could contribute to a mating advantage over SH males, although direct measures of reproductive success are still needed.

      Strengths:

      Through a series of clever neurogenetic and behavioral approaches, the authors implicate specific subsets of ORNs and pC1 neurons in promoting distinct forms of aggressive behavior, particularly tussling. They have devised a refined territorial control paradigm, which appears more robust than earlier assays using a food cup (Chen et al., 2002). This new setup is relatively clutter-free and could be amenable to future automation using computer vision approaches. The updated Figure 5, which combines cell-type-specific developmental manipulation of pC1SS2 neurons with behavioral output, provides a link between developmental mechanisms and functional aggression circuits. The manuscript is generally well written, and the claims are largely supported by the data.

      Thank you for the precise summary of the manuscript and acknowledgment of the novelty and significance of the study.

      Weakness:

      Although most concerns have been addressed, the manuscript still lacks a rigorous, objective method for quantifying lunging and tussling. Because scoring appears to have been done manually and a single lunge in a 30 fps video spans only 2-3 frames, the 0.2 s cutoff seems arbitrary, and there are no objective criteria distinguishing reciprocal lunging from tussling. Despite this, the study offers valuable insights into the neural and behavioral mechanisms of Drosophila aggression.

      Thank you for this comment. The duration of each lunge was measured by analyzing the videos frame by frame—from the frame before the initiation of the lunge to the frame after its completion—resulting in an average span of 3–5 frames. Given a frame rate of 30 fps, this corresponds to approximately 0.1–0.17 seconds. We acknowledge that there are certain limitations for manually quantifying the two types of aggressive behaviors, which has now been stated in the newly added “Limitations of the Study” section in the revised manuscript.

      Reviewer #2 (Public review):

      Summary:

      Gao et al. investigated the change of aggression strategies by the social experience and its biological significance by using Drosophila. Two modes of inter-male aggression in Drosophila are known: lunging, high-frequency but weak mode, and tussling, low-frequency but more vigorous mode. Previous studies have mainly focused on the lunging. In this paper, the authors developed a new behavioral experiment system for observing tussling behavior and found that tussling is enhanced by group rearing, while lunging is suppressed. They then searched for neurons involved in the generation of tussling. Although olfactory receptors named Or67d and Or65a have previously been reported to function in the control of lunging, the authors found that these neurons do not function in the execution of tussling and another olfactory receptor, Or47b, is required for tussling, as shown by the inhibition of neuronal activity and the gene knockdown experiments. Further optogenetic experiments identified a small number of central neurons pC1[SS2] that induce the tussling specifically. These neurons express doublesex (dsx), a sex-determination factor, and knockdown of dsx strongly suppresses the induction of tussling. In order to further explore the ecological significance of the aggression mode change in group-rearing, a new behavioral experiment was performed to examine the territorial control and the mating competition. And finally, the authors found that differences in the social experience (group vs. solitary rearing) and the associated change in aggression strategy are important in these biologically significant competitions. These results add a new perspective to the study of aggression behavior in Drosophila. Furthermore, this study proposes an interesting general model in which the social experience modified behavioral changes play a role in reproductive success.

      Strengths:

      A behavioral experiment system that allows stable observation of tussling, which could not be easily analyzed due to its low-frequency, would be very useful. The experimental setup itself is relatively simple, just the addition of a female to the platform, so it should be applicable to future research. The finding about the relationship between the social experience and the aggression mode change is quite novel. Although the intensity of aggression changes with the social experience was already reported in several papers (Liu et al., 2011 etc), the fact that the behavioral mode itself changes significantly has rarely been addressed, and is extremely interesting. The identification of sensory and central neurons required for the tussling makes appropriate use of the genetic tools and the results are clear. A major strength of this study in neurobiology is the finding that another group of neurons (Or47b-expressing olfactory neurons and pC1[SS2] neurons), distinct from the group of neurons previously thought to be involved in low-intensity aggression (i.e. lunging), function in the tussling behavior. Furthermore, the results showing that the regulation of aggression by pC1[SS2] neurons is based on the function of the dsx gene will bring a new perspective to the field. Further investigation of the detailed circuit analysis is expected to elucidate the neural substrate of the conflict between the two aggression modes. The experimental systems examining the territory control and the reproductive competition in Fig. 6 are novel and have advantages in exploring their biological significance. It is important to note that in addition to showing the effects of age and social experience on territorial and mating behaviors, the authors experimentally demonstrated that altered fighting strategy has effects with respect to these behaviors.

      Thank you for your precise summary of our study and being very positive on the novelty and significance of the study.

      Reviewer #3 (Public review):

      In this revised manuscript, Gao et al. presented a series of well-controlled behavioral data showing that tussling, a form of high-intensity fighting among male fruit flies (Drosophila melanogaster) is enhanced specifically among socially experienced and relatively old males. Moreover, results of behavioral assays led authors to suggest that increased tussling among socially experienced males may increase mating success. They also concluded that tussling is controlled by a class of olfactory sensory neurons and sexually dimorphic central neurons that are distinct from pathways known to control lunges, a common male-type attack behavior.

      A major strength of this work is that it is the first attempt to characterize behavioral function and neural circuit associated with Drosophila tussling. Many animal species use both low-intensity and high-intensity tactics to resolve conflicts. High-intensity tactics are mostly reserved for escalated fights, which are relatively rare. Because of this, tussling in the flies, like high-intensity fights in other animal species, have not been systematically investigated. Previous studies on fly aggressive behavior have often used socially isolated, relatively young flies within a short observation duration. Their discovery that 1) older (14-days old) flies tend to tussle more often than younger (2 to 7-days-old) flies, 2) group-reared flies tend to tussle more often than socially isolated flies, and 3) flies tend to tussle at later stage (mostly ~15 minutes after the onset of fighting), are the result of their creativity to look outside of conventional experimental settings. These new findings are key for quantitatively characterizing this interesting yet under-studied behavior.

      Newly presented data have made several conclusions convincing. Detailed descriptions of methods to quantify behaviors help understand the basis of their claims by improving transparency. However, I remain concerned about authors' persistent attempt to link the high intensity aggression to reproductive success. The authors' effort to "tone down" the link between the two phenomena remains insufficient. There are purely correlational. I reiterate this issue because the overall value of the manuscript would not change with or without this claim.

      Thank you for acknowledging the novelty and significance of the study. Regarding the relationship you mentioned between high-intensity aggression and reproductive success, we further toned down the statement between them throughout the manuscript in the revised manuscript. We also modified the title to “Social Experience Shapes Fighting Strategies in Drosophila”. In addition, we now added a ‘Limitations of the Study’ section to clearly state the correlation between tussling and reproductive success.

      Reviewer #1 (Recommendations for the authors):

      If possible, mention the EM-connectome data showing the minimal interneuronal path from Or47b ORNs to pC1SS2 neurons (even if derived from the female connectome), which can strengthen the model of parallel sensory-central pathways.

      Thank you for this comment. According to data from the EM connectome, connecting Or47b ORNs to pC1d neurons requires at least two intermediate neurons. An example minimal pathway is: ORN_VA1v (L) → AL-AST1 (L) → PLP245 (L) → pC1d (R). We have added this point in the Discussion section of the revised manuscript.

      I'm not convinced that labeling lunges as "gentle" combat behavior works, either in the abstract or elsewhere. While lunging is indeed a lower-intensity form of aggression compared to tussling, applying anthropomorphic descriptors risks misleading readers.

      Thank you for this comment. We now use “low-intensity” instead of “gentle” to describe lunging.

      In Materials & Methods, please cross-check all figure-panel references after the recent re-numbering (e.g. "Figure 5A6A" etc.).

      Thank you for this comment. We have thoroughly verified the figure panel references in the Materials & Methods section.

      Ensure that Table S1 is clearly cited in the main text where you first describe fly genotypes.

      Thank you for this comment. We have now cited Table S1 in the main text.

      There are multiple grammatical errors and typos throughout the manuscript. Please correct them. Some examples are below, but this is not an exhaustive list:

      Line 98-102 requires rephrasing as the results are already published and not being observed by the authors.

      Thank you for this comment. We have revised the manuscript to “we occasionally observed the high-intensity boxing and tussling behavior in male flies as previously reported (Chen et al., 2002; Nilsen et al., 2004), which….”

      line 116- lower not 'lowed'.

      Corrected.

      line 942 & 945- knock-down males not 'knocking down males'.

      Corrected. Thank you very much for these comments.

      Reviewer #2 (Recommendations for the authors):

      The authors have almost completely answered the major comments I have noted on the ver.1 manuscript: (1) They clearly show changes in fighting strategy in the territory control behavior experiment in Fig. 6-figure supplements. (2) A detailed description of how aggressive behavior is measured. Thus, I am convinced by this revision.

      Thank you for these comments that make the manuscript a better version.

      Furthermore, in Fig. 5, which examined the relationship of pC1[SS2] characteristics with the function of dsx, is a novel data and very interesting. I look forward to further developments.

      Thank you. We will continue to explore this part in our future study.

      However, one point still concerns me.

      Line 192: Although the authors describe it as "usage-dependent," the trans-Tango technique is essentially a postsynaptic cell-labeling technique. It is possible that the labeling intensity in postsynaptic cells increases from the change in expression levels of the Or47b gene due to GH. However, there is no difference in the expression level of the Or47b gene labeled by GFP between SH and GH. Therefore, we cannot conclude that the expression of the Or47b gene is increased by rearing conditions.

      The original paper on trans-TANGO (Talay et al., 2017) does not discuss the usage-dependency. A review of trans-synaptic labeling techniques (Ni, Front Neural Circuits. 2021) discusses that the increase in trans-TANGO signaling with aging may be related to synaptic strength, but there is no experimental evidence for this. In my opinion, the results in Figure 3-figure supplement 2 only weakly suggest that the increase in trans-TANGO signaling may be explained by an increase in synaptic strength due to group rearing.

      We appreciate the reviewer’s insightful comment regarding the interpretation of the trans-Tango signal. Indeed, the original trans-Tango study (Talay et al., 2017) does not claim that the method is usage-dependent. The observed increase in trans-Tango labeling with age, as reported in their supplemental figures, may reflect accumulation over time, potentially influenced by synaptic maturation or increased component expression. To avoid overstating our results, we have revised the relevant statement in the manuscript to remove the term "usage-dependent" and now describe the change in trans-Tango signal more cautiously.  

      Reviewer #3 (Recommendations for the authors):

      Below are the cases where their professed attempts to "tone down the statement" appear ignored:

      Lines 27-29:

      "Our findings... suggest how social experience shapes fighting strategies to optimize reproductive success".

      We have now revised the manuscript to “Our findings… suggest that social experience may shape fighting strategies to optimize reproductive success.”

      Lines 85-86:

      "... discover that this infrequent yet intense form of combat is... crucial for territory dominance and mating competition".

      We have now revised the manuscript to “…discover that this infrequent yet intense form of combat is enhanced by social enrichment, while the low-intensity lunging is suppressed by social enrichment.” 

      Lines 335-339:

      "Here, we found that... GH males tend to... increase the high-intensity tussling, which enhances their territorial and mating competition."

      We have removed “which enhances their territorial and mating competition” in the revised manuscript.

      Lines 343-344:

      "... presenting a paradox between social experience, aggression and reproductive success. Our result resolved this paradox..."

      We have now revised the manuscript to “...Our results provide an explanation for this paradox…”

      Lines 355-358:

      "Interestingly, we found that the mating advantage gained through social enrichment can even offset the mating disadvantage associated with aging, further supporting the vital role of shifting fighting strategies in experienced, aged males."

      We have removed “further supporting the vital role of shifting fighting strategies in experienced, aged males” in the revised manuscript.

      Lines 361-362:

      "These results separate the function of the two fighting forms and rectify out understanding of how social experiences regulate aggression and reproductive success."

      We have removed this sentence in the revised manuscript.

      Some may say that a speculative statement is harmless, but I think it indeed is harmful unless it is clearly indicated as a speculation. It is regrettable that authors remain reluctant to change their claim without providing any new supporting evidence. All three reviewers raised the same concern in the first round of review.

      We apologize for not making the speculative nature of the statement clearer in the previous version. In the revised manuscript, we have now explicitly rephrased sentences to only suggest a correlation but not a causal link between tussling and reproductive success.

      I have no choice but to keep my evaluation of the manuscript as "Incomplete" unless the authors thoroughly eliminate any attempt to link these two. This must go beyond changing a few words in the lines listed above.

      Thank you for this comment. In addition to the lines listed above, we carefully checked all statements regarding the correlation between fighting strategies and reproductive success throughout the full text. Furthermore, we have also added a “Limitations of the Study” section to address the shortcomings of this study in the revised manuscript.

      I do not have the same level of concern over the interpretation of Fig. 6A-C, because this is directly linked to aggressive interactions. Even if the socially isolated males do not engage in tussling, it is not a leap to assume that a different fighting tactic of socially experienced males can give them an advantage in defending a territory. To me, this is a sufficient ethological link with the observed behavioral change.

      Thank you for this insightful comment.

      The following are relatively minor, although important, concerns.

      I beg to differ over the authors' definition of "tussling". Supplemental movies S1 and S2 appear to include "tussling" bouts in which 2 flies lunging at each other in rapid succession, and supplemental movie S3 appears to include bouts of "holding", in which one fly holds the opponent's wings and shakes vigorously. These cases suggest that the definition of "tussling" as opposed to "lunging" has a subjective element. However, I would not delve on this matter further because it is impossible to be completely objective over behavioral classification, even by using a computational method. An important point is that the definition is applied consistently within the publication. I have no reason to doubt that this was not the case.

      Thank you for this comment. Since the analysis of tussling behavior was conducted manually, it is challenging to achieve complete objectivity. However, we made every effort to apply consistent criteria throughout the analysis. We have added a “Limitations of the Study” section in the revised manuscript to clearly state this caveat. We appreciate your understanding.

      Authors now state that "all tester flies were loaded by cold anesthesia" (lines 432-433). I would like to draw attention to the well-known fact that anesthesia, whether by ice or by CO2, are long known to affect fly's subsequent behaviors (for aggression, see Trannoy S. et al., Learn. Mem. 2015. 22: 64-68). It will be prudent to acknowledge the possibility that this handling method could have contributed to unusually high levels of spontaneous tussling, which has not been reported elsewhere before.

      Thank you for this comment. The increased tussling behavior observed in our study is unlikely due to cold anesthesia, as noted by Trannoy S. et al. (2015), cold anesthesia profoundly reduces locomotion and general aggressiveness in flies. We acknowledge that the use of cold anesthesia in behavioral experiments may have potential effects on aggression. To minimize this influence, we allowed the flies to recover and adapt for at least 30 minutes before behavioral recording. Moreover, both control and experimental groups were treated in exactly the same manner to ensure consistency.

      It is intriguing that pC1SS2 neurons are dsx+ but fru-. Authors convincingly demonstrated that these neurons are clearly distinct from the P1a neurons, a well-characterized hub for male social behaviors. It is possible that pC1SS2 neurons overlap with previously characterized dsx+ neurons that are important for male aggressions (measured by lunges), such as in Koganezawa et al., Curr. Biol. 2016 and Chiu et al., Cell 2020, a point authors could have explicitly raised.

      Thank you for this comment. We have added this point into the Discussion section of the revised manuscript, as follows: “That tussling-promoting… aggression (Koganezawa et al., 2016). Moreover, the anatomical features of pC1<sup>SS2</sup> neurons are highly similar to the male-specific aggression-promoting (MAP) neurons identified by another previous study (Chiu et al., 2021).

      I acknowledge the authors' courage to initiate an investigation to a less characterized, high intensity fighting behavior. Tussling requires the simultaneous engagement of two flies. Even if there are confusion over the distinction between lunges and tussling, authors' conclusion that socially experienced flies and socially isolated flies employ distinct fighting strategy is convincing. The concern I raised above is about the interpretation of the data, not about the quality of data.

      Thank you for your constructive comments to make this manuscript better.

    1. eLife Assessment

      This fundamental work provides novel insights into the blood flow-dependent mechanisms of neuronal migration and the role of Gherlin signaling in the adult brain. The authors present convincing evidence that newborn rostral migratory stream (RMS) neurons are closely situated alongside blood vessels, preferentially along arterioles, and that migratory speed is correlated with blood flow. They also provide evidence (in vitro and some in vivo) that Ghrelin from blood is involved in augmenting RMS neuron migration speed.

    2. Reviewer #1 (Public review):

      Summary:

      This study provides compelling evidence suggesting that ghrelin, a molecule released in the surrounding of the major adult brain neurogenic niche (V-SVZ) by blood vessels with high blood flow controls the migration of newborn interneurons towards the olfactory bulbs.

      Strengths:

      This study is a tour de force as it provides a solid set of data obtained by time lapse recordings in vivo. The data demonstrate that the migration and guidance of newborn neurons relies on factors released by selective type of blood vessels.

      Weaknesses:

      Some intermediate conclusions are weak and may be reinforced by additional experiments.

      Comments on revisions: The manuscript has improved.

    3. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public review):

      Summary: 

      This study provides compelling evidence suggesting that ghrelin, a molecule released in the surroundings of the major adult brain neurogenic niche (V-SVZ) by blood vessels with high blood flow, controls the migration of newborn interneurons towards the olfactory bulbs. 

      Strengths:

      This study is a tour de force as it provides a solid set of data obtained by time-lapse recordings in vivo. The data demonstrate that the migration and guidance of newborn neurons rely on factors released by selective types of blood vessels. 

      Weaknesses:

      Some intermediate conclusions are weak and may be reinforced by additional experiments. 

      We thank the reviewer for the thoughtful evaluation and constructive comments outlined in the “Recommendations for The Authors”. In response, we have incorporated additional data, revised relevant figures, and clarified explanations in the revised manuscript.

      Reviewer #2 (Public review)

      Summary: 

      The authors establish a close spatial relationship between RMS neurons and blood vessels. They demonstrated that high blood flow was correlated with migratory speed. In vitro, they demonstrate that Ghrelin functions as a motogen that increases migratory speed through augmentation of actin cup formation. The authors proceed to demonstrate through the knockdown of the Ghrelin receptor that fewer RMS neurons reach the OB.

      They show the opposite is true when the animal is fasted. 

      Strengths: 

      Compelling evidence of close association of RMS neurons with blood vessels (tissue clearing 3D), preferentially arterioles. Good use of 2-photon imaging to demonstrate migratory speed and its correlation with blood flow. In vitro analysis of Ghrelin administration to cultured RMS neurons, actin visualization, Ghsr1KD, is solid and compelling. 

      We sincerely thank the reviewer for the encouraging comments and helpful suggestions. As noted, our original manuscript lacked sufficient in vivo evidence connecting blood flow with ghrelin signaling. To address this, we have added new data and revised the explanations throughout the manuscript as described below.

      Weaknesses: 

      (1) Novelty of findings attenuated due to prior work, especially Li et al., Experimental Neurology 2014. Here, the authors demonstrated that Ghrelin enhances migration in adultborn neurons in the SVZ and RMS. 

      We agree with the reviewer that the idea that ghrelin enhances migration of new neurons is not entirely novel. The study by Li et al. (2014) provided critical insights that guided our investigation into ghrelin as a blood-derived factor promoting neuronal migration. However, our study expands on this by demonstrating that ghrelin directly stimulates migration via GHSR1a in cultured new neurons, and we further identified the cellular and cytoskeletal mechanisms involved. Specifically, we showed that ghrelin enhances somal translocation by activating actin dynamics at the rear of the cell soma. We have revised the Results and Discussion sections accordingly to emphasize these novel aspects as follows:

      “A previous study demonstrated that the migration of V-SVZ-derived new neurons was attenuated in ghrelin knockout mice (Li et al., 2014). In our study, we found that the migration of cultured new neurons was enhanced by the application of ghrelin to the culture medium, and this effect was abolished by Ghsr1a knockdown (KD). These findings suggest that ghrelin directly stimulates neuronal migration through its receptor, GHSR1a, on new neurons. A previous study showed that GHSR1a is expressed in various regions of the brain (Zigman et al., 2006). In our experiments, new neuron-specific KD of Ghsr1a indicated that ghrelin signaling acts in a cell-autonomous manner to regulate neuronal migration.” (Discussion, page 13, lines 10–18)

      “Furthermore, we identified the cellular and cytoskeletal mechanisms underlying this effect on migration. The results indicate that ghrelin enhances somal translocation during migration by activating actin cytoskeletal dynamics at the rear of the neuronal soma.” (Discussion, page 13, lines 24–26)

      (2) The evidence for blood delivery of Ghrelin is not very convincing. Fluorescently-labeled Ghrelin appears to be found throughout the brain parenchyma, irrespective of the distance from vessels. It is also not clear from the data whether there is a link between increased blood flow and Ghrelin delivery. 

      We agree that the correlation between blood flow and ghrelin transcytosis is not very convincing in our study. As the reviewer pointed out, Figure 3A gives the impression that fluorescent-labeled ghrelin is uniformly distributed throughout the brain parenchyma. However, high-magnification images newly added in Figure 3 show that some, but not all, vessels have particularly strong fluorescent signals in the parenchymal area adjacent to the abluminal side of vascular endothelial cells, visualized by CD31 immunostaining (Feng et al., 2004) (Figure 3A′, A′′). To quantify these observations, we defined two regions: Area I (perivascular area), within 10 μm of the abluminal surface of CD31-positive endothelium; and Area II (distant area), located 10–20 μm away (Figure 3E). Of note, Area I corresponds to the perivascular region where new neurons are frequently observed (Figure 1).

      Importantly, we found strong ghrelin signals in vascular endothelial cells of endomucin-negative high-flow vessels (Figure 3C, D). This suggests that transcytosis of blood-derived ghrelin may occur more frequently in high-flow vessels due to increased endocytosis at the endothelium. To test this, we quantified signal gradients in the extra-vessel regions as fold changes (Area I / Area II), as illustrated in Figure 3E. The proportion of vessel segments with >1.5-fold increases was significantly higher in endomucin-negative vessels than in endomucin-positive ones (Figure 3F). Furthermore, vessels with >2-fold increases were observed exclusively in the endomucinnegative group (6.48% ± 1.18%). 

      These data suggest that, in high-flow vessels, blood-derived ghrelin accumulates more in the immediate perivascular region than in areas further away. This supports the possibility that elevated blood flow delivers a larger amount of ghrelin to the vascular endothelium, enhancing its transcytosis into adjacent brain parenchyma. This mechanism may underlie the preferential migration of new neurons along perivascular regions with high blood flow, as shown in Figure 1.  We have incorporated this new data in Figure 3 and corresponding explanations into the Results, Figure legend and Methods

      (3) The in vivo link between Ghsr1KD and migratory speed is not established. Given the strong work to open the study on blood flow and migratory speed and the in vitro evidence that migratory speed is augmented by Ghrelin, the paper would be much stronger with direct measurement of migration speed upon Ghsr1KD. Indeed, blood flow should also be measured in this experiment since it would address concerns in 2. If blood flow and ghrelin delivery are linked, one would expect that Ghsr1KD neurons would not exhibit increased migratory speed when associated with slow or fast blood flow vessels. 

      In Figure 3, we showed that ghrelin transcytosis occurs preferentially in high-flow vessels, suggesting a role for ghrelin in mediating the effects of blood flow on neuronal migration. However, whether this dependence is solely attributable to ghrelin signaling remains unclear. 

      To address this, we tested whether Ghsr1a-KD modifies the impact of reduced blood  flow on neuronal migration by combining Ghsr1a-KD with bilateral common carotid artery stenosis (BCAS), a chronic cerebral hypoperfusion model (Figure S9A). We found that BCAS decreased the percentage of Ghsr1a-KD new neurons reaching the OB, similar to the effect seen in control neurons (Figure S9B, see also Figure 2A–C). This suggests that blood flow influences neuronal migration even under Ghsr1a-KD conditions. 

      Furthermore, we analyzed the distribution of Ghsr1a-KD neurons with respect to vessel flow characteristics. Even under Ghsr1a-KD, a higher proportion of new neurons were located in the area of endomucin-negative (high-flow) vessels compared with endomucin-positive (low-flow) vessels (Figure S9C), indicating that Ghsr1a-KD does not abolish the preferential association of migrating neurons with high flow vessels. These findings suggest that although ghrelin signaling contributes to blood flow-dependent migration, it is not the sole factor. Other blood-derived signals may also mediate this effect. We have included these new data in Figure S9 and updated the corresponding sections in the Results

      Reviewer #1 (Recommendations for the authors) :

      Major 

      Page 6, Line 13. Please provide in the result section some explanation about how photothrombic clot is induced.  

      We added the following explanation to the Results section to clarify the method used to induce photothrombotic clot formation.

      “For clot formation, a restricted area of selected vessels was irradiated by a two-photon laser immediately after intravenous injection of rose bengal.” (Results, Page 7, lines 27–28)

      Page 6, Line 18. The authors use the marmoset as an additional experimental model. Here, V-SVZ-derived newborn neurons migrate in other brain regions as compared to rodents. Please provide a clear rationale for moving from rodents to "common marmosets" as an experiment model. And why use marmosets only for this set of experiments? 

      We clarified the rationale for using common marmosets in addition to mice as follows:

      “Because blood vessel-guided neuronal migration in the adult brain is a conserved phenomenon across species (Kishimoto et al., 2011; Akter et al., 2021; Shvedov et al., 2024), we hypothesized that blood flow may also influence neuronal migration in other brain regions of primates. The neocortex, which supports higher-order brain functions and has undergone evolutionary expansion in primates, was selected as a target region. In common marmosets, but not in mice, V-SVZ-derived new neurons migrate toward the neocortex and ventral striatum (Akter et al., 2021) (Supplemental Movies S4 and S5).” (Results, Page 6, lines 19–25)

      Figure 2B. The experimental setup is possibly problematic as the lentiviral tracing measurement does not take into consideration the rate of neurogenesis or newborn neuron survival. Can authors assess the rate of proliferation and survival in the VSVZ/RMS upon BCAS to decipher whether the reduced number of cells observed in the OB only results from migration changes? (comparable remark stands for Figure 5) 

      To evaluate whether the reduction in the number of new neurons observed in the OB after BCAS (Figure 2B, C) is due solely to impaired migration, we assessed cell proliferation and survival in the V-SVZ and RMS. Specifically, we quantified the density of Ki67+ proliferating cells and cleaved caspase-3+ apoptotic cells in the sham and BCAS groups. BCAS significantly decreased cell proliferation and increased cell death in both the V-SVZ and RMS (Figure S4), suggesting that reduced neurogenesis and/or survival may contribute to the decreased neuronal distribution in the OB. 

      Although we cannot exclude the possibility that changes in cell proliferation or survival contributed to this effect, our photothrombotic clot formation experiments are better suited to directly examine how acute reduction in blood flow affects neuronal migration. These experiments allowed us to measure the migration speed of new neurons shortly after inducing localized blood flow inhibition. We found that clot formation significantly reduced the migration speed of new neurons (Figure 2E, H), indicating that blood flow changes directly impair neuronal migration in the adult brain. 

      We have included these new data in Figure S4 and updated the corresponding text in the Results, Discussion, Figure legend, and Methods as follows:

      Figure 3. About ghrelin signaling. It is unclear whether its transcytosis occurs in endomucin-negative because of the high bloodstream flow. How can this be explained? What happens upon BCAS, is there still a close relation between ghrelin transcytosis, blood flow, and neuron migration? 

      As correctly noted, our initial explanation and data did not provide sufficient evidence that higher blood flow delivers a larger amount of ghrelin into the brain parenchyma. We found that some vessels had particularly strong fluorescent signals in the parenchymal area adjacent to the abluminal surface of vascular endothelial cells, as visualized by CD31 immunostaining (Feng et al., 2004) (Figure 3A′, A′′). On the basis of our observation that strong fluorescent signals were detected in vascular endothelial cells of endomucin-negative (high-flow) vessels (Figure 3C, D), we hypothesized that ghrelin transcytosis may occur more frequently in high-flow vessels due to increased endocytosis at the vessel endothelium. 

      To test this hypothesis, we quantified signal gradients in the extra-vessel regions by calculating fold changes in fluorescent intensity between two zones: Area I (0–10 μm from the abluminal surface of the endothelium) and Area II (10–20 μm away), as illustrated in Figure 3E. Area I corresponds to the perivascular region where new neurons are frequently found (Figure 1). We found that the proportion of vessel segments with >1.5-fold signal increase in Area I relative to Area II was significantly higher in endomucin-negative vessels than endomucin-positive ones (Figure 3F). Furthermore, vessel segments with >2-fold increases were observed exclusively in the endomucin-negative group (6.48% ± 1.18%). These results support the idea that higher blood flow increases the amount of ghrelin that reaches the luminal surface of vascular endothelial cells, thereby increasing the possibility of ghrelin transcytosis into the brain parenchyma.

      We also examined whether blood flow inhibition–induced by BCAS or photothrombotic clot formation–affects the relationship between ghrelin transcytosis, blood flow, and neuronal migration. The above results suggest that blood flow reduction may decrease ghrelin transcytosis, thereby contributing to impaired neuronal migration. To further explore this, we analyzed the distribution of new neurons around high- versus low-flow vessels under BCAS conditions. In the BCAS group, we still observed a higher density of new neurons in the region of high-flow (endomucin-negative) vessels compared with in low-flow (endomucin-positive) ones (Figure S9C). This suggests that even under reduced blood flow, neuronal migration preferentially occurs near high-flow vessels. Taken together, these results suggest that ghrelin transcytosis, blood flow and neuronal migration are connected, and that this relationship persists under conditions of blood flow reduction.

      Figure 4. Is ghrelin controlling both individual Dcx+ neuron migration as well as chain migration (cells moving more together)? This should be assessed and clarified. 

      How is ghrelin controlling actin dynamics in newborn migrating neurons? Since somal translocation speed and somal stride length are both modulated by ghrelin, this factor may also control MT remodeling, could that be checked? 

      We have revised the manuscript to better explain the role of ghrelin in both modes of neuronal migration–chain and individual. Initially, we demonstrated that ghrelin enhances the migration of new neurons in V-SVZ culture (Figure 4A, B), where these neurons migrate outward as chains, indicating that ghrelin facilitates chain migration. In subsequent in vitro experiments (Figure 4C–M), we showed that ghrelin also enhances the migration of individual neurons. To examine this in vivo, we injected Ghsr1a-KD and control lentiviruses into two different anatomical regions: the V-SVZ, where chain migration originates, and the OB core, where new neurons migrate individually. These experiments enabled us to assess the role of ghrelin signaling in each mode of migration independently. We found that ghrelin enhanced both chain migration in the RMS and individual migration in the OB. These results indicate that ghrelin signaling facilitates both forms of neuronal migration. We added the following text in the Results section:

      “To assess the direct effect of ghrelin on neuronal migration, we applied recombinant ghrelin to V-SVZ cultures, in which new neurons emerge and migrate as chains (Figure 4A). Ghrelin significantly increased the migration distance of these neurons (Figure 4B), indicating enhanced chain migration. We then used super-resolution time-lapse imaging to examine individually migrating neurons with or without knockdown (KD) of growth hormone secretagogue receptor 1a (GHSR1a), a ghrelin receptor expressed in V-SVZ-derived new neurons (Li et al., 2014) (Figure 4C). Ghrelin enhanced the migration speed of control cells (lacZ-KD) cells, indicating that it also facilitates individual migration (Figure 4D).” (Results, Page 9, lines 5–12)

      “Of the total labeled Dcx+ cells, the percentage of Dcx+ cells reaching the GL was significantly lower in the Ghsr1a-KD group than in the control group (Figure 5B, C), suggesting that ghrelin enhances individual radial migration of new neurons in the OB.” (Results, Page 10, lines 5–8) “These data indicate that ghrelin signaling facilitates both individual migration in the OB and chain migration in the RMS.” (Results, Page 10, lines 17–18)

      We also added discussion on how ghrelin may regulate cytoskeletal dynamics in migrating neurons. Ghrelin signaling has been reported to control actin cytoskeletal remodeling in astrocytoma cells (Dixit et al., 2006), which led us to investigate similar effects in migrating neurons. Rac, a member of the Rho GTPase family, was shown to mediate this actin remodeling in astrocytoma migration, suggesting it may also be involved in ghrelin-induced actin cup formation in new neurons. Furthermore, because somal translocation depends not only on actin but also on microtubule dynamics (Kaneko et al., 2017), it is possible that ghrelin influences both systems. Supporting this idea, ghrelin signaling was shown to modulate microtubule behavior via SFK-dependent phosphorylation of α-tubulin (Slomiany and Slomiany, 2017). These findings suggest that ghrelin may enhance somal translocation through coordinated regulation of both the actin and microtubule systems. We added following text in the Results and Discussion sections:

      “Ghrelin signaling has been reported to regulate actin cytoskeletal dynamics in astrocytoma cells (Dixit et al., 2006), which led us to examine whether a similar mechanism operates in migrating neurons.”(Results, Page 9, lines 23–25)

      “Further studies are needed to elucidate how ghrelin promotes actin cup formation in migrating neurons. Given that Rac, a Rho family GTPase, mediates actin remodeling downstream of ghrelin in astrocytoma cells (Dixit et al., 2006), it is possible that Rac may also be involved in ghrelininduced cytoskeletal regulation in new neurons.” (Discussion, Page 13, lines 28–31)

      “In addition to actin remodeling, ghrelin may regulate microtubule dynamics. Ghrelin signaling was shown to modulate microtubules via SFK-dependent phosphorylation of α-tubulin (Slomiany and Slomiany, 2017), raising the possibility that ghrelin promotes somal translocation of new neurons through coordinated regulation of both actin and microtubule networks (Kaneko et al., 2017).” (Discussion, Page 13, line 31–Page 14, line 2)

      It would also be informative to provide immunolabeling of Ghsr1 in the V-SVZ / RMS/ OB to have a clear picture of the expression pattern of this receptor. Newborn neurons migrate along blood vessels, which are surrounded by astrocytes that have also been reported to express Ghsr1, thus could newborn neuron migration change may also arise from activation of Ghsr1 in their surrounding astrocytes? 

      A previous study reported that GHSR1a is expressed in DCX+ new neurons in the RMS and OB, and in V-SVZ neural progenitor cells (Li et al., 2014). To visualize the spatial expression pattern of Ghsr1a, we performed RNAscope in situ hybridization because specific anti-GHSR1a antibodies suitable for immunohistochemistry were not available. Consistent with the previous report, we detected Ghsr1a mRNA in DCX+ new neurons in the VSVZ, RMS, and OB (Figure S5A), indicating that new neurons directly receive ghrelin signaling. 

      Moreover, our KD experiments demonstrated that ghrelin enhanced the migration of new neurons in a cell-autonomous manner via GHSR1a (Figure 4, 5). Nevertheless, a recent study (Stark et al., 2024) showed that GHSR1a was expressed in various cell types, including glutamatergic and GABAergic neurons, suggesting that ghrelin may also exert non-cellautonomous effects on neuronal migration. Given the presence of diverse cell types, including neurons, microglia, pericytes, and astrocytes, along the migratory route, it remains possible that GHSR1a activation in these neighboring cells contributes to the overall regulation of neuronal migration. 

      Figure 5. About the in vivo knockdown of Ghsr1a. The results section (page 9, line 3) mentioned that mice were either injected with one or the other construct but Figure 5 shows coincidence of GFP and dsRed positive cells. Were control and Ghsr1a shRNAs injected together into the same mouse? Could you quantify the number of cells in green (control), red (Ghsr1a KD), and yellow (both)? Won't they mostly be yellow? Have you tried injecting control and Ghsr1a separately? If yes, do you get the same result? Such analysis would be important to separate cell autonomous from noncell autonomous effects. 

      To minimize variability in injection conditions, we initially coinjected control and Ghsr1a-KD lentiviruses into the same mice and analyzed their migration using a paired design. As the reviewer correctly noted, some cells were coinfected and expressed both EmGFP and DsRed (18.7% ± 2.86% of EmGFP+ cells and 10.8% ± 0.533% of DsRed+ cells). To ensure that this overlap did not affect our analysis, we excluded EmGFP+/DsRed+ double-positive cells and focused solely on EmGFP+/DsRed− (control) and EmGFP−/DsRed+ (Ghsr1a-KD) single-positive cells. 

      We agree with the reviewer that coinjection could lead to reciprocal interactions between control and Ghsr1a-KD cells, potentially masking cell-autonomous effects. To address this, we performed an independent experiment in which control and Ghsr1a-KD lentiviruses were injected separately into different mice (Figure S7A), as suggested. Consistent with the results of the coinjection experiment, we found that the Ghsr1a-KD cells showed significantly reduced distribution in the GL compared with that in control cells (Figure S7B). Although we cannot exclude the possibility of a non-cell-autonomous effect of ghrelin, this result supports the conclusion that ghrelin signaling enhances neuronal migration in a cell-autonomous manner. 

      Who is expressing Ghsr1a, newborn neurons, and or their progenitors? The production and survival of newborn V-ZVS cells should be assessed upon knockdown of the ghrelin receptor too. 

      To determine whether the altered distribution of new neurons observed upon Ghsr1aKD is due to impaired migration rather than decreased cell production or survival, we examined the effects of Ghsr1a-KD on the proliferation and survival of new neurons and their progenitors, which express GHSR1a (Li et al., 2014). 

      We compared the proportion of cleaved caspase-3+ cells and Ki67+ cells from the total labeled cells in the V-SVZ and RMS between the control and Ghsr1a-KD groups. There was no significant difference in the proportion of cleaved caspase-3+ cells between the groups (Control: 874 cells from 5 mice; Ghsr1a-KD: 678 cells from 7 mice), suggesting that ghrelin signaling does not affect the survival of new neurons and their progenitors. 

      Similarly, the proportion of Ki67+ cells in the RMS did not differ significantly between the two groups (Figure S8), indicating that Ghsr1a-KD does not impair cell proliferation in the RMS. However, it remains technically difficult to evaluate whether Ghsr1a-KD affects proliferation in the VSVZ, because lentivirus injection into the VSVZ may interfere with GHSR1a expression not only in new neurons and neural progenitors, but also in other cell types known to express GHSR1a (Zigman et al., 2006). A previous study reported that ghrelin signaling promoted cell proliferation in the V-SVZ (Li et al., 2014), thus we cannot exclude the possibility that Ghsr1a-KD may affect V-SVZ proliferation.

      To overcome this limitation, we assessed the effects of Ghsr1a-KD on neuronal migration using in vitro KD experiments (Figure 4C–J) and in vivo OB-core lentivirus injections (Figure 5A–C), both of which did not interfere with proliferation in the V-SVZ. These complementary approaches consistently demonstrated that Ghsr1a-KD reduces the migration speed of new neurons. 

      “To determine whether the altered distribution of new neurons after Ghsr1a-KD is due to impaired migration rather than changes in cell production or survival, we assessed the effects of Ghsr1aKD on the proliferation and survival of new neurons and their progenitors, which express GHSR1a (Li et al., 2014). We quantified the proportion of cleaved caspase-3+ cells and Ki67+ cells from the total labeled cells in the V-SVZ and RMS in both control and Ghsr1a-KD groups. We found no significant difference in cleaved caspase-3+ cell proportions between the groups (Control: 874 cells from 5 mice; Ghsr1a-KD: 678 cells from 7 mice), suggesting that ghrelin signaling does not influence the survival of new neurons and their progenitors. Similarly, the percentage of Ki67+ cells in the RMS was similar between the two groups (Figure S8), indicating that Ghsr1a-KD does not impair cell proliferation in the RMS. However, technical limitations prevented a reliable evaluation of proliferation in the V-SVZ, as lentivirus injection into this region may interfere with GHSR1a expression in not only neural progenitors and new neurons, but also other GHSR1aexpressing cell types (Zigman et al., 2006). Although ghrelin signaling has been reported to promote cell proliferation in the V-SVZ (Li et al., 2014), our complementary in vitro KD experiments (Figure 4C–J) and in vivo OB-core lentivirus injections (Figure 5A–C), which did not affect the V-SVZ, consistently demonstrated that Ghsr1a-KD reduces neuronal migration. Taken together, our results suggest that blood-derived ghrelin enhances neuronal migration in the RMS and OB by stimulating actin cytoskeleton contraction in the cell soma, rather than by altering cell proliferation or survival.” (Results, Page 10, line 19–Page 11, line 4)

      “rat anti-Ki67 (1:500, #14-5698-82, eBioscience); and rabbit anti-cleaved caspase-3 (1:200, #9661, Cell Signaling Technology)” (Methods, Page 48, lines 14–16)

      How much is ghrelin/Ghsr1 signaling conserved in marmosets? 

      How ghrelin signaling is conserved between mice and common marmosets is important to clarify. A previous study reported the existence of a ghrelin homolog in common marmoset, which shares high sequence similarity with that in mice (Takemi et al., 2016). Moreover, the GHSR1a homolog in the common marmoset (https://www.ncbi.nlm.nih.gov/protein/380748978) shares 95.36% amino acid identity with its mouse counterpart. These findings suggest that blood-derived ghrelin may similarly promote neuronal migration in the marmoset brain, as observed in mice. 

      We have added the following text in the Discussion section:

      “Our data showed that new neurons preferentially migrate along arteriole-side vessels rather than venule-side vessels in both mouse and common marmoset brains, suggesting that the mechanism of blood flow-dependent neuronal migration is conserved across rodent and primate species, as well as across brain regions. A previous study identified a ghrelin homolog in the common marmoset with high sequence similarity to the murine version (Takemi et al., 2016). In addition, the marmoset GHSR1a homolog shares 95.36% amino acid identity with that of the mouse (https://www.ncbi.nlm.nih.gov/protein/380748978). These findings suggest that bloodderived ghrelin promotes neuronal migration in the common marmoset brain in a manner similar to that in mice.” (Discussion, Page 15, lines 8–16)

      Page 9. Starvation has been shown to boost ghrelin blood levels. What is the exact protocol used in this experiment and is this indeed increasing Ghrelin release from blood vessels in the V-SVZ? What about Ghsr1 expression level in newborn neurons? 

      We have clarified the calorie restriction (CR) protocol used in our experiments. We adopted a 70% CR protocol, which was previously shown to enhance hippocampal neurogenesis when administered for 14 days (Hornsby et al., 2016). In our study, the daily food intake under ad libitum (AL) conditions was first measured, and CR mice were then fed 70% of that amount for 5 consecutive days (see Figure 5I and Figure S10A). 

      To assess whether CR enhances ghrelin transcytosis into the brain parenchyma, we performed ELISA to quantify ghrelin levels in the OB and RMS. However, ghrelin concentrations were below the detection limit in both groups, precluding a direct comparison.

      We also considered whether CR modulates the expression level of the ghrelin receptor GHSR1a. A recent study reported that fasting increased GHSR1a expression in the OB (Stark et al., 2024), raising the possibility that CR may exert a similar effect. To test this, we performed in situ hybridization and quantified Ghsr1a mRNA puncta in Dcx+ cells in the OB. No significant difference was found between the AL and CR groups (Figure S5B), suggesting that CR does not alter GHSR1a expression levels in new neurons. 

      Although we cannot exclude the possibility that CR increases GHSR1a expression in other OB cell types, our combined CR and Ghsr1a-KD experiments strongly support a cellautonomous contribution of ghrelin signaling to the enhanced neuronal migration observed under CR conditions. Corresponding data and text have been added to Figure S5 and the Results, Discussion, and the Figure legend sections as follows:

      Minor 

      Page 4 

      Line 19 In Supplemental movies 1 and 2, it is unclear where to see the GFP+ new neurons interact with BV. Can you add arrows as an indication for the readers? It will be better to add the anatomy term for orientation, caudal, or rostral in the video. (The same for Supplemental movies 3, 4, and 5).  

      To clarify the regions of interest in Supplemental Movies 1 and 2, where neuron–vessel interactions in the RMS are highlighted, we added dotted lines indicating the RMS boundaries. In addition, we created a new movie (Supplemental Movie S1′) showing a high-magnification view of Supplemental Movie S1, in which arrows mark EGFP+ new neurons interacting with blood vessels. We also added orientation indicators (e.g., caudal and rostral) and arrows to highlight new neuron–vessel interactions in Supplemental Movies S1–S5. 

      The following descriptions have been added to the Figure legends:

      “Supplemental Movie S1′ 

      High-magnification view extracted from Supplemental Movie S1. Arrows indicate EGFP+ cells interacting with blood vessels.” (Figure legend, Page 46, lines 6–8)

      “Arrows indicate EGFP+ cells interacting with blood vessels.” (Figure legend, Supplemental Movie S3, Page 46, lines 16–17)

      “Arrows indicate Dcx+ cells interacting with blood vessels.” (Figure legend, Supplemental Movies S4 and S5, Page 46, lines 21–22, 26–27)

      Blood vessels are labeled in the Supplemental movies 2 and 3 by employing Flt1DsRed transgenic mice instead of RITC-Dex-GMA. However, Flt1-DsRed transgenic mice are not mentioned in the results section. 

      We have now included an explanation regarding the use of Flt1-DsRed mice, in which vascular endothelial cells were labeled with DsRed.

      “To visualize blood vessels, we also used Flt1-DsRed transgenic mice, in which vascular endothelial cells were specifically labeled with DsRed (Matsumoto et al., 2012). Using DcxEGFP/Flt1-DsRed double transgenic mice, we observed close spatial relationships between new neurons and blood vessels (Supplemental Movies S2 and S3).” (Results, Page 4, lines 22– 26)

      Figure 5. Can you indicate (in the figure legend and the result section) the stage of the adult brain used for this experiment? 

      We used 6- to 12-week-old adult male mice in all experiments in this study. To specify this, we have added the age of animals to both the Results and the relevant Figure legends as follows:

      “Therefore, we first studied blood vessel-guided neuronal migration in the RMS and OB using three-dimensional imaging in 6- to 12-week-old adult mice, which enabled analysis of the in vivo spatial relationship between new neurons and blood vessels.” (Results, Page 4, lines 14–16)

      “Figure 1 New neurons migrate along blood vessels with abundant flow in the adult brain.” (Figure legend, Page 25, line 4)

      “(B, C) Three-dimensional reconstructed images of a new neuron (green) and blood vessels (red) in the rostral migratory stream (RMS) (B) and glomerular layer (GL) (C) of 6- to 12-weekold adult mice.” (Figure legend, Page 25, lines 6–8)

      “(E) Transmission electron microscopy image of a new neuron (green) in close contact with a blood vessel (red) in the GL of a 6- to 12-week-old adult mouse.” (Figure legend, Page 26, lines 4–5)

      “(F) Time-lapse images of a migrating neuron (indicated by asterisks) in the GL of a 6- to 12week-old Dcx-EGFP mouse.” (Figure legend, Page 26, lines 6–7)

      “Figure 3 Ghrelin is delivered from the bloodstream to the RMS and OB in the adult brain (A) Representative images of the OB and cortex of a fluorescent ghrelin-infused mouse (6 to 12 weeks old).” (Figure legend, Page 30, lines 1–3)

      “Lentivirus injection into the OB core (A) and the VSVZ (D) was performed in 6- to 12-week-old adult mice.” (Figure legend, Page 33, lines 3–4)

      Reviewer #2 (Recommendations for author):

      Major:

      Ghsr1KD and blood flow 2-photon experiments to directly measure migratory speed. Could also do the same with fasting with or without Ghsr1KD.  

      We thank the reviewer for the valuable suggestion to strengthen our study. As pointed out in the Public Review, we agree that direct in vivo measurement of neuronal migration speed under Ghsr1a-KD conditions is important to clarify the link between ghrelin signaling and blood flow. 

      Two-photon imaging is the most suitable method for this purpose. Although we attempted two-photon imaging of Ghsr1a-KD new neurons, the number of virus-infected cells observed in vivo was too low to yield reliable data. Therefore, we chose an alternative strategy, combining Ghsr1a-KD with blood flow reduction using the BCAS model (Figure S9A), in which migration speed can be quantified based on the percentage of labeled cells reaching the OB. As stated in the Public Review response, BCAS significantly decreased the migration speed of Ghsr1a-KD new neurons (Figure S9B), indicating that Ghsr1a-KD does not abolish the influence of blood flow reduction. These findings suggest that ghrelin signaling is involved, but is not essential, for blood flow-dependent neuronal migration. 

      As suggested by the reviewer, direct observation of migration dynamics (e.g., somal translocation, leading process extension, stationary and migratory phases) is needed, especially in calorie restriction experiments. Although our data indicate that ghrelin signaling is required for fasting-induced increases in migration speed of new neurons, calorie restriction could also change concentrations of other factors in blood (Bonnet et al., 2020; Wu et al., 2024; Alogaiel et al., 2025), which may independently affect behavior of migrating neurons. Given that ghrelin is not the sole factor contributing to blood flow-dependent neuronal migration, other circulating factors could affect behavior of migrating neurons in a different manner during fasting. In vivo twophoton imaging would be a powerful approach to determine whether fasting-induced neuronal migration is caused by upregulated somal translocation speed, which would further support a role for ghrelin in this process.

      We have added the following text in the Discussion:

      “Although our data indicate that ghrelin signaling is essential for fasting-induced acceleration of neuronal migration, calorie restriction may also alter the concentrations of other circulating factors (Bonnet et al., 2020; Wu et al., 2024; Alogaiel et al., 2025), which could independently influence the behavior of migrating neurons.” (Discussion, Page 14, lines 25–29)

      Minor: 

      (1) Show fluorescent Ghreliin in Figure 3 for all brain areas measured in Figure 1 (GL, EPL, GCL, and RMS) for direct comparison.  

      To allow for direct comparison across brain regions, we added a new Supplemental figure showing the distribution of fluorescently labeled ghrelin in the OB, including the GL, EPL, GCL and RMS. This comprehensive view highlights ghrelin localization relative to vasculature and migrating neurons in the regions analyzed in Figure 1.

      (1) Figure 1, panel I is presented in a confusing manner. High blood flow points to 0 degrees, low blood flow to 180 degrees. It implies (unintentionally, I am sure) that low blood flow results in migration away from OB. Maybe plot separately?

      We agree that the original presentation of Figure 1I could be misinterpreted as referring to anatomical orientation (i.e., toward or away from the OB). To avoid confusion, we revised the figure to categorize new neuron–vessel interactions into four groups according to (1) the angle between the migration direction and vessel axis (small or large), and (2) whether the new neuron is migrating toward or away from the direction of higher blood flow. This new presentation avoids implying a fixed anatomical direction and better reflects the relationship between local blood flow and neuronal migration behavior. The revised figure is presented as Supplemental Figure S1.

    1. eLife Assessment

      This important work begins to understand how BDNF regulates the phosphorylation and activity of LRRK2. The overall strength of evidence has been assessed as compelling, though some claims are only partially supported. The work will be of interest for those that might pursue specific LRRK2 interactions and mutational effects on these pathways as the work continues to develop.

    2. Reviewer #1 (Public review):

      Summary:

      LRRK2 protein is familially linked to Parkinson's disease by the presence of several gene variants that all confer a gain-of-function effect on LRRK2 kinase activity.

      The authors examine the effects of BDNF stimulation in immortalized neuron-like cells, cultured mouse primary neurons, hIPSC-derived neurons, and brain tissue from genetically modified mice. They examine a LRRK2 regulatory phosphorylation residue, LRRK2 binding relationships, and measures of synaptic structure and function.

      Strengths:

      The study addresses an important research question: how does a PD-linked protein interact with other proteins, and contribute to responses to a well-characterized neuronal signalling pathway involved in the regulation of synaptic function and cell health.

      They employ a range of good models and techniques to fairly convincingly demonstrate that BDNF stimulation alters LRRK2 phosphorylation and binding to many proteins. IN this revised manuscript, aspects are well validated e.g., drebrin binding, but there is a disconnect between these findings and alterations to LRRK2 substrates. A convincing phosphoproteomic analysis of PD mutant Knock-in mouse brain is included. Overall the links between LRRK2, LRRK2 activity, and the changes to synaptic molecules, structures, and activity are intriguing.

      Weaknesses:

      The data sets remain disjointed, conclusions are sweeping, and not always in line with what the data is showing. Validation of 'omics' data is light. Some inconsistencies with the major conclusions are ignored. Several of the assays employed (western blotting especially) are underpowered, findings key to their interpretation are addressed in only one or other of the several models employed, and supporting observations are lacking.

      Main Conclusions of Abstract:

      (1) Increase in pLRRK2 Ser935 and pRAB after BDNF in SH-SY5Y & mouse neurons

      Well supported, but only for pLRRK2 in neurons, why not pERK pAkt & pRab?

      (2) Omics Proteome remodelling of LRRK2 interactome with BDNF & different in G2019S mouse neurons.

      Supports that the phosphoproteome of G2019S is different. Drebrin interaction with LRRK2 very well supported. Link between drebrin and LRRK2 activity somewhat supported (pS935 site), but the consequence (non-specific pRab8) not supported, as there is no evidence of a change in LRRK2 substrate(s).

      (3) Golgi 1 month LKO mouse altered dendritic spines, transient at 1m not older.

      Supported but very small transient change in spines, disconnected to other results (e.g., drebrin).

      (4) iPSC-derived neurons BDNF increases mEPSC frequency (transient at 70 not 50 or 90 days) in WT not KO "which appear to bypass this regulation through developmental compensation"

      Weak, not clear what is being bypassed.

      Main Conclusions Based on Old and New Figure / Data:

      (1) Increase in pLRRK2 Ser935 and pRAB after BDNF in SH-SY5Y & mouse neurons

      Well supported, but only for pLRRK2 in neurons, why not ERK Akt & Rab?

      (2) BDNF promotes LRRK2 interaction with "post-synaptic actin cytoskeleton components"

      Tone down, only one postsynaptic validated - drebrin strong BUT CONTRADICTORY; link between drebrin and LRRK2 activity (pS935 site) supported, consequence (non-specific pRab8) broken, no evidence of change in LRRK2 substrate.

      (3) LRRK2 G2019S striatal phosphoproteome is different from WT.

      It is different. Where is link to BDNF or Drebrin?

      (4) BDNF signaling is impaired in Lrrk2 knockout neurons

      TrkB changes seem higher in SHSY5Y. pAKT impaired, pERK not convincing. Primary neurons Akt slower but it and Erk mostly intact. MLi-2 did not block pAkt or pErk in WT or KO (higher in latter). Whatever is happening in KO, Mli-2 not really blocking effect in WT. If we are to assume that studying the KO was a means to understand LRRK2 function, the authors data should explain why we care if an effect is absent in LKO, if LRRK2 isn't doing the same job in WT?

      BDNF increases synaptic puncta in WT not LKO (which start higher?). Is this BDNF increase blocked by LRRK2 inhibition?

      (5) Postsynaptic structural changes in Lrrk2 knockout neurons

      Golgi impregnation shows some very small spine changes at 1m. Not sustained over age. mRNA changes are very small (10% not even a fold... very weak and should be written as so). Derbrin levels reduced clearly at 1m, but probably also at 4 & 18. Underpowered, disconnected time course from the spine changes.

      (6) An effect on "spontaneous electrical activity" at Div70

      Weak. What is so special at 70 days that means we should be confident in the differences, or be satisfied that the other time points are legitimately ignored? These are 10-11 cells from 3 cultures assayed at 3 time points but only one is presented (rest in supplement). This should be a 2 (time) or 3 way (+culture RM) ANOVA. As it stands, in WT there is a little - no activity at 50 days, little to no at 70 days, and variable to lots or none at 90. BDNF did nothing at 50 or 90 but may have at 70. In KO low activity stable at 50 & 70, tanks at 90. BDNF would seem to have a similar effect on KO at 90 as WT at 70, but as there are only 7 cells it remains inconclusive. Thus the conclusion that BDNF signalling is broken in LKO is not well supported by the ephys data, nor is the BDNF effect in WT cells (even at the 70 day time point) shown to be susceptible to LRRK2 inhibition.

    3. Reviewer #2 (Public review):

      The data show that BDNF regulates the PD-associated kinase LRRK2, they place LRRK2 within well-described BDNF pathways biochemically, and they show that LRRK2 can play a role mediating BDNF-driven synaptic outcomes at excitatory synapses. The chief strength is that the data provide a potential focal point for multiple observations that have been made across many labs. The findings will be of broad interest because LRRK2 has emerged as a protein that is likely to be part of Parkinson's pathology and its normal and pathological actions remain poorly understood.

      A major strength of the study is the multiple approaches that were used (biochemistry, bioinformatics, light and electron microscopy and electrophysiology) across different experimental models (cells, primary neurons, human neurons, mice) to identify and examine the impact of BDNF on LRRK2 signaling and functions. Noteworthy is also the employment of LRRK2KO preparations to validate outcomes and to place LRRK2 actions up or downstream.

      The demonstration that LRRK2 and drebrin interact directly is important and suggests that other interacting proteins identified biochemically and bioinformatically in the paper will be important to pursue.

      Some data from different models do not fit well with one another (like mouse and human neurons). This is likely due to inherent differences in the preparations. Since different experiments were carried out on the different preps, however, it is not possible to cross compare. The lack of this information is viewed more as an open question than a cause for concern.

    1. eLife Assessment

      This manuscript presents a valuable and insightful contribution to the understanding of how Legionella pneumophila remodels its vacuolar niche through coordinated ubiquitination mechanisms. The identification of Rab5 as a target of both canonical and phosphoribosyl ubiquitination, and the demonstration of a detergent-resistant ubiquitin "cloud" surrounding the LCV, represent significant advances in the field. The findings are supported by rigorous experimental design, robust quantitative analyses, and clear mechanistic insight, meeting a standard of evidence that is compelling and exceeds current state-of-the-art approaches.

    2. Reviewer #1 (Public review):

      Summary:

      In the submitted manuscript, Steinbach et al describe the formation of a detergent-resistant "cloud" around the Legionella-containing vacuole (LCV) that functions as a protective barrier. The authors show that formation of the "cloud" barrier is contingent upon the phosphoribosyl-ubiquitination activity of the SidE/SdeABC effector family, and is temporally regulated, with the assembly and subsequent disassembly of the "cloud" coinciding with replication and vacuolar expansion. The authors postulate a model of "cloud" barrier formation that relies upon a wave of initial ubiquitination by the SidC effector family, after which the SidE/SdeABC family expands the ubiquitination and forms cross-links that render the ubiquitin cloud resistant to harsh detergents. Additionally, Steinbach et al. also demonstrate that Rab5 is recruited to the LCV and remains associated for a considerable period.

      Strengths:

      This manuscript is very well written, with clear justification provided for experiments that make it very easy to follow along with the experimental logic. The figures have clearly been designed with much thought and are easy to interpret. Steinbach et al have also done a commendable job of addressing the previous reviewers' comments, even though some may suggest that some of these comments could be viewed as slightly unreasonable. This work would be of interest to both the Legionella and ubiquitin fields. Legionella researchers would potentially be interested to explore the proposed barrier model as the function for the ubiquitin "cloud," whereas ubiquitin researchers may be interested in exploring the mechanisms underlying SidE's crosslinking ability.

      Weaknesses:

      While the work is important and describes the physical nature of the ubiquitin cloud on the Legionella vacuole, it is somewhat descriptive in nature and does not dig deeply into what purpose this cloud serves. This is a complicated topic that will certainly stimulate additional research in this area.

    3. Reviewer #2 (Public review):

      Summary:

      The manuscript "Canonical and phosphoribosyl ubiquitination coordinate to stabilize a proteinaceous structure surrounding the Legionella-containing vacuole" by Steinbach et al. is well written and presents strong evidence that satisfactorily supports the main hypothesis and research objectives. The authors have clearly demonstrated the presence of cloud-like, detergent-resistant GTPase Rab5 surrounding the LCV, and formation of the structure is dependent on the SidE family of effectors. The study provides insights into the relevant (associated with described phenotype) ubiquitination pathways. The findings advance our understanding of Legionella pneumophila vacuole remodeling during intracellular infection and open directions for future research to establish broader implications of this structure on Legionella pathogenesis.

      Strengths:

      The manuscript convincingly demonstrates the presence of a cloud-like, detergent-resistant GTPase Rab5 surrounding the LCV through elegant microscopy. The experimental evidence about the dependence of the observed phenotype on the SidE family of effectors is compelling and presented with strong scientific rigor. The introduction is well-written, and the discussion is thorough and satisfactory. The article is thought-provoking and shows preliminary evidence for ubiquitin-mediated protection and spatial organization of the LCV.

      Weaknesses:

      The manuscript is well-organized and detailed, and it is hard to find weaknesses under the set goals of the research. A few weaknesses are that the molecular determinants or the regulatory mechanisms that drive selective versus non-selective incorporation of host proteins into this structure are unclear, and, as the authors mentioned, further work is required to establish the precise biophysical basis of the detergent resistance and expansive morphology of the ubiquitinated GTPase "cloud". Currently, the function or purpose of the structure is completely speculative. The effects or importance of the structure on bacterial replication is also not established in the current study. Figure 2D, right panel, Western blot results, the authors suggested the signal present in all four lanes between 37 and 25 kDa is 'nonspecific', which is probably a 'too intense' signal to be called so. Mass spec analysis would be interesting in order to identify sources of such intense signals. With these few limitations, the research presented in this manuscript is experimentally rigorous and opens avenues for future research.

    4. Reviewer #3 (Public review):

      Summary:

      This manuscript by Mukherjee and colleagues extended earlier studies on the coordination of the SidC and SidE effector families on the generation of a unique ubiquitin layer on the surface of the vacuoles containing the bacterial pathogen Legionella pneumophila (LCV).

      Strengths:

      The main strength of the manuscript is the identification of the small GTPase Rab5 as a major "carrier" of these differently modified ubiquitin and ubiquitin chains, which was nicely quantified.

      Weaknesses:

      (1) The results are mostly descriptive, based on mechanistic studies from earlier works.

      (2) The majority of the work was dedicated to the characterization of the unique ubiquitin layer on the LCV. One important question was ignored: what is the role of Rab5 in this process? Is the GTPase activity of Rab5 required for its ubiquitination by SidC and SidE? The authors should create a Rab5 KO cell line, complement the line with different mutants of Rab5, and examine their ubiquitination and association with the LCV.

      (3) The finding that Rab5 is associated with the LCV supports the notion that the LCV has characteristics of endo- or/late endosomes. The positioning of the LCV in the endocytic pathway should be discussed in the context of earlier studies (e.g.,PMID: 38739652; PMID: 11067875; PMID: 11067875).

    1. eLife Assessment

      Sanchez-Vasquez et al establish an innovative approach to induce aneuploidy in preimplantation embryos. This important study extends the author's previous publications evaluating the consequences of aneuploidy in the mammalian embryo. In this work, the authors investigate the developmental potential of aneuploid embryos and characterize changes in gene expression profiles under normoxic and hypoxic culture conditions. Using a solid methodology they identify sensitivity to Hif1alpha loss in aneuploid embryos, and in further convincing experiments they assess how levels of DNA damage and DNA repair are altered under hypoxic and normoxic conditions.

    2. Reviewer #1 (Public review):

      Summary:

      This paper developed a model of chromosome mosaicism by using a new aneuploidy-inducing drug (AZ3146), and compared this to their previous work where they used reversine, to demonstrate the fate of aneuploid cells during murine preimplantation embryo development. They found that AZ3146 acts similarly to reversine in inducing aneuploidy in embryos, but interestingly showed that the developmental potential of embryos is higher in AZ3146-treated vs. reversine-treated embryos. This difference was associated with changes in HIF1A, p53 gene regulation, DNA damage, and fate of euploid and aneuploid cells when embryos were cultured in a hypoxic environment.

      Strengths:

      In the current study, the authors investigate the fate of aneuploid cells in the preimplantation murine embryo using a specific aneuploidy-inducing compound to generate embryos that were chimeras of euploid and aneuploid cells. The strength of the work is that they investigate the developmental potential and changes in gene expression profiles under normoxic and hypoxic culture conditions. Further, they also assessed how levels of DNA damage and DNA repair are altered in these culture conditions. They also assessed the allocation of aneuploid cells to the divergent cell lineages of the blastocyst stage embryo.

    3. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #1(Public review):

      We deeply appreciate the reviewer comments on our manuscript. Following up the revisions, our manuscript has been improved thanks to their insightful remarks. We have proceeded with all the required changes.  

      Weaknesses:

      The authors have still not addressed the inconsistent/missing description for sample size, the appropriate number of * for each figure panel, and the statistical tests used.

      Description of sample size, specific P value and statistical test used has been added it both in the main text, figures and figure legends.

      The authors assign 5% oxygen as hypoxia. This is not the case as the in vivo environment is close to this value. 5% is normoxia. Clinical IVF/embryo culture occurs at 5% O2. Please adjust your narrative around this.

      We define in our manuscript “normoxia” as the standard atmospheric oxygen levels in tissue culture incubators, which range from about 20–21% oxygen. Our definition of hypoxia is 5% concentration of oxygen, taking into consideration the standard levels of oxygen in the IVF clinics. Physiological oxygen in mouse varies from ~1.5% to 8% (Alva et al 2022). Considering that these levels of oxygen are the standard levels in tissue culture practices, a paragraph has been added to the discussion and materials and methods for further clarification   

      Reviewer #2 (Public review):

      Weakness:

      Given that this is a study on the induction of aneuploidy, it would be meaningful to assess aneuploidy immediately after induction, and then again before implantation. This is also applicable to the competition experiments on page 7/8. What is shown is the competitiveness of treated cells. Because the publication centers around aneuploidy, inclusion of such data in the main figure at all relevant points would strengthen it. There is some evaluation of karyotypes only in the supplemental - why? Would be good not to rely on a single assay that the authors appear to not give much importance.

      This is an excellent point. However, due to the stochasticity of the arising of aneuploidies when embryos are treated with AZ3146 and reversine (Bolton et al 2016), every treatment is likely to generate different levels of aneuploidy. Due to this, and to the technical limitations of generating single-cell genomic DNA sequencing at the blastocyst stage, we were unable to determine the karyotype of all cells after different conditions. Nevertheless, Regin et al 2024 (eLife) showed similar results on the overall transcriptome changes of different dosages of aneuploidy: high dosage embryos overexpress p53, like reversine-treated embryos; meanwhile, low dosage embryos overexpress the hypoxic pathway, including HIF1A, similar to embryos treated with AZ3146.  

      Reviewer #1 (Recommendations for the authors):

      Corrections required before final publishing:

      Please ensure that the number of asterisks is in alignment with standard convention (* <0.05; ** <0.01; *** <0.001; **** <0.0001). If you want to describe an exact P -vale it should be presented as P = 0.0004. line 108 *** is <0.0004. line 263 * P<0.0044

      Same issue appears in lines 697, 711, 722, 753, 685

      Specific values have been added in the figures and modified in the text. 

      Line 199: "...viable E9.5 embryos" missing "Figure S1D"

      Modified in manuscript

      Line 120: "...decidua" please add "Figure S1C"

      Modified in manuscript

      Line 126-127: Please add a description for the results (morula) in Fig 1D, e.g., It appears that YH2Ax persists from 8-cell to morula when treated with Reversine but not AZ3146"

      At the morula stage, the levels of γH2A.X in reversine- and AZ3146-treated embryos are similar (Fig. 1E). However, at the blastocyst stage, high levels of γH2A.X are maintained in reversine-treated embryos and reduced in AZ3146-treated embryos, suggesting some level of DNA repair between the morula-to-blastocyst stages (Fig. S2A). In contrast, in hypoxia, the levels of γH2A.X are low in the three treatments at the morula stages, suggesting that DNA repair can be enhanced under hypoxic conditions. Similar results have been reported in somatic cells (Marti et al., 2021; Pietrzak et al., 2018).

      Line 213: PARP1 levels were also similar under all conditions; but Fig3E, top right shows PARP1 was significantly lower with Reversine treatment; also please correct me if i am wrong, but does the phrase "all conditions" cross reference yH2AX and PARP1 between Fig 3 and Fig 1 to show the impact of hypoxia? Because from my understanding Fig 1 was done in 20% oxygen, but Fig 3 was done in 5% oxygen – hypoxia.

      This is correct. Modification in the manuscript has been performed for clarification

      Line 264: extra forward dash? "Reversine/AZ3146/ aggregation"

      Modified in manuscript

      Line 644: you don't have a control for IDF treatment, so how did you differentiate between impact of aneuploid drugs vs IDF treatment alone? Would the impact observed be due to compounding effect of aneuploidy drugs + IDF?

      This is a great observation. We previously demonstrated that IDF-1174 treatments in embryos do not affect pre-implantation development (Fig. S3).

      Line 681: change their behaviour is a vague statement. Be specific.

      Modified in manuscript

      Line 676 missing bracket "E)"

      Modified in manuscript

      Line 680: "...significantly on" should be "for"

      Modified in manuscript

      Line 682-685: "...hypoxia favours the survival of reversine-induced aneuploid cells." does it? the statement before this says in Rev/AZ chimeras, AZ blastomeres contribute similarly to reversine-blastomeres to the TE and PE but significantly increase contributions to the EPI.Wouldn't this mean hypoxia favours survival of AZ aneuploid cells in EPI?

      In normoxic conditions, AZ3146 treated cells in Rev/AZ chimeras contributed mostly to the EPI and TE but not PE. In contrast, in normoxic conditions, Rev-treated cells contributed similarly to all the lineages. This result seems to be due to a better survival of Rev-treated cells under normoxic conditions (Fig. 4D-E)

      Line 720: (b) shows blastocyst staining from what group? DMSO? Rev/AZ? Or are the 3 blastocysts shown here, 3 separate examples of Reversine-treated blastocysts? Would require labelling Fig S2B, and adding a short description in the corresponding figure legend

      Figure (B) shows the expression pattern of PARP1 at the blastocyst stage. Modified in manuscript

      Figure 2, Figure S3 and Figure S6: were these experiments performed at 5% or 20% O2, please add detail.

      Modified in manuscript

      Reviewer #2 (Recommendations for the authors):

      Lines 45-46 understanding of reduction of aneuploidy should mention/discuss the paper of attrition/selection, of the kind by the Brivanlou lab for instance, or others. As well as allocation to specific lineages, including the authors' work.

      A section in the discussion has been added in response to this recommendation. Comparison between models is debatable.

      The response does not clarify whether other papers were cited instead, or the authors own work that has shown preferential allocation to TE.

    1. eLife Assessment

      This important study provides a novel approach for delineating subcortical-cortical white matter bundles. The authors provide convincing evidence by harnessing state-of-the-art methods and cross-species data. Together, this effort will be of interest to scientists across multiple subfields and accelerate progress in a biologically critical but methodologically challenging area.

    2. Reviewer #1 (Public review):

      The authors note that it is challenging to perform diffusion MRI tractography consistently in both humans and macaques, particularly when deep subcortical structures are involved. The scientific advance described in this paper is effectively an update to the tracts that the XTRACT software supports. The changes to XTRACT are soundly motivated in theory (based on anatomical tracer studies) and practice (changes in seeding/masking for tractography).

    3. Reviewer #2 (Public review):

      Summary:

      In this article, Assimopoulos et al. expand the FSL-XTRACT software to include new protocols for identifying cortical-subcortical tracts with diffusion MRI, with a focus on tracts connecting to the amygdala and striatum. They show that the amygdalofugal pathway and divisions of the striatal bundle/external capsule can be successfully reconstructed in both macaques and humans while preserving large-scale topographic features previously defined in tract tracing studies. The authors set out to create an automated subcortical tractography protocol, and they accomplish this for a subset of specific subcortical connections.

      Strengths:

      The main strength of the current study is the translation of established anatomical knowledge to a tractography protocol for delineating cortical-subcortical tracts that are difficult to reconstruct. Diffusion MRI-based tractography is highly prone to false positives; thus, constraining tractography outputs by known anatomical priors is important. The authors used existing tracing literature to create anatomical constraints for tracking specific cortical-subcortical connections and refined their protocol through an iterative process and in collaboration with multiple neuroanatomists. Key additional strengths include 1) the creation of a protocol that can be applied to both macaque and human data; 2) demonstration that the protocol can be applied to be high quality data (3 shells, > 250 directions, 1.25 mm isotropic, 55 minutes) and lower quality data (2 shells, 100 directions, 2 mm isotropic, 6.5 minutes); and 3) validation that the anatomy of cortical-subcortical tracts derived from the new method are more similar in monozygotic twins than in siblings and unrelated individuals.

      Overall Appraisal:

      This new method will accelerate research on anatomically validated cortical-subcortical white matter pathways. The work has utility for diffusion MRI researchers across fields.

      Editors' note:

      Both reviewers were satisfied with the responses to their feedback.

    1. eLife Assessment

      This valuable study presents an analysis of evolutionary conservation in intrinsically disordered regions, identified as key drivers of phase separation, leveraging a protein language model. The strength of evidence is convincing, but a clearer justification of the methods and analyses is needed to fully support the main claims.

    2. Reviewer #1 (Public review):

      The manuscript by Zhang et al describes the use of a protein language model (pLM) to analyse disordered regions in proteins, with a focus on those that may be important in biological phase separation. While the paper is relatively easy to read overall, my main comment is that the authors could perhaps make it clearer which observations are new, and which support previous work using related approaches. Further, while the link to phase separation is interesting, it is not completely clear which data supports the statements made, and this could also be made clearer.

      Major comments:

      (1) With respect to putting the work in a better context of what has previously been done before, this is not to say that there is not new information in it, but what the authors do is somewhat closely related to work by others. I think it would be useful to make those links more directly. Some examples:

      (1a) Alderson et al (reference 71) analysed in detail the conservation of IDRs (via pLDDT, which is itself related to conservation) to show, for example, that conserved residues fold upon binding. This analysis is very similar to the analysis used in the current study (using ESM2 as a different measure of conservation). Thus, the approach (pages 7-8) described as "This distinction allows us to classify disordered regions into two types: "flexible disordered" regions, which show high ESM2 scores and greater mutational tolerance, and "conserved disordered" regions, which display low ESM2 scores, indicating varying levels of mutational constraint despite a lack of stable folding." is fundamentally very similar to that used by Alderson et al. Thus, the result that "Given that low ESM2 scores generally reflect mutational constraint in folded proteins, the presence of region a among disordered residues suggests that certain disordered amino acids are evolutionarily conserved and likely functionally significant" is in some ways very similar to the results of that paper.

      (1b) Dasmeh et al (https://doi.org/10.1093/genetics/iyab184), Lu et al (https://doi.org/10.1371/journal.pcbi.1010238) and Ho & Huang (https://doi.org/10.1002/pro.4317) analysed conservation in IDRs, including aromatic residues and their role in phase separation

      (1c) A number of groups have performed proteomewide saturation scans using pLMs, including variants of the ESM family, including Meier (reference 89, but cited about something else) and Cagiada et al (https://doi.org/10.1101/2024.05.21.595203) that analysed variant effects in IDRs using a pLM. Thus, I think statements such as "their applicability to studying the fitness and evolutionary pressures on IDRs has yet to be established" should possibly be qualified.

      (2) On page 4, the authors write, "The conserved residues are primarily located in regions associated with phase separation." These results are presented as a central part of the work, but it is not completely clear what the evidence is.

      (3) It would be useful with an assessment of what controls the authors used to assess whether there are folded domains within their set of IDRs.

    3. Reviewer #2 (Public review):

      This manuscript uses the ESM2 language model to map the evolutionary fitness landscape of intrinsically disordered regions (IDRs). The central idea is that mutational preferences predicted by these models could be useful in understanding eventual IDR-related behavior, such as disruption of otherwise stable phases. While ESM2-type models have been applied to analyze such mutational effects in folded proteins, they have not been used or verified for studying IDRs. Here, the authors use ESM2 to study membraneless organelle formation and the related fitness landscape of IDRs.

      Through this, their key finding in this work is the identification of a subset of amino acids that exhibit mutation resistance. Their findings reveal a strong correlation between ESM2 scores and conservation scores, which if true, could be useful for understanding IDRs in general. Through their ESM2-based calculations, the authors conclude that IDRs crucial for phase separation frequently contain conserved sequence motifs composed of both so-called sticker and spacer residues. The authors note that many such motifs have been experimentally validated as essential for phase separation.

      Unfortunately, I do not believe that the results can be trusted. ESM2 has not been validated for IDRs through experiments. The authors themselves point out its little use in that context. In this study, they do not provide any further rationale for why this situation might have changed. Furthermore, they mention that experimental perturbations of the predicted motifs in in vivo studies may further elucidate their functional importance, but none of that is done here. That some of the motifs have been previously validated does not give any credibility to the use of ESM2 here, given that such systems were probably seen during the training of the model.

      I believe that the authors should revamp their whole study and come up with a rigorous, scientific protocol where they make predictions and test them using ESM2 (or any other scientific framework).

    4. Reviewer #3 (Public review):

      Summary:

      This is a very nice and interesting paper to read about motif conservation in protein sequences and mainly in IDRs regions using the ESM2 language model. The topic of the paper is timely, with strong biological significance. The paper can be of great interest to the scientific community in the field of protein phase transitions and future applications using the ESM models. The ability of ESM2 to identify conserved motifs is crucial for disease prediction, as these regions may serve as potential drug targets. Therefore, I find these findings highly significant, and the authors strongly support them throughout the paper. The work motivates the scientific community towards further motif exploration related to diseases.

      Strengths:

      (1) Revealing conserved regions in IDRs by the ESM-2 language model.

      (2) Identification of functionally significant residues within protein sequences, especially in IDRs.

      (3) Findings supported by useful analyses.

      Weaknesses:

      (1) Lack of examples demonstrating the potential biological functions of these conserved regions

      (2) Very limited discussion of potential future work and of limitations.

    1. eLife Assessment

      This important study explored a number of issues related to citations in the peer review process. An analysis of more than 37000 peer reviews at four journals found that: i) during the first round of review, reviewers were less likely to recommend acceptance if the article under review cited the reviewer's own articles; ii) during the second and subsequent rounds of review, reviewers were more likely to recommend acceptance if the article cited the reviewer's own articles; iii) during all rounds of review, reviewers who asked authors to cite the reviewer's own articles (a practice known as 'coercive citation') were less likely to recommend acceptance. However, when an author agreed to cite work by the reviewer, the reviewer was more likely to recommend acceptance of the revised article. The evidence is convincing, but article would benefit from a clearer presentation of the results and a more nuanced discussion of the motivations of reviewers.

    2. Reviewer #1 (Public Review):

      Summary:

      The work used open peer reviews and followed them through a succession of reviews and author revisions. It assessed whether a reviewer had requested the author include additional citations and references to the reviewers' work. It then assessed whether the author had followed these suggestions and what the probability of acceptance was based on the authors decision.

      Strengths and weaknesses:

      The work's strengths are the in-depth and thorough statistical analysis it contains and the very large dataset it uses. The methods are robust and reported in detail. However, this is also a weakness of the work. Such thorough analysis makes it very hard to read! It's a very interesting paper with some excellent and thought provoking references but it needs to be careful not to overstate the results and improve the readability so it can be disseminated widely. It should also discuss more alternative explanations for the findings and, where possible, dismiss them.

    3. Reviewer #2 (Public Review):

      Summary:

      This article examines reviewer coercion in the form of requesting citations to the reviewer's own work as a possible trade for acceptance and shows that, under certain conditions, this happens.

      Strengths:

      The methods are well done and the results support the conclusions that some reviewers "request" self-citations and may be making acceptance decisions based on whether an author fulfills that request.

      Weaknesses:

      The author needs to be more clear on the fact that, in some instances, requests for self-citations by reviewers is important and valuable.

    4. Reviewer #3 (Public Review):

      Summary:

      In this article, Barnett examines a pressing question regarding citing behavior of authors during the peer review process. In particular, the author studies the interaction between reviewers and authors, focusing on the odds of acceptance, and how this may be affected by whether or not the authors cited the reviewers' prior work, whether the reviewer requested such citations be added, and whether the authors complied/how that affected the reviewer decision-making.

      Strengths:

      The author uses a clever analytical design, examining four journals that use the same open peer review system, in which the identities of the authors and reviewers are both available and linkable to structured data. Categorical information about the approval is also available as structured data. This design allows a large scale investigation of this question.

      Weaknesses:

      My concerns pertain to the interpretability of the data as presented and the overly terse writing style.

      Regarding interpretability, it is often unclear what subset of the data are being used both in the prose and figures. For example, the descriptive statistics show many more Version 1 articles than Version 2+. How are the data subset among the different possible methods?

      Likewise, the methods indicate that a matching procedure was used comparing two reviewers for the same manuscript in order to control for potential confounds. However, the number of reviews is less than double the number of Version 1 articles, making it unclear which data were used in the final analysis. The methods also state that data were stratified by version. This raises a question about which articles/reviews were included in each of the analyses. I suggest spending more space describing how the data are subset and stratified. This should include any conditional subsetting as in the analysis on the 441 reviews where the reviewer was not cited in Version 1 but requested a citation for Version 2. Each of the figures and tables, as well as statistics provided in the text should provide this information, which would make this paper much more accessible to the reader. [Note from editor: Please see "Editorial feedback" for more on this]

      Finally, I would caution against imputing motivations to the reviewers, despite the important findings provided here. This is because the data as presented suggest a more nuanced interpretation is warranted. First, the author observes similar patterns of accept/reject decisions whether the suggested citation is a citation to the reviewer or not (Figs 3 and 4). Second, much of the observed reviewer behavior disappears or has much lower effect sizes depending on whether "Accept with Reservations" is considered an Accept or a Reject. This is acknowledged in the results text, but largely left out of the discussion. The conditional analysis on the 441 reviews mentioned above does support a more cautious version of the conclusion drawn here, especially when considered alongside the specific comments left by reviewers that were mentioned in the results and information in Table S.3. However, I recommend toning the language down to match the strength of the data.

    5. Reviewer #4 (Public Review):

      Summary:

      This work investigates whether a citation to a referee made by a paper is associated with a more positive evaluation by that referee for that paper. It provides evidence supporting this hypothesis. The work also investigates the role of self citations by referees where the referee would ask authors to cite the referee's paper.

      Strengths:

      This is an important problem: referees for scientific papers must provide their impartial opinions rooted in core scientific principles. Any undue influence due to the role of citations breaks this requirement. This work studies the possible presence and extent of this.

      Barring a few issues discussed below, the methods are solid and well done. The work uses a matched pair design which controls for article-level confounding and further investigates robustness to other potential confounds.

      It is surprising that even in these investigated journals where referee names are public, there is prevalence of such citation-related behaviors.

      Weaknesses:

      Some overall claims are questionable:

      "Reviewers who were cited were more likely to approve the article, but only after version 1" It also appears that referees who were cited were less likely to approve the article in version 1. This null or slightly negative effect undermines the broad claim of citations swaying referees. The paper highlights only the positive results while not including the absence (and even reversal) of the effect in version 1 in its narrative.

      "To the best of our knowledge, this is the first analysis to use a matched design when examining reviewer citations" Does not appear to be a valid claim based on the literature reference [18]

      It will be useful to have a control group in the analysis associated to Figure 5 where the control group comprises matched reviews that did not ask for a self citation. This will help demarcate words associated with approval under self citation (as compared to when there is no self citation). The current narrative appears to suggest an association of the use of these words with self citations but without any control.

      More discussion on the recommendations will help: For the suggestion that "the reviewers initially see a version of the article with all references blinded and no reference list" the paper says "this involves more administrative work and demands more from peer reviewers". I am afraid this can also degrade the quality of peer review, given that the research cannot be contextualized properly by referees. Referees may not revert back to all their thoughts and evaluations when references are released afterwards.

    6. Author response:

      There was a common theme across the reviews to provide a more cautious interpretation and to consider the key question of whether peer reviewers who include citations are being purely self-serving or are highlighting important missing context. I will include a suggested new text analysis to cover this and will expand the discussion on this key question. Reviewers highlighted some confusion around the sample sizes for the different analyses, and I will clarify all sample sizes in the next version.

    1. eLife Assessment

      This is a useful study, describing transcriptome-based PPGL subtypes and exploring the mutations, immune correlates, and disease progression of cases in each subtype. The cohort is a reasonable size, and a second cohort is included from TCGA. The identification of driver mutations in PPGL is incomplete, and this compromises characterisation for prognostic purposes. This is a reasonable starting point from which to further elucidate PPGL subtypes.

    2. Reviewer #1 (Public review):

      This study presents an exploration of PPGL tumour bulk transcriptomics and identifies three clusters of samples (labeled as subtypes C1-C3). Each subtype is then investigated for the presence of somatic mutations, metabolism-associated pathways and inflammation correlates, and disease progression.

      The proposed subtype descriptions are presented as an exploratory study. The proposed potential biomarkers from this subtype are suitably caveated, and will require further validation in PPGL cohorts together with a mechanistic study.

      The first section uses WGCNA (a method to identify clusters of samples based on gene expression correlations) to discover three transcriptome-based clusters of PPGL tumours.

      The second section inspects a previously published snRNAseq dataset, and labels some of the published cells as subtypes C1, C2, C3 (Methods could be clarified here), among other cells labelled as immune cell types. Further details about how the previously reported single-nuclei were assigned to the newly described subtypes C1-C3 require clarification.

      The tumour samples are obtained from multiple locations in the body (Figure 1A). It will be important to see further investigation of how the sample origin is distributed among the C1-C3 clusters, and whether there is a sample-origin association with mutational drivers and disease progression.

    3. Reviewer #2 (Public review):

      Summary:

      A study that furthers the molecular definition of PPGL (where prognosis is variable) and provides a wide range of sub-experiments to back up the findings. One of the key premises of the study is that identification of driver mutations in PPGL is incomplete and that compromises characterisation for prognostic purposes. This is a reasonable starting point on which to base some characterisation based on different methods.

      Strengths:

      The cohort is a reasonable size, and a useful validation cohort in the form of TCGA is used. Whilst it would be resource-intensive (though plausible given the rarity of the tumour type) to perform RNAseq on all PPGL samples in clinical practice, some potential proxies are proposed.

      Weaknesses:

      The performance of some of the proxy markers for transcriptional subtype is not presented.

      There is limited prognostic information available.

    4. Author response:

      Reviewer #1 (Public Review):

      This study presents an exploration of PPGL tumour bulk transcriptomics and identifies three clusters of samples (labeled as subtypes C1-C3). Each subtype is then investigated for the presence of somatic mutations, metabolism-associated pathways and inflammation correlates, and disease progression. The proposed subtype descriptions are presented as an exploratory study. The proposed potential biomarkers from this subtype are suitably caveated and will require further validation in PPGL cohorts together with a mechanistic study.

      The first section uses WGCNA (a method to identify clusters of samples based on gene expression correlations) to discover three transcriptome-based clusters of PPGL tumours. The second section inspects a previously published snRNAseq dataset, and labels some of the published cells as subtypes C1, C2, C3 (Methods could be clarified here), among other cells labelled as immune cell types. Further details about how the previously reported single-nuclei were assigned to the newly described subtypes C1-C3 require clarification.

      Thank you for your valuable suggestion. In response to the reviewer’s request for further clarification on “how previously published single-nuclei data were assigned to the newly defined C1-C3 subtypes,” we have provided additional methodological details in the revised manuscript (lines 103-109). Specifically, we aggregated the single-nucleus RNA-seq data to the sample level by summing gene counts across nuclei to generate pseudo-bulk expression profiles. These profiles were then normalized for library size, log-transformed (log1p), and z-scaled across samples. Using genesets scores derived from our earlier WGCNA analysis of PPGLs, we defined transcriptional subtypes within the Magnus cohort (Supplementary Figure. 1C). We further analyzed the single-nucleus data by classifying malignant (chromaffin) nuclei as C1, C2, or C3 based on their subtype scores, while non-malignant nuclei (including immune, stromal, endothelial, and others) were annotated using canonical cell-type markers (Figure. 4A).

      The tumour samples are obtained from multiple locations in the body (Figure 1A). It will be important to see further investigation of how the sample origin is distributed among the C1-C3 clusters, and whether there is a sample-origin association with mutational drivers and disease progression.

      Thank you for your valuable suggestion. In the revised manuscript (lines 74-79), Figure. 1A, Table S1 and Supplementary Figure. 1A, we harmonized anatomic site annotations from our PPGL cohort and the TCGA cohort and analyzed the distribution of tumor origin (adrenal vs extra-adrenal) across subtypes. The site composition is essentially uniform across C1-C3—approximately 75% pheochromocytoma (PC) and 25% paraganglioma (PG)—with only minimal variation. Notably, the proportion of extra-adrenal origin (paraganglioma origin) is slightly higher in the C1 subtype (see Supplementary Figure 1A), which aligns with the biological characteristics of tumors from this anatomical site, which typically exhibit more aggressive behavior.

      Reviewer #2 (Public Review):

      A study that furthers the molecular definition of PPGL (where prognosis is variable) and provides a wide range of sub-experiments to back up the findings. One of the key premises of the study is that identification of driver mutations in PPGL is incomplete and that compromises characterisation for prognostic purposes. This is a reasonable starting point on which to base some characterisation based on different methods. The cohort is a reasonable size, and a useful validation cohort in the form of TCGA is used. Whilst it would be resource-intensive (though plausible given the rarity of the tumour type) to perform RNA-seq on all PPGL samples in clinical practice, some potential proxies are proposed.

      We sincerely thank the reviewer for their positive assessment of our study’s rationale. We fully agree that RNA sequencing for all PPGL samples remains resource-intensive in current clinical practice, and its widespread application still faces feasibility challenges. It is precisely for this reason that, after defining transcriptional subtypes, we further focused on identifying and validating practical molecular markers and exploring their detectability at the protein level.

      In this study, we validated key markers such as ANGPT2, PCSK1N, and GPX3 using immunohistochemistry (IHC), demonstrating their ability to effectively distinguish among molecular subtypes (see Figure. 5). This provides a potential tool for the clinical translation of transcriptional subtyping, similar to the transcription factor-based subtyping in small cell lung cancer where IHC enables low-cost and rapid molecular classification.

      It should be noted that the subtyping performance of these markers has so far been preliminarily validated only in our internal cohort of 87 PPGL samples. We agree with the reviewer that larger-scale, multi-center prospective studies are needed in the future to further establish the reliability and prognostic value of these markers in clinical practice.

      The performance of some of the proxy markers for transcriptional subtype is not presented.

      We agree with your comment regarding the need to further evaluate the performance of proxy markers for transcriptional subtyping. In our study, we have in fact taken this point into full consideration. To translate the transcriptional subtypes into a clinically applicable classification tool, we employed a linear regression model to compare the effect values (β values) of candidate marker genes across subtypes (Supplementary Figure. 1D-F). Genes with the most significant β values and statistical differences were selected as representative markers for each subtype.

      Ultimately, we identified ANGPT2, PCSK1N, and GPX3—each significantly overexpressed in subtypes C1, C2, and C3, respectively, and exhibiting the most pronounced β values—as robust marker genes for these subtypes (Figure. 5A and Supplementary Figure. 1D-F). These results support the utility of these markers in subtype classification and have been thoroughly validated in our analysis. 

      There is limited prognostic information available.

      Thank you for your valuable suggestion. In this exploratory revision, we present the available prognostic signal in Figure. 5C. Given the current event numbers and follow-up time, we intentionally limited inference. We are continuing longitudinal follow-up of the PPGL cohort and will periodically update and report mature time-to-event analyses in subsequent work.

    1. eLife Assessment

      This valuable study presents an analysis of evolutionary conservation in intrinsically disordered regions, identified as key drivers of phase separation, leveraging a protein language model. The strength of evidence presented is convincing overall, though the theoretical grounding could benefit from further development.

    2. Reviewer #1 (Public review):

      The manuscript by Zhang et al describes the use of a protein language model (pLM) to analyse disordered regions in proteins, with a focus on those that may be important in biological phase separation. This is an interesting study that supports, complements and extends previous related analyses on the conservation and mutational tolerance of disordered regions, with a particular focus on disordered regions in proteins that are found in condensates.

    3. Reviewer #2 (Public review):

      This manuscript uses the ESM2 language model to map the evolutionary fitness landscape of intrinsically disordered regions (IDRs). The central idea is that mutational preferences predicted by these models could be useful in understanding eventual IDR-related behavior, such as disruption of otherwise stable phases. While ESM2-type models have been applied to analyze such mutational effects in folded proteins, they have not been used or verified for studying IDRs. Here, the authors use ESM2 to study membraneless organelle formation and the related fitness landscape of IDRs.

      Through this, their key finding in this work is the identification of a subset of amino acids that exhibit mutation resistance. Their findings reveal a strong correlation between ESM2 scores and conservation scores, which if true, could be useful for understanding IDRs in general. Through their ESM2-based calculations, the authors conclude that IDRs crucial for phase separation frequently contain conserved sequence motifs composed of both so-called sticker and spacer residues. The authors note that many such motifs have been experimentally validated as essential for phase separation.

      Comments on revisions:

      Unfortunately my concerns about lack of theoretical grounding and validation (especially critical in lack of theoretical grounding) persist. The argument about correlation between ESM2 scores and MSA conservation is circular. Protein language models already encode residue‑level conservation, so agreement with conservation does not establish new predictive power. For IDRs, conservation is a poor surrogate for function because many functions are mediated by short, degenerate SLiMs that are frequently gained and lost. Sequence‑only predictions therefore need orthogonal (preferably experimental or at the least in silico) tests. Finally, without a family‑level holdout (e.g., cluster de‑duplication at low identity) and prospective tests, overlap with known motifs cannot rule out training‑data memorization/near‑duplicates.

    1. eLife Assessment

      This investigation presents a valuable contribution by elucidating the genetic determinants of growth and fitness across multiple clinical strains of Mycobacterium intracellulare, an understudied non-tuberculous mycobacterium. Employing transposon sequencing (Tn-seq), the authors identify a core set of 131 genes essential for bacterial viability, offering a solid foundation for anti-mycobacterial drug discovery. However, there are minor but nonetheless significant concerns about data organization, which need to be addressed for greater scientific impact.

    2. Reviewer #1 (Public review):

      Summary:

      In this descriptive study, Tateishi et al. report a Tn-seq based analysis of genetic requirements for growth and fitness in 8 clinical strains of Mycobacterium intracellulare Mi), and compare the findings with a type strain ATCC13950. The study finds a core set of 131 genes that are essential in all nine strains, and therefore are reasonably argued as potential drug targets. Multiple other genes required for fitness in clinical isolates have been found to be important for hypoxic growth in the type strain.

      Strengths:

      The study has generated a large volume of Tn-seq datasets of multiple clinical strains of Mi from multiple growth conditions, including from mouse lungs. The dataset can serve as an important resource for future studies on Mi, which despite being clinically significant remains a relatively understudied species of mycobacteria.

      Weaknesses:

      The primary claim of the study that the clinical strains are better adapted for hypoxic growth is yet to be comprehensively investigated. However, this reviewer thinks such an investigation would require a complex experimental design and perhaps forms an independent study.

    3. Reviewer #4 (Public review):

      Summary:

      In this study Tateishi et al. used TnSeq to identify 131 shared essential or growth defect-associated genes in eight clinical MAC-PD isolates and the type strain ATCC13950 of Mycobacterium intracellulare which are proposed as potential drug targets. Genes involved in gluconeogenesis and the type VII secretion system which are required for hypoxic pellicle-type biofilm formation in ATCC13950 also showed increased requirement in clinical strains under standard growth conditions. These findings were further confirmed in a mouse lung infection model.

      Strengths:

      This study has conducted TnSeq experiments in reference and 8 different clinical isolates of M. intracellulare thus producing large number of datasets which itself is a rare accomplishment and will greatly benefit the research community.

      Weaknesses:

      (1) A comparative growth study of pure and mixed cultures of clinical and reference strains under hypoxia will be helpful in supporting the claim that clinical strains adapt better to such conditions. This should be mentioned as future directions in the discussion section along with testing the phenotype of individual knockout strains.<br /> (2) Authors should provide the quantitative value of read counts for classifying a gene as "essential" or "non-essential" or "growth-defect" or "growth-advantage". Merely mentioning "no insertions in all or most of their TA sites" or "unusually low read counts" or "unusually high low read counts" is not clear.<br /> (3) One of the major limitations of this study is the lack of validation of TnSeq results with individual gene knockouts. Authors should mention this in the discussion section.

    4. Reviewer #5 (Public review):

      Summary:

      In the research article, "Functional genomics reveals strain-specific genetic requirements conferring hypoxic growth in Mycobacterium intracellulare" Tateshi et al focussed their research on pulmonary disease caused by Mycobacterium avium-intracellulare complex which has recently become a major health concern. The authors were interested in identifying the genetic requirements necessary for growth/survival within host and used hypoxia and biofilm conditions that partly replicate some of the stress conditions experienced by bacteria in vivo. An important finding of this analysis was the observation that genes involved in gluconeogenesis, type VII secretion system and cysteine desulphurase were crucial for the clinical isolates during standard culture while the same were necessary during hypoxia in the ATCC type strain.

      Strength of the study:

      Transposon mutagenesis has been a powerful genetic tool to identify essential genes/pathways necessary for bacteria under various in vitro stress conditions and for in vivo survival. The authors extended the TnSeq methodology not only to the ATCC strain but also to the recently clinical isolates to identify the differences between the two categories of bacterial strains. Using this approach they dissected the similarities and differences in the genetic requirement for bacterial survival between ATCC type strains and clinical isolates. They observed that the clinical strains performed much better in terms of growth during hypoxia than the type strain. These in vitro findings were further extended to mouse infection models and similar outcomes were observed in vivo further emphasising the relevance of hypoxic adaptation crucial for the clinical strains which could be explored as potential drug targets.

      Weakness:

      The authors have performed extensive TnSeq analysis but fail to present the data coherently. The data could have been well presented both in Figures and text. In my view this is one of the major weakness of the study.

    1. eLife Assessment

      This important study provides evidence supporting the idea that postnatal experience plays an instructive role in shaping the patterns of functional connectivity between extrastriate visual cortex and frontal regions during development, by comparing neonates, blind and sighted adults. The evidence supporting the authors' claim is solid. Nevertheless, substantial weaknesses remain in mechanistic interpretation and alignment with relevant developmental frameworks. This study will be of significant interest to neuroscientists and neuroimaging researchers focused on vision, plasticity and development.

    2. Reviewer #1 (Public review):

      Summary:

      The present study evaluates the role of visual experience in shaping functional correlations between human extrastriate visual cortex and frontal regions. The authors used fMRI to assess "resting-state" temporal correlations in three groups: sighted adults, congenitally blind adults, and neonates. Previous research has already demonstrated differences in functional correlations between visual and frontal regions in sighted compared to early blind individuals. The novel contribution of the current study lies in the inclusion of an infant dataset, which allows for an assessment of the developmental origins of these differences.

      The main results of the study reveal that correlations between prefrontal and visual regions are more prominent in the blind and infant groups, with the blind group exhibiting greater lateralization. Conversely, correlations between visual and somato-motor cortices are more prominent in sighted adults. Based on these data, the authors conclude that visual experience plays an instructive role in shaping these cortical networks. This study provides valuable insights into the impact of visual experience on the development of functional connectivity in the brain.

      Strengths:

      The dissociations in functional correlations observed among the sighted adult, congenitally blind, and neonate groups provide strong support for the main conclusion regarding postnatal experience-driven shaping of visual-frontal connectivity.

      The inclusion of neonates offers a unique and valuable developmental anchor for interpreting divergence between blind and sighted adults. This is a major advance over prior studies limited to adult comparisons.

      Convergence with prior findings in the blind and sighted adult groups reinforces the reliability and external validity of the present results.

      The split-half reliability analysis in the infant data increases confidence in the robustness of the reported group differences.

      Weaknesses:

      The manuscript risks overstating a mechanistic distinction between sighted and blind development by framing visual experience as "instructive" and blindness as "reorganizing." Similarly, the binary framing of visual experience and blindness as independent may oversimplify shared plasticity mechanisms.

      The interpretation of changes in temporal correlations as altered neural communication does not adequately consider how shifts in shared variance across networks may influence these measures without reflecting true biological reorganization.

      The discussion does not substantively engage with the longstanding debate over whether sensory experience plays an instructive or permissive role in cortical development.

      The relationship between resting-state and task-based findings in blindness remains unclear.

    3. Reviewer #2 (Public review):

      Summary:

      Tian et al. explore the developmental origins of cortical reorganization in blindness. Previous work has found that a set of regions in the occipital cortex show different functional responses and patterns of functional correlations in blind vs. sighted adults. Here, Tian et al. explore how this organization arises over development. Is the "starting state" more like the blind pattern, or more like the adult pattern? Their analyses reveal that the answer depends on the particular networks investigated. Some functional connections in infants look more like blind than sighted adults; other functional connections look more like sighted than blind adults; and others fall somewhere in the middle, or show an altogether different pattern in infants compared with both sighted and blind adults.

      Strengths:

      The paper addresses very important questions about the starting state in the developing visual cortex, and how cortical networks are shaped by experience. Another clear strength lies in the unequivocal nature of many results. Many results have very large effect sizes, critical interactions between regions and groups are tested and found, and infant analyses are replicated in split halves of the data.

      Weaknesses:

      While potential roles of experience (e.g., visual, cross-modal) are discussed in detail, little consideration is given to the role of experience-independent maturation. The infants scanned are extremely young, only 2 weeks old. It is possible then that the sighted adult pattern may still emerge later in infancy or childhood, regardless of infant visual experience. If so, the blind adult pattern may depend on blindness-related experience only (which may or may not reflect "visual" experience per se). In short, it is not clear that birth, or the first couple weeks of life, are a clear cut "starting point" for development, after which all change can be attributed to experience.

    4. Reviewer #3 (Public review):

      Summary

      This study aimed to investigate whether the differences observed in the organization of visual brain networks between blind and sighted adults result from a reorganization of an early functional architecture due to blindness, or whether the early architecture is immature at birth and requires visual experience to develop functional connections. This question was investigated through the comparison of 3 groups of subjects with resting-state functional MRI (rs-fMRI). Based on convincing analyses, the study suggests that: 1) secondary visual cortices showed higher connectivity to prefrontal cortical regions (PFC) than to non-visual sensory areas (S1/M1 and A1) in infants like in blind adults, in contrast to sighted adults; 2) the V1 connectivity pattern of infants lies between that of sighted adults (showing stronger functional connectivity with non-visual sensory areas than with PFC) and that of blind adults (showing stronger functional connectivity with PFC than with non-visual sensory areas); 3) the laterality of the connectivity patterns of infants resembled those of sighted adults more than those of blind adults, but infants showed a less differentiated fronto-occipital connectivity pattern than adults.

      Strengths

      The question investigated in this article is important for understanding the mechanisms of plasticity during typical and impaired development, and the approach considered, which compares different groups of subjects including, neonates/infants and blind adults, is highly original.

      Overall, the presented analyses are solid and well detailed, and the results and discussion are convincing.

      Weaknesses

      While it is informative to compare the "initial" state (close to birth) and the "final" states in blind and sighted adults to study the impact of post-natal and visual experience, this study does not analyze the chronology of this development and when the specialization of functional connections is completed. This would require investigating the evolution of functional connectivity of the visual system as a function of visual experience and thus as a function of age, at least during toddlerhood given the early and intense maturation of the visual system after birth. This could be achieved by analyzing different developmental periods using open databases such as the Baby Connectome Project.

      The rationale for grouping full-term neonates and preterm infants (scanned at term-equivalent age) is not understandable when seeking to perform comparisons with adults. Even if the study results do not show differences between full-terms and preterms in terms of functional connectivity differences between regions and of connectivity patterns, preterms group had different neurodevelopment and post-natal (including visual) experiences (even a few weeks might have an impact). And actually they show reduced connectivity strength systematically for all regions compared with full-terms (Sup Fig 7). Considering a more homogeneous group of neonates would have strengthen the study design.

      The rationale for presenting results on the connectivity of secondary visual cortices before the one of primary cortices (V1) could be clarified.

      The authors acknowledge the methodological difficulties for defining regions of interest (ROIs) in infants in a similar way as adults. Since the brain development is not homogeneous and synchronous across brain regions (in particular with the frontal and parietal lobes showing a delayed growth), this poses major problems for registration. This raises the question of whether the study findings could be biased by differences in ROI positioning across groups.

    5. Author response:

      Reviewer #1 (Public Review):

      Summary:

      The present study evaluates the role of visual experience in shaping functional correlations between extrastriate visual cortex and frontal regions. The authors used fMRI to assess "resting-state" temporal correlations in three groups: sighted adults, congenitally blind adults, and neonates. Previous research has already demonstrated differences in functional correlations between visual and frontal regions in sighted compared to early blind individuals. The novel contribution of the current study lies in the inclusion of an infant dataset, which allows for an assessment of the developmental origins of these differences.

      The main results of the study reveal that correlations between prefrontal and visual regions are more prominent in the blind and infant groups, with the blind group exhibiting greater lateralization. Conversely, correlations between visual and somato-motor cortices are more prominent in sighted adults. Based on these data, the authors conclude that visual experience plays an instructive role in shaping these cortical networks. This study provides valuable insights into the impact of visual experience on the development of functional connectivity in the brain.

      Strengths:

      The dissociations in functional correlations observed among the sighted adult, congenitally blind, and neonate groups provide strong support for the study's main conclusion regarding experience-driven changes in functional connectivity profiles between visual and frontal regions.

      In general, the findings in sighted adult and congenitally blind groups replicate previous studies and enhance the confidence in the reliability and robustness of the current results.

      Split-half analysis provides a good measure of robustness in the infant data.

      Weaknesses:

      There is some ambiguity in determining which aspects of these networks are shaped by experience.

      This uncertainty is compounded by notable differences in data acquisition and preprocessing methods, which could result in varying signal quality across groups. Variations in signal quality may, in turn, have an impact on the observed correlation patterns.

      The study's findings could benefit from being situated within a broader debate surrounding the instructive versus permissive roles of experience in the development of visual circuits.

      Reviewer #2 (Public Review):

      Summary:

      Tian et al. explore the developmental organs of cortical reorganization in blindness. Previous work has found that a set of regions in the occipital cortex show different functional responses and patterns of functional correlations in blind vs. sighted adults. In this paper, Tian et al. ask: how does this organization arise over development? Is the "starting state" more like the blind pattern, or more like the adult pattern? Their analyses reveal that the answer depends on the particular networks investigated; some functional connections in infants look more like blind than sighted adults; other functional connections look more like sighted than blind adults; and others fall somewhere in the middle, or show an altogether different pattern in infants compared with both sighted and blind adults. 

      Strengths:

      The question raised in this paper is extremely important: what is the starting state in development for visual cortical regions, and how is this organization shaped by experience? This paper is among the first to examine this question, particularly by comparing infants not only with sighted adults but also blind adults, which sheds new light on the role of visual (and cross-modal) experience. Another clear strength lies in the unequivocal nature of many results. Many results have very large effect sizes, critical interactions between regions and groups are tested and found, and infant analyses are replicated in split halves of the data. 

      Weaknesses:

      A central claim is that "infant secondary visual cortices functionally resemble those of blind more than sighted adults" (abstract, last paragraph of intro). I see two potential issues with this claim. First, a minor change: given the approaches used here, no claims should be made about the "function" of these regions, but rather their "functional correlations". Second (and more importantly), the claim that the secondary visual cortex in general resembles blind more than sighted adults is still not fully supported by the data. In fact, this claim is only true for one aspect of secondary visual area functional correlations (i.e., their connectivity to A1/M1/S1 vs. PFC). In other analyses, the infant secondary visual cortex looks more like sighted adults than blind adults (i.e., in within vs. across hemisphere correlations), or shows a different pattern from both sighted and blind adults (i.e., in occipito-frontal subregion functional connectivity). It is not clear from the manuscript why the comparison to PFC vs. non-visual sensory cortex is more theoretically important than hemispheric changes or within-PFC correlations (in fact, if anything, the within-PFC correlations strike me as the most important for understanding the development and reorganization of these secondary visual regions). It seems then that a more accurate conclusion is that the secondary visual cortex shows a mix of instructive effects of vision and reorganizing effects of blindness, albeit to a different extent than the primary visual cortex.

      Relatedly, group differences in overall secondary visual cortex connectivity are particularly striking as visualized in the connectivity matrices shown in Figure S1. In the results (lines 105-112), it is noted that while the infant FC matrix is strongly correlated with both adult groups, the infant group is nonetheless more strongly correlated with the blind than sighted adults. I am concerned that these results might be at least partially explained by distance (i.e., local spread of the bold signal), since a huge portion of the variance in these FC matrices is driven by stronger correlations between regions within the same system (e.g., secondary-secondary visual cortex, frontal-frontal cortex), which are inherently closer together, relative to those between different systems (e.g., visual to frontal cortex). How do results change if only comparisons between secondary visual regions and non-visual regions are included (i.e., just the pairs of regions within the bold black rectangle on the figure), which limits the analysis to long-rang connections only? Indeed, looking at the off-diagonal comparisons, it seems that in fact there are three altogether different patterns here in the three groups. Even if the correlation between the infant pattern and blind adult pattern survives, it might be more accurate to claim that infants are different from both adult groups, suggesting both instructive effects of vision and reorganizing effects of blindness. It might help to show the correlation between each group and itself (across independent sets of subjects) to better contextualize the relative strength of correlations between the groups. 

      It is not clear that differences between groups should be attributed to visual experience only. For example, despite the title of the paper, the authors note elsewhere that cross-modal experience might also drive changes between groups. Another factor, which I do not see discussed, is possible ongoing experience-independent maturation. The infants scanned are extremely young, only 2 weeks old. Although no effects of age are detected, it is possible that cortex is still undergoing experience-independent maturation at this very early stage of development. For example, consider Figure 2; perhaps V1 connectivity is not established at 2 weeks, but eventually achieves the adult pattern later in infancy or childhood. Further, consider the possibility that this same developmental progression would be found in infants and children born blind. In that case, the blind adult pattern may depend on blindness-related experience only (which may or may not reflect "visual" experience per se). To deal with these issues, the authors should add a discussion of the role of maturation vs. experience and temper claims about the role of visual experience specifically (particularly in the title). 

      The authors measure functional correlations in three very different groups of participants and find three different patterns of functional correlations. Although these three groups differ in critical, theoretically interesting ways (i.e., in age and visual/cross-modal experience), they also differ in many uninteresting ways, including at least the following: sampling rate (TR), scan duration, multi-band acceleration, denoising procedures (CompCor vs. ICA), head motion, ROI registration accuracy, and wakefulness (I assume the infants are asleep).

      Addressing all of these issues is beyond the scope of this paper, but I do feel the authors should acknowledge these confounds and discuss the extent to which they are likely (or not) to explain their results. The authors would strengthen their conclusions with analyses directly comparing data quality between groups (e.g., measures of head motion and split-half reliability would be particularly effective).

      Response #1: We appreciate the reviewer’s comments. In response, we have revised the paper to provide a more balanced summary of the data and clarified in the introduction which signatures the paper focuses on and why. Additionally, we have included several control analyses to account for other plausible explanations for the observed group differences. Specifically, we randomly split the infant dataset into two halves and performed split-half cross-validation. Across all comparisons, the results from the two halves were highly similar, suggesting that the effects are robust (see Supplementary Figures S3 and S4).

      Furthermore, we compared the split-half noise ceiling across the groups (infants, sighted adults, and blind adults) and found no significant differences between them (details in response #6). Finally, we repeated our analysis after excluding infants with a radiology score of 4 or 5, and the results remained consistent, indicating that our findings are not confounded by potential brain anomalies (details in response #2).

      We hope these control analyses help strengthen our conclusions.

      Reviewer #3 (Public Review):

      Summary:

      This study aimed to investigate whether the differences observed in the organization of visual brain networks between blind and sighted adults result from a reorganization of an early functional architecture due to blindness, or whether the early architecture is immature at birth and requires visual experience to develop functional connections. This question was investigated through the comparison of 3 groups of subjects with resting-state functional MRI (rs-fMRI). Based on convincing analyses, the study suggests that: 1) secondary visual cortices showed higher connectivity to prefrontal cortical regions (PFC) than to non-visual sensory areas (S1/M1 and A1) in sighted infants like in blind adults, in contrast to sighted adults; 2) the V1 connectivity pattern of sighted infants lies between that of sighted adults (stronger functional connectivity with non-visual sensory areas than with PFC) and that of blind adults (stronger functional connectivity with PFC than with non-visual sensory areas); 3) the laterality of the connectivity patterns of sighted infants resembled those of sighted adults more than those of blind adults, but sighted infants showed a less differentiated fronto-occipital connectivity pattern than adults.

      Strengths:

      The question investigated in this article is important for understanding the mechanisms of plasticity during typical and impaired development, and the approach considered, which compares different groups of subjects including, neonates/infants and blind adults, is highly original.

      -Overall, the analyses considered are solid and well-detailed. The results are quite convincing, even if the interpretation might need to be revised downwards, as factors other than visual experience may play a role in the development of functional connections with the visual system.

      Weaknesses:

      While it is informative to compare the "initial" state (close to birth) and the "final" states in blind and sighted adults to study the impact of post-natal and visual experience, this study does not analyze the chronology of this development and when the specialization of functional connections is completed. This would require investigating when experience-dependent mechanisms are important for the setting- establishment of multiple functional connections within the visual system. This could be achieved by analyzing different developmental periods in the same way, using open databases such as the Baby Connectome Project. Given the early, "condensed" maturation of the visual system after birth, we might expect sighted infants to show connectivity patterns similar to those of adults a few months after birth.

      The rationale for mixing full-term neonates and preterm infants (scanned at term-equivalent age) from the dHCP 3rd release is not understandable since preterms might have a very different development related to prematurity and to post-natal (including visual) experience. Although the authors show that the difference between the connectivity of visual and other sensory regions, and the one of visual and PFC regions, do not depend on age at birth, they do not show that each connectivity pattern is not influenced by prematurity. Simply not considering the preterm infants would have made the analysis much more robust, and the full-term group in itself is already quite large compared with the two adult groups. The current study setting and the analyses performed do not seem to be an adequate and sufficient model to ascertain that "a few weeks of vision after birth is ... insufficient to influence connectivity".

      In a similar way, excluding the few infants with detected brain anomalies (radiological scores higher or equal to 4) would strengthen the group homogeneity by focusing on infants supposed to have a rather typical neurodevelopment. The authors quote all infants as "sighted" but this is not guaranteed as no follow-up is provided.

      Response #2: We appreciate the reviewer’s suggestion. We re-analyzed the infant cohort after excluding all cases with radiological scores ≥4 (n =39 infants excluded). The revised analysis confirmed that the connectivity patterns reported in the main text remain statistically unchanged (see Supplementary Fig. S11). This demonstrates the robustness of our findings to potential confounding effects from potential brain anomalies. We have explicitly clarified this in the revised Methods section (page 14, line 391in the manuscript).

      In our dataset, newborns (average age at scan = 2.79 weeks) have very limited and immature vision. We agree with the reviewer that long-term visual outcomes cannot be guaranteed without follow-up data. The term "sighted infants" was used operationally to distinguish this cohort from congenitally blind populations.

      The post-menstrual age (PMA) at scan of the infants is also not described. The methods indicate that all were scanned at "term-equivalent age" but does this mean that there is some PMA variability between 37 and 41 weeks? Connectivity measures might be influenced by such inter-individual variability in PMA, and this could be evaluated.

      The rationale for presenting results on the connectivity of secondary visual cortices before one of the primary cortices (V1) was not clear to understand. Also, it might be relevant to better justify why only the connectivity of visual regions to non-visual sensory regions (S1-M1, A1) and prefrontal cortex (PFC) was considered in the analyses, and not the ones to other brain regions.

      In relation to the question explored, it might be informative to reposition the study in relation to what others have shown about the developmental chronology of structural and functional long-distance and short-distance connections during pregnancy and the first postnatal months.

      The authors acknowledge the methodological difficulties in defining regions of interest (ROIs) in infants in a similar way as adults. The reliability and the comparability of the ROIs positioning in infants is definitely an issue. Given that brain development is not homogeneous and synchronous across brain regions (in particular with the frontal and parietal lobes showing delayed growth), the newborn brain is not homothetic to the adult brain, which poses major problems for registration. The functional specialization of cortical regions is incomplete at birth. This raises the question of whether the findings of this study would be stable/robust if slightly larger or displaced regions had been considered, to cover with greater certainty the same areas as those considered in adults. And have other cortical parcellation approaches been considered to assess the ROIs robustness (e.g. MCRIB-S for full-terms)?

      Recommendations for the Authors:

      Reviewer #1(Recommendations for the authors):

      Further consideration should be given to the underlying changes in network architecture that may account for differences in functional correlations across groups. An increase (or decrease) in correlation between two regions could signify an increase (decrease) in connection or communication between those regions. Alternatively, it might reflect an increase in communication or connection with a third region, while the physical connections/interactions between the two original regions remain unchanged. These possibilities lead to distinct mechanistic interpretations. For example, there are substantial changes in connectivity during early visual (e.g. Burkhalter A. 1993, Cerebral Cortex) and visuo-motor development (e.g., Csibra et al. 2000 Neuroreport). It's not clear whether increases in communication within the visual network and improvements in visuo-motor behavior (e.g., Yizhar et al. 2023 Frontiers in Neuroscience) wouldn't produce a qualitatively similar pattern of results.

      Relatedly, the within-network correlation patterns between visual ROIs and frontal ROIs appear markedly different between sighted adults and infants (Supplementary Figure S1). To what extent do the differences in long-range correlations between visual and frontal regions reflect these within-network differences in functional organization?

      Response #3: The reviewer is raising some interesting questions about possible mechanisms and network changes. Resting state studies are indeed always subject to possibility that some effects are mediated by a third, unobserved region. Prior whole-cortex connectivity analyses have observed primarily changes in occipito-frontal connectivity in blindness, so there is not a clear cortical ‘third region’ candidate (Deen et al., 2015). However, some thalamic affects have also been observed and could contribute to the phenomenon (Bedny et al., 2011). Resting state changes in correlation between two areas do not imply changes in strength of long-range anatomical connectivity. Indeed, in the current case they may well reflect differential functional coupling, rather than strengthening or weakening of anatomical connections. We now discuss this in the Discussion section on page 12, line 301 as follows:

      “Despite these insights, many questions remain regarding the neurobiological mechanisms underlying experience-based functional connectivity changes and their relationship to anatomical development. Long-range anatomical connections between brain regions are already present in infants—even prenatally—though they remain immature (Huang et al., 2009; Kostović et al., 2019, 2021; Takahashi et al., 2012; Vasung, 2017). Functional connectivity changes may stem from local synaptic modifications within these stable structural pathways, consistent with findings that functional connectivity can vary independently of structural connection strength (Fotiadis et al., 2024). Moreover, functional connectivity has been shown to outperform structural connectivity in predicting individual behavioral differences, suggesting that experience-based functional changes may reflect finer-scale synaptic or network-level modulations not captured by macrostructural measures (Ooi et al., 2022). Prior studies also suggest that, even in adults, coordinated sensory-motor experience can lead to enhancement of functional connectivity across sensory-motor systems, indicating that large-scale changes in functional connectivity do not necessarily require corresponding changes in anatomical connectivity (Guerra-Carrillo et al., 2014; Li et al., 2018).”

      It is not clear how changes in correlation patterns among visual areas would produce the connectivity between visual areas and prefrontal areas reported in the current study. Activity in visual areas drives correlations both among visual areas and between visual and prefrontal areas and the same is true of prefrontal corticies.

      The findings from this study should be more closely linked to the extensive literature surrounding the debate on whether experience plays an instructive or permissive role in visual development (e.g., Crair 1999 Current Opin Neurobiol; Sur et al. 1999 J Neurobiol; Kiorpes 2016 J Neurosci; Stellwagen & Shatz 2002 Neuron; Roy et al. 2020 Nature Communications).

      Response #4: The instructive role suggests that specific experiences or patterns of neural activity directly shape and organize neural circuitry, while the permissive role indicates that such experiences or activity merely enable other factors, such as molecular signals, to influence neural circuit formation(Crair, 1999; Sur et al., 1999). To distinguish whether experience plays an instructive or permissive role, it is essential to manipulate the pattern or information content of neural activity while maintaining a constant overall activity level (Crair, 1999; Roy et al., 2020; Stellwagen & Shatz, 2002). However, both the sighted and blind adult groups have had extensive experience and neural activity in the visual cortices. For the sighted group, activity in the visual cortex is partly driven by bottom-up input from the external environment, through the retina, LGN, and ultimately to the cortex. In contrast, the blind group’s visual cortex activity is partially driven by top-down input from non-visual networks. The precise role of this activity in shaping the observed connectivity patterns remains unclear. Although our study cannot speak to this issue directly, we now link to the relevant literature on page 12,line 320 of the manuscript in the Discussion section as follows:

      “The current findings reveal both effects of vision and effects of blindness on the functional connectivity patterns of the visual cortex. A further open question is whether visual experience plays an instructive or permissive role in shaping neural connectivity patterns. An instructive role suggests that specific sensory experiences or patterns of neural activity directly shape and organize neural circuitry. In contrast, a permissive role implies that sensory experience or neural activity merely facilitates the influence of other factors—such as molecular signals—on the formation and organization of neural circuits (Crair, 1999; Sur et al., 1999). Studies with animals that manipulate the pattern or informational content of neural activity while keeping overall activity levels constant could distinguish between these hypotheses (Crair, 1999; Roy et al., 2020; Stellwagen & Shatz, 2002).”

      The assertion that a few weeks of vision after birth is insufficient to influence connectivity is provocative. Though supported by the study's results, it would benefit from integration with research in animal models showing considerable malleability of networks from early experience (e.g., Akerman et al. 2002 Neuron; Li et al. 2006 Nature Neuroscience; Stacy et al. 2023 J Neuroscience).

      Response #5: We thank the reviewer for their suggestion. The present study found that several weeks of postnatal visual experience is insufficient to significantly alter the long-term connectivity patterns of the visual cortices. While animal studies have shown that acute visual experience, or even exposure to visual stimuli through unopened eyelids, can robustly influence visual system development(Akerman et al., 2002; Li et al., 2008; Van Hooser et al., 2012). We think this discrepancy may be attributed to the substantial differences in developmental timelines between species. The human lifespan is much longer, and so is the human critical period, making it unclear how to map duration from one species to another. We briefly touched upon the time course issue in page 11 line 289 in the Discussion section as follows:

      “The present results reveal the effects of experience on development of functional connectivity between infancy and adulthood, but do not speak to the precise time course of these effects. Infants in the current sample had between 0 and 20 weeks of visual experience. Comparisons across these infants suggests that several weeks of postnatal visual experience is insufficient to produce a sighted-adult connectivity profile. The time course of development could be anywhere between a few months and years and could be tested by examining data from children of different ages.”

      Substantial differences between the groups are evident in several key aspects of the study, including the number of subjects, brain sizes, imaging parameters, and data preprocessing, all of which are likely to have an impact on the overall signal quality. To clarify how these differences might have impacted correlation differences between groups, it would be essential to include information on the noise ceilings for each correlation analysis within each group.

      Response #6: We thank the reviewer for their suggestion. We now report the split-half noise ceiling for adult and infant groups. For each participant, we first split the rs-fMRI time series into two halves, then calculated the ROI-wise rsFC pattern from the two splits. The split-half noise ceiling was estimated according to Lage-Castellanos et al (2019). The noise ceilings of the three groups (infants: 0.90 ± 0.056,blind adults: 0.88 ± 0.041, sighted adults: 0.90 ± 0.055) showed no significant difference (One-way ANOVA<sub>,</sub> F(2,552) = 2.348, p = 0.097). Therefore, we believe that overall signal quality is unlikely to impact our results. We also add the relevant context in the Method section in page 16 Line 447 as follows:

      “Substantial differences between the groups exist in this study, including the number of subjects, brain sizes, imaging parameters, and data preprocessing, all of which are likely to have an impact on the overall signal quality. To address this concern, we compared the split-half noise ceiling across the groups (infants, sighted adults, and blind adults). For each participant, we first split the rs-fMRI time series into two halves, then calculated the ROI-wise rsFC pattern from the two splits. The split-half noise ceiling was estimated according to Lage-Castellanos et al (Lage-Castellanos et al., 2019). The noise ceilings of the three groups (infants: 0.90 ± 0.056, blind adults: 0.88 ± 0.041, sighted adults: 0.90 ± 0.055) showed no significant difference (One-way ANOVA, F (2,552) = 2.348, p = 0.097). Therefore, overall signal quality is unlikely to impact our results.”

      In general, it appears that the infant correlations are stronger compared to the other groups. While this could reflect increased coherence or lack of differentiation, it is also possible that it is simply due to the presence of a non-neuronal global signal. Such a signal has the potential to substantially limit the effective range of functional correlations and comparisons with adults. To address this, it is advisable to conduct control analyses aimed at assessing and potentially removing global signals.

      Response #7: We agree with the reviewer that global signal regression (GSR) may help reduce non-neuronal artifacts, such as motion, cardiac, and respiratory signals, which are known to correlate with the global signal. However, the global signal also contains neural signals from gray matter, and removing it can introduce unwanted artifacts, especially for the current study. First, GSR can reduce the physiological accuracy of functional connectivity (FC); second, GSR may have differential effects across groups, potentially introducing additional artifacts in between-group comparisons, as noted by Murphy et al (Murphy & Fox, 2017). The CompCor method (Behzadi et al., 2007; Whitfield-Gabrieli & Nieto-Castanon, 2012) is capble to estimate the global non-neuronal artifacts like the GSR method. Meanwhile as it estimate global non-neuronal artifacts from signals within the white matter (WM) and cerebrospinal fluid (CSF) masks, but not the gray matter (GM), CompCor could introduce minimal unwanted bias to the GM signal.

      Was there a difference in correlations for preterm vs term neonates? Recent research has suggested that preterm births can have an impact on functional networks, particularly in frontal cortices. e.g., Tokariev et al. 2019, Li et al. 2021 elife; Zhang et al. 2022 Fronteirs in Neuroscience.

      Response #8: We have compared preterm and term neonates for all the main results, including the connectivity from the secondary visual cortex/V1 to non-visual sensory cortices versus prefrontal cortices, the laterality of occipito-frontal connectivity, and the specialization across different fronto-occipital networks. This information is reported in Page 6 line 169 and Supplementary Figure S7. The connectivities of full-term infants are generally higher than those of preterm infants. However, the connectivity patterns of term and preterm infants are very similar.

      The consistency between the current results and prior work (e.g., Burton et al. 2014) is notable, particularly in the observed greater correlations in prefrontal regions and weaker correlations in somato-motor regions for early blind individuals compared to sighted. However, almost all visual-frontal correlations in both groups were negative in that prior study. Some discussion on why positive correlations were found in the current study could help to clarify.

      Response #9: Many other papers have reported positive correlations similar to those found in our study (e.g., Deen et al., 2015; Kanjlia et al., 2021). In contrast, Burton's study identified predominantly negative visual-frontal correlations, we think this is likely because the global signal was regressed out during preprocessing. This methodological choice can lead to an increase in negative connections (Murphy & Fox, 2017).

      The term "secondary visual areas" used throughout the paper lacks specificity, and its usage in terms of underlying anatomical and functional areas has been inconsistent in the literature. It would be advisable to adopt a more precise characterization based on functional and/or anatomical criteria.

      Response #10: We specified in the article that Tthe occipital ROIs were defined in the current study are functional areas in people born blind identified in prior studies as regions that respond to three non-visual tasks such as language, math, or executive function, and show functional connectivity changes in blind adults in previous studies (Kanjlia et al., 2016, 2021; Lane et al., 2015). These regions respond to language, math and executivie function in the congenitally blind population (see Figure 1.) The are refered collectively as ‘secondary visual areas’ to destinguish them from V1. Anatomically, these three regions cover the majority of the lateral occipital cortex and part of the ventral occipital cortex, providing a good sample of the connectivity profile of higher-order visual areas. Thus, we are using the term "secondary visual areas" to refer to these regions. In blind individuals, although these regions respond to non-visual tasks, their exact functions are unknown.

      The inclusion of the ventral temporal cortex in the visual ROIs is currently only depicted in Supplementary Figure S7. To enhance the clarity of the areas of interest analyzed, it would be advisable to illustrate the ventral temporal areas in the main text. Were there notable differences in the frontal correlations between the lateral occipital visual areas and ventral temporal areas?

      Response #11: We thank the reviewer for pointing out this issue. We added a statement about the ventral visual cortex in describing the location of the ROI and added the ventral view of ROIs in the Figure 1. The language-responsive and math -responsive ROIs covers both the lateral and ventral visual cortex, whereas executive function (response-conflict) regions cover only the lateral visual cortex. We compared the connectivity patterns of these three regions and found no differences (see supplementary Fig S2).

      The blind group results are characterized as reflecting a reorganization in comparison to sighted adults while the results for sighted adults compared to infants are discussed more as a maturation ("adult pattern isn't default but requires experience to establish"). Both the sighted and blind adult groups showed differences from the infant group, and these differences are attributed to the role of experience. Why use "reorganization" for one result and maturation for another?

      Response #12: We agree with the reviewer that both of the adult groups should be thought of as equal in relation to the infants. In other words, the brain develops under one set of experiential conditions or another. We do not think that the adult sighted pattern reflects maturation. Rather, the sighted adult pattern reflects the combined influence of maturation and visual experience. The adult blind pattern reflects the combined influence of maturation and blindness. We use the term ‘reorganization’ to label differences in the blind adults relative to sighted infants. We do so for the purpose of clarity and to remain consistent with terminology in prior liaterature. However, we agree with the reviewer that the blind group does not reflect ‘reorganization’ intrinsically any more than the sighted adult group.

      The statement that "visual experience is required to set up long-range functional connectivity" is unclear, especially since the infant and blind groups showed stronger long-range functional correlations with PFC.

      Response #13: We revised this sentence to specifically as “visual experience establishes elements of the sighted-adult long-range connectivity” in tha Abstract line 17.

      The statement that the visual ROIS roughly correspond to "the anatomical location of areas such as V5/MT+, LO, V3a, and V4v" appears imprecise. From Supplementary Figure S7, these areas cover anterior portions of ventral temporal cortex (do these span the anatomical location of putative category-selective areas?) and into the intraparietal sulcus.

      Response #14: Thanks to the reviewer for the clarification. The ventral ROIs cover the middle and part of the anterior portion of the ventral temporal lobe, including the putative category-selective areas. Additionally, the dorsal ROIs extend beyond the occipital lobe to the intraparietal sulcus and superior parietal lobule. We have added a more detailed description of the anatomical location of the ROI in the Methods section Page 17 line 489 as follows:

      “Each functional ROI spans multiple anatomical regions and together the secondary visual ROIs tile large portions of lateral occipital, occipito-temporal, dorsal occipital and occipito-parietal cortices. In sighted people, the secondary visual occipital ROIs include the anatomical locations of functional regions such as motion area V5/MT+, the lateral occipital complex (LO), category specific ventral occipitotemporal cortices and dorsally, V3a and V4v.  The occipital ROI also covers the middle of the ventral temporal lobe. Dorsally, it extended to the intraparietal sulcus and superior parietal lobule.”

      The motivation for assessing correlations with motor and frontal regions was briefly discussed in the introduction. It would be helpful to reiterate this motivation when first introducing the analyses in the results.

      Response #15: Thank you for the thoughtful suggestion. Upon reflection, we chose to substantially revise the Introduction to more clearly and comprehensively explain the rationale for examining the couplings with motor and frontal regions, rather than reiterating it in the Results section. We believe this revised framing provides a stronger foundation for the analyses that follow, while avoiding redundancy across sections. We hope this addresses the reviewer’s concern.

      Reviewer #2 (Recommendations for the authors):

      Congratulations on a well-written paper and an interesting set of results.

      Reviewer #3 (Recommendations for the authors):

      Abstract:

      Mentioning "sighted infants" does not seem adequate.

      Response #16: In our dataset, newborns (average age at scan = 2.79 weeks) have very limited and immature vision. We agree with the reviewer that long-term visual outcomes cannot be guaranteed without follow-up data. The term "sighted infants" was used operationally to distinguish this cohort from congenitally blind populations.

      In sentences after "Specifically...", it was not clear whether the authors referred to V1 connectivity.

      Response #17: We thank the reviewer for this comment. In the revised abstract, we have removed the original "Specifically..." phrasing and clarified the results.

      Introduction

      Talking about the "instructive effects" of vision might be confusing or misleading. Visual experiences like exposure to oral language are part of the normal/spontaneous environment that allows the infant behavioral acquisitions (contrarily with learnings that occur later during development with instruction like for reading).

      Response #18: We appreciate the reviewer’s concern and would like to clarify that the term “instructive effect” is used here derived from neurodevelopmental studies (Crair, 1999; Sur et al., 1999). In this context, “instructive” refers to activity-dependent mechanisms where patterns of neural activity actively guide the organization of synaptic connectivity, emphasizing that spontaneous or sensory-driven activity (e.g., retinal waves, visual experience) can directly shape circuit refinement, as seen in ocular dominance column formation. In the context of our study, we emphasize that vision plays an instructive role in setting up the balance of connectivity between occipital cortex and non-visual networks.

      For references on the development of connectivity, I would advise citing MRI studies but also studies based on histological approaches (see for example the detailed review by Kostovic et al, NeuroImage 2019).

      Response #19: We thank the reviewer for this suggestion. We have incorporated a discussion on the long-range anatomical connections that emerge as early as infancy, referencing studies that employed diffusion MR imaging and histological methods, as detailed below.

      “Many long-range anatomical connections between brain regions are already established in infants, even before birth, although they are not yet mature (Huang et al., 2009; Kostović et al., 2019, 2021; Takahashi et al., 2012; Vasung, 2017).” (Page 12, line 303 in the manuscript)

      Results

      P7 l170: It might be helpful to be precise that this is "compared with inter-hemispheric connectivity".

      Response #20: We thank the reviewer for this suggestion. To align with our established terminology, we have revised the statement to explicitly contrast within-hemisphere connectivity with between-hemisphere connectivity. The modified text now reads (page 7, line 183 in the manuscript):

      “Compared to sighted adults, blind adults exhibited a stronger dominance of within-hemisphere connectivity over between-hemisphere connectivity. That is, in people born blind, left visual networks are more strongly connected to left PFC, whereas right visual networks are more strongly connected to right PFC.

      L176-181: It was not clear to me what was the difference between "across" and "between hemisphere connectivity". Would it be informative to test the difference between blind and sighted adults?

      Response #21: We clarify that there is no distinction between the terms “across” and “between hemisphere connectivity”—they refer to the same concept. To ensure consistency, we have revised the text to exclusively use “between hemisphere connectivity” throughout the manuscript. Regarding the comparison between blind and sighted adults, we conducted statistical comparisons between these groups in our analysis, and the results have been incorporated into the revised version (Page 7, line 187 in the manuscript).

      Adding statistics on Figure 3, but also on Figures 1 and 2 might help the reading.

      Response #22: We have added the statistics in Figure 1-4.

      Adding the third comparison in Figure 4 would be possible in my view.

      Response #23: We explored integrating the response-conflict region into Figure 4, but this would require a 3x3 bar chart with pairwise statistical significance markers, which introduced excessive visual complexity that hindered readers’ ability to grasp our intended message. To ensure clarity, we retained the original Figure 4 while providing the complete three-region analysis (including all statistical comparisons) in Supplementary Figure S8 to ensure completeness.

      Methods

      The authors might have to specify ages at birth, and ages at scan (median + range?).

      Response #24: We have added that information in the Methods section as follows:

      “The average age from birth at scan = 2.79 weeks (SD = 3.77, median = 1.57, range = 0 – 19.71); average gestational age at scan = 41.23 weeks (SD = 1.77, median = 41.29, range = 37 – 45.14); average gestational age at birth = 38.43 weeks (SD = 3.73, median = 39.71, range = 23 – 42.71).” (Page 14, line 379 in the manuscript)

      It might be relevant to comment on the range of available fMRI volumes, and the fact that connectivity measures might then be less robust in infants.

      Response #25: We report the range of fMRI volumes in the Methods section (Page 16, Line 449). Adult participants (blind and sighted) underwent 1–4 scanning sessions, each containing 240 volumes (mean scan duration: 710.4 seconds per participant). For infants, all subjects had 2300 fMRI volumes, and we retained a subset of 1600 continuous volumes per subject with the minimum number of motion outliers. While infant connectivity measures may inherently exhibit lower robustness due to developmental and motion-related factors, our infant cohort’s large sample size (n=475) and stringent motion censoring criteria enhance the reliability of group-level inferences. We have integrated this clarification into the Methods section (Page 16, Line 444) as follows:

      "While infant connectivity estimates may be less robust at the individual level compared to adults due to shorter scan durations and higher motion, our cohort’s large sample size (n=475) and rigorous motion censoring mitigate these limitations for group-level analyses. "

      The mention of dHCP 2nd release should be removed from the paragraph on data availability.

      Response #26: We have removed it.

    1. eLife Assessment

      This important study highlights the novel role of RSPO mimetic SZN-043 in the activation of hepatic WNT signaling and promoting hepatocyte regeneration. The authors provide convincing evidence of SZN-043 increasing hepatocytes proliferation in various mouse models, including a humanized mouse liver model, ALD model and CCL4 fibrosis model. This study will be of interest to researchers in liver regeneration and repair mechanisms.

    2. Reviewer #1 (Public review):

      Summary:

      The work by Fisher et al describes the role of novel RSPO mimetics in the activation of WNT signaling and hepatocyte regeneration. However, the results of the experiments and weaknesses of the methods used do not support the conclusions of the authors that the new therapy can promote liver regeneration in alcohol-induced liver cirrhosis.

      Strengths:

      Similarly to its precursor, aASGR1-RSPO2-RA-IgG, SZN-043 can upregulate Wnt target genes and promote hepatocyte proliferation in the liver.

      Comments on revisions:

      The authors responded to all my comments and concerns.

    3. Reviewer #2 (Public review):

      Summary:

      The study by Fisher et al investigates therpauetic role for SZN-043, a hepatocyte-targeted R-spondin mimetic, for its potential role in restoring Wnt signaling and promoting liver-regeneration in alcohol-associated liver disease (ALD). Using multiple preclinical models, the compound was shown to promote hepatocyte proliferation and reduce fibrosis. This study highlights the efficacy in promoting liver regeneration while maintaining controlled signaling. Limitations include a need for further exploration of off-target effects and fibrosis mechanisms. The findings support SZN-043 as a promising candidate for ALD therapy, warranting further clinical evaluation. This is a well deigned study with thorough investigation using multiple disease models.

      Strengths:

      (1) Well-written manuscript with clear design, robust methods, and discussion.

      (2) Using multiple models strengthens the findings and expands beyond ALD.

      (3) Identification of SZN-043 as a novel potent drug for liver regeneration.

    4. Author response:

      Response to Comments from reviewer #1

      Many thanks for appreciating that SZN-043 can promote hepatocyte proliferation via the Wnt-signaling pathway.

      (1) The reviewer is concerned with using only CYP1A2 expression as an endpoint to make a conclusion about the effect of SZN-043 on Wnt activity in human ALD samples. The reviewer raises a good point as the more commonly used Wnt target gene, AXIN2, is not consistantly changed in both cohorts. We were at first also surprised by this finding. However, upon closer analysis we found that the expression of hepatocyte-specific target genes such as CYP1A2 (Figure 2), CYP2E1, OAT, LGR5, GLUL (Table 1) and ZNRF3 were mostly expressed in hepatocytes and ductal cells were all down-regulated in ALD samples. Others Wnt target genes expressed in epithelial and mesenchymal liver cell populations, such as AXIN2, CCND1 and NOTUM are indeed not consistently and significantly changed. Given that SZN-043 is not active on mesenchymal cells, this discrepancy could be best explained by the large increase in mesenchymal cells in ALD tissue samples, thereby confounding the results. We have now clarified this in the discussion. Another method to assess Wnt activity is to measure b-catenin phosphorylation and nuclear transfer. In our hands, this method was found to be better suited for tissue culture than histological sections from in vivo studies. We have also amended the manuscript title to refer to expression of Wnt target genes, rather than Wnt activity.

      (2) We have now added a supplemental figure to show the lack of Ki-67+ human hepatocytes in the cirrhotic tissue samples to confirm the absence of hepatocyte proliferation (Figure S1).

      (3) The differences in amino acid sequence between SZN-043 and its precursor, αASGR1-RSPO2-RAIgG, can be found in the material and method section. These changes in amino acid sequences improved the biophysical properties of the final clinical candidate, such as oxidation and nonspecific binding. The biochemical analysis of those differences exceeds the scope of the current manuscript. We present here the pharmacokinetic properties of SZN-043 only, as this was the only molecule advanced to clinical trial and used in the studies presented here.

      (4) The reviewer suggests to assess the effect of SZN-043 in Ctnnb1-KO mice to confirm that SZN043 acts via a canonical Wnt pathway. Indeed, there were several reports on the ability of Rspondin to act on other pathways besides the Wnt signaling pathway (for recent review, Niehrs et al, 2024, Bioessays). However, while an interesting suggestion, this line of investigation belongs to MOA studies and exceeds the scope of the current manuscript. An additional manuscript presenting MOA studies for SZN-043 was recently submitted elsewhere. Still, we have added this possibility in the discussion section.

      (5) The reviewer is asking how SZN-043 is affecting liver functions in general. Indeed, we have observed a consistent reduction in the international normalized ratio of prothrombin time using the thioacetamide (TAA)-induced fibrosis model and previously published those findings (Zhang, 2020). In our hands, the TAA is the only liver injury model that significantly increases INR. This increase is modest compared to that observed in clinical patients. Therefore, we do not report INR findings for other models. We have not seen any effects of SZN-043 on hepatocyte differentiation markers such as HNF4A (data not shown) and the hepatocyte specific ASGR1/2 as shown in Figure 5. Rather we focused on proliferation as the main potentially beneficial endpoint, to restore the parenchymal mass in injured livers. Finally, consistent with what was reported in the literature, we have observed a transient and reciprocal effect on albumin and alfa-fetoprotein expression during the proliferative phase of liver regeneration. These results are detailed in an additional manuscript presenting MOA studies for SZN-043, which was recently submitted elsewhere.

      (6) We have used females only in the ethanol-induced injury models because there are numerous reports in the literature stating that males are not as susceptible to those injuries.  

      (7) The reviewer questions the relevance of the ethanol-induced injury model used to evaluate SZN043 efficacy. Indeed, none of the disease model developed to date reproduce the severity and complexity of alcohol-associated liver diseases, although some, such as the ethanol supplemented Lieber DeCarli diet, are more commonly used than others – which is the reason why this model was selected. 

      (8) The reviewer questions the relevance of the fibrosis model used to evaluate SZN-043 efficacy. Indeed, none of the fibrosis models developed to date reproduce the severity and complexity of cirrhosis in human livers. While combining ethanol with CCl4 would lead to more severe fibrotic livers, CCl4 itself is not involved in ALD in humans. Both models are likely to result in similar pericentral fibrosis with central-to-central bridging. In this study, we were mostly interested in addressing the effects of SZN-043 in a tissue affected by fibrotic scars.  

      (9) The sex of CCl4-treated mice is male. We added this information in the methods section.

      (10) A summary of histology and fibrosis assessment data for alcohol-fed mice was added in supplemental Table S3. In our hands, the use of aging mice did not induce the presence of fibrosis, in contrast to published results.  

      (11) The rationale for using 13.5-month-old mice in the alcohol studies and scid mice in the CCl4 studies has been clarified in the results and discussion sections. 

      a. Briefly, aging mice were reported to be more susceptible to ethanol-induced injury than young mice and to include induction of fibrosis. However, we were unable to reproduce the presence of fibrosis reported in the literature.  

      b. Scid mice were used in the CCl4 studies to test whether a stronger response could be observed in the absence of a potential anti-drug antibodies response. While a modest reduction in fibrosis was observed in both B6 and scid mice following the SZN-043 treatment, the effect size did not seem affected by the mouse strain. 

      Response to Comments from reviewer #2

      Many thanks for appreciating that the use of multiple disease models to identify SZN-043 as a potential novel drug for liver regeneration.

      (1) The importance of restoring liver regeneration capacity to reduce the need for liver transplantation had been emphasized in the introduction.

      (2) There is continuous damage to the mouse hepatocytes in the FRG mice, due to the Fah mutation. They undergo repair mechanisms favoring the proliferation of human hepatocytes during the production period. Injury models that affect the human hepatocytes population have been developed in these mice. However, the primary goal of this study was to confirm that SZN043 was efficacious in inducing human hepatocytes proliferation, a feature difficult to reproduce in primary hepatocyte cultures. Given the artefactual nature of the chimeric liver in FRG mice and the high cost of these mice, further studies were not judged to be necessary.

      (3) Corrected

      (4) A figure including DAPI staining has now been included in supplemental Figure S2.

      (5) Clarification that the 8 weeks alcohol feeding used in our study design is a modification of the NIAAA model. While some ASGR1 has been reported on the surface of macrophages, additional data from MOA studies strongly suggest that the effect of SZN-043 is mediated via a hepatocytespecific mechanism (submitted manuscript).

      (6) The reviewer inquired about the potential role of macrophages in promoting an antiinflammatory state in response to SZN-043. While a direct effect is unlikely, a potential effect of macrophages in response to SZN-043 is plausible. Wnt activation is known to induce the secretion of hepatokines, such as LECT2, which in turn can influence macrophage activity. This possibility is discussed in the discussion section.

      (7) The potential off-target effects of SZN-043 such as stellate cell activation is discussed in the discussion section.

      (8) The discussion of the limitations of current models has been included in the discussion section of the manuscript.

      (9) We have now included a discussion of prior RSPO-based therapies, such as OMP-131R10. We explain why the hepatocyte-targeting of RSPO activity minimizes undesired effects.

    1. eLife Assessment

      This study presents a valuable finding that the blood-brain barrier (BBB) may be modulated through specific modes of electroacupuncture stimulation. The data were collected and analyzed using a solid and validated methodology, and can be used as a starting point for functional studies of the BBB for drug delivery across healthy and diseased states. The work will be of broad interest to scientists working in the field of drug delivery and drug development.

    2. Joint Public Review:

      This study employs single-cell RNA sequencing to investigate how electroacupuncture (EA) stimulation alters the transcriptional profiles of central nervous system cell types following blood-brain barrier (BBB) opening. The authors seek to characterize changes in gene expression and pathway activities across diverse neural cells in response to electroacupuncture (EA) stimulation using high-resolution transcriptomics. This approach has the potential to elucidate the cellular mechanisms underlying EA stimulation and their implications for therapeutic intervention. The work engages with a timely and biologically significant question regarding noninvasive stimulation methods to manipulate BBB permeability. However, no in vivo/in vitro functional assays are provided to validate the changes in BBB permeability or cytokine release in the tested models. The experimental rationale remains inadequately explained, and key details regarding the magnitude, duration, and spatial distribution of BBB opening in this system are still lacking.

    3. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      The work from this paper successfully mapped transcriptional landscape and identified EA-responsive cell types (endothelial, microglia). Data suggest EA modulates BBB via immune pathways and cell communication. However, claims of "BBB opening" are not directly proven (no permeability data).

      (1) No in vivo/in vitro assays confirm BBB permeability changes (e.g., Evans blue leakage, TEER).

      (2) Only male rats were used, ignoring sex-specific BBB differences.

      (3) Pericytes and neurons, critical for the BBB, were not captured, likely due to dissociation artifacts.

      (4) Protein-level validation (Western blot, IHC) absent for key genes (e.g., LY6E, HSP90).

      (5) Fixed stimulation protocol (2/100 Hz, 40 min); no dose-response or temporal analysis.

      We sincerely apologize for the oversight regarding the description of changes in blood-brain barrier permeability. In fact, our team conducted a series of preliminary studies that verified this aspect, and we hace provided a more detailed introduction in the introduction section, in lines 60-71 of the manuscript.

      We are very grateful to the reviewers for pointing out the important and meaningful issue of "gender-specific BBB differences." We will make this a focal point in our future research.

      As for pericytes and neurons, we acknowledge their importance in the function of the blood-brain barrier. We acknowledge the importance of pericytes and neurons in the blood-brain barrier. However, neurons are absent because our sample processing method involves dissociation. During the dissociation procedure, neuronal axons, which are relatively long, are filtered out during the frequent cell suspension steps and cannot enter the downstream microfluidic system for analysis, so they are not present in our data. Since this experiment is primarily focused on non-neuronal cells, we did not choose to use nucleus extraction for sample processing. As for pericytes, we believe they are not captured because their proportion in our samples is extremely low, which is why they are not present in the data. Further research may require single-nucleus transcriptomics or the separate isolation of these two cell types for study. Of course, in our current mechanistic studies, we are also fully considering the important roles these two cell types play in BBB function.

      In addition, to validate the results at the protein level, we have recently conducted some experiments. However, as several proteins are currently at a critical stage of further experimental validation, it is not appropriate to present them in the manuscript at this time. Instead, we have uploaded the relevant data as an appendix for your review. This includes a figure of several protein markers we examined, as well as a table of the antibodies used.

      This section is also further elaborated in the introduction and its references.

      Reviewer #2 (Public review):

      Summary:

      This study uses single-cell RNA sequencing to explore how electroacupuncture (EA) stimulation alters the brain's cellular and molecular landscape after blood-brain barrier (BBB) opening. The authors aim to identify changes in gene expression and signaling pathways across brain cell types in response to EA stimulation using single-cell RNA sequencing. This direction holds promise for understanding the consequences of noninvasive methods of BBB opening for therapeutic drug delivery across the BBB.

      (1) The work falls short in its current form. The experimental design lacks a clear justification, and readers are not provided with sufficient background information on the extent, timing, or regional specificity of BBB opening in this EA model. These details, established in prior work, are critical to understanding the rationale behind the current transcriptomic analyses.

      (2) Further, the results are often presented with minimal context or interpretation. There is no model of intercellular or molecular coordination to explain the BBB-opening process, despite the stated goal of identifying such mechanisms. The statement that EA induces a "unique frontal cortex-specific transcriptome signature" is not supported, as no data from other brain regions are presented. Biological interpretation is at times unclear or inaccurate - for instance, attributing astrocyte migration effects to endothelial cell clusters or suggesting microglial tight junction changes without connecting them meaningfully to endothelial function.

      (3) The study does include analyses of receptor-ligand signaling and cell-cell communication, which could be among its most biologically rich outputs. However, these are relegated to supplementary material and not shown in the leading figures. This choice limits the utility of the manuscript as a hypothesis-generating resource.

      (4) Overall, while the dataset may be of interest to BBB researchers and those developing technologies for drug delivery across the BBB, the manuscript in its current form does not yet fulfill its interpretive goals. A more integrated and biologically grounded analysis would be beneficial.

      This section is also further elaborated in the introduction and its references.

      Our current study is actually based on previous findings that electroacupuncture can open the BBB, with a more pronounced effect observed in the frontal lobe (this aspect should be further described in the research background). Building on this foundation, our aim is to delineate the potential biological mechanisms involved. Therefore, we selected frontal lobe tissue as our primary choice for sequencing and have not yet investigated differences across other brain regions, although this may become a focus of future research. Additionally, we recognize that the mechanism underlying BBB opening is complex, and at present, we cannot determine whether it is driven by a single direct factor or by coordinated actions between cells or molecules. As such, our results are presented only briefly for now, and we will carefully consider whether to supplement our findings by incorporating insights from other studies.

      Considering the overall data layout and the length of the article, we ultimately decided not to make any changes to the presentation of the article's data. The images included in the supplementary materials are also thoroughly described and referenced in the manuscript, allowing readers to selectively view any data they are interested in.

      Indeed, our current dataset and analysis tend to present objective data results. We are also conducting a series of validations that may be related to the biology of the blood-brain barrier, and we look forward to sharing and discussing any future research findings with you and everyone.

      Reviewer #1 (Recommendations for the authors):

      (1) Figures 3-7: Label treatment groups (CON vs. EA) consistently in legends.

      (2) Methods: Specify rat strain (Sprague-Dawley) in the abstract.

      (3) Clarify Limitations: Explicitly state that BBB opening is inferred, not proven.

      This section has been revised at lines 743-733, 748, 949, 754-755, and 759-760 of the manuscript.

      Revised at line 31 of the manuscript.

      Thank you for your feedback. The background information on the open evidence of BBB has been added to the introduction.

      Reviewer #2 (Recommendations for the authors):

      (1) Abstract and Introduction

      • Include specific key findings in the abstract to improve clarity and reader engagement.

      • Expand the introduction to situate this work in the context of other BBB-opening methods (e.g., ultrasound) and the known consequences of BBB disruption.

      • Clarify the rationale for choosing electroacupuncture.

      • Include information (perhaps summarized from previous studies) about the extent, timeline, and functional assessment of BBB opening in this model to help justify the single-cell RNA-seq design.

      (2) Experimental Rationale and Context

      • Reiterate experimental design and rationale in each results section, rather than relying exclusively on the Methods section.

      • Specify the time point of tissue collection relative to the EA intervention.

      • Describe the anatomical sites of acupuncture stimulation and their physiological relevance.

      (3) Data Presentation

      • Replace the human brain cartoon in Figure 1 with an anatomically appropriate rat brain schematic.

      • Reevaluate which data are presented in the main versus supplementary figures. Highlight biologically meaningful results, such as cell-cell communication and ligand-receptor interactions, in the main figures rather than supplementary data.

      (4) Interpretation and Modeling

      • More carefully link transcriptional changes (e.g., Wnt signaling in microglia) to biologically plausible mechanisms of BBB regulation-e.g., microglial signaling to endothelial cells.

      • Clarify whether the presence of granulocytes and T cells might result from a lack of perfusion prior to brain dissection.

      • Consider proposing a model (even speculative) of how EA leads to BBB opening based on observed transcriptional changes.

      First, for the sake of brevity in the abstract, we did not present specific results in this section. Second, since BBB opening via EA is a unique strategy, our previous studies have examined the opening time window and the recovery of the BBB after EA intervention (as mentioned in the introduction). We believe its characteristics differ from those of ultrasound-induced BBB opening and BBB disruption, so we did not conduct comparative discussions, but objectively presented our research findings. In further functional validation experiments, we may consider integrating other opening strategies in our studies. Additionally, the choice of electroacupuncture was based on our previous series of studies, which have already been outlined in the research background. Finally, we did indeed determine the experimental design of this study based on prior research, as described in the background section of the introduction.

      We decided not to make changes to this section in the manuscript after careful consideration. The setup of electroacupuncture intervention and controls has been thoroughly discussed in our previous studies (as referenced in the introduction), so we have not repeated it in this manuscript. Overall, building on all our previous findings, this study focuses primarily on the potential mechanisms of EA intervention. The anatomical sites of acupuncture stimulation and their physiological relevance are another key area of our research, and we are currently conducting a series of related studies. We look forward to sharing these findings with you in the future.

      We have already changed the human brain diagram in Figure 1 to a rat brain diagram, and have replaced Figure 1 in the files with the revised version. However, considering the overall data layout and the length of the article, we ultimately decided not to make changes to the data presentation in the manuscript. The images in the supplementary materials are also thoroughly described and referenced in the manuscript, allowing readers to selectively view the data they are interested in.

      This section has provided us with excellent suggestions for further exploration, although no changes have been made to the manuscript at this time. In the future, we may conduct more detailed transcriptomic studies focusing on sex differences and different brain regions, which will allow for a more comprehensive analysis of the biological mechanisms involved in BBB regulation.

    1. eLife Assessment

      This valuable study explores the role of the chromatin regulator ATAD2 in mouse spermatogenesis. It convincingly demonstrates that ATAD2 is essential for proper chromatin remodeling in haploid spermatids, influencing gene accessibility, H3.3-mediated transcription, and histone eviction. Using Atad2 knockout (KO) mice, the authors link ATAD2 to the DNA-replication-independent incorporation of sperm-specific proteins like protamines and histone H3.3. Although the findings highlight chromatin abnormalities and impaired in vitro fertilization in KO mice, natural fertility remains unaffected, suggesting possible in vivo compensatory mechanisms. However, in its current form, the study lacks mechanistic insight and provides only partial evidence for ATAD2's molecular role, limiting its functional conclusions.

    2. Reviewer #1 (Public review):

      Summary:

      The authors analyzed the expression of ATAD2 protein in post-meiotic stages and characterized the localization of various testis-specific proteins in the testis of the Atad2 knockout (KO). By cytological analysis as well as the ATAC sequencing, the study showed that increased levels of HIRA histone chaperone, accumulation of histone H3.3 on post-meiotic nuclei, defective chromatin accessibility and also delayed deposition of protamines. Sperm from the Atad2 KO mice reduces the success of in vitro fertilization. The work was performed well, and most of the results are convincing. However, this manuscript does not suggest a molecular mechanism for how ATAD2 promotes the formation of testis-specific chromatin.

      Strengths:

      The paper describes the role of ATAD2 AAA+ ATPase in the proper localization of sperm-specific chromatin proteins such as protamine, suggesting the importance of the DNA replication-independent histone exchanges with the HIRA-histone H3.3 axis.

      Weaknesses:

      (1) Some results lack quantification.

      (2) The work was performed well, and most of the results are convincing. However, this manuscript does not suggest a molecular mechanism for how ATAD2 promotes the formation of testis-specific chromatin.

    3. Reviewer #2 (Public review):

      Summary:

      This manuscript by Liakopoulou et al. presents a comprehensive investigation into the role of ATAD2 in regulating chromatin dynamics during spermatogenesis. The authors elegantly demonstrate that ATAD2, via its control of histone chaperone HIRA turnover, ensures proper H3.3 localization, chromatin accessibility, and histone-to-protamine transition in post-meiotic male germ cells. Using a new well-characterized Atad2 KO mouse model, they show that ATAD2 deficiency disrupts HIRA dynamics, leading to aberrant H3.3 deposition, impaired transcriptional regulation, delayed protamine assembly, and defective sperm genome compaction. The study bridges ATAD2's conserved functions in embryonic stem cells and cancer to spermatogenesis, revealing a novel layer of epigenetic regulation critical for male fertility.

      Strengths:

      The MS first demonstration of ATAD2's essential role in spermatogenesis, linking its expression in haploid spermatids to histone chaperone regulation by connecting ATAD2-dependent chromatin dynamics to gene accessibility (ATAC-seq), H3.3-mediated transcription, and histone eviction. Interestingly and surprisingly, sperm chromatin defects in Atad2 KO mice impair only in vitro fertilization but not natural fertility, suggesting unknown compensatory mechanisms in vivo.

      Weaknesses: The MS is robust and there are not big weaknesses

    4. Reviewer #3 (Public review):

      Summary:

      The authors generated knockout mice for Atad2, a conserved bromodomain-containing factor expressed during spermatogenesis. In Atad2 KO mice, HIRA, a chaperone for histone variant H3.3, was upregulated in round spermatids, accompanied by an apparent increase in H3.3 levels. Furthermore, the sequential incorporation and removal of TH2B and PRM1 during spermiogenesis were partially disrupted in the absence of ATAD2, possibly due to delayed histone removal. Despite these abnormalities, Atad2 KO male mice were able to produce offspring normally.

      Strengths:

      The manuscript addresses the biological role of ATAD2 in spermatogenesis using a knockout mouse model, providing a valuable in vivo framework to study chromatin regulation during male germ cell development. The observed redistribution of H3.3 in round spermatids is clearly presented and suggests a previously unappreciated role of ATAD2 in histone variant dynamics. The authors also document defects in the sequential incorporation and removal of TH2B and PRM1 during spermiogenesis, providing phenotypic insight into chromatin transitions in late spermatogenic stages. Overall, the study presents a solid foundation for further mechanistic investigation into ATAD2 function.

      Weaknesses:

      While the manuscript reports the gross phenotype of Atad2 KO mice, the findings remain largely superficial and do not convincingly demonstrate how ATAD2 deficiency affects chromatin dynamics. Moreover, the phenotype appears too mild to elucidate the functional significance of ATAD2 during spermatogenesis.

      (1) Figures 4-5: The analyses of differential gene expression and chromatin organization should be more comprehensive. First, Venn diagrams comparing the sets of significantly differentially expressed genes between this study and previous work should be shown for each developmental stage. Second, given the established role of H3.3 in MSCI, the effect of Atad2 knockout on sex chromosome gene expression should be analyzed. Third, integrated analysis of RNA-seq and ATAC-seq data is needed to evaluate how ATAD2 loss affects gene expression. Finally, H3.3 ChIP-seq should be performed to directly assess changes in H3.3 distribution following Atad2 knockout.

      (2) Figure 3: The altered distribution of H3.3 is compelling. This raises the possibility that histone marks associated with H3.3 may also be affected, although this has not been investigated. It would therefore be important to examine the distribution of histone modifications typically associated with H3.3. If any alterations are observed, ChIP-seq analyses should be performed to explore them further.

      (3) Figure 7: While the authors suggest that pre-PRM2 processing is impaired in Atad2 KO, no direct evidence is provided. It is essential to conduct acid-urea polyacrylamide gel electrophoresis (AU-PAGE) followed by western blotting, or a comparable experiment, to substantiate this claim.

      (4) HIRA and ATAD2: Does the upregulation of HIRA fully account for the phenotypes observed in Atad2 KO? If so, would overexpression of HIRA alone be sufficient to phenocopy the Atad2 KO phenotype? Alternatively, would partial reduction of HIRA (e.g., through heterozygous deletion) in the Atad2 KO background be sufficient to rescue the phenotype?

      (5) The mechanism by which ATAD2 regulates HIRA turnover on chromatin and the deposition of H3.3 remains unclear from the manuscript and warrants further investigation.

    5. Author response:

      Reviewer #1 (Public review): 

      Summary: 

      The authors analyzed the expression of ATAD2 protein in post-meiotic stages and characterized the localization of various testis-specific proteins in the testis of the Atad2 knockout (KO). By cytological analysis as well as the ATAC sequencing, the study showed that increased levels of HIRA histone chaperone, accumulation of histone H3.3 on post-meiotic nuclei, defective chromatin accessibility and also delayed deposition of protamines. Sperm from the Atad2 KO mice reduces the success of in vitro fertilization. The work was performed well, and most of the results are convincing. However, this manuscript does not suggest a molecular mechanism for how ATAD2 promotes the formation of testis-specific chromatin. 

      We would like to take this opportunity to highlight that the present study builds on our previously published work, which examined the function of ATAD2 in both yeast S. pombe and mouse embryonic stem (ES) cells (Wang et al., 2021). In yeast, using genetic analysis we showed that inactivation of HIRA rescues defective cell growth caused by the absence of ATAD2. This rescue could also be achieved by reducing histone dosage, indicating that the toxicity depends on histone over-dosage, and that HIRA toxicity, in the absence of ATAD2, is linked to this imbalance.

      Furthermore, HIRA ChIP-seq performed in mouse ES cells revealed increased nucleosome-bound HIRA, particularly around transcription start sites (TSS) of active genes, along with the appearance of HIRA-bound nucleosomes within normally nucleosome-free regions (NFRs). These findings pointed to ATAD2 as a major factor responsible for unloading HIRA from nucleosomes. This unloading function may also apply to other histone chaperones, such as FACT (see Wang et al., 2021, Fig. 4C).

      In the present study, our investigations converge on the same ATAD2 function in the context of a physiologically integrated mammalian system—spermatogenesis. Indeed, in the absence of ATAD2, we observed H3.3 accumulation and enhanced H3.3-mediated gene expression. Consistent with this functional model of ATAD2— unloading chaperones from histone- and non-histone-bound chromatin—we also observed defects in histone-toprotamine replacement.

      Together, the results presented here and in Wang et al. (2021) reveal an underappreciated regulatory layer of histone chaperone activity. Previously, histone chaperones were primarily understood as factors that load histones. Our findings demonstrate that we must also consider a previously unrecognized regulatory mechanism that controls assembled histone-bound chaperones. This key point was clearly captured and emphasized by Reviewer #2 (see below).

      Strengths: 

      The paper describes the role of ATAD2 AAA+ ATPase in the proper localization of sperm-specific chromatin proteins such as protamine, suggesting the importance of the DNA replication-independent histone exchanges with the HIRA-histone H3.3 axis. 

      Weaknesses: 

      (1) Some results lack quantification. 

      We will consider all the data and add appropriate quantifications where necessary.

      (2) The work was performed well, and most of the results are convincing. However, this manuscript does not suggest a molecular mechanism for how ATAD2 promotes the formation of testis-specific chromatin. 

      Please see our comments above.

      Reviewer #2 (Public review): 

      Summary: 

      This manuscript by Liakopoulou et al. presents a comprehensive investigation into the role of ATAD2 in regulating chromatin dynamics during spermatogenesis. The authors elegantly demonstrate that ATAD2, via its control of histone chaperone HIRA turnover, ensures proper H3.3 localization, chromatin accessibility, and histone-toprotamine transition in post-meiotic male germ cells. Using a new well-characterized Atad2 KO mouse model, they show that ATAD2 deficiency disrupts HIRA dynamics, leading to aberrant H3.3 deposition, impaired transcriptional regulation, delayed protamine assembly, and defective sperm genome compaction. The study bridges ATAD2's conserved functions in embryonic stem cells and cancer to spermatogenesis, revealing a novel layer of epigenetic regulation critical for male fertility. 

      Strengths: 

      The MS first demonstration of ATAD2's essential role in spermatogenesis, linking its expression in haploid spermatids to histone chaperone regulation by connecting ATAD2-dependent chromatin dynamics to gene accessibility (ATAC-seq), H3.3-mediated transcription, and histone eviction. Interestingly and surprisingly, sperm chromatin defects in Atad2 KO mice impair only in vitro fertilization but not natural fertility, suggesting unknown compensatory mechanisms in vivo. 

      Weaknesses:

      The MS is robust and there are not big weaknesses 

      Reviewer #3 (Public review): 

      Summary: 

      The authors generated knockout mice for Atad2, a conserved bromodomain-containing factor expressed during spermatogenesis. In Atad2 KO mice, HIRA, a chaperone for histone variant H3.3, was upregulated in round spermatids, accompanied by an apparent increase in H3.3 levels. Furthermore, the sequential incorporation and removal of TH2B and PRM1 during spermiogenesis were partially disrupted in the absence of ATAD2, possibly due to delayed histone removal. Despite these abnormalities, Atad2 KO male mice were able to produce offspring normally. 

      Strengths: 

      The manuscript addresses the biological role of ATAD2 in spermatogenesis using a knockout mouse model, providing a valuable in vivo framework to study chromatin regulation during male germ cell development. The observed redistribution of H3.3 in round spermatids is clearly presented and suggests a previously unappreciated role of ATAD2 in histone variant dynamics. The authors also document defects in the sequential incorporation and removal of TH2B and PRM1 during spermiogenesis, providing phenotypic insight into chromatin transitions in late spermatogenic stages. Overall, the study presents a solid foundation for further mechanistic investigation into ATAD2 function. 

      Weaknesses:

      While the manuscript reports the gross phenotype of Atad2 KO mice, the findings remain largely superficial and do not convincingly demonstrate how ATAD2 deficiency affects chromatin dynamics. Moreover, the phenotype appears too mild to elucidate the functional significance of ATAD2 during spermatogenesis. 

      We respectfully disagree with the statement that our findings are largely superficial. Based on our investigations of this factor over the years, it has become evident that ATAD2 functions as an auxiliary factor that facilitates mechanisms controlling chromatin dynamics (see, for example, Morozumi et al., 2015). These mechanisms can still occur in the absence of ATAD2, but with reduced efficiency, which explains the mild phenotype we observed.

      This function, while not essential, is nonetheless an integral part of the cell’s molecular biology and should be studied and brought to the attention of the broader biological community, just as we study essential factors. Unfortunately, the field has tended to focus primarily on core functional actors, often overlooking auxiliary factors. As a result, our decade-long investigations into the subtle yet important roles of ATAD2 have repeatedly been met with skepticism regarding its functional significance, which has in turn influenced editorial decisions.

      We chose eLife as the venue for this work specifically to avoid such editorial barriers and to emphasize that facilitators of essential functions do exist. They deserve to be investigated, and the underlying molecular regulatory mechanisms must be understood.

      (1) Figures 4-5: The analyses of differential gene expression and chromatin organization should be more comprehensive. First, Venn diagrams comparing the sets of significantly differentially expressed genes between this study and previous work should be shown for each developmental stage. Second, given the established role of H3.3 in MSCI, the effect of Atad2 knockout on sex chromosome gene expression should be analyzed. Third, integrated analysis of RNA-seq and ATAC-seq data is needed to evaluate how ATAD2 loss affects gene expression. Finally, H3.3 ChIP-seq should be performed to directly assess changes in H3.3 distribution following Atad2 knockout.  

      (1) In the revised version, we will include Venn diagrams to illustrate the overlap in significantly differentially expressed genes between this study and previous work. However, we believe that the GSEAs presented here provide stronger evidence, as they indicate the statistical significance of this overlap (p-values). In our case, we observed p-value < 0.01 (**) and p < 0.001 (***).

      (2) Sex chromosome gene expression was analyzed and is presented in Fig. 5C.

      (3) The effect of ATAD2 loss on gene expression is shown in Fig. 4A, B, and C as histograms, with statistical significance indicated in the middle panels.

      (4) Although mapping H3.3 incorporation across the genome in wild-type and Atad2 KO cells would have been informative, the available anti-H3.3 antibody did not work for ChIP-seq, at least in our hands. The authors of Fontaine et al., 2022, who studied H3.3 during spermatogenesis in mice, must have encountered the same problem, since they tagged the endogenous H3.3 gene to perform their ChIP experiments.

      (2) Figure 3: The altered distribution of H3.3 is compelling. This raises the possibility that histone marks associated with H3.3 may also be affected, although this has not been investigated. It would therefore be important to examine the distribution of histone modifications typically associated with H3.3. If any alterations are observed, ChIP-seq analyses should be performed to explore them further.  

      Based on our understanding of ATAD2’s function—specifically its role in releasing chromatin-bound HIRA—in the absence of ATAD2 the residence time of both HIRA and H3.3 on chromatin increases. This results in the detection of H3.3 not only on sex chromosomes but across the genome. Our data provide clear evidence of this phenomenon. The reviewer is correct in suggesting that the accumulated H3.3 would carry H3.3-associated histone PTMs; however, we are unsure what additional insights could be gained by further demonstrating this point.

      (3) Figure 7: While the authors suggest that pre-PRM2 processing is impaired in Atad2 KO, no direct evidence is provided. It is essential to conduct acid-urea polyacrylamide gel electrophoresis (AU-PAGE) followed by western blotting, or a comparable experiment, to substantiate this claim. 

      Figure 7 does not suggest that pre-PRM2 processing is affected in Atad2 KO; rather, this figure—particularly Fig. 7B—specifically demonstrates that pre-PRM2 processing is impaired, as shown using an antibody that recognizes the processed portion of pre-PRM2. ELISA was used to provide a more quantitative assessment; however, in the revised manuscript we will also include a western blot image.

      (4) HIRA and ATAD2: Does the upregulation of HIRA fully account for the phenotypes observed in Atad2 KO? If so, would overexpression of HIRA alone be sufficient to phenocopy the Atad2 KO phenotype? Alternatively, would partial reduction of HIRA (e.g., through heterozygous deletion) in the Atad2 KO background be sufficient to rescue the phenotype? 

      These are interesting experiments that require the creation of appropriate mouse models, which are not currently available.

      (5)The mechanism by which ATAD2 regulates HIRA turnover on chromatin and the deposition of H3.3 remains unclear from the manuscript and warrants further investigation. 

      The Reviewer is absolutely correct. In addition to the points addressed in response to Reviewer #1’s general comments (see above), it would indeed have been very interesting to test the segregase activity of ATAD2 (likely driven by its AAA ATPase activity) through in vitro experiments using the Xenopus egg extract system described by Tagami et al., 2004. This system can be applied both in the presence and absence (via immunodepletion) of ATAD2 and would also allow the use of ATAD2 mutants, particularly those with inactive AAA ATPase or bromodomains. However, such experiments go well beyond the scope of this study, which focuses on the role of ATAD2 in chromatin dynamics during spermatogenesis

      Reference

      Wang T, Perazza D, Boussouar F, Cattaneo M, Bougdour A, Chuffart F, Barral S, Vargas A, Liakopoulou A, Puthier D, Bargier L, Morozumi Y, Jamshidikia M, Garcia-Saez I, Petosa C, Rousseaux S, Verdel A, Khochbin S. ATAD2 controls chromatin-bound HIRA turnover. Life Sci Alliance. 2021 Sep 27;4(12):e202101151. doi: 10.26508/lsa.202101151. PMID: 34580178; PMCID: PMC8500222.

      Morozumi Y, Boussouar F, Tan M, Chaikuad A, Jamshidikia M, Colak G, He H, Nie L, Petosa C, de Dieuleveult M, Curtet S, Vitte AL, Rabatel C, Debernardi A, Cosset FL, Verhoeyen E, Emadali A, Schweifer N, Gianni D, Gut M, Guardiola P, Rousseaux S, Gérard M, Knapp S, Zhao Y, Khochbin S. Atad2 is a generalist facilitator of chromatin dynamics in embryonic stem cells. J Mol Cell Biol. 2016 Aug;8(4):349-62. doi: 10.1093/jmcb/mjv060. Epub 2015 Oct 12. PMID: 26459632; PMCID: PMC4991664.

      Fontaine E, Papin C, Martinez G, Le Gras S, Nahed RA, Héry P, Buchou T, Ouararhni K, Favier B, Gautier T, Sabir JSM, Gerard M, Bednar J, Arnoult C, Dimitrov S, Hamiche A. Dual role of histone variant H3.3B in spermatogenesis: positive regulation of piRNA transcription and implication in X-chromosome inactivation. Nucleic Acids Res. 2022 Jul 22;50(13):7350-7366. doi: 10.1093/nar/gkac541. PMID: 35766398; PMCID: PMC9303386.

      Tagami H, Ray-Gallet D, Almouzni G, Nakatani Y. Histone H3.1 and H3.3 complexes mediate nucleosome assembly pathways dependent or independent of DNA synthesis. Cell. 2004 Jan 9;116(1):51-61. doi:10.1016/s0092-8674(03)01064-x. PMID: 14718166.

    1. eLife Assessment

      This useful work identifies new monoclonal antibodies produced by cystic fibrosis patients against Pseudomonas aeruginosa type three secretion system. The evidence supporting authors' claim is solid. Nonetheless, the manuscript may benefit from a more in depth description of what the authors learned from their structure-based analyses of antibodies targeting PcrV.

    2. Reviewer #1 (Public review):

      Summary:

      Desveaux et al. describe human mAbs targeting protein from the Pseudomonas aeruginosa T3SS, discovered by employing single cell B cell sorting from cystic fibrosis patients. The mAbs were directed at the proteins PscF and PcrV. They particularly focused on two mAbs binding the T3SS with the potential of blocking activity. The supplemented biochemical analysis was crystal structures of P3D6 Fab complex. They also compared the blocking activity with mAbs that were described in previous studies, using an assay that evaluated the toxin injection. They conducted mechanistic structure analysis and found that these mAbs might act through different mechanisms by preventing PcrV oligomerization and disrupting PcrVs scaffolding function.

      The antibiotic resistance crisis requires the development of new solutions to treat infections cause by MDR bacteria. The development of antibacterial mAbs holds great potential. In that context, this report is important as it paves the way for the development of additional mAbs targeting various pathogens that harbor the T3SS. In this report the authors present a comparative study of their discovered mAbs vs. a commercial mAb currently in clinical testing resulting in valuate data with applicative implications. The authors investigated the mechanism of action of the mAbs using advanced methods and assays for characterization of antibody and antigen interaction, underlining the effort to determine the discovered mAbs suitability for downstream application.

    3. Reviewer #2 (Public review):

      Summary:

      Desveaux et al. performed Elisa and translocation assays to identify among 34 cystic fibrosis patients which ones produced antibodies against P. aeruginosa type three secretion system (T3SS). Authors were especially interested in antibodies against PcrV and PcsF, two key components of the T3SS. The authors leveraged their binding assays and flow cytometry to isolate individual B cells from the two most promising sera, and then obtained monoclonal antibodies for the proteins of interest. Among the tested monoclonal antibodies, P3D6 and P5B3 emerged as the best candidates due to their inhibitory effect on the ExoS-Bla translocation marker (with 24% and 94% inhibition, respectively). The authors then showed that P5B3 binds to the five most common variants of PcrV, while P3D6 seems to recognize only one variant. Furthermore, the authors showed that P3D6 inhibits translocon formation, measured as cell death of J774 macrophages. To get insights into the P3D6-PcrV interaction, the authors defined the crystal structure of the P3D6-PcrV complex. Finally, the authors compared their new antibodies with two previous ones (i.e., MEDI3902 and 30-B8).

      Strengths:

      • Article is well written.

      • Authors used complementary assays to evaluate protective effect of candidate monoclonal antibodies.

      • Authors offered crystal structure with insights into the P3D6 antibody-T3SS interaction (e.g., interactions with monomer vs pentamers).

      • Authors put their results in context by comparing their antibodies with respect to previous ones.

      Weaknesses:

      • Results shown in Fig. 6 should be initially described in the Results section and not in the Discussion section.

      • The authors should describe, in the Discussion (and also in L146-147), in more detail the gained insights into how anti-PcrV antibodies work. This is especially important given previous reports of more potent antibodies (e.g., Simonis et al.) that significantly reduces the novelty of their work. Hence, authors could explicitly highlight how their study differentiate from previous work, and what unique insights were gained (in the current version is not completely obvious).

    4. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      Desveaux et al. describe human mAbs targeting protein from the Pseudomonas aeruginosa T3SS, discovered by employing single cell B cell sorting from cystic fibrosis patients. The mAbs were directed at the proteins PscF and PcrV. They particularly focused on two mAbs binding the T3SS with the potential of blocking activity. The supplemented biochemical analysis was crystal structures of P3D6 Fab complex. They also compared the blocking activity with mAbs that were described in previous studies, using an assay that evaluated the toxin injection. They conducted mechanistic structure analysis and found that these mAbs might act through different mechanisms by preventing PcrV oligomerization and disrupting PcrVs scaffolding function.

      Strengths:

      The antibiotic resistance crisis requires the development of new solutions to treat infections caused by MDR bacteria. The development of antibacterial mAbs holds great potential. In that context, this report is important as it paves the way for the development of additional mAbs targeting various pathogens that harbor the T3SS. In this report, the authors present a comparative study of their discovered mAbs vs. a commercial mAb currently in clinical testing resulting in valuable data with applicative implications. The authors investigated the mechanism of action of the mAbs using advanced methods and assays for the characterization of antibody and antigen interaction, underlining the effort to determine the discovered mAbs suitability for downstream application.

      Weaknesses:

      Although the information presented in this manuscript is important, previous reports regarding other T3SS structures complexed with antibodies, reduce the novelty of this report. Nevertheless, we provide several comments that may help to improve the report. The structural analysis of the presented mAbs is incomplete and unfortunately, the authors did not address any developability assessment. With such vital information missing, it is unclear if the proposed antibodies are suited for diagnostic or therapeutic usage. This vastly reduces the importance of the possibly great potential of the authors' findings. Moreover, the structural information does not include the interacting regions on the mAb which may impede the optimization of the mAb if it is required to improve its affinity.

      As described in the manuscript (Fig. 6), our mAbs are markedly less effective in every in vitro T3SS inhibition assay than the mAbs recently described by Simonis et al. They are therefore very unlikely to outperform these mAbs in in vivo animal models of P. aeruginosa infection. Considering the high cost of animal experiments and ethical concerns-and in accordance with the Reduction principal of the 3Rs guidelines-we chose not to pursue in vivo experiments. Instead, we focused on leveraging the new isolated mAbs to investigate the mechanisms of action and structural features of anti-PcrV mAbs.

      Following the reviewer's suggestion, we have now added mAb interaction features into the structural data presented in the manuscript. However, based on the efficiency data, the structural analysis and the mechanistic insights presented, we do not consider further therapeutic use and optimization of our mAbs to be warranted.

      Reviewer #2 (Public review):

      Summary:

      Desveaux et al. performed Elisa and translocation assays to identify among 34 cystic fibrosis patients which ones produced antibodies against P. aeruginosa type three secretion system (T3SS). The authors were especially interested in antibodies against PcrV and PcsF, two key components of the T3SS. The authors leveraged their binding assays and flow cytometry to isolate individual B cells from the two most promising sera, and then obtained monoclonal antibodies for the proteins of interest. Among the tested monoclonal antibodies, P3D6 and P5B3 emerged as the best candidates due to their inhibitory effect on the ExoS-Bla translocation marker (with 24% and 94% inhibition, respectively). The authors then showed that P5B3 binds to the five most common variants of PcrV, while P3D6 seems to recognize only one variant. Furthermore, the authors showed that P3D6 inhibits translocon formation, measured as cell death of J774 macrophages. To get insights into the P3D6PcrV interaction, the authors defined the crystal structure of the P3D6-PcrV complex. Finally, the authors compared their new antibodies with two previous ones (i.e., MEDI3902 and 30-B8).

      Strengths:

      (1) The article is well written.

      (2) The authors used complementary assays to evaluate the protective effect of candidate monoclonal antibodies.

      (3) The authors offered crystal structure with insights into the P3D6 antibody-T3SS interaction (e.g., interactions with monomer vs pentamers).

      (4) The authors put their results in context by comparing their antibodies with respect to previous ones.

      Weaknesses:

      The authors used a similar workflow to the one previously reported in Simonis et al. 2023 (antibodies from cystic fibrosis patients that included B cell isolation, antibody-PcrV interaction modeling, etc.) but the authors do not clearly explain how their work and findings differentiate from previous work.   

      We employed a similar mAb isolation pipeline to that used by Simonis et al., beginning with the screening of a cohort of cystic fibrosis patients chronically infected with P. aeruginosa. As in Simonis et al., we isolated specific B cells using a recombinant PcrV bait, followed by single-cell PCR amplification of immunoglobulin genes. The main differences in methodology between the two studies are as follows: i) the use of individuals from different cohorts, and therefore having different Ab repertoires; ii) the nature of the screening assays, although in both cases the screening was focused on the inhibition of T3SS function; iii) the PcrV labeling strategy, with Simonis et al. employing direct labeling, whereas we used a biotinylated tag combined with streptavidin;

      The number of specific mAbs obtained and produced was higher in Simonis et al. (47 versus 9 in our study). They sorted B cells from three individuals compared to two in our work and possibly started with a larger amount of PBMCs per donor, which may account for the higher number of specific B cells and mAbs isolated. Considering that the strategies were overall very similar, the greater number of mAbs isolated in Simonis et al. likely explains, to a large extent, why they identified mAbs targeting different epitopes compared to ours, including highly potent mAbs that we did not recover. 

      Our modeling study, unlike that of Simonis et al., which relied on an AlphaFold prediction of the multimeric structure of P. aeruginosa PcrV, was based on the experimentally determined structure of the homologous Salmonella SipD pentamer, as described in the manuscript. Furthermore, we compared our mAb P3D6 not only with 30-B8 from Simonis et al., but also with MEDI3902. Finally, in contrast to the approach of Simonis et al., we used functional assays to investigate the differences in mechanisms of action among these mAbs, which target three distinct epitopes.

      (2) Although new antibodies against P. aeruginosa T3SS expand the potential space of antibodybased therapies, it is unclear if P3D6 or P5B3 are better than previous antibodies. In fact, in the discussion section authors suggested that the 30-B8 antibody seems to be the most effective of the tested antibodies.  

      As explained above and shown in the Results section (Figure 6), the 30-B8 mAb is markedly more effective at inhibiting T3SS activity in both in vitro assays used.

      (3) The authors should explain better which of the two antibodies they have discovered would be better suited for follow-up studies. It is confusing that the authors focused the last sections of the manuscript on P3D6 despite P3D6 having a much lower ExoS-Bla inhibition effect than P5B3 and the limitation in the PcrV variant that P3D6 seems to recognize. A better description of this comparison and the criteria to select among candidate antibodies would help readers identify the main messages of the paper. 

      The P3D6 mAb shows stronger inhibitory activity than P5B3 in the two assays used, as shown in Supplementary Figure 1. An error in the table in Figure 2B was corrected and this table now reflects the results presented in Supplementary Figure 1. 

      The final sections of the manuscript focus on P3D6, which is more potent than P5B3, and for which we successfully determined a co-crystal structure with PcrV*. All parallel attempts to obtain a structure of P5B3 in complex with PcrV* failed. The P3D6-PcrV* structure was used to analyze epitope recognition and mechanisms of action in comparison to previously described mAbs. As previously mentioned, we do not consider further studies aimed at therapeutic development and optimization of our mAbs to be justified given the current data. Therefore, we believe that the main message of the paper is adequately captured in the title.

      (4) This work could strongly benefit from two additional experiments:

      (a) In vivo experiments: experiments in animal models could offer a more comprehensive picture of the potential of the identified monoclonal antibodies. Additionally, this could help to answer a naïve question: why do the patients that have the antibodies still have chronic P. aeruginosa infections? 

      As explained above, the mAbs we isolated are significantly less potent than those described by Simonis et al., and are therefore unlikely to outperform the best anti-PcrV candidates in vivo. In light of the data, and considering ethical concerns related to animal use in research and budgetary constraints, we decided not to proceed with in vivo experiments.

      There are a number of reasons that may explain why patients with anti-PcrV Abs blocking the T3SS can still be chronically infected with Pa. First these Abs may be at limiting concentration, particularly in sites where Pa replicates, and thus unable to clear infection. in addition, it has been described that the T3SS is downregulated in chronic infection in cystic fibrosis patients. This suggests that a therapeutic intervention with T3SS inhibiting Abs may be more efficient if done early in cystic fibrosis patients to prevent colonization when Pa possesses an active T3SS. Finally, T3SS is not the only virulence mechanism employed by P. aeruginosa during infection. Indeed, multiple protein adhesins and polysaccharides are important factors facilitating the formation of bacterial biofilms that are crucial for establishing chronic persistent infection. In this regard, a combination of Abs targeting different factors on the P. aeruginosa surface may be needed to treat chronic infections.  

      (b) Multi-antibody T3SS assays (i.e., a combination of two or more monoclonal antibodies evaluated with the same assays used for characterization of single ones). This could explore the synergistic effects of combinatorial therapies that could address some of the limitations of individual antibodies. 

      Given the high potency of the Simonis mAbs and the mechanisms of action highlighted by our analysis, it is unlikely that our mAbs would synergize with those described by Simonis. Additionally, since our two mAbs cross-compete for binding, synergy between them is also improbable.

      Reviewer #1 (Recommendations for the authors):

      Line 166: How was the serum-IgG purified? (e.g., protein A, protein G). 

      Protein A purification was used, as now mentioned in the manuscript. Purified Igs were thus predominantly IgG1, IgG2 and IgG4, as indicated.

      (2) Line 196: When mentioning affinities, it is preferable to present in molar units. 

      To facilitate comparisons, Ab concentrations were presented in µg/mL as in Simonis et al.

      (3) Line 206: The author states that P3D6 displays significantly reduced ExoS-Bla injection (Figure 2B), but according to the presented table, ExoS-Bla inhibition was higher for P5B3. Additionally, when using "significantly", what was the statistical test that was used to evaluate the significance? Please clarify.

      We thank the reviewer for pointing out this inconsistency. Indeed, the names of P3D6 and P5B3 were exchanged when building the table related to Figure 2B. The corrected version of this figure is now presented in the new version of the manuscript. An ANOVA was performed to evaluate the significance of the observed difference (adjusted p-values < 0.001) and it is now mentioned in the figure caption.  

      (4) Line 215: "P3B3" typo.

      This was corrected.

      (5) Figure 3B: Could the author explain the higher level of ExoS-Bla injection when using VRCO1 antibody compared to no antibody.  

      A slightly higher level of the median is observed in the case of three variants out of five. However, this difference is not statistically significant (p-value > 0.05).

      (6) Supplement Figure 1: the presented grey area is not clear (is it the 95%CI?) and how was the IC50 calculated? With what model was it projected? Are the values for IC50 beyond the 100µg/mL mark a projection? It seems that projecting such greater values (such as the IC50 of over 400µg/mL for variant 5) is prone to high error probability.

      The grey area represents the 95% confidence interval (95% CI) and it is now mentioned in the figure caption. The IC50 and 95% CI were both inferred by the dose-response drc R package based on a three-parameters log-logistic model and it is now explained in the Materials & Methods section. The p-values for IC50 beyond the 100µg/mL were below 0.05 but we agree that such extrapolation should be considered with precaution (see below our response to comment number 7).

      (7) Line 227: The author describes that P5B3 has similar IC50 values towards variants 1-4, but the  IC50 towards variant 5 is substantially higher with 400µg/mL, albeit the only difference between variant 4 and 5 is the switch position 225 Arg -> Lys which are very similar in their properties. Please provide an explanation. 

      As explained in our response to comment number 6, we agree that the comparison of IC50 that are estimated to be close or higher than the highest experimental concentration is somehow speculative. Indeed, we performed further statistical analysis that showed no significant difference between the IC50 toward the five PcrV variants of mAb P5B3. In contrast, the difference between the IC50 of mAbs P5B3 and P3D6 toward variant 1 is statistically significant. This is now explained in the manuscript.

      (8) Line 233: Pore assembly: It is not clear how the data was normalized. The authors mention the methods normalization against the wildtype strain in the absence of antibodies, but did not elaborate clearly if the mutant strain has the same base cytotoxicity as the wild type. It would be helpful to show the level of cytotoxicity of the wild type compared to the mutant in the absence of antibodies to understand the baseline of cytotoxicity of both strains.  

      In these experiments we did not use the wild-type strain. As explained, the only strain that allows the measurement of pore formation by translocators PopB/PopD is the one lacking all effectors. All the experiments were done with this strain, and all the measurements were normalized accordingly. 

      (9) Figure 4: The explanation is redundant as it is clearly stated in the results. It would be better for the caption to describe the figure and leave interpretation to the results section. Overall, this comment is relevant to all figure captions, as it will reduce redundancy. My suggestion is to keep the figure caption as a road map to understand what is shown in the figure. For example, the Figure 4 caption should include that the concentration is presented in logarithmic scale, what is the dashed line, what is the grey area (what interval does it represent?), what each circle represents, and what is the regression model used? 

      Figure captions have been improved as suggested. 

      (10) Line 432: The authors apparently misquoted the original article describing the chimeric form PcrV* by describing the fusion of amino acids 1-17 and 136-249. I quote the original article by Tabor et al. "[...] we generated a truncated PcrV fragment (PcrVfrag) comprising PcrV amino acids 1-17 fused to amino acids 149-236 [...]". Additionally, how does the absence of amino acid 21 in the variant affect the conclusion? 

      Our construct was inspired by the one described in Tabor et al. but was not identical. We have therefore replaced "was constructed based on a construct by Tabor et al." for "whose design was inspired by the construct described in Tabor et al."

      Amino acid 21 is only absent in the construct used for crystallization experiments; all other experiments looking at Ab activity were performed with bacteria bearing full-length PcrV. The difference in P3D6 activity between variants V1 and V2-appears to be explained by the nature of the residue at position 225, according to the structural data, as explained now in more detail in the manuscript. Accordingly, the difference in efficiency of P3D6 against the V1 and V2  variants is explained by the residue at position 225, as both variants have the same residue at position 21. However, while the nature of the residue at position 225 appears to explain the absence of efficiency of the Ab for the variants studied, an impact of residue 21 could not be totally ruled out in putative variants with a Ser at 225 but different amino acids at 21.

      (11) Line 569: Missing word - ESRF stands for European Synchrotron Radiation Facility. 

      This has been corrected.

      (12) Line 268-269 (Figure 5A): The description of the alpha helices in relation to the figure is incomplete. Helices 2,3 and 5 are not indicated. 

      Indeed, since the structure is well-known and in the interest of visibility and simplicity, we only included the most relevant secondary structure features.

      (13) Line 271-272: It would be good to elaborate on the exact binding platform between LC and HC of the Fab and the residues on the PcrV side. For example, the author could apply the structure to PDBePISA (EMBL-EBI) which will provide details about the interface between the PcrV and the antibody. It is very interesting to learn what regions of the antibody are in charge of the binding, such as: is the H-CDR3 the major contributor of the binding or are other CDRs more involved? Additionally, in line 275 they state that the substitution of Ser 225 with Arg or Lys is consistent with the P3D6 insufficient binding. What contributed to this result on the antibodies side? 

      In order to address this question, we are now providing a LigPlot figure (supplementary Figure 3) in which specific interactions between PcrV* and the Fab are shown.

      (14) Line 291: It is unclear from what data the authors concluded that anti-PscF targets 3 distinct regions of PscF. 

      The data are shown in Supplementary Table 2, as mentioned in the manuscript. We have now modified the order of the anti-PcrV mAbs in the table to better illustrate the three identified epitope clusters (Sup table 2). Similarly, the anti-PscF mAbs appear to group into three clusters as P3G9 and P5E10 only compete with themselves, while mabs P3D6 and P5B3 compete with themselves and each other.

      (15) Line 315: It is preferable to introduce results in the results section instead of the discussion. 

      While preparing the manuscript, we initially included these results as a separate paragraph in the Results section, but ultimately chose the current format to improve flow and avoid redundancy.

      (16) Supplement Figure 2: What was the regression model used to evaluate IC50, and what is presented in the graph? What is the dashed line (see comment for Figure 4 above)? 

      The regression is based on a three-parameters log-logistic model and the light-colors area correspond to the 95% IC. The dashed lines visually represents 100% of ExoS-Bla injection. These information are now mentioned in the figure caption.

      (17) Figure 6B: It would be better to show an additional rotation of the PcrV bound by Fab 30-B8 that corresponds to the same as the one represented with Fab MEDI3092. This would clear up the differences in binding regions. Same for Fab P3D6. 

      Figure 6 already depicts two orientations. Despite the fact that we agree that additional orientations could be of interest, we believe that this would add unnecessary complexity to the figure, and would prefer to maintain the figure as is, if possible.

      (18) Line 356-358: The author proposes an experiment to support the suggested mechanism of P3D6, it would follow up with a bio-chemical analysis showing the prevention of PcrV oligomerization in its presence. 

      We understand the reviewers’ comment regarding the potential use of biochemical approaches to test our hypothesis. However, this not currently feasible as we have been unable to achieve in vitro oligomerization of PcrV alone, possibly due to the absence of other T3SS components, such as the polymerized PscF needle.

      (19) Line 456: Missing details about how the ELISA was conducted including temperature, how the antigen was absorbed, plate type, etc. 

      Experimental details have been added.

      (20) Line 460: Missing substrate used for alkaline phosphatase. 

      The nature of the substrate was added to the methods.

    1. eLife Assessment

      This study makes the valuable claim that people track, specifically, the elasticity of control (that is, the degree to which outcome depends on how many resources - such as money - are invested), and that control elasticity is impaired in certain types of psychopathologies. A novel task is introduced that provides solid evidence that this learning process occurs and that human behavior is sensitive to changes in the elasticity of control. Evidence that elasticity inference is distinct from more general learning mechanisms and is related to psychopathology remains incomplete.

    2. Reviewer #1 (Public review):

      Summary:

      The authors investigated the elasticity of controllability by developing a task that manipulates the probability of achieving a goal with a baseline investment (which they refer to as inelastic controllability) and the probability that additional investment would increase the probability of achieving a goal (which they refer to as elastic controllability). They found that a computational model representing the controllability and elasticity of the environment accounted better for the data than a model representing only the controllability. They also found that prior biases about the controllability and elasticity of the environment was associated with a composite psychopathology score. The authors conclude that elasticity inference and bias guide resource allocation.

      Strengths:

      This research takes a novel theoretical and methodological approach to understanding how people estimate the level of control they have over their environment, and how they adjust their actions accordingly. The task is innovative and both it and the findings are well-described (with excellent visuals). They also offer thorough validation for the particular model they develop. The research has the potential to theoretically inform understanding of control across domains, which is a topic of great importance.

      Weaknesses:

      In its revised form, the manuscript addresses most of my previous concerns. The main remaining weakness pertains to the analyses aimed at addressing my suggesting of Bayesian updating as an alternative to the model proposed by the authors. My suggestion was to assume that people perform a form of function approximation to relate resource expenditure to success probability. The authors performed a version of this where people were weighing evidence for a few canonical functions (flat, step, linear), and found that this model underperforms theirs. However, this Bayesian model is quite constrained in its ability to estimate the function relating resources. A more robust test would be to assume a more flexible form of updating that is able to capture a wide range of distributions (e.g., using basis functions, gaussian processes, or nonparametric estimators); see, e.g., work by Griffiths on human function learning). The benefit of testing this type of model is that it would make contact with a known form of inference that individuals engage in across various settings, and therefore could offer a more parsimonious and generalizable account of function learning, whereby learning of resource elasticity is a special case. I defer to the authors as to whether they'd like to pursue this direction, but if not I think it's still important that they acknowledge that they are unable to rule out a more general process like this as an alternative to their model. This also pertains to inferences about individual differences, which currently hinge on their preferred model being the most parsimonious.

    3. Reviewer #2 (Public review):

      Summary:

      In this paper, the authors test whether controllability beliefs and associated actions/resource allocation are modulated by things like time, effort, and monetary costs (what they call "elastic" as opposed to "inelastic" controllability). Using a novel behavioral task and computational modeling, they find that participants do indeed modulate their resources depending on whether they are in an "elastic," "inelastic," or "low controllability" environment. The authors also find evidence that psychopathology is related to specific biases in controllability.

      Strengths:

      This research investigates how people might value different factors that contribute to controllability in a creative and thorough way. The authors use computational modeling to try to dissociate "elasticity" from "overall controllability," and find some differential associations with psychopathology. This was a convincing justification for using modeling above and beyond behavioral output, and yielded interesting results. Notably, the authors conclude that these findings suggest that biased elasticity could distort agency beliefs via maladaptive resource allocation. Overall, this paper reveals important findings about how people consider components of controllability.

      Weaknesses:

      The authors have gone to great lengths to revise the manuscript to clarify their definitions of "elastic" and "inelastic" and bolster evidence for their computational model, resulting in an overall strong manuscript that is valuable for elucidating controllability dynamics and preferences. One minor weakness is that the justification for the analysis technique for the relationships between the model parameters and the psychopathology measures remains lacking given the fact that simple correlational analyses did not reveal any significant associations nor were there results of any regression analyses. That said, the authors did preregister the CCA analysis, so while perhaps not the best method, it was justified to complete it. Regardless of method, the psychopathology results are not particularly convincing, but provide an interesting jumping-off point for further exploration in future work.

    4. Reviewer #3 (Public review):

      A bias in how people infer the amount of control they have over their environment is widely believed to be a key component of several mental illnesses including depression, anxiety, and addiction. Accordingly, this bias has been a major focus in computational models of those disorders. However, all of these models treat control as a unidimensional property, roughly, how strongly outcomes depend on action. This paper proposes---correctly, I think---that the intuitive notion of "control" captures multiple dimensions in the relationship between action and outcome. In particular, the authors identify one key dimension: the degree to which outcome depends on how much *effort* we exert, calling this dimension the "elasticity of control". They additionally argue that this dimension (rather than the more holistic notion of controllability) may be specifically impaired in certain types of psychopathology. This idea has the potential to change how we think about several major mental disorders in a substantial way, and can additionally help us better understand how healthy people navigate challenging decision-making problems. More concisely, it is a *very good idea*.

      The more concrete contributions, however, are not as strong. In particular, evidence for the paper's most striking claims is weak. Quoting the abstract, these claims are (1) "the elasticity of control [is] a distinct cognitive construct guiding adaptive behavior" and (2) "overestimation of elasticity is associated with elevated psychopathology involving an impaired sense of control."

      Main issues

      I'll highlight the key points.

      - The task cannot distinguish elasticity inference from general learning processes

      - Participants were explicitly instructed about elasticity, with labeled examples

      - The psychopathology claims rely on an invalid interpretation of CCA, and are contradicted by simple correlations (elasticity bias and the sense of agency scale is r=0.03)

      Distinct construct

      Starting with claim 1, there are three subclaims here. (1A) People's behavior is sensitive to differences in elasticity; (1B) there are mental processes specific to elasticity inference, i.e., not falling out of general learning mechanisms; and, implicitly, (1C) people infer elasticity naturally as they go about their daily lives. The results clearly support 1A. However, 1B and 1C are not well supported.

      (1B) The data cannot support the "distinct cognitive construct" claim because the task is too simple to dissociate elasticity inference from more general learning processes (also raised by Reviewer 1). The key behavioral signature for elasticity inference (vs. generic controllability inference) is the transfer across ticket numbers, illustrated in Fig 4. However, this pattern is also predicted by a standard Bayesian learner equipped with an intuitive causal model of the task. Each ticket gives you another chance to board and the agent infers the probability that each attempt succeeds. Crucially, this logic is not at all specific to elasticity or even control. An identical model could be applied to inferring the bias of a coin from observations of whether any of N tosses were heads-a task that is formally identical to this one (at least, the intuitive model of the task; see first minor comment).

      Importantly, this point cannot be addressed by showing that the author's model fits data better than this or any other specific Bayesian model. It is not a question of whether one particular updating rule explains data better than another. Rather, it is a question of whether the task can distinguish between biases in *elasticity* inference versus biases in probabilistic inference more generally. The present task cannot make this distinction because it does not make separate measurements of the two types of inference. To provide compelling evidence that elasticity inference is a "distinct cognitive construct", one would need to show that there are reliable individual differences in elasticity inference that generalize across contexts but do not generalize to computationally similar types of probabilistic inference (e.g. the coin flipping example).

      (1C) The implicit claim that people infer elasticity outside of the experimental task is undermined by the experimental design. The authors explicitly tell people about the two notions of control as part of the training phase: "To reinforce participants' understanding of how elasticity and controllability were manifested in each planet, [participants] were informed of the planet type they had visited after every 15 trips."

      In the revisions, the authors seem to go back and forth on whether they are claiming that people infer elasticity without instruction (I won't quote it here). I'll just note that the examples they provide in the most recent rebuttal are all cases in which one never receives explicit labels about elasticity. If people only infer elasticity when it is explicitly labeled, I struggle to see its relevance for understanding human cognition and behavior.

      Psychopathology

      Finally, I turn to claim 2, that "overestimation of elasticity is associated with elevated psychopathology involving an impaired sense of control." The CCA analysis is in principle unable to support this claim. As the authors correctly note in their latest rebuttal, the CCA does show that "there is a relationship between psychopathology traits and task parameters". The lesion analysis further shows that "elasticity bias specifically contributes to this relationship" (and similarly for the Sense of Agency scale). Crucially, however, this does *not* imply that there is a relationship between those two variables. The most direct test of that relationship is the simple correlation, which the authors report only in a supplemental figure: there is no relationship (r=0.03). Although it is of course possible that there is a relationship that is obscured by confounding variables, the paper provides no evidence-statistical or otherwise-that such a relationship exists.

      Minor comments

      The statistical structure of the task is inconsistent with the framing. In the framing, participants can make either one or two second boarding attempts (jumps) by purchasing extra tickets. The additional attempt(s) will thus succeed with probability p for one ticket and 2p - p^2 for two tickets; the p^2 captures the fact that you only take the second attempt if you fail on the first. A consequence of this is buying more tickets has diminishing returns. In contrast, in the task, participants always jumped twice after purchasing two tickets, and the probability of success with two tickets was exactly double that with one ticket. Thus, if participants are applying an intuitive causal model to the task, the researcher could infer "biases" in elasticity inference that are probably better characterized as effective use of prior information (encoded in the causal model).

      The model is heuristically defined and does not reflect Bayesian updating. For example, it over-estimates maximum control by not using losses with less than 3 tickets (intuitively, the inference here depends on what your beliefs about elasticity). Including forced three-ticket trials at the beginning of each round makes this less of an issue; but if you want to remove those trials, you might need to adjust the model. The need to introduce the modified model with kappa is likely another symptom of the heuristic nature of the model updating equations.

    5. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #1 (Public review):

      This research takes a novel theoretical and methodological approach to understanding how people estimate the level of control they have over their environment and how they adjust their actions accordingly. The task is innovative and both it and the findings are well-described (with excellent visuals). They also offer thorough validation for the particular model they develop. The research has the potential to theoretically inform understanding of control across domains, which is a topic of great importance.

      We thank the Reviewer for their favorable appraisal and valuable suggestions, which have helped clarify and strengthen the study’s conclusion. 

      In its revised form, the manuscript addresses most of my previous concerns. The main remaining weakness pertains to the analyses aimed at addressing my suggesting of Bayesian updating as an alternative to the model proposed by the authors. My suggestion was to assume that people perform a form of function approximation to relate resource expenditure to success probability. The authors performed a version of this where people were weighing evidence for a few canonical functions (flat, step, linear), and found that this model underperformed theirs. However, this Bayesian model is quite constrained in its ability to estimate the function relating resources. A more robust test would be to assume a more flexible form of updating that is able to capture a wide range of distributions (e.g., using basis functions, gaussian processes, or nonparametric estimators); see, e.g., work by Griffiths on human function learning). The benefit of testing this type of model is that it would make contact with a known form of inference that individuals engage in across various settings and therefore could offer a more parsimonious and generalizable account of function learning, whereby learning of resource elasticity is a special case. I defer to the authors as to whether they'd like to pursue this direction, but if not I think it's still important that they acknowledge that they are unable to rule out a more general process like this as an alternative to their model. This pertains also to inferences about individual differences, which currently hinge on their preferred model being the most parsimonious.

      We thank the Reviewer for this thoughtful suggestion. We acknowledge that more flexible function learning approaches could provide a stronger test in favor of a more general account. Our Bayesian model implemented a basis function approach where the weights of three archetypal functions (flat, step, linear) are learned from experience Testing models with more flexible basis functions would likely require a task with more than three levels of resource investment (1, 2, or 3 tickets). This would make an interesting direction for future work expanding on our current findings. We now incorporate this suggestion in more detail in our updated manuscript (335-341):

      “Second, future models could enable generalization to levels of resource investment not previously experienced. For example, controllability and its elasticity could be jointly estimated via function approximation that considers control as a function of invested resources. Although our implementation of this model did not fit participants’ choices well (see Methods), other modeling assumptions drawn from human function learning [30] or experimental designs with continuous action spaces may offer a better test of this idea.”

      Reviewer #2 (Public review):

      This research investigates how people might value different factors that contribute to controllability in a creative and thorough way. The authors use computational modeling to try to dissociate "elasticity" from "overall controllability," and find some differential associations with psychopathology. This was a convincing justification for using modeling above and beyond behavioral output and yielded interesting results. Notably, the authors conclude that these findings suggest that biased elasticity could distort agency beliefs via maladaptive resource allocation. Overall, this paper reveals important findings about how people consider components of controllability. The authors have gone to great lengths to revise the manuscript to clarify their definitions of "elastic" and "inelastic" and bolster evidence for their computational model, resulting in an overall strong manuscript that is valuable for elucidating controllability dynamics and preferences. 

      We thank the Reviewer for their constructive feedback throughout the review process, which has substantially strengthened our manuscript and clarified our theoretical framework.

      One minor weakness is that the justification for the analysis technique for the relationships between the model parameters and the psychopathology measures remains lacking given the fact that simple correlational analyses did not reveal any significant associations.

      We note that the existence of bivariate relationships is not a prerequisite for the existence of multivariate relationships. Conditioning the latter on the former, therefore, would risk missing out on important relationships existing in the data. Ultimately, correlations between pairs of variables do not offer a sensitive test for the general hypothesis that there is a relationship between two sets of variables. As an illustration, consider that elasticity bias correlated in our data (r = .17, p<.001) with the difference between SOA (sense of agency) and SDS (self-rating depression). Notably, SOA and SDS were positively correlated (r = .47, p<.001), and neither of them was correlated with elasticity bias (SOA: r=.04 p=.43, SDS: r=-.06, p=.16). It was a dimension that ran between them that mapped onto elasticity bias. This specific finding is incidental and uncorrected for multiple comparisons, hence we do not report it in the manuscript, but it illustrates the kinds of relationships that cannot be accounted for by looking at bivariate relationships alone.  

      Reviewer #3 (Public review):

      A bias in how people infer the amount of control they have over their environment is widely believed to be a key component of several mental illnesses including depression, anxiety, and addiction. Accordingly, this bias has been a major focus in computational models of those disorders. However, all of these models treat control as a unidimensional property, roughly, how strongly outcomes depend on action. This paper proposes---correctly, I think---that the intuitive notion of "control" captures multiple dimensions in the relationship between action and outcome.

      In particular, the authors identify one key dimension: the degree to which outcome depends on how much *effort* we exert, calling this dimension the "elasticity of control". They additionally argue that this dimension (rather than the more holistic notion of controllability) may be specifically impaired in certain types of psychopathology. This idea has the potential to change how we think about several major mental disorders in a substantial way and can additionally help us better understand how healthy people navigate challenging decision-making problems. More concisely, it is a very good idea.

      We thank the Reviewer for their thoughtful engagement with our manuscript. We appreciate their recognition of elasticity as a key dimension of control that has the potential to advance our understanding of psychopathology and healthy decision-making.

      Starting with theory, the authors do not provide a strong formal characterization of the proposed notion of elasticity. There are existing, highly general models of controllability (e.g., Huys & Dayan, 2009; Ligneul, 2021) and the elasticity idea could naturally be embedded within one of these frameworks. The authors gesture at this in the introduction; however, this formalization is not reflected in the implemented model, which is highly task-specific.

      Our formal definition of elasticity, detailed in Supplementary Note 1, naturally extends the reward-based and information-theoretic definitions of controllability by Huys & Dayan (2009) and Ligneul (2021). We now further clarify how the model implements this formalized definition (lines 156-159).

      “Conversely, in the ‘elastic controllability model’, the beta distributions represent a belief about the maximum achievable level of control (𝑎<sub>Control</sub>, 𝑏<sub>Control</sub>) coupled with two elasticity estimates that specify the degree to which successful boarding requires purchasing at least one (𝑎<sub>elastic≥1</sub>, 𝑏<sub>elastic≥1</sub>) or specifically two (𝑎<sub>elastic2</sub>, 𝑏<sub>elastic2</sub>) extra tickets. As such, these elasticity estimates quantify how resource investment affects control. The higher they are, the more controllability estimates can be made more precise by knowing how much resources the agent is willing and able to invest (Supplementary Note 1).”

      Moreover, the authors present elasticity as if it is somehow "outside of" the more general notion of controllability. However, effort and investment are just specific dimensions of action; and resources like money, strength, and skill (the "highly trained birke") are just specific dimensions of state. Accordingly, the notion of elasticity is necessarily implicitly captured by the standard model. Personally, I am compelled by the idea that effort and resource (and therefore elasticity) are particularly important dimensions, ones that people are uniquely tuned to. However, by framing elasticity as a property that is different in kind from controllability (rather than just a dimension of controllability), the authors only make it more difficult to integrate this exciting idea into generalizable models.

      We respectfully disagree that we present elasticity as outside of, or different in kind from, controllability. Throughout the manuscript, we explicitly describe elasticity as a dimension of controllability (e.g., lines 70-72, along many other examples). This is also expressed in our formal definition of elasticity (Supplementary Note 1). 

      The argument that vehicle/destination choice is not trivial because people occasionally didn't choose the instructed location is not compelling to me-if anything, the exclusion rate is unusually low for online studies. The finding that people learn more from non-random outcomes is helpful, but this could easily be cast as standard model-based learning very much like what one measures with the Daw two-step task (nothing specific to control here). Their final argument is the strongest, that to explain behavior the model must assume "a priori that increased effort could enhance control." However, more literally, the necessary assumption is that each attempt increases the probability of success-e.g. you're more likely to get a heads in two flips than one. I suppose you can call that "elasticity inference", but I would call it basic probabilistic reasoning.

      We appreciate the Reviewer’s concerns but feel that some of the more subjective comments might not benefit from further discussion. We only note that controllability and its elasticity are features of environmental structure, so in principle any controllability-related inference is a form of model-based learning. The interesting question is whether people account in their world model for that particular feature of the environment.   

      The authors try to retreat, saying "our research question was whether people can distinguish between elastic and inelastic controllability." I struggle to reconcile this with the claim in the abstract "These findings establish the elasticity of control as a distinct cognitive construct guiding adaptive behavior". That claim is the interesting one, and the one I am evaluating the evidence in light of.

      In real-world contexts, it is often trivial that sometimes further investment enhances control and sometimes it does not. For example, students know that if they prepare more extensively for their exams they will likely be able to achieve better grades, but they also know that there is uncertainty in this regard – their grades could improve significantly, modestly, or in some cases, they might not improve at all, depending on the type of exams their study program administers and the knowledge or skills being tested. Our research question was whether in such contexts people learn from experience the degree to which controllability is elastic to invested resources and adapt their resource investment accordingly. Our findings show that they do. 

      The authors argue for CCA by appeal to the need to "account for the substantial variance that is typically shared among different forms of psychopathology". I agree. A simple correlation would indeed be fairly weak evidence. Strong evidence would show a significant correlation after *controlling for* other factors (e.g. a regression predicting elasticity bias from all subscales simultaneously). CCA effectively does the opposite, asking whether-with the help of all the parameters and all the surveys-one can find any correlation between the two sets of variables. The results are certainly suggestive, but they provide very little statistical evidence that the elasticity parameter is meaningfully related to any particular dimension of psychopathology.

      We agree with the Reviewer on the relationship between elasticity and any particular dimension of psychopathology. The CCA asks a different question, namely, whether there is a relationship between psychopathology traits and task parameters, and whether elasticity bias specifically contributes to this relationship. 

      I am very concerned to see that the authors removed the discussion of this limitation in response to my first review. I quote the original explanation here:

      - In interpreting the present findings, it needs to be noted that we designed our task to be especially sensitive to overestimation of elasticity. We did so by giving participants free 3 tickets at their initial visits to each planet, which meant that upon success with 3 tickets, people who overestimate elasticity were more likely to continue purchasing extra tickets unnecessarily. Following the same logic, had we first had participants experience 1 ticket trips, this could have increased the sensitivity of our task to underestimation of elasticity in elastic environments. Such underestimation could potentially relate to a distinct psychopathological profile that more heavily loads on depressive symptoms. Thus, by altering the initial exposure, future studies could disambiguate the dissociable contributions of overestimating versus underestimating elasticity to different forms of psychopathology.

      The logic of this paragraph makes perfect sense to me. If you assume low elasticity, you will infer that you could catch the train with just one ticket. However, when elasticity is in fact high, you would find that you don't catch the train, leading you to quickly infer high elasticity eliminating the bias. In contrast, if you assume high elasticity, you will continue purchasing three tickets and will never have the opportunity to learn that you could be purchasing only one-the bias remains.

      The authors attempt to argue that this isn't happening using parameter recovery. However, they only report the *correlation* in the parameter, whereas the critical measure is the *bias*. Furthermore, in parameter recovery, the data-generating and data-fitting models are identical-this will yield the best possible recovery results. Although finding no bias in this setting would support the claims, it cannot outweigh the logical argument for the bias that they originally laid out. Finally, parameter recovery should be performed across the full range of plausible parameter values; using fitted parameters (a detail I could only determine by reading the code) yields biased results because the fitted parameters are themselves subject to the bias (if present). That is, if true low elasticity is inferred as high elasticity, then you will not have any examples of low elasticity in the fitted parameters and will not detect the inability to recover them.

      The logic the Reviewer describes breaks down when one considers the dynamics of participants’ resource investment choices. A low elasticity bias in a participant’s prior belief would make them persist for longer in purchasing a single ticket despite failure, as compared to a person without such a bias. Indeed, the ability of the experimental design to demonstrate low elasticity biases is evidenced by the fact that the majority of participants were fitted with a low elasticity bias (μ = .16 ± .14, where .5 is unbiased). 

      Originally, the Reviewer was concerned that elasticity bias was being confounded with a general deficit in learning. The weak inter-parameter correlations in the parameter recovery test resolved this concern, especially given that, as we now noted, the simulated parameter space encompassed both low and high elasticity biases (range=[.02,.76]). Furthermore, regarding the Reviewer's concern about bias in the parameter recovery, we found no such significant bias with respect to the elasticity bias parameter (Δ(Simulated, Recovered)= -.03, p=.25), showing that our experiment could accurately identify low and high elasticity biases.

      The statistical structure of the task is inconsistent with the framing. In the framing, participants can make either one or two second boarding attempts (jumps) by purchasing extra tickets. The additional attempt(s) will thus succeed with probability p for one ticket and 2p – p<sup>^</sup>2 for two tickets; the p<sup>^</sup>2 captures the fact that you only take the second attempt if you fail on the first. A consequence of this is buying more tickets has diminishing returns. In contrast, in the task, participants always jumped twice after purchasing two tickets, and the probability of success with two tickets was exactly double that with one ticket. Thus, if participants are applying an intuitive causal model to the task, they will appear to "underestimate" the elasticity of control. I don't think this seriously jeopardizes the key results, but any follow-up work should ensure that the task's structure is consistent with the intuitive causal model.

      We thank the Reviewer for this comment, and agree the participants may have employed the intuitive understanding the Reviewer describes. This is consistent with our model comparison results, which showed that participants did not assume that control increases linearly with resource investment (lines 677-692). Consequently, this is also not assumed by our model, except perhaps by how the prior is implemented (a property that was supported by model comparison). In the text, we acknowledge that this aspect of the model and participants’ behavior deviates from the true task's structure, and it would be worthwhile to address this deviation in future studies. 

      That said, there is no reason that this will make participants appear to be generally underestimating elasticity. Following exposure to outcomes for one and three tickets, any nonlinear understanding of probabilities would only affect the controllability estimate for two tickets. This would have contrasting effects on the elasticity estimated to the second and third tickets, but on average, it would not change the overall elasticity estimated. On the other hand, such a participant is only exposed to outcomes for two and three tickets, they would come to judge the difference between the first and second tickets too highly, thereby overestimating elasticity.  

      The model is heuristically defined and does not reflect Bayesian updating. For example, it overestimates maximum control by not using losses with less than 3 tickets (intuitively, the inference here depends on what your beliefs about elasticity). Including forced three-ticket trials at the beginning of each round makes this less of an issue; but if you want to remove those trials, you might need to adjust the model. The need to introduce the modified model with kappa is likely another symptom of the heuristic nature of the model updating equations.

      Note that we have tested a fully Bayesian model (lines 676-691), but found that this model fitted participants’ choices worse. 

      You're right; saying these analyses provides "no information" was unfair. I agree that this is a useful way to link model parameters with behavior, and they should remain in the paper. However, my key objection still holds: these analyses do not tell us anything about how *people's* prior assumptions influence behavior. Instead, they tell us about how *fitted model parameters* depend on observed behavior. You can easily avoid this misreading by adding a small parenthetical, e.g.

      Thus, a prior assumption that control is likely available **(operationalized by \gamma_controllability)** was reflected in a futile investment of resources in uncontrollable environments.

      We thank the Reviewer for the suggestion and have added this parenthetical (lines 219, 225).

    1. eLife Assessment

      This study provides valuable insights with solid evidence into altered tactile perception in a mouse model of ASD (Fmr1 mice), paralleling sensory abnormalities in Fragile X and autism. Its main strength lies in the use of a novel tactile categorization task and the careful dissection of behavioral performance across training and difficulty levels, suggesting that deficits may stem from an interaction between sensory and cognitive processes. However, while the experiments are well executed, the reported effects are subtle and sometimes non-significant. The interpretation of results may be over-extended given the nature of the data (solely behavioral) and the absence of mechanistic, causal, or computational approaches limits the strength of the broader conclusions. The work will be relevant to those interested in autism, cognition, and/or sensory processing.

    2. Reviewer #1 (Public review):

      Summary:

      This study addresses the important question of how top-down cognitive processes affect tactile perception in autism - specifically, in the Fmr1-/y genetic mouse model of autism. Using a 2AFC tactile task in behaving mice, the study investigated multiple aspects of perceptual processing, including perceptual learning, stimulus categorization and discrimination, as well as the influence of prior experience and attention.

      Strengths:

      The experiments seem well performed, with interesting results. Thus, this study can/will advance our understanding of atypical tactile perception and its relation to cognitive factors in autism.

      Weaknesses:

      Certain aspects of the analyses (and therefore the results) are unclear, which makes the manuscript difficult to understand. Clearer presentation, with the addition of more standard psychometric analyses, and/or other useful models (like logistic regression) would improve this aspect. The use of d' needs better explanation, both in terms of how and why these analyses are appropriate (and perhaps it should be applied for more specific needs rather than as a ubiquitous measure).

    3. Reviewer #2 (Public review):

      Summary:

      This manuscript presents a tactile categorization task in head-fixed mice to test whether Fmr1 knockout mice display differences in vibrotactile discrimination using the forepaw. Tactile discrimination differences have been previously observed in humans with Fragile X Syndrome, autistic individuals, as well as mice with loss of Fmr1 across multiple studies. The authors show that during training, Fmr1 mutant mice display subtle deficits in perceptual learning of "low salience" stimuli, but not "high salience" stimuli, during the task. Following training, Fmr1 mutant mice displayed an enhanced tactile sensitivity under low-salience conditions but not high-salience stimulus conditions. The authors suggest that, under 'high cognitive load' conditions, Fmr1 mutant mouse performance during the lowest indentation stimuli presentations was affected, proposing an interplay of sensory and cognitive system disruptions that dynamically affect behavioral performance during the task.

      Strengths:

      The study employs a well-controlled vibrotactile discrimination task for head-fixed mice, which could serve as a platform for future mechanistic investigations. By examining performance across both training stages and stimulus "salience/difficulty" levels, the study provides a more nuanced view of how tactile processing deficits may emerge under different cognitive and sensory demands.

      Weaknesses:

      The study is primarily descriptive. The authors collect behavioral data and fit simple psychometric functions, but provide no neural recordings, causal manipulations, or computational modeling. Without mechanistic evidence, the conclusions remain speculative. Second, the authors repeatedly make strong claims about "categorical priors," "attention deficits," and "choice biases," but these constructs are inferred indirectly from secondary behavioral measures. Many of the effects are based on non-significant trends, and alternative explanations (such as differences in motivation, fatigue, satiety, stereotyped licking, and/or reward valuation) are not considered. Third, the mapping of the behavioral results onto high-level cognitive constructs is tenuous and overstated. The authors' interpretations suggest that they directly tested cognitive theories such as Load Theory, Adaptive Resonance Theory, or Weak Central Coherence. However, the experiments do not manipulate or measure variables that would allow such theories to be tested. More specific comments are included below.

      (1) The authors employ a two-choice behavioral task to assess forepaw tactile sensitivity in Fmr1 knockout mice. The data provide an interesting behavioral observation, but it is a descriptive study. Without mechanistic experiments, it is difficult to draw any conclusions, especially regarding top-down or bottom-up pathway dysfunctions. While the task design is elegant, the data remain correlational and do not advance our mechanistic understanding of Fmr1-related sensory and/or cognitive alterations.

      (2) The conclusions hinge on speculative inferences about "reduced top-down categorization influence" or "choice consistency bias," but no neural, circuit-level, or causal manipulations (e.g., optogenetics, pharmacology, targeted lesions, modeling) are used to support these claims. Without mechanistic data, the translational impact is limited.

      (3) Statistical analysis:

      (a) Several central claims are based on "trends" rather than statistically significant effects (e.g., reduced task sensitivity, reduced across-category facilitation). Building major interpretive arguments on non-significant findings undermines confidence in the conclusions.

      (b) The n number for both genotypes should be increased. In several experiments (e.g., Figure 1D, 2E), one animal appears to be an outlier. Considering the subtle differences between genotypes, such an outlier could affect the statistical results and subsequent interpretations.

      (c) The large number of comparisons across salience levels, categories, and trial histories raises concern for false positives. The manuscript does not clearly state how multiple comparisons were controlled.

      (d) The data in Figure 5, shown as separate panels per indentation value, are analyzed separately as t-tests or Mann-Whitney tests. However, individual comparisons are inappropriate for this type of data, as these are repeated stimulus applications across a given session. The data should be analyzed together and post-hoc comparisons reported. Given the very subtle difference in miss rates across control and mutant mice for 'low-salience' stimulus trials, this is unlikely to be a statistically meaningful difference when analyzed using a more appropriate test.

      (4) Emphasis on theoretical models:

      The paper leans heavily on theories such as Adaptive Resonance Theory, Load Theory of Attention, and Weak Central Coherence, but the data do not actually test these frameworks in a rigorous way. The discussion should be reframed to highlight the potential relevance of these frameworks while acknowledging that the current data do not allow them to be assessed.

    4. Reviewer #3 (Public review):

      Summary:

      Developing consistent and reliable biomarkers is critically important for developing new pharmacological therapies in autism spectrum disorders (ASDs). Altered sensory perception is one of the hallmarks of autism and has been recently added to DSM-5 as one of the core symptoms of autism. Touch is one of the fundamental sensory modalities, yet it is currently understudied. Furthermore, there seems to be a discrepancy between different studies from different groups focusing on tactile discrimination. It is not clear if this discrepancy can be explained by different experimental setups, inconsistent terminology, or the heterogeneity of sensory processing alterations in ASDs. The authors aim to investigate the interplay between tactile discrimination and cognitive processes during perceptual decisions. They have developed a forepaw-based 2-alternative choice task for mice and investigated tactile perception and learning in Fmr1-/y mice

      Strengths:

      There are several strengths of this task: translational relevance to human psychophysical protocols, including controlled vibrotactile stimulation. In addition to the experimental setup, there are also several interesting findings: Fmr1-/y mice demonstrated choice consistency bias, which may result in impaired perceptual learning, and enhanced tactile discrimination in low-salience conditions, as well as attentional deficits with increased cognitive load. The increase in the error rates for low salience stimuli is interesting. These observations, together with the behavioral design, may have a promising translational potential and, if confirmed in humans, may be potentially used as biomarkers in ASD.

      Weaknesses:

      Some weaknesses are related to the lack of the original raster plots and density plots of licks under different conditions, learning rate vs time, and evaluation of the learning rate at different stages of learning. Overall, these data would help to answer the question of whether there are differences in learning strategies or neural circuit compensation in Fmr1-/y mice. It is also not clear if reversal learning is impaired in Fmr1-/y mice.

    5. Author response:

      Reviewer #1 (Public review): 

      Summary: 

      This study addresses the important question of how top-down cognitive processes affect tactile perception in autism - specifically, in the Fmr1-/y genetic mouse model of autism. Using a 2AFC tactile task in behaving mice, the study investigated multiple aspects of perceptual processing, including perceptual learning, stimulus categorization and discrimination, as well as the influence of prior experience and attention.  

      We appreciate the reviewer’s statement highlighting the importance of our study. 

      Strengths: 

      The experiments seem well performed, with interesting results. Thus, this study can/will advance our understanding of atypical tactile perception and its relation to cognitive factors in autism. 

      We thank the reviewer for recognizing the quality of our experiments and the relevance of our findings for understanding tactile perception and cognition in autism.

      Weaknesses: 

      Certain aspects of the analyses (and therefore the results) are unclear, which makes the manuscript difficult to understand. Clearer presentation, with the addition of more standard psychometric analyses, and/or other useful models (like logistic regression) would improve this aspect. The use of d' needs better explanation, both in terms of how and why these analyses are appropriate (and perhaps it should be applied for more specific needs rather than as a ubiquitous measure). 

      We thank the reviewer for the helpful comments. We understand that the analyses were difficult to follow, and we will work on the clarity of the Results section. However, we would like to emphasize that every d′ measure is accompanied by analyses of response rates (i.e., correct and incorrect choice rates). In addition, we applied standard psychometric analyses whenever possible. Specifically, psychometric functions were fitted to the data using logistic regression. We will rework the text to clarify these points.

      During training, only two stimulus amplitudes were presented, which precluded the construction of psychometric curves. For the categorization task, however, psychometric analyses were feasible and conducted (Figure 2). These analyses revealed no evidence of categorization bias (as measured by threshold) or accuracy (as measured by the slope) across stimulus strengths.

      The calculation of d’ is included in the Methods, but we will also report and explain its use in each part of the Results section where it has been included.

      Reviewer #2 (Public review): 

      Summary: 

      This manuscript presents a tactile categorization task in head-fixed mice to test whether Fmr1 knockout mice display differences in vibrotactile discrimination using the forepaw. Tactile discrimination differences have been previously observed in humans with Fragile X Syndrome, autistic individuals, as well as mice with loss of Fmr1 across multiple studies. The authors show that during training, Fmr1 mutant mice display subtle deficits in perceptual learning of "low salience" stimuli, but not "high salience" stimuli, during the task. Following training, Fmr1 mutant mice displayed an enhanced tactile sensitivity under low-salience conditions but not high-salience stimulus conditions. The authors suggest that, under 'high cognitive load' conditions, Fmr1 mutant mouse performance during the lowest indentation stimuli presentations was affected, proposing an interplay of sensory and cognitive system disruptions that dynamically affect behavioral performance during the task. 

      Strengths: 

      The study employs a well-controlled vibrotactile discrimination task for head-fixed mice, which could serve as a platform for future mechanistic investigations. By examining performance across both training stages and stimulus "salience/difficulty" levels, the study provides a more nuanced view of how tactile processing deficits may emerge under different cognitive and sensory demands. 

      We thank the reviewer for emphasizing the strengths of our task design and analysis approach, and we appreciate that the potential of this platform for future mechanistic investigations is recognized.

      Weaknesses: 

      The study is primarily descriptive. The authors collect behavioral data and fit simple psychometric functions, but provide no neural recordings, causal manipulations, or computational modeling. Without mechanistic evidence, the conclusions remain speculative. 

      We thank the reviewer for the careful reading of our manuscript and for the constructive feedback. The reviewer raises a valid point. We agree that our study is primarily descriptive and focused on behavioral data, and we appreciate the opportunity to clarify the scope and interpretation of our findings. Our primary goal was to characterize behavioral patterns during tactile discrimination and categorization, and the psychometric analyses were intended to provide a detailed description of these patterns. We do not claim to provide direct neural, causal, or computational evidence. 

      Second, the authors repeatedly make strong claims about "categorical priors," "attention deficits," and "choice biases," but these constructs are inferred indirectly from secondary behavioral measures. Many of the effects are based on non-significant trends, and alternative explanations (such as differences in motivation, fatigue, satiety, stereotyped licking, and/or reward valuation) are not considered. 

      Alternative explanations of our findings, such as differences in motivation, fatigue, satiety, stereotyped licking, and reward valuation have indeed been considered. We will revise the manuscript to present these points more clearly. 

      Third, the mapping of the behavioral results onto high-level cognitive constructs is tenuous and overstated. The authors' interpretations suggest that they directly tested cognitive theories such as Load Theory, Adaptive Resonance Theory, or Weak Central Coherence. However, the experiments do not manipulate or measure variables that would allow such theories to be tested. More specific comments are included below.

      This was not done intentionally. We do not claim to have tested the Load Theory; rather, inspired by it, we assessed behavioral patterns in our tactile categorization task. We agree that referring to the Adaptive Resonance Theory, which is based on artificial neural network models, might be misleading since we focus on behavioral results, and we will revise the text accordingly. However, our task allowed us to examine the impact of categorization on discrimination, confirming that Fmr1<sup>-/y</sup>ation can amplify perceptual differences between stimuli belonging to different categories and reduce perceived differences within a category in WT mice but not in the mice when low-salience stimuli were experienced. Finally, we do not claim to have tested the Weak Central Coherence theory, although our results suggest reduced use of categories in low-salience tactile discrimination. 

      (1) The authors employ a two-choice behavioral task to assess forepaw tactile sensitivity in Fmr1 knockout mice. The data provide an interesting behavioral observation, but it is a descriptive study. Without mechanistic experiments, it is difficult to draw any conclusions, especially regarding top-down or bottom-up pathway dysfunctions. While the task design is elegant, the data remain correlational and do not advance our mechanistic understanding of Fmr1-related sensory and/or cognitive alterations. 

      We agree with the reviewer that our current experiments are behavioral in nature and do not provide direct mechanistic evidence for top-down pathway dysfunction. Our goal was to carefully characterize tactile responses and behavioral patterns in Fmr1<sup>-/y</sup> mice. The notion of “top-down” is used at the behavioral level, referring to the influence of higher-level cognitive processes (e.g., categorization, attention) on perception, rather than to underlying neural circuits. We will revise the manuscript to more clearly emphasize that our conclusions are based on behavioral observations, and we will frame mechanistic inferences as hypotheses rather than established findings. We will also explicitly note that future work using neural recordings or causal manipulations will be required to directly test these hypotheses.

      We also note that identifying the precise top-down circuits involved will require extensive additional experimentation. For example, one would first need to pinpoint the specific top-down pathway that modulates the influence of categorization on discrimination without directly altering categorization itself. After such a circuit is identified, further work would then be needed to rescue or manipulate this pathway in the Fmr1<sup>-/y</sup> model. These steps represent a substantial program of mechanistic research that, while important, goes well beyond the scope of the present study.

      (2) The conclusions hinge on speculative inferences about "reduced top-down categorization influence" or "choice consistency bias," but no neural, circuit-level, or causal manipulations (e.g., optogenetics, pharmacology, targeted lesions, modeling) are used to support these claims. Without mechanistic data, the translational impact is limited. 

      We recognize that “reduced top-down categorization influence” and “choice consistency bias” are based on behavioral observations. However, we respectfully disagree that this makes these constructs inherently speculative. Similar behavioral inferences have been applied in previous clinical studies to characterize cognitive tendencies (Soulières et al., 2007; Feigin et al., 2021). The translational impact of our work lies in the highly translational platform we have developed – and in highlighting the complexity of tactile measures and additional analyses that can be conducted in clinical studies.

      We agree with the reviewer that the neural-based experiments would indeed provide valuable mechanistic insight into our observed behavioral alterations, and we believe future studies should therefore focus on their underlying neurobiological substrate.

      We will revise the language throughout the manuscript to clarify that all conclusions are based on behavioral measures.  

      (3) Statistical analysis: 

      (a) Several central claims are based on "trends" rather than statistically significant effects (e.g., reduced task sensitivity, reduced across-category facilitation). Building major interpretive arguments on nonsignificant findings undermines confidence in the conclusions.  

      Several trends are evident in complex measures, such as d’ analyses on task sensitivity or responses pooled across different amplitudes. Additional analyses revealed which component of these measures showed a statistically significant difference across genotypes, namely the low-salience incorrect choices accounting for low task sensitivity. We chose to present all analyses to be transparent and to highlight that commonly used complex measures (like d’ analyses) may mask important findings. In the text, we described p-values between 0.05 and 0.1 as observed trends without over-interpreting their significance. 

      (b) The n number for both genotypes should be increased. In several experiments (e.g., Figure 1D, 2E), one animal appears to be an outlier. Considering the subtle differences between genotypes, such an outlier could affect the statistical results and subsequent interpretations. 

      The number of mice used in each genotype group is consistent with standard practices in behavioral studies using mice and sensory tasks. We have performed effect size measures (e.g., Cohen’s d) alongside some of the statistical comparisons, showing a medium effect size (>0.5). 

      As the reviewer correctly noted, no mice were excluded based on outlier analyses, since the observed variability reflects true biological differences rather than experimental or technical errors. We will reexamine our dataset for potential outliers. If any are identified, we will perform analyses both with and without the outlier and report any effects that are sensitive to single animals. These procedures and results will be explicitly described in the Methods and Results sections.

      (c) The large number of comparisons across salience levels, categories, and trial histories raises concern for false positives. The manuscript does not clearly state how multiple comparisons were controlled.  

      We thank the reviewer for raising this important point and we will include a clear statement on multiple comparisons in the Methods section. 

      (d) The data in Figure 5, shown as separate panels per indentation value, are analyzed separately as ttests or Mann-Whitney tests. However, individual comparisons are inappropriate for this type of data, as these are repeated stimulus applications across a given session. The data should be analyzed together and post-hoc comparisons reported. Given the very subtle difference in miss rates across control and mutant mice for 'low-salience' stimulus trials, this is unlikely to be a statistically meaningful difference when analyzed using a more appropriate test. 

      We thank the reviewer for raising this point. This was not done intentionally. A repeated-measures ANOVA on miss rates for low-salience stimuli during categorization confirmed that there are statistically significant differences both across stimulus amplitudes and between genotypes. Additional correction for multiple comparisons will be performed and explained in the Methods section.  

      (4) Emphasis on theoretical models: The paper leans heavily on theories such as Adaptive Resonance Theory, Load Theory of Attention, and Weak Central Coherence, but the data do not actually test these frameworks in a rigorous way. The discussion should be reframed to highlight the potential relevance of these frameworks while acknowledging that the current data do not allow them to be assessed. 

      As mentioned above, our goal was not to directly test these theories but rather to apply them within our translational framework. The Discussion section will be reframed to highlight that our findings are consistent with predictions from certain cognitive theories rather than implying that these frameworks were directly tested.

      Reviewer #3 (Public review): 

      Summary: 

      Developing consistent and reliable biomarkers is critically important for developing new pharmacological therapies in autism spectrum disorders (ASDs). Altered sensory perception is one of the hallmarks of autism and has been recently added to DSM-5 as one of the core symptoms of autism. Touch is one of the fundamental sensory modalities, yet it is currently understudied. Furthermore, there seems to be a discrepancy between different studies from different groups focusing on tactile discrimination. It is not clear if this discrepancy can be explained by different experimental setups, inconsistent terminology, or the heterogeneity of sensory processing alterations in ASDs. The authors aim to investigate the interplay between tactile discrimination and cognitive processes during perceptual decisions. They have developed a forepaw-based 2-alternative choice task for mice and investigated tactile perception and learning in Fmr1-/y mice 

      Strengths: 

      There are several strengths of this task: translational relevance to human psychophysical protocols, including controlled vibrotactile stimulation. In addition to the experimental setup, there are also several interesting findings: Fmr1-/y mice demonstrated choice consistency bias, which may result in impaired perceptual learning, and enhanced tactile discrimination in low-salience conditions, as well as attentional deficits with increased cognitive load. The increase in the error rates for low salience stimuli is interesting. These observations, together with the behavioral design, may have a promising translational potential and, if confirmed in humans, may be potentially used as biomarkers in ASD. 

      We appreciate the reviewer’s positive assessment of our study’s translational value and the importance of our behavioral findings.

      Weaknesses: 

      Some weaknesses are related to the lack of the original raster plots and density plots of licks under different conditions, learning rate vs time, and evaluation of the learning rate at different stages of learning. Overall, these data would help to answer the question of whether there are differences in learning strategies or neural circuit compensation in Fmr1-/y mice. It is also not clear if reversal learning is impaired in Fmr1-/y mice.  

      We thank the reviewer for these helpful suggestions. We agree that visualizing behavioral patterns, such as raster and density plots of licks, as well as learning rate over time, could provide additional insights into learning dynamics. This analysis will be conducted and added into the revised manuscript.

      There was no assessment of reversal learning in Fmr1<sup>-/y</sup> mice in this study. While it is an interesting and important question based on previous findings in preclinical and clinical studies, it falls outside the scope of the current manuscript.    

      Feigin H, Shalom-Sperber S, Zachor DA, Zaidel A (2021) Increased influence of prior choices on perceptual decisions in autism. Elife 10.

      Soulières I, Mottron L, Saumier D, Larochelle S (2007) At ypical categorical perception in autism: Autonomy of discrimination? J Autism Dev Disord 37:481–490.

    1. eLife Assessment

      This important study reports an endometrial organoid culture system mimicking the window of implantation. The evidence supporting the conclusion drawn is convincing. The data will be of interest to embryologists and investigators working on reproductive biology and medicine.

    2. Reviewer #1 (Public review):

      Summary:

      This study generated 3D cell constructs from endometrial cell mixtures that were seeded in the Matrigel scaffold. The cell assemblies were treated with hormones to induce a "window of implantation" (WOI) state. Although many bioinformatic analyses point in this direction, there are major concerns that must be addressed.

      Strengths:

      The addition of 3 hormones to enhance the WOI state (although not clearly supported in comparison to the secretory state).

      Comments on revisions:

      The authors did their best to revise their study according to the Reviewers' comments. However, the study remains unconvincing, incomplete and at the same time still too dense and not focused enough.

    3. Reviewer #2 (Public review):

      Zhang et al. have developed an advanced three-dimensional culture system of human endometrial cells, termed a receptive endometrial assembloid, that models the uterine lining during the crucial window of implantation (WOI). During this mid-secretory phase of the menstrual cycle, the endometrium becomes receptive to an embryo, undergoing distinctive changes. In this work, endometrial cells (epithelial glands, stromal cells, and immune cells from patient samples) were grown into spheroid assembloids and treated with a sequence of hormones to mimic the natural cycle. Notably, the authors added pregnancy-related factors (such as hCG and placental lactogen) on top of estrogen and progesterone, pushing the tissue construct into a highly differentiated, receptive state. The resulting WOI assembloid closely resembles a natural receptive endometrium in both structure and function. The cultures form characteristic surface structures like pinopodes and exhibit abundant motile cilia on the epithelial cells, both known hallmarks of the mid-secretory phase. The assembloids also show signs of stromal cell decidualization and an epithelial mesenchymal transition, like process at the implantation interface, reflecting how real endometrial cells prepare for possible embryo invasion.

      Although the WOI assembloid represents an important step forward, it still has limitations: the supportive stromal and immune cell populations decrease over time in culture, so only early-passage assembloids retain full complexity. Additionally, the differences between the WOI assembloid and a conventional secretory-phase organoid are more quantitative than absolute; both respond to hormones and develop secretory features, but the WOI assembloid achieves a higher degree of differentiation due to the addition of "pregnancy" signals. Overall, while it's a reinforced model (not an exact replica of the natural endometrium), it provides a valuable in vitro system for implantation studies and testing potential interventions, with opportunities to improve its long-term stability and biological fidelity in the future.

    1. eLife Assessment

      This study presents a valuable finding on whether executive resources mediate the impact of language predictability in reading in the context of aging. The presentation of evidence is incomplete; further conceptual clarifications, methodological details, and addressing potential confounds would strengthen the study. The work will be of interest to cognitive neuroscientists working on reading, language comprehension, and executive control.

    2. Reviewer #1 (Public review):

      This manuscript reports a dual-task experiment intended to test whether language prediction relies on executive resources, using surprisal-based measures of predictability and an n-back task to manipulate cognitive load. While the study addresses a question under debate, the current design and modeling framework fall short of supporting the central claims. Key components of cognitive load, such as task switching, word prediction vs integration, are not adequately modeled. Moreover, the weak consistency in replication undermines the robustness of the reported findings. Below unpacks each point.

      Cognitive load is a broad term. In the present study, it can be at least decomposed into the following components:

      (1) Working memory (WM) load: news, color, and rank.

      (2) Task switching load: domain of attention (color vs semantics), sensorimotor rules (c/m vs space).

      (3) Word comprehension load (hypothesized against): prediction, integration.

      The components of task switching load should be directly included in the statistical models. Switching of sensorimotor rules may be captured by the "n-back reaction" (binary) predictor. However, the switching of attended domains and the interaction between domain switching and rule complexity (1-back or 2-back) were not included. The attention control experiment (1) avoided useful statistical variation from the Read Only task, and (2) did not address interactions. More fundamentally, task-switching components should be directly modeled in both performance and full RT models to minimize selection bias. This principle also applies to other confounding factors, such as education level. While missing these important predictors, the current models have an abundance of predictors that are not so well motivated (see later comments). In sum, with the current models, one cannot determine whether the reduced performance or prolonged RT was due to affecting word prediction load (if it exists) or merely affecting the task switching load.

      The entropy and surprisal need to be more clearly interpreted and modeled in the context of the word comprehension process. The entropy concerns the "prediction" part of the word comprehension (before seeing the next word), whereas surprisal concerns the "integration" part as a posterior. This interpretation is similar to the authors writing in the Introduction that "Graded language predictions necessitate the active generation of hypotheses on upcoming words as well as the integration of prediction errors to inform future predictions [1,5]." However, the Results of this study largely ignored entropy (treating it as a fixed effect) and only focus on surprisal without clear justification.

      In Table S3, with original and replicated model fitting results, the only consistent interaction is surprisal x age x cognitive load [2-back vs. Reading Only]. None of the two-way interactions can be replicated. This is puzzling and undermines the robustness of the main claims of this paper.

    3. Reviewer #2 (Public review):

      Summary:

      This paper considers the effects of cognitive load (using an n-back task related to font color), predictability, and age on reading times in two experiments. There were main effects of all predictors, but more interesting effects of load and age on predictability. The effect of load is very interesting, but the manipulation of age is problematic, because we don't know what is predictable for different participants (in relation to their age). There are some theoretical concerns about prediction and predictability, and a need to address literature (reading time, visual world, ERP studies).

      Strengths/weaknesses

      It is important to be clear that predictability is not the same as prediction. A predictable word is processed faster than an unpredictable word (something that has been known since the 1970/80s), e.g., Rayner, Schwanenfluegel, etc. But this could be due to ease of integration. I think this issue can probably be dealt with by careful writing (see point on line 18 below). To be clear, I do not believe that the effects reported here are due to integration alone (i.e., that nothing happens before the target word), but the evidence for this claim must come from actual demonstrations of prediction.

      The effect of load on the effects of predictability is very interesting (and also, I note that the fairly novel way of assessing load is itself valuable). Assuming that the experiments do measure prediction, it suggests that they are not cost-free, as is sometimes assumed. I think the researchers need to look closely at the visual world literature, most particularly the work of Huettig. (There is an isolated reference to Ito et al., but this is one of a large and highly relevant set of papers.)

      There is a major concern about the effects of age. See the Results (161-5): this depends on what is meant by word predictability. It's correct if it means the predictability in the corpus. But it may or may not be correct if it refers to how predictable a word is to an individual participant. The texts are unlikely to be equally predictable to different participants, and in particular to younger vs. older participants, because of their different experiences. To put it informally, the newspaper articles may be more geared to the expectations of younger people. But there is also another problem: the LLM may have learned on the basis of language that has largely been produced by young people, and so its predictions are based on what young people are likely to say. Both of these possibilities strike me as extremely likely. So it may be that older adults are affected more by words that they find surprising, but it is also possible that the texts are not what they expect, or the LLM predictions from the text are not the ones that they would make. In sum, I am not convinced that the authors can say anything about the effects of age unless they can determine what is predictable for different ages of participants. I suspect that this failure to control is an endemic problem in the literature on aging and language processing and needs to be systematically addressed.

      Overall, I think the paper makes enough of a contribution with respect to load to be useful to the literature. But for discussion of age, we would need something like evidence of how younger and older adults would complete these texts (on a word-by-word basis) and that they were equally predictable for different ages. I assume there are ways to get LLMs to emulate different participant groups, but I doubt that we could be confident about their accuracy without a lot of testing. But without something like this, I think making claims about age would be quite misleading.

    4. Author response:

      Reviewer #1 (Public review):

      Cognitive Load and Task-Switching Components:

      We agree that cognitive load is multi-faceted and encompasses dimensions not fully captured in our present models, including domain and rule switching. For the revision, we will explicitly model these components in the statistical analyses by incorporating predictors reflecting attended domain switching and rule complexity, as suggested. We will also explain our inclusion of n-back reaction predictors and justify their relationship with theoretical constructs of executive function. Full details of coding schemes will be provided.

      Modeling Entropy and Surprisal:

      We appreciate the reviewer’s suggestion to further explain the distinction between entropy (predictive uncertainty) and surprisal (integration difficulty), and acknowledge that our treatment of entropy warrants extension. In the revision, we will expand the results and discussion on entropy, providing clearer theoretical motivation for its inclusion and conducting supplementary analyses to examine its role alongside surprisal.

      Replicability of Findings:

      We note the concern regarding two-way vs. three-way interactions in model replication. In the revised manuscript, we will report robustness analyses on subsets of our data (e.g., matched age and education groups), clarify degrees of freedom and group sizes, and transparently report any discrepancies.

      Predictors and Statistical Modeling:

      We will add clarifications on predictor selection, data structure, and rationale for model hierarchy. The functions of d-prime, comprehension accuracy, and performance modeling will be described in more detail, including discussion of block-level vs. participant-level effects.

      Reviewer #2 (Public review):

      Distinction Between Prediction and Predictability:

      We recognize the importance of clearly communicating the difference between prediction and predictability, as well as integration-based vs. prediction-based effects. We will clarify these distinctions throughout the introduction, methods, and discussion sections, citing the relevant theoretical literature (e.g., Pickering & Gambi 2018; Federmeier 2007; Staub 2015; Frisson 2017).

      Aging, Corpus Predictability, and Individual Differences:

      We appreciate the critical point regarding age, corpus-based predictability, and potential cohort effects in language model estimates. In the revision, we will provide conceptual clarifications on how surprisal and entropy might differ for different age groups and discuss limitations in extrapolating these metrics to participant-specific predictions. The limitations inherent in relying on LLM-derived estimates and text materials will be more directly addressed.

      Coverage of Literature and Paradigms:

      We will broaden the literature review as requested, particularly on the N400 effects and behavioral traditions in prediction research. These additions should help contextualize the present work within both neuroscience and psycholinguistics.

      Experimental Context and Predictability Metrics:

      We will address concerns regarding the context window for prediction estimation, describing more precisely how context was defined and whether broader textual cues may improve predictability metrics.

      References

      Pickering, M.J. & Gambi, C. (2018). Predicting while comprehending language: A theory and review. Psychol. Bull., 144(10), 1002–1044.

      Federmeier, K.D. (2007). Thinking ahead: The role and roots of prediction in language comprehension. Psychophysiology, 44(4), 491–505.

      Frisson, S. (2017). Can prediction explain the lexical processing advantage for short words? J. Mem. Lang., 95, 121–138.\

      Staub, A. (2015). The effect of lexical predictability on eye movements in reading: Critical review and theoretical interpretation. Lang. Linguist. Compass, 9(8), 311–327.Huettig, F. & Mani, N. (2016). Is prediction necessary to understand language? Probably not. Trends Cogn. Sci., 20(10), 484–492.We appreciate the reviewers’ constructive comments and believe their suggestions will meaningfully strengthen the paper. Our planned revisions will address each of the above points with additional analyses, clarifications, and expanded discussion.

    1. eLife Assessment

      This study used a conditional knockout mouse line to remove Ptbp1 in retinal progenitors and showed that its deletion has no effect on retinal neurogenesis or cell fate specification, thereby challenging the prevailing view of Ptbp1 as a master regulator of neuronal fate. The findings are convincing, supported by transcriptome analysis, histology, and proliferation assays. This study is important, though the genetic tools employed may not fully capture Ptbp1's potential role during the earliest stages of retinal development.

    2. Reviewer #1 (Public review):

      Summary:

      The researchers sought to determine whether Ptbp1, an RNA-binding protein formerly thought to be a master regulator of neuronal differentiation, is required for retinal neurogenesis and cell fate specification. They used a conditional knockout mouse line to remove Ptbp1 in retinal progenitors and analyzed the results using bulk RNA-seq, single-cell RNA-seq, immunohistochemistry, and EdU labeling. Their findings show that Ptbp1 deletion has no effect on retinal development, since no defects were found in retinal lamination, progenitor proliferation, or cell type composition. Although bulk RNA-seq indicated changes in RNA splicing and increased expression of late-stage progenitor and photoreceptor genes in the mutants, and single-cell RNA-seq detected relatively minor transcriptional shifts in Müller glia, the overall phenotypic impact was low. As a result, the authors conclude that Ptbp1 is not required for retinal neurogenesis and development, thus contradicting prior statements about its important role as a master regulator of neurogenesis. They argue for a reassessment of this stated role. While the findings are strong in the setting of the retina, the larger implications for other areas of the CNS require more investigation. Furthermore, questions about potential reimbursement from Ptbp2 warrant further research.

      Strengths:

      This study calls into doubt the commonly held belief that Ptbp1 is a critical regulator of neurogenesis in the CNS, particularly in retinal development. The adoption of a conditional knockout mouse model provides a reliable way for eliminating Ptbp1 in retinal progenitors while avoiding the off-target effects often reported in RNAi experiments. The combination of bulk RNA-seq, scRNA-seq, and immunohistochemistry enables a thorough examination of molecular and cellular alterations at both embryonic and postnatal stages, which strengthens the study's findings. Furthermore, using publicly available RNA-Seq datasets for comparison improves the investigation of splicing and expression across tissues and cell types. The work is well-organized, with informative figure legends and supplemental data that clearly show no substantial phenotypic changes in retinal lamination, proliferation, or cell destiny, despite identified transcriptional and splicing modifications.

      Weaknesses:

      The retina-specific method raises questions regarding whether Ptbp1 is required in other CNS locations where its neurogenic roles were first proposed. The claim that Ptbp1 is "fully dispensable" for retinal development may be toned down, given the transcriptional and splicing modifications identified. The possibility of subtle or transitory impacts, such as ectopic neuron development followed by cell death, is postulated, but not completely investigated. Furthermore, as the authors point out, the compensating potential of increased Ptbp2 warrants additional exploration. Although the study performs well in transcriptome and histological analyses, it lacks functional assessments (such as electrophysiological or behavioral testing) to determine if small changes in splicing or gene expression affect retinal function. While 864 splicing events have been found, the functional significance of these alterations, notably the 7% that are neuronal-enriched and the 35% that are rod-specific, has not been thoroughly investigated. The manuscript might be improved by describing how these splicing changes affect retinal development or function.

    3. Reviewer #2 (Public review):

      Summary:

      Ptbp1 has been proposed as a key regulator of neuronal fate through its role in repressing neurogenesis. In this study, the authors conditionally inactivated Ptbp1 in mouse retinal progenitor cells using the Chx10-Cre line. While RNA-seq analysis at E16 revealed some changes in gene expression, there were no significant alterations in retinal cell type composition, and only modest transcriptional changes in the mature retina, as assessed by immunofluorescence and scRNAseq. Based on these findings, the authors conclude that Ptbp1 is not essential for cell fate determination during retinal development.

      Strengths:

      Despite some effects of Ptbp1 inactivation (initiated around E11.5 with the onset of Chx10-Cre activity) on gene expression and splicing, the data convincingly demonstrate that retinal cell type composition remains largely unaffected. This study is highly significant since it challenges the prevailing view of Ptbp1 as a central repressor of neurogenesis and highlights the need to further investigate, or re-evaluate, its role in other model systems and regions of the CNS.

      Weaknesses:

      A limitation of the study is the use of the Chx10-Cre driver, which initiates recombination around E11. This timing does not permit assessment of Ptbp1 function during the earliest phases of retinal development, if expressed at that time.

    1. eLife Assessment

      This valuable study presents a mechanistic model of predictive coding by medial entorhinal cortex grid cells, implemented with biologically detailed conductance-based neurons. The evidence supporting the emergence of this coding scheme from specific membrane currents and the anatomical connectivity among inhibitory neurons is solid. However, the justification for the choice of connectivity patterns and other network parameters remains somewhat incomplete. This work will be of interest to neuroscientists working on spatial navigation, circuit dynamics, and neuronal coding.

    2. Reviewer #1 (Public review):

      Summary:

      In this manuscript, the authors aim to elucidate the mechanisms by which grid cells in the medial entorhinal cortex generate predictive representations of spatial location. To address this, they built a computational model integrating intrinsic neuronal dynamics with structured network connectivity. Specifically, they combine a conductance-based single-cell model incorporating biologically realistic HCN channels with a continuous attractor network that reflects known properties of grid cell circuitry. Their simulations show that HCN conductance can shift grid fields forward by approximately 5% of their diameter, consistent with experimental observations in layer II grid cells. Additionally, by introducing asymmetry in the connectivity of interneurons, the model produces larger forward shifts, which parallel properties observed in layer III grid cells. Together, these two mechanisms provide a unified framework for explaining layer-specific predictive coding in the entorhinal cortex.

      Strengths:

      A major strength of the study lies in its conceptual contribution. The authors propose two distinct mechanisms to generate forward-shifted grid fields for predictive coding. One mechanism is intrinsic and depends on the time constants associated with HCN channels. The other is network-based and results from asymmetries in interneuron connectivity. These two mechanisms correspond to different observed properties of grid cells in layer II and layer III, respectively. The modeling is based on previously validated frameworks of continuous attractor network models (e.g., Burak & Fiete; Kang & DeWeese), but it incorporates several novel features, including the incorporation of biophysically realistic HCN channels, a network architecture that excludes stellate-stellate connections and relies on interneurons, and asymmetric interneuron connectivity.

      Weaknesses:

      One of the proposed mechanisms for predictive coding, namely asymmetric interneuron connectivity, is a novel idea. However, this type of connectivity has not yet been demonstrated experimentally in the medial entorhinal cortex. Therefore, the biological plausibility of this mechanism remains uncertain and will need to be evaluated in future empirical studies.

    3. Reviewer #2 (Public review):

      Summary:

      This study proposes that predictive spatial representations in medial entorhinal cortex (MEC) grid cells arise through two distinct biophysical mechanisms: (1) HCN conductance-dependent temporal dynamics, which generate modest forward shifts (~5% of grid field diameter) in Layer II cells, and (2) network asymmetry, enabling larger predictive shifts (~25% of grid field diameter) in Layer III cells. The model further predicts a dorsoventral gradient in predictive coding magnitude, correlating with observed HCN conductance variations. These results provide a mechanistic framework for understanding how intrinsic cellular properties and circuit architecture collectively enable prospective spatial coding in the MEC. This is an important study.

      Strengths:

      These findings reveal how cellular properties and circuit design enable prospective spatial coding. This novel, impactful study will be of interest to the field.

      Weaknesses:

      Some of the models are too mathematical and do not fit with the biological observation.

    4. Reviewer #3 (Public review):

      Summary:

      The manuscript by Shaikh and Assisi addresses a timely and important question related to the neural circuit mechanisms underlying spatial representations during navigation. Concretely, they present a model of the medial entorhinal cortex (MEC) with biophysically detailed conductance-based stellate cells that can perform path integration and reveal two potential mechanisms underlying two forms of predictive coding by grid cells in the MEC. One mechanism uses HCN channels to explain predictive coding in MEC layer II grid cells equivalent to ~5% of the diameter of a grid field, and the other uses asymmetric connections between interneurons and stellate cells, resulting in a ~25% predictive bias of layer III grid cells. The methods and model are technically sound, and the model is expected to be useful for computational neuroscientists studying the neural mechanisms of spatial navigation.

      Strengths:

      One strength of the model is its use of conductance-based neuron models of stellate cells and interneurons, adding important biophysical constraints and details to existing continuous attractor network models of grid cells. The model fills a gap in the literature by providing mechanisms for predictive coding constrained by biophysical properties of stellate cells and simplified network topology.

      Weaknesses:

      A weakness of the model is that the neural network is relatively small (five sheets with 71 × 71 neurons each), and the 2-D toroidal topology is further simplified to a 1-D ring attractor consisting of three rings with 192 neurons each. The model incorporates biophysical detail at the single-neuron level, but not at the network level. For example, it includes only stellate cells and a generic interneuron type, and does not implement data-driven connectivity patterns.

      The restricted network size and the limited experimental knowledge about connectivity among stellate cells, principal cells, and different interneuron types in the MEC could be addressed in more detail. Moreover, the manuscript lacks a thorough discussion of assumptions common to most continuous attractor network (CAN) models of grid cells, such as the use of "hand-crafted" connections between direction-sensitive conjunctive grid cells and network cells to drive attractor shifts. Including such a discussion would strengthen the manuscript. This is especially relevant given the authors' explicit claim that they have revealed two mechanisms underlying the emergence of a predictive code in the MEC. In this reviewer's view, the work demonstrates a potential mechanism, but one that requires experimental verification. The significance of the model would thus be increased by providing more experimentally testable predictions of the model.

    1. eLife Assessment

      This fundamental study shows how past experiences shape perception across short, medium, and long time scales, using a single behavioural paradigm and reanalysed EEG data. It provides convincing evidence for two processes across all scales: an attention-dependent mechanism that speeds responses to expected events, and an attention-independent mechanism where expected events are encoded less precisely, consistent with feedforward dampening. The work offers a unifying account of temporal context effects, though stronger brain-behaviour links, integration with serial dependence attraction and repulsion models, and extension to other timescale definitions would further strengthen the contribution.

    2. Reviewer #1 (Public review):

      Summary:

      This paper addresses an important and topical issue: how temporal context - at various time scales - affects various psychophysical measures, including reaction times, accuracy and localization. It offers interesting insights, with separate mechanisms for different phenomena, which are well discussed.

      Strengths:

      The paradigm used is original and effective. The analyses are rigorous.

      Comments on revised version:

      I think the authors have dealt adequately with my issues, none of which were fundamental.

    3. Reviewer #2 (Public review):

      Summary:

      This study investigates the influence of prior stimuli over multiple time scales in a position discrimination task, using pupillometry data and a reanalysis of EEG data from an existing dataset. The authors report consistent history-dependent effects across task-related, task-unrelated, and stimulus-related dimensions, observed across different time scales. These effects are interpreted as reflecting a unified mechanism operating at multiple temporal levels, framed within predictive coding theory.

      Strengths:

      The authors have done a good job in their revision, clarifying important points and stating the limitations of the study clearly.

      I also think they made a valid effort to address and correct issues arising from the temporal dependency confound, although I still wonder whether the best approach would have been to design an experiment in a way that avoided this confound in the first place.<br /> Overall, this is a substantially improved version, and I particularly appreciate the clarification and correction regarding the direction of the bias in the EEG data (repulsive rather than attractive).

      Weaknesses:

      These are now relatively minor points.

      I believe this latter aspect, the repulsive bias, may deserve further discussion, especially in relation to their behavioral findings and, in particular, to earlier work proposing multi-stage frameworks of serial dependence, where low-level repulsion interacts with attractive biases at higher-level stages (Fritsche et al., 2020; Pascucci et al., 2019; Sheehan & Serences, 2022). The authors may also consider to cite some key reviews on serial dependence that discuss both repulsion and attraction in forced-choice and reproduction tasks (Manassi et al., 2023; Pascucci et al., 2023).

      Related to this, after finding the opposite pattern, is the sentence in line 472-473 ("Further, we found an attractive...") and the related argument still valid?

      Regarding my earlier point about former line 197 and Figure 3b,c: what I noticed-similar to the patterns reported in the studies I referenced-is that the data cannot be simply described as showing faster and more accurate responses for small deltas. Responses also appear faster and more accurate for very large deltas, with performance being worse in between. Indeed, as the authors state: "The peak in precision for large Deltas locations is consistent with alternate events being encoded more precisely, while the peak for small offsets may be explained by the attractive bias towards the previous target." I wonder whether it is necessary, or unequivocally supported by the data, to hypothesize two separate mechanisms here. An alternative could be interference effects between consecutive stimuli that are neither identical nor completely different-making the previous one more likely to interfere with the current stimulus representation.

      Finally, this is definitely a minor point, but I still find the reply to my comment about the prediction of stable retinal input rather speculative. Such a prediction would seem more plausible in world-centered coordinates.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      (1) The manuscript is quite dense, with some concepts that may prove difficult for the non-specialist. I recommend spending a few more words (and maybe some pictures) describing the difference between task-relevant and task-irrelevant planes. Nice technique, but not instantly obvious. Then we are hit with "stimulus-related", which definitely needs some words (also because it is orthogonal to neither of the above). 

      We agree that the original description of the planes was too terse and have expanded on this in the revised manuscript.

      Line 85 - To test the influence of attention, trials were sorted according to two spatial reference planes, based on the location of the stimulus: task-related and task-unrelated (Fig. 1b). The task-related plane corresponded to participants’ binary judgement (Fig 1b, light cyan vertical dashed line) and the task-unrelated plane was orthogonal to this (Fig 1b, dark cyan horizontal dashed line). For example, if a participant was tasked with performing a left-or-right of fixation judgement, then their task-related plane was the vertical boundary between the left and right side of fixation, while their task-unrelated plane was the horizontal boundary. The former (left-right) axis is relevant to their task while the latter (top-bottom) axis is orthogonal and task irrelevant. This orthogonality can be leveraged to analyze the same data twice (once according to the task-related plane and again according to the taskunrelated plane) in order to compare performance when the relative location of an event is either task relevant or irrelevant.

      Line 183 - whereas task planes were constant, the stimulus-related plane was defined by the location of the stimulus on the previous trial, and thus varied from trial to trial. That is, on each trial, the target is considered a repeat if it changes location by <|90°| relative to its location on the previous trial, and an alternate if it moves by >|90°|.

      (2) While I understand that the authors want the three classical separations, I actually found it misleading. Firstly, for a perceptual scientist to call intervals in the order of seconds (rather than milliseconds), "micro" is technically coming from the raw prawn. Secondly, the divisions are not actually time, but events: micro means one-back paradigm, one event previously, rather than defined by duration. Thirdly, meso isn't really a category, just a few micros stacked up (and there's not much data on this). And macro is basically patterns, or statistical regularities, rather than being a fixed time. I think it would be better either to talk about short-term and long-term, which do not have the connotations I mentioned. Or simply talk about "serial dependence" and "statistical regularities". Or both. 

      We agree that the temporal scales defined in the current study are not the only way one could categorize perceptual time. We also agree that by using events to define scales, we ignore the influence of duration. In terms of the categories, we selected these for two reasons: 1) they conveniently group previous phenomena, and 2) they loosely correspond to iconic-, short- and long-term memory. We agree that one could also potentially split it up into two categories (e.g., short- and long-term), but in general, we think any form of discretization will have limitations. For example, Reviewer 1 suggests that the meso category is simply a few micros stacked together. However, there is a rich literature on phenomena associated with sequences of an intermediate length that do not appear to be entirely explained by stacking micro effects (e.g., sequence learning and sequential dependency). We also find that when controlling for micro level effects, there are clear meso level effects. Also, by the logic that meso level effects are just stacked micro effects, one could also argue the same for macro effects. We don’t think this argument is incorrect, rather we think it exemplifies the challenge of discretising temporal scales. Ultimately, the current study was aimed to test whether seemingly disparate phenomena identified in previous work could be captured by unifying principles. To this end we found that these categories were the most useful. However, we have included a “Limitations and future directions” section in the Discussion of the revised manuscript that acknowledges both the alternative scheme proposed by Reviewer 1, and the value of extending this work to consider the influence of duration (as well as events).

      Line 488 - Limitations and future directions. One potential limitation of the current study is the categorization of temporal scales according to events, independent of the influence of event duration. While this simplification of time supports comparison between different phenomena associated with each scale (e.g., serial dependence, sequential dependencies, statistical learning), future work could investigate the role of duration to provide a more comprehensive understanding of the mechanisms identified in the current study.

      Related to this, while the temporal scales applied here conveniently categorized known sensory phenomena, and partially correspond to iconic-, short-, and long-term memory, they are but one of multiple ways to delineate time. For example, temporal scales could alternatively be defined simply as short- and long-term (e.g., by combining micro and meso scale phenomena). However, this could obscure meaningful differences between phenomena associated with sensory persistence and short-term memory, or qualitative differences in the way that shortsequences of events are processed.

      (3) More serious is the issue of precision. Again, this is partially a language problem. When people use the engineering terms "precision" and "accuracy" together, they usually use the same units, such as degrees. Accuracy refers to the distance from the real position (so average accuracy gives bias), and precision is the clustering around the average bias, usually measured as standard deviation. Yet here accuracy is percent correct: also a convention in psychology, but not when contrasting accuracy with precision, in the engineering sense. I suggest you change "accuracy" to "percent correct". On the other hand, I have no idea how precision was defined. All I could find was: "mixture modelling was used to estimate the precision and guess rate of reproduction responses, based on the concentration (k) and height of von Mises and uniform distributions, respectively". I do not know what that means.

      In the case of a binary decision, is seems reasonable to use the term “accuracy” to refer to the correspondence between the target state and the response on a task. However, we agree that while our (main) task is binary, the target is not and nor is the secondary task. We thank the reviewer for bringing this to our attention, as we agree that this will be a likely cause of confusion. To avoid confusion we have specifically referred to “task accuracy” throughout the revised manuscript.

      With regards to precision, our measure of precision is consistent with what Reviewer 1 describes as such, i.e., the clustering of responses. In particular, the von Mises distribution is essentially a Gaussian distribution in circular space, and the kappa parameter defines the width of the distribution, regardless of the mean, with larger values of kappa indicating narrower (more precise) distributions. We could have used standard deviation to assess precision; however, this would incorrectly combine responses on which participants failed to encode the target (e.g., because of a blink) and were simply guessing. To account for these trials, we applied mixture modelling of guess and genuine responses to isolate the precision of genuine responses, as is standard in the visual working memory literature. However, we agree that this was not sufficiently described in the original manuscript and have elaborated on this method in the revised version.

      Line 598 - From the reproduction task, we sought to estimate participant’s recall precision. It is likely that on some trials participants failed to encode the target and were forced to make a response guess. To isolate the recall precision from guess responses, we used mixture modelling to estimate the precision and guess rate of reproduction responses, based on the concentration (k) and height of von Mises and uniform distributions, respectively (Bays et al., 2009). The k parameter of the von Mises distribution reflects its width, which indicates the clustering of responses around a common location.

      (4) Previous studies show serial dependence can increase bias but decrease scatter (inverse precision) around the biased estimate. The current study claims to be at odds with that. But are the two measures of precision relatable? Was the real (random) position of the target subtracted from each response, leaving residuals from which the inverse precision was calculated? (If so, the authors should say so..) But if serial dependence biases responses in essentially random directions (depending on the previous position), it will increase the average scatter, decreasing the apparent precision. 

      Previous studies have shown that when serial dependence is attractive there is a corresponding increase in precision around small offsets from the previous item (citations). Indeed, attractive biases will lead to reduced scattering (increased precision) around a central attracter. Consistent with previous studies, and this rational, we also found an attractive bias coupled with increased precision. To clarify, for the serial dependency analysis, we calculated bias and precision by binning reproduction responses according to the offset between the current and previous target and then performing the same mixture modelling described above to estimate the mean (bias) and kappa (precision) parameters of the von Mises distribution fit to the angular errors. This was not explained in the original manuscript, so we thank Reviewer 1 for bringing this to our attention and have clarified the analysis in the revised version.

      Line 604 - For the serial dependency analysis, we calculated bias and precision by binning reproduction responses according to the angular offset between the current and previous target and then performing mixture modelling to estimate the mean (bias) and k (precision) parameters of the von Mises distribution.

      (5) I suspect they are not actually measuring precision, but location accuracy. So the authors could use "percent correct" and "localization accuracy". Or be very clear what they are actually doing. 

      As explained in our response to Reviewer 1’s previous comment, we are indeed measuring precision.

      Reviewer #2 (Public review):

      (1) The abstract should more explicitly mention that conclusions about feedforward mechanisms were derived from a reanalysis of an existing EEG dataset. As it is, it seems to present behavioral data only.

      It is not clear what relevance the fact that the data has been analyzed previously has to the results of the current study. However, we do think that it is important to be clear that the EEG recordings were collected separately from the behavioural and eyetracking data, so we have clarified this in the revised abstract.

      Line 7 - By integrating behavioural and pupillometry recordings with electroencephalographical recordings from a previous study, we identify two distinct mechanisms that operate across all scales.

      (2) The EEG task seems quite different from the others, with location and color changes, if I understand correctly, on streaks of consecutive stimuli shown every 100 ms, with the task involving counting the number of target events. There might be different mechanisms and functions involved, compared to the behavioral experiments reported. 

      As stated above, we agree that it is important that readers are aware that the EEG recordings were collected separately to the behavioural and eyetracking data. We were forthright about this in the original manuscript and how now clarified this in the revised abstract. We agree that collecting both sets of data in the same experiment would be a useful validation of the current results and have acknowledged this in a new Limitations and future directions section of the Discussion of the revised manuscript.

      Line 501 - Another limitation of the current study is that the EEG recordings were collected in the separate experiment to the behavioural and pupillometry data. The stimuli and task were similar between experiments, but not identical. For example, the EEG experiment employed coloured arc stimuli presented at a constant rate of ~3.3 Hz and participants were tasked with counting the number of stimuli presented at a target location. By contrast, in the behavioural experiment, participants viewed white blobs presented at an average rate of ~2.8 Hz and performed a binary spatial task coupled with an infrequent reproduction task. An advantage of this was that the sensory responses to stimuli in the EEG recordings were not conflated with motor responses; however, future work combining these measures in the same experiment would serve as a validation for the current results.

      (3) How is the arbitrary choice of restricting EEG decoding to a small subset of parieto-occipital electrodes justified? Blinks and other artifacts could have been corrected with proper algorithms (e.g., ICA) (Zhang & Luck, 2025) or even left in, as decoders are not necessarily affected by noise. Moreover, trials with blinks occurring at the stimulus time should be better removed, and the arbitrary selection of a subset of electrodes, while reducing the information in input to the decoder, does not account for trials in which a stimulus was missed (e.g., due to blinks).

      Electrode selection was based on several factors: 1) reduction of eye movement/blink artifacts (as noted in the original manuscript), 2) consistency with the previous EEG study (Rideaux, 2024) and other similar decoding studies (Buhmann et al., 2024; Harrison et al., 2023; Rideaux et al., 2023), 3) improved signal-to-noise by including only sensors that carry the most position information (as shown in Supplementary Figure 1a and the previous EEG study). We agree that this was insufficiently explained in the original manuscript and have clarified our sensor selection in the revised version.

      Line 631 - We only included the parietal, parietal-occipital, and occipital sensors in the analyses to i) reduce the influence of signals produced by eye movements, blinks, and non-sensory cortices, ii) for consistency with similar previous decoding studies (Buhmann et al., 2024; Rideaux, 2024; Rideaux et al., 2025), and iii) to improve decoding accuracy by restricting sensors to those that carried spatial position information (Supplementary Fig. 1a).

      (4) The artifact that appears in many of the decoding results is puzzling, and I'm not fully convinced by the speculative explanation involving slow fluctuations. I wonder if a different high-pass filter (e.g., 1 Hz) might have helped. In general, the nature of this artifact requires better clarification and disambiguation.

      We agree that the nature of this artifact requires more clarification and disambiguation. Due to relatively slow changes in the neural signal, which are not stimulus-related, there is a degree of temporal autocorrelation in the recordings. This can be filtered out, for example, by using a stricter high-pass filter; however, we tried a range of filters and found that a cut-off of at least 0.7 Hz is required to remove the artifact, and even a filter of 0.2 Hz introduces other (stimulus-related) artifacts, such as above-chance decoding prior to stimulus onset. These stimulus-related artifacts are due to the temporal smearing of data, introduced by the filtering, and have a more pronounced and complex influence on the results and are more difficult to remove through other means, such as the baseline correction applied in the original manuscript.

      The temporal autocorrelation is detected by the decoder during training and biases it to classify/decode targets that are presented nearby in time as similar. That is, it learns the neural pattern for a particular stimulus location based on the activity produced by the stimulus and the temporal autocorrelation (determined by slow stimulus unrelated fluctuations). The latter only accounts for a relatively smaller proportion of the variance in the neural recordings under normal circumstances and would typically go undetected when simply plotting decoding accuracy as a function of position. However, it becomes weakly visible when decoding accuracy is plotted as a function of distance from the previous target, as now the bias (towards temporally adjacent targets) aligns with the abscissa. Further, it becomes highly visible when the stimulus labels are shuffled, as now the decoder can only learn from the variance associated with the temporal autocorrelation (and not from the activity produced by the stimulus).

      In the linear discriminant analysis, this led to temporally proximal items being more likely to be classified as on the same side. This is why there is above-chance performance for repeat trials (Supplementary Figure 2b), and below-chance performance for alternate trials, even when the labels are shuffled – the temporal autocorrelation produces a general bias towards classifying temporally proximate stimuli as on the same side, which selectively improves the classification accuracy of repeat trials. Fortunately, the bias is relatively constant as a function of time within the epoch and is straightforward to estimate by shuffling the labels, which means that it can be removed through a baseline correction. However, to further demonstrate that the autocorrelation confound cannot account for the differences observed between repeat and alternate trials in the micro classification analysis, we now additionally show the results from a more strictly filtered version of the data (0.7 Hz). These results show a similar pattern as the original, with the additional stimulusrelated artifacts introduced by the strict filter, e.g., above chance decoding prior to stimulus onset.

      In the inverted encoding analysis, the same temporal autocorrelation manifests as temporally proximal trials being decoded as more similar locations. This is why there is increased decoding accuracy for targets with small angular offsets from the previous target, even when the labels are shuffled (Supplementary Figure 3c), because it is on these trials that the bias happens to align with the correct position. This leads to an attractive bias towards the previous item, which is most prominent when the labels are shuffled.

      To demonstrate the phenomenon, we simulated neural recordings from a population of tuning curves and performed the inverted encoding analysis on a clean version of the data and a version in which we introduced temporal autocorrelation. We then repeated this after shuffling the labels. The simulation produced very similar results to those we observed in the empirical data, with a single exception: while precision in the simulated shuffled data was unaffected by autocorrelation, precision in the unshuffled data was clearly affected by this manipulation. This may explain why we did not find a correlation between the shuffled and unshuffled precision in the original manuscript. 

      These results echo those from the classification analysis, albeit in a more continuous space. However, whereas in the classification analysis it was straightforward to perform a baseline correction to remove the influence of general temporal dependency, the more complex nature of the accuracy, precision, and bias parameters over the range of time and delta location makes this approach less appropriate. For example, the bias in the shuffled condition ranged from -180 to 180 degrees, which when subtracted from the bias in the unshuffled condition would produce an equally spurious outcome, i.e., the equal opposite of this extreme bias. Instead for the inverted encoding analysis, we used the data high-pass filtered at 0.7 Hz. As with the classification analysis, this removed the influence of general temporal dependencies, as indicated by the results of the shuffled data analysis (Supplementary Figure 3f), but it also temporally smeared the stimulus-related signal, resulting in above chance decoding accuracy prior to stimulus onset (Supplementary Figure 3d). However, given thar we were primarily interested in the pattern of accuracy, precision, and bias as a function of delta location, and less concerned with the precise temporal dynamics of these changes, which appeared relatively stable in the filtered data. Thus, this was the more suitable approach to removing the general temporal dependencies in the inverted encoding analysis and the one that is presented in Figure 3.

      We have updated the revised manuscript in light of these changes, including a fuller description of the artifact and the results from the abovementioned control analyses.

      Figure 3 updated.

      Figure 3 caption - e) Decoding accuracy for stimulus location, from reanalysis of previously published EEG data (17). Inset shows the EEG sensors included in the analysis (blue dots), and black rectangles indicate the timing of stimulus presentations (solid: target stimulus, dashed: previous and subsequent stimuli). f) Decoding accuracy for location, as a function of time and D location. Bright colours indicate higher decoding accuracy; absolute accuracy values can be inferred from (e). g-i) Average location decoding  (g) accuracy, (h) precision, and (h) bias from 50 – 500 ms following stimulus onset. Horizontal bar in (e) indicates cluster corrected periods of significance; note, all time points were significantly above chance due to temporal smear introduced by strict high-pass filtering (see Supplementary Figure 3 for full details). Note, the temporal abscissa is aligned across (e & f). Shaded regions indicate ±SEM.

      Line 218 - To further investigate the influence of serial dependence, we applied inverted encoding modelling to the EEG recordings to decode the angular location of stimuli. We found that decoding accuracy of stimulus location sharply increased from ~60 ms following stimulus onset (Fig. 3e). Note, to reduce the influence of general temporal dependencies, we applied a 0.7 Hz high-pass filter to the data, which temporally smeared the stimulus-related information, resulting in above chance decoding accuracy prior to stimulus presentation (for full details, see Supplementary Figure 3). To understand how serial dependence influences the representation of these features, we inspected decoding accuracy for location as a function of both time and D location (Fig. 3f). We found that decoding accuracy varied depending not only as a function of time, but also as a function of D location. To characterise this relationship, we calculated the average decoding accuracy from 50 ms until the end of the epoch (500 ms), as a function of D location (Fig. 3g). This revealed higher accuracy for targets with larger D location. We found a similar pattern of results for decoding precision (Fig. 3h). These results are consistent with the micro temporal context (behavioural) results, showing that targets that alternated were recalled more precisely. Lastly, we calculated the decoding bias as a function of D location and found a clear repulsive bias away from the previous item (Fig. 3i). While this result is inconsistent with the attractive behavioural bias, it is consistent with recent studies of serial dependence suggesting an initial pattern of repulsion followed by an attractive bias during the response period (20–22).

      Line 726 - As shown in Supplementary Figure 3, we found the same general temporal dependencies in the decoding accuracy computed using inverted encoding that were found using linear discriminant classification. However, as a baseline correction would not have been appropriate or effective for the parameters decoded with this approach, we instead used a high-pass filter of 0.7 Hz to remove the confound, while being cautious about interpreting the timing of effects produced by this analysis due to the temporal smear introduced by the filter.

      Supplementary Figure 2 updated.

      Supplementary Figure 2 caption - Removal of general micro temporal dependencies in EEG responses. We found that there were differences in classification accuracy for repeat and alternate stimuli in the EEG data, even when stimulus labels were shuffled. This is likely due to temporal autocorrelation within the EEG data due to low frequency signal changes that are unrelated to the decoded stimulus dimension. This signal trains the decoder to classify temporally proximal stimuli as the same class, leading to a bias towards repeat classification. For example, in general, the EEG signal during trial one is likely to be more similar to that during trial two than during trial ten, because of low frequency trends in the recordings. If the decoder has been trained to classify the signal associated with trial one as a leftward stimulus, then it will be more likely to classify trial two as a leftward stimulus too. These autocorrelations are unrelated to stimulus features; thus, to isolate the influence of stimulus-specific temporal context, we subtracted the classification accuracy produced by shuffling the stimulus labels from the unshuffled accuracy (as presented in Figure 2e, f). We confirmed that using a stricter high-pass filter (0.7 Hz) removes this artifact, as indicated by the equal decoding accuracy between the two shuffled conditions. However, the stricter high-pass filter temporally smears the stimulus-related signal, which introduces other (stimulus-related) artifacts, e.g., above-chance decoding accuracy prior to stimulus presentation, that are larger and more complex, i.e., changing over time. Thus, we opted to use the original high pass filter (0.1 Hz) and apply a baseline correction. a) The uncorrected classification  accuracy along task related and unrelated planes. Note that these results are the same as the corrected version shown in Figure 2e, because the confound is only apparent when accuracy is grouped according to temporal context.

      b) Same as (a), but split into repeat and alternate stimuli, along (left) task-related and (right) unrelated planes. Classification  accuracy when labels are shuffled is also shown. Inset in (a) shows the EEG sensors included in the analysis (blue dots). (c, d) Same as (a, b), but on data filtered using a 0.7 Hz high-pass filter. Black rectangles indicate the timing of stimulus presentations (solid: target stimulus, dashed: previous and subsequent stimuli). Shaded regions indicate ±SEM.

      Supplementary Figure 3 updated.

      Supplementary Figure 3 caption - Removal of general temporal dependencies in EEG responses for inverted encoding analyses. As described in Methods - Neural Decoding, we used inverted encoding modelling of EEG recordings to estimate the decoding accuracy, precision, and bias of stimulus location. Just as in the linear discriminant classification analysis, we also found the influence of general temporal dependencies in the results produced by the inverted encoding analysis. In particular, there was increased decoding accuracy for targets with low D location. This was weakly evident in the period prior to stimulus presentation, but clearly visible when the labels were shuffled. These results are mirror those from the classification analysis, albeit in a more continuous space. However, whereas in the classification analysis it was straightforward to perform a baseline correction to remove the influence of general temporal dependency, the more complex nature of the accuracy, precision, and bias parameters over the range of time and D location makes this approach less appropriate. For example, the bias in the shuffled condition ranged from -180° to 180°, which when subtracted from the bias in the unshuffled condition would produce an equally spurious outcome, i.e., the equal opposite of this extreme bias. Instead for the inverted encoding analysis, we used the data high-pass filtered at 0.7 Hz. As with the classification analysis, this significantly reduced the influence of general temporal dependencies, as indicated by the results of the shuffled data analysis, but it also temporally smeared the stimulus-related signal, resulting in above chance decoding accuracy prior to stimulus onset. However, we were primarily interested in the pattern of accuracy, precision, and bias as a function of D location, and less concerned with the precise temporal dynamics of these changes. Thus, this was the more suitable approach to removing the general temporal dependencies in the inverted encoding analysis and the one that is presented in Figure 3. (a) Decoding accuracy as a function of time for the EEG data filtered using a 0.1 Hz high-pass filter. Inset shows the EEG sensors included in the analysis (blue dots), and black rectangles indicate the timing of stimulus presentations (solid: target stimulus, dashed: previous and subsequent stimuli). (b, c) The same as (a), but as a function of time and D location for (b) the original data and (c) data with shuffled labels. (d-f) Same as (a-c), but for data filtered using a 0.7 Hz high-pass filter. Shaded regions in (a, d) indicate ±SEM. Horizontal bars in (a, d) indicate cluster corrected periods of significance; note, all time points in (d) were significantly above chance. Note, the temporal abscissa is vertically aligned across plots (a-c & d-f).

      In the process of performing these additional analyses and simulations, we became aware that the sign of the decoding bias in the inverted encoding analyses had been interpreted in the wrong direction. That is, where we previously reported an initial attractive bias followed by a repulsive bias relative to the previous target, we have in fact found the opposite, an initial repulsive bias followed by an attractive bias relative to the previous target. Based on the new control analyses and simulations, we think that the latter attractive bias was due to general temporal dependencies. That is, in the filtered data, we only observe a repulsive bias. While the bias associated with serial dependence was not a primary feature of the study, this (somewhat embarrassing) discovery has led to reinterpretation of some results relating to serial dependence. However, it is encouraging to see that our results now align with those of recent studies (Fischer et al., 2024; Luo et al., 2025; Sheehan et al. 2024).

      Line 385 - Our corresponding EEG analyses revealed better decoding accuracy and precision for stimuli preceded by those that were different and a bias away from the previous stimulus. These results are consistent with finding that alternating stimuli are recalled more precisely. Further, while the repulsive pattern of biases is inconsistent with the observed behavioural attractive biases, it is consistent with recent work on serial dependence indicating an initial period of repulsion, followed by an attractive bias during the response period (20–22). These findings indicate that serial dependence and first-order sequential dependencies can be explained by the same underlying principle.

      (5) Given the relatively early decoding results and surprisingly early differences in decoding peaks, it would be useful to visualize ERPs across conditions to better understand the latencies and ERP components involved in the task.

      A rapid presentation design was used in the EEG experiment, and while this is well suited to decoding analyses, unfortunately we cannot resolve ERPs because the univariate signal is dominated by an oscillation at the stimulus presentation frequency (~3 Hz). We agree that this could be useful to examine in future work.

      (6) It is unclear why the precision derived from IEM results is considered reliable while the accuracy is dismissed due to the artifact, given that both seem to be computed from the same set of decoding error angles (equations 8-9).

      This point has been addressed in our response to point (4).

      (7) What is the rationale for selecting five past events as the meso-scale? Prior history effects have been shown to extend much further back in time (Fritsche et al., 2020). 

      We used five previous items in the meso analyses to be consistent with previous research on sequential dependencies (Bertelson, 1961; Gao et al., 2009; Jentzsch & Sommer, 2002; Kirby, 1976; Remington, 1969). However, we agree that these effects likely extend further and have acknowledged this in the revied version of the manuscript.

      Line 240 - Higher-order sequential dependences are an example of how stimuli (at least) as far back as five events in the past can shape the speed and task accuracy of responses to the current stimulus (9, 10); however, note that these effects have been observed for more than five events (20).

      (8) The decoding bias results, particularly the sequence of attraction and repulsion, appear to run counter to the temporal dynamics reported in recent studies (Fischer et al., 2024; Luo et al., 2025; Sheehan & Serences, 2022). 

      This point has been addressed in our response to point (4).

      (9) The repulsive component in the decoding results (e.g., Figure 3h) seems implausibly large, with orientation differences exceeding what is typically observed in behavior. 

      As noted in our response to point (4), this bias was likely due to the general temporal dependency confound and has been removed in the revised version of the manuscript.

      (10) The pattern of accuracy, response times, and precision reported in Figure 3 (also line 188) resembles results reported in earlier work (Stewart, 2007) and in recent studies suggesting that integration may lead to interference at intermediate stimulus differences rather than improvement for similar stimuli (Ozkirli et al., 2025).

      Thank you for bringing this to our attention, we have acknowledged this in the revised manuscript.

      Line 197 - Consistent with our previous binary analysis, and with previous work (19), we also found that responses were faster and more accurate when D location was small (Fig. 3b, c).

      (11) Some figures show larger group-level variability in specific conditions but not others (e.g., Figures 2b-c and 5b-c). I suggest reporting effect sizes for all statistical tests to provide a clearer sense of the strength of the observed effects. 

      Yes, as noted in the original manuscript, we find significant differences between the variance task-related and -unrelated conditions. We think this is due to opposing forces in the task-related condition: 

      “The increased variability of response time differences across the taskrelated plane likely reflects individual differences in attention and prioritization of responding either quickly or accurately. On each trial, the correct response (e.g., left or right) was equally probable. So, to perform the task accurately, participants were motivated to respond without bias, i.e., without being influenced by the previous stimulus. We would expect this to reduce the difference in response time for repeat and alternate stimuli across the taskrelated plane, but not the task-unrelated plane. However, attention may amplify the bias towards making faster responses for repeat stimuli, by increasing awareness of the identity of stimuli as either repeats or alternations (17). These two opposing forces vary with task engagement and strategy and thus would be expected produce increased variability across the task-related plane.” We agree that providing effect sizes may provided a clearer sense of the observed effects and have done so in the revised version of the manuscript.

      Line 739 - For Wilcoxon signed rank tests, the rank-biserial correlation (r) was calculated as an estimate of effect size, where 0.1, 0.3, and 0.5 indicate small, medium, and large effects, respectively (54). For Friedman’s ANONA tests, Kendal’s W was calculated as an estimate of effect size, where 0.1, 0.3, and 0.5 indicate small, medium, and large effects, respectively (55).

      (12) The statement that "serial dependence is associated with sensory stimuli being perceived as more similar" appears inconsistent with much of the literature suggesting that these effects occur at post-perceptual stages (Barbosa et al., 2020; Bliss et al., 2017; Ceylan et al., 2021; Fischer et al., 2024; Fritsche et al., 2017; Sheehan & Serences, 2022). 

      In light of the revised analyses, this statement has been removed from the manuscript.

      (13) If I understand correctly, the reproduction bias (i.e., serial dependence) is estimated on a small subset of the data (10%). Were the data analyzed by pooling across subjects?

      The dual reproduction task only occurred on 10% of trials. There were approximately 2000 trials, so ~200 reproduction responses. For the micro and macro analyses, this was sufficient to estimate precision within each of the experimental conditions (repeat/alternate, expected/unexpected). However, it is likely that we were not able to reproduce the effect of precision at the meso level across both experiments because we lacked sufficient responses to reliably estimate precision when split across the eight sequence conditions. Despite this, the data was always analysed within subjects.

      (14) I'm also not convinced that biases observed in forced-choice and reproduction tasks should be interpreted as arising from the same process or mechanism. Some of the effects described here could instead be consistent with classic priming. 

      We agree that the results associated with the forced-choice task (response time task accuracy) were likely due to motor priming, but that a separate (predictive) mechanism may explain the (precision) results associated with the reproduction task. These are two mechanisms we think are operating across the three temporal scales investigated in the current study.

      Reviewing Editor Comments:

      (1) Clarify task design and measurement: The dense presentation makes it difficult to understand key design elements and their implications. Please provide clearer descriptions of all task elements, and how they relate to each other (EEG vs. behaviour, stimulus plane vs. TR and TU plane, reproduction vs. discrimination and role of priming), and clearly explain how key measures were computed for each of these (e.g., precision, accuracy, reproduction bias).

      In the revised manuscript, we have expanded on descriptions of the source and nature of the data (behavioural and EEG), the different planes analyzed in the behavioural task, and how key metrics (e.g., precision) were computed.

      (2) Offer more insight into underlying data, including original ERP waveforms to aid interpretation of decoding results and the timing of effects. In particular, unpack the decoding temporal confound further.

      In the revised manuscript, we have considerably offered more insight into the decoding results, in particular, the nature of the temporal confound. We were unable to assess ERPs due to the rapid presentation design employed in the EEG experiment.

      (3) Justify arbitrary choices such as electrode selection for EEG decoding (e.g., limiting to parieto-occipital sensors), number of trials in meso scale, and the time terminology itself.

      In the revised manuscript, we have clarified the reasons for electrode selection.

      (3) Discuss deviations from literature: Several findings appear to contradict or diverge from previous literature (e.g., effects of serial dependence). These discrepancies could be discussed in more depth. 

      Upon re-analysis of the serial dependence bias and removal of the temporal confound, the results of the revised manuscript now align with those from previous literature, which has been acknowledged.

      Reviewer #1 (Recommendations for the authors):

      (1) would like to use my reviewer's prerogative to mention a couple of relevant publications. 

      Galluzzi et al (Journal of Vision, 2022) "Visual priming and serial dependence are mediated by separate mechanisms" suggests exactly that, which is relevant to this study.

      Xie et al. (Communications Psychology, 2025) "Recent, but not long-term, priors induce behavioral oscillations in peri-saccadic vision" also seems relevant to the issue of different mechanisms. 

      Thank you for bringing these studies to our attention. We agree that they are both relevant have referenced both appropriately in the revised version of the manuscript.

      Reviewer #2 (Recommendations for the authors): 

      (1) I find the discussion on attention and awareness (from line 127 onward) somewhat vague and requiring clarification.

      We agree that this statement was vague and referred to “awareness” without operationation. We have revised this statement to improve clarity.

      Line 135 - However, task-relatedness may amplify the bias towards making faster responses for repeat stimuli, by increasing attention to the identity of stimuli as either repeats or alternations (17).

      (2) Line 140: It's hard to argue that there are expectations that the image of an object on the retina is likely to stay the same, since retinal input is always changing. 

      We agree that retinal input is often changing, e.g., due to saccades, self-motion, and world motion. However, for a prediction to be useful, e.g., to reduce metabolic expenditure or speed up responses, it must be somewhat precise, so a prediction that retinal input will change is not necessarily useful, unless it can specify what it will change to. Given retinal input of x at time t, the range of possible values of x at time t+1 (predicting change) is infinite. By contrast, if we predict that x=x at time t+1 (no change), then we can make a precise prediction. There is, of course, other information that could be used to reduce the parameter space of predicted change from x at time t, e.g., the value of x at time t-1, and we think this drives predictions too. However, across the infinite distribution of changes from x, zero change will occur more frequently than any other value, so we think it’s reasonable to assert that the brain may be sensitive to this pattern.

      (3) Line 564: The gambler's fallacy usually involves sequences longer than just one event.

      Yes, we agree that this phenomenon is associated with longer sequences. This section of the manuscript was in regards to previous findings that were not directly relevant to the current study and has been removed in the revised version.

      (4) In the shared PDF, the light and dark cyan colors used do not appear clearly distinguishable. 

      I expect this is due to poor document processing or low-quality image embeddings. I will check that they are distinguishable in the final version.

      References: 

      Barbosa, J., Stein, H., Martinez, R. L., Galan-Gadea, A., Li, S., Dalmau, J., Adam, K. C. S., Valls-Solé, J., Constantinidis, C., & Compte, A. (2020). Interplay between persistent activity and activity-silent dynamics in the prefrontal cortex underlies serial biases in working memory. Nature Neuroscience, 23(8), Articolo 8. https://doi.org/10.1038/s41593-020-0644-4

      Bliss, D. P., Sun, J. J., & D'Esposito, M. (2017). Serial dependence is absent at the time of perception but increases in visual working memory. Scientific reports, 7(1), 14739. 

      Ceylan, G., Herzog, M. H., & Pascucci, D. (2021). Serial dependence does not originate from low-level visual processing. Cognition, 212, 104709. https://doi.org/10.1016/j.cognition.2021.104709

      Fischer, C., Kaiser, J., & Bledowski, C. (2024). A direct neural signature of serial dependence in working memory. eLife, 13. https://doi.org/10.7554/eLife.99478.1

      Fritsche, M., Mostert, P., & de Lange, F. P. (2017). Opposite effects of recent history on perception and decision. Current Biology, 27(4), 590-595. 

      Fritsche, M., Spaak, E., & de Lange, F. P. (2020). A Bayesian and efficient observer model explains concurrent attractive and repulsive history biases in visual perception. eLife, 9, e55389. https://doi.org/10.7554/eLife.55389

      Gekas, N., McDermott, K. C., & Mamassian, P. (2019). Disambiguating serial effects of multiple timescales. Journal of vision, 19(6), 24-24. 

      Luo, M., Zhang, H., Fang, F., & Luo, H. (2025). Reactivation of previous decisions repulsively biases sensory encoding but attractively biases decision-making. PLOS Biology, 23(4), e3003150. https://doi.org/10.1371/journal.pbio.3003150

      Ozkirli, A., Pascucci, D., & Herzog, M. H. (2025). Failure to replicate a superiority effect in crowding. Nature Communications, 16(1), 1637. https://doi.org/10.1038/s41467025-56762-5

      Sheehan, T. C., & Serences, J. T. (2022). Attractive serial dependence overcomes repulsive neuronal adaptation. PLoS biology, 20(9), e3001711. 

      Stewart, N. (2007). Absolute identification is relative: A reply to Brown, Marley, and

      Lacouture (2007).  Psychological  Review, 114, 533-538. https://doi.org/10.1037/0033-295X.114.2.533

      Treisman, M., & Williams, T. C. (1984). A theory of criterion setting with an application to sequential dependencies. Psychological review, 91(1), 68. 

      Zhang, G., & Luck, S. J. (2025). Assessing the impact of artifact correction and artifact rejection on the performance of SVM- and LDA-based decoding of EEG signals. NeuroImage, 316, 121304. https://doi.org/10.1016/j.neuroimage.2025.121304

    1. eLife Assessment

      Complementing previous work (Namiki et al, 2018), this study provides an important resource for the Drosophila community as it reports 500 lines targeting descending neurons (DN), in addition to compiling 306 existing DN lines from the literature. The compelling work characterizes 146 DNs and makes a critical link with the DNs identified in Electron microscopy (EM). The lines in this paper will be of interest to Drosophila neuroscientists who will be able to use the reported genetic drivers for further functional characterization of DNs and circuit mapping in conjunction with existing EM datasets.

    2. Reviewer #1 (Public review):

      Summary:

      This manuscript by Zung et al. describes a curated library of genetic lines labeling a class of important neurons called Descending Neurons in the fruit fly, Drosophila melanogaster. These neurons are especially important in their critical role in relaying information from the brain to motor circuits within the ventral nerve cord - the insect analogy of the vertebrate spinal cord. The authors screened through a vast resource of Gal4 lines to generate 500 new genetic lines that allow for the precise labeling of 190 (40%) of all Descending Neurons. The tools introduced here will allow researchers to perform precise circuit dissection of the exact roles these neurons play in linking the brain to the ventral nerve cord.

      Strengths:

      This manuscript represents an important follow-up to the author's 2018 paper in the extension of the genetic toolkit from 178 genetic lines that target 65 Descending Neuron (DN) classes to 806 lines that target 190 DN classes. The presentation of this toolkit is comprehensive with confocal images, informative classifications of lines based on specificity/consistency, and identification of the neuron types - when possible - in the EM dataset.

      Weaknesses:

      No weaknesses were identified by this reviewer.

    3. Reviewer #2 (Public review):

      Summary:

      Descending neurons (DNs) are critical nodes in the neural computation underlying sensorimotor transformation. Building on their earlier work, the authors have substantially expanded the genetic resources for labeling these cell types in D. melanogaster, offering a valuable public resource.

      Strengths:

      The authors identified 146 additional DN types and generated 500 new DN driver lines, expanding the genetic reagents from labeling 98 cell types to 244, representing approximately 50% of all DN types estimated by EM connectomes. While the EM connectomes offer unprecedented resolution of neuronal cell types and their connectivity, genetic access to these cell types remains essential for studying their functions and testing hypotheses. Given the broad interest in DNs, the reagents generated in this study will be of important value for addressing a wide range of questions in sensorimotor transformation.

      The organization of the dataset is overall intuitive and comprehensive. The authors also provided clear information and guidance on accessing the relevant resources, such as stack images and fly lines. In addition, the authors have thoughtfully handled the information updated from the earlier collection they generated (Namiki et al. 2018) and incorporated previously published DN lines, providing a consolidated and up-to-date resource for the DN community.

      Weaknesses:

      No weaknesses were identified by this reviewer.

    4. Reviewer #3 (Public review):

      Summary:

      This study provides the Drosophila community with a large collection of new split-Gal4 descending neuron genetic lines. They extend previous efforts to characterize and identify genetic lines for this important class of neurons by providing images of descending neurons and a metric for genetic lines based on specificity and consistency. Their discussion highlights several applications of this collection, for example, to understand the function of new descending neurons through optogenetic and/or physiological characterization. They also helpfully discuss caveats, encouraging users of this collection to validate expression patterns and to be careful when interpreting optogenetic experimental results, considering potential off-target labeling in the lines. Overall, members of the Drosophila community interested in understanding the function of descending neurons and their role in behavior will find this a helpful resource.

      Strengths:

      (1) The authors extend the previous genetic access of descending neurons in Drosophila to over 800 split-Gal4 lines and 190 cell types (nearly half of the known population of descending neurons). The authors update and at times correct the previous identification of descending neurons from a previous, large-scale analysis. The authors extend and, at times, correct previous efforts at characterizing these neurons.

      (2) Clear images of descending neurons labeled by new genetic lines are presented in the main figure papers for reference.

      (3) This study classifies lines labeling descending neurons using a quality score to indicate specificity and consistency. They provide this for the entire set of genetic lines, a valuable assessment for researchers interested in targeting these neurons for optogenetic or physiological characterization.

      Weaknesses:

      Although this paper represents a substantial effort and useful contribution to the Drosophila community, a few weaknesses, primarily regarding the specificity and reliability of genetic lines, remain:

      (1) The authors state that optogenetic activation of DN types using the new split-GAL4 lines is expected to reliably activate the target neurons with virtually no off-target effects in the rest of the central nervous system. More data supporting this conclusion, including both qualitative and quantitative anatomical evidence, would strengthen this claim.

      (2) The authors do recommend that researchers using these lines examine expression patterns themselves to evaluate line cleanliness and consistency, but some analysis by the authors would be useful, for example, providing guidelines for best practices to perform this evaluation.

      (3) Changes in expression patterns after several generations are noted by the authors, weakening confidence somewhat in the long-term usefulness of this collection of genetic lines.

    1. eLife Assessment

      This important study presents the development of a novel inhibitor for SARS-CoV-2 Mac1 that has potential utility both as an antiviral therapeutic and as a tool for probing the molecular mechanisms by which infection-induced ADP-ribosylation triggers robust host antiviral responses. Though minor gaps in understanding the compound's precise molecular mechanism of action and its ability to target Mac1 from other coronaviruses remain, the evidence for its effects on SARS-CoV-2 in relevant biological models is compelling.

    2. Reviewer #1 (Public review):

      SARS-CoV-2 encodes a macrodomain (Mac1) within the nsp3 protein that removes ADP-ribose groups from proteins. However, its role during infection is not well understood. Evidence suggests that Mac1 antagonizes the host interferon response by counteracting the wave of ADP ribosylation that occurs during infection. Indeed, several PARPs are interferon-stimulated genes. While multiple targets have been proposed, the mechanistic links between ADP ribosylation and a robust antiviral response remain unclear.

      Genetic inactivation of Mac1 abrogates viral replication in vivo, suggesting that small-molecule inhibitors of Mac1 could be developed into antivirals to treat COVID-19 and other emerging coronaviruses. The authors report a potent and selective small molecule inhibitor targeting Mac1 (AVI-4206) that demonstrates efficacy in human airway organoids and animal models of SARS-CoV-2 infection. While these results are compelling and provide proof of concept for the therapeutic targeting of Mac1, I am particularly intrigued by the potential of this compound as a probe to elucidate the mechanistic connections between infection-induced ADP ribosylation and the host antiviral response.

      The precise function of Mac1 remains unclear. Given its presence in multiple viruses, it likely acts on a fundamental host immune pathway(s). AVI-4206, while promising as a lead compound for the development of antivirals targeting coronaviruses, could also be a valuable tool for uncovering the function of the Mac1 domain. This may lead to fundamental insights into the host immune response to viral infection.

    3. Reviewer #2 (Public review):

      Summary:

      The authors describe the development of a novel inhibitor (AVI-4206) for the first macrodomains of the nsp3 protein of SARS-CoV-2 (Mac1). This involves both medical chemical synthesis, structural work as well as biochemical characterisation. Subsequently the authors present their finding of the efficacy of the inhibitor both on cell culture as well as animal models of SARS-CoV-2 infection. They find that despite high affinity for Mac1 and the known replicatory defects of catalytically inactive Mac1 only moderate beneficial effects can be observed in their chosen models.

      Strengths:

      The authors employ a variety of different assay to study the affinity, selectivity and potency of the novel inhibitor and thus the in vitro data are very compelling.<br /> Similarly, the authors use several cell culture and in vivo models to strengthen their findings. In addition, the authors address several aspects of the health impact of coronaviral infections from animal survival, over viral load to histological assessment of lung damage.

      Weaknesses:

      The selection of Targ1 and MacroD2 as off-target human macrodomains is sub-optimal as several studies have shown that the first macrodomains of PARP9 and PARP14 are much closer related to coronaviral macrodomains and both macrodomains are implicated in antiviral defence and immunity. However, the authors address this issue by providing modeling data that show clashes with AVI-4206 similarly to their models with MacroD2 and TARG1.

      Comments on revisions:

      While the authors have not addressed all my suggestions experimentally, I would like to nevertheless congratulate them on a significantly strengthened manuscript that will provide a valuable contribution to the field.

    4. Reviewer #3 (Public review):

      Summary:

      The authors were trying to validate SARS-CoV-2 Mac1 as a drug discovery target and by extension other viral macrodomains.

      Strengths:

      The medicinal chemistry and structure based optimization is exemplary. Macrodomains and ADPribosyl hydrolases have the reputation for being undruggable, yet the authors managed to optimize hits from a fragment screen using structure based approaches and fragment linking to make a 20nM inhibitor as a tool compound to validate the target.<br /> In addition, the in vivo work is also a strength. The ability to reduce the viral count at a rate comparable to nirmatrelvir is impressive. Tracking the cytokine expression levels also supports much of the genetic data and mechanism of action for macrodomains.

      Weaknesses:

      The main compound AVI-4206, while being very potent and selective is not appreciably orally bioavailable. The fact that they have to use high doses of the compound IP to see in vivo effects may lead to questions regarding off target effects. The authors acknowledge this and point it out as a potential avenue for further optimization.

      The cellular models are not as predictive of antiviral activity as one would expect. However, the authors had enough chutzpah to test the compound in vivo knowing that cellular models might not be an accurate representation of a living system with a fully functional immune system all of which is most likely needed in an antiviral response to test the importance of Mac1 as a target.

      Comments on revisions:

      All previous suggestions were addressed. I am satisfied with the author's modifications.

    5. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewer #1 (Recommendations for the authors):

      Although this study is rigorous and the paper is well-written, I have a few concerns that the authors should address before publication.

      (1) Cellular levels of protein ADP-ribosylation should be analyzed using anti-ADPR antibodies following infection, both with and without Mac1 and AVI-4206 treatment. While the authors have provided impressive in vivo data, these experiments could ideally be conducted in mice. However, I would be amenable to these analyses being performed in human airway organoids, as they demonstrate clear phenotypes following AVI-4206 treatment post-infection. For a more in-depth exploration, the authors could consider affinity purifying ADP-ribosylated proteins and identifying them via mass spectrometry. I would find it particularly compelling if this approach revealed components of the NF-kB signaling pathway, given the intriguing results presented in Fig. 5. I am also curious if there are differences in ADP ribosylated proteins when comparing Mac1 KO SARS-C0V-2 to AVI-4206 treatment.

      We note that despite the recent flurry of activity around Mac1, there is a surprising lack of public data on overall ADPr levels or targets. While we will address the literature precedence for PARP14 signals specifically below (Reviewer 2 point (h)) by immunofluorescence, we note that overall levels have not been characterized biochemically previously. Recent PARP14 papers and the ASAP AViDD preprint show changes by immunofluorescence only: and the evidence in that preprint is quite modest - see Figure 7B - https://pmc.ncbi.nlm.nih.gov/articles/PMC11370477/.

      We suspect the difficulty in tracking changes biochemically is due to multiple factors that influence the overall detectability and reproducibility. First, with regard to detectability - it is quite possible that only a small change in the ADPr status of a small number of targets is responsible for the phenotypes in vivo. Virus levels are very low in the organoid system and the variability in ADPr levels from tissue samples from in vivo experiments is high. Given the difficulty in translating back to cellular models, this problem is therefore magnified further. Second, with regard to reproducibility - we observe a great deal of reagent dependence on ADPr signals by Western blot+/- Mac1 expression in both cellular and tissue lysates (including when stimulated with H2O2, interferon, or during viral infection). Similarly, we do not observe reproducible proteins that pulldown with Mac1 when assayed by mass spectrometry. It is quite likely that these issues are a result of tissue/sample preparation that results in a loss of the ADPr modification during preparation (especially for acidic residue modifications). This also explains the reliance on IF assays in the PARP14 literature. A very good discussion of these issues is also contained in this paper: https://doi.org/10.1042/BSR20240986.

      Nonetheless we have attempted one final experiment. Here, we have measured ADPr modification of cellular lysates upon uninfected conditions as well as upon infection with either WT or N40D mutant virus. For all conditions, this was done with or without treatment of cells with 100 μM of AVI-4206. Measurement of ADPr modifications by western blot using a  pan-ADPr antibody revealed a single prominent band with a molecular weight of ~130kDa, that showed a uniform increase in signal upon treatment of cells with AVI-4206 regardless of infection status. While this general trend was also observed with the mono-ADPr antibody, it was not statistically significant in its regulation upon AVI-4206 treatment. We suspect that the major band observed in these western blots is PARP1, as upon enrichment of ADPr proteins from these lysates by Af1521 immunoprecipitation, we find PARP1 to be among the most abundant proteins detected within this molecular weight range. We note that there is a baseline increase in polyADPr detection upon infection of virus with WT Mac1 (relative to uninfected and virus with N40D) and further increase when treated with AVI-4206. This compound-dependent increase is paralleled in the uninfected and N40D conditions. The counterintuitive increase upon WT Mac1 virus infection, which should erase ADPr marks, and the compound-dependent increase in the uninfected condition suggest that there are many indirect effects on ADPr signalling dynamics in this experiment. These results are difficult to reconcile with the specificity profiling of AVI-4206 (Supplementary Figure5: Thermal proteome profiling in A549 cellular lysates). As mentioned above, the lack of consistent signal across reagents for ADPr detection and the timing of monitoring ADPr levels are additional complicating factors.

      We added to the results:

      “However, we observed no strong consistent signals of global pan-ADP-ribose (panADPr) or mono-ADP-ribose (monoADPr) accumulation in infected cells treated with AVI-4206 in immunoblot analyses (Supplementary Figure 8).”

      Methods for experiment:

      Calu3 cells were obtained from ATCC and cultured in Advanced DMEM (Gibco) supplemented with 2.5% FBS, 1x GlutaMax, and 1x Penicillin-Streptomycin at 37°C and 5% CO<sub>2</sub>. 5x10<sup>6</sup> cells were plated in 15-cm dishes and media was changed every 2-3 days until the cells were 80% confluent. The cells were treated with INFy 50 ng/mL (R&D Systems) w/without AVI-4206 100 μM. After 6 hours, the cells were infected with WA1 or WA1 NSP3 Mac1 N40D at a multiplicity of infection (MOI) of 1 for 36 hours. The cells were washed with PBS x 3 and scraped in Pierce IP Lysis Buffer (ThermoFisher) containing 1x HALT protease and phosphatase inhibitor mix (ThermoFisher) on ice. The lysate was stored at -80C until further processing.

      The cell lysate was incubated for 5 minutes at room temperature with recombinant benzonase. Following incubation, the lysate was centrifuged at 13,000 rpm at 4°C for 20 minutes, and the supernatant was collected. The samples were then boiled for 5 minutes at 95°C in 1x NuPAGE LDS sample buffer (Invitrogen) with a final concentration of 1X NuPAGE sample reducing agent (Invitrogen). For the detection of ADPr levels in whole-cell lysates, the samples were subjected to SDS-PAGE and Immunoblotting. All primary and secondary antibodies (pan-ADP-ribose antibody (MABE1016, Millipore), Mono-ADP-ribose antibody (AbD33204, Bio-Rad), HRP-conjugated (Cell signaling), used at a 1:1000 dilution were diluted in 5% non-fat dry milk in TBST. Signals were detected by chemiluminescence (Thermo) and visualized using the ChemiDoc XRS+ System (Bio-Rad). Densitometric analysis was performed using Image Lab (Bio-Rad). Quantification was normalized to Actin. The data are expressed as mean ± SD. Statistical differences were determined using an unpaired t-test in GraphPad Prism 10.3.1.

      (2) SARS-CoV-2 escape mutants for AVI-4206 should be generated, sequenced, and evaluated for both ADP-ribosyl hydrolase activity and their susceptibility to inhibition by AVI-4206.

      We thank the reviewer for this suggestion. These are indeed key experiments which are currently hampered by the lack of a cell line that is fully responsive to drug treatment. Although infected organoids and macrophages show an effect in response to AVI-4206, viral levels are ~3 logs lower than in cell lines and difficult to sequence. In the absence of a system that would allow meaningful screening for outgrowth of resistant viruses, we have conducted mass spectrometry studies that showed that Mac1 is the only significant hit for AVI-4206 (SupplementaryFigure 5). The suggested outgrowth experiments will be conducted once a responsive cell line model has been established.

      (3) Given that Mac1 is found in several coronaviruses, it would be insightful for the authors to test a selection of Mac1 homologs from divergent coronaviruses to assess whether AVI-4206 can inhibit their activity in vitro.

      As mentioned above, inconsistencies in ADPr staining limit our ability to directly measure cellular activity. As an alternative approach to measure AVI-4206 selectivity in cells, we have adapted our CETSA assay for SARS-1 and MERs macrodomain proteins and find evidence that AVI-4206 can shift the melting temperature of both proteins, albeit to a lesser degree than that seen for Mac1. In line with MERS being more structurally divergent than SARS-1 from SARS CoV2, the ΔTagg for SARS-1 and MERS are 4℃ and 1℃, respectively, compared to 9℃ for Mac1.  These data have been added as Supplementary Fig S3C. Development of broader spectrum pan-inhibitors is on our radar for future work which will more thoroughly assess homologs from divergent coronaviruses.

      We added the following sentence to the main results:

      “Encouragingly, we were also able to adapt our CETSA assay for SARS-1 and MERs macrodomain proteins and find that AVI-4206 can shift the melting temperature of both proteins, albeit to a lesser degree than that seen for Mac1 (Supplementary Figure 3C).”

      We also added this supplementary figure 3:

      Minor

      (1) Line 88, "respectively.heir potency"

      Fixed, thank you!

      (2) Line 149 add a period after proteome

      Fixed, thank you!

      Reviewer #2 (Recommendations for the authors):

      (a) The authors assess inhibition of MacroD2 and Targ1 as of-targets for AVI-4206. However, Mac1 belongs to the MacroD-type class of macrodomains of which MacroD1, MacroD2 and MOD1s of PARP9 and PARP14 are the human members. In contrast Targ1 belongs to the ALC1-like class, which is only very distantly related to Mac1. Furthermore, recent studies have shown that the first macrodomains of PARP9 and PARP4 (MOD1 of PARP9/14) are much closer related to Mac1 and PARP9/14 were implicated in antiviral immunity. As such the authors should include assays showing the activity of their compounds against MacroD1 and MOD1s of PARP9/14.

      We emphasize that we detect no significant shift for any protein other than Mac1 in A549 cells by CETSA-MS (Supplementary Figure 6). For Mac1 CESTA, we see an average of 6 PARP14 spectral counts across conditions and did not detect PARP9.  In addition, for separate work in MPro, we ran similar CETSA experiments where we observed an average of 2 PARP9 and 15 PARP14 spectral counts across conditions. Although PARP9 and PARP14 massively increase expression upon IFN treatment in A549 cells, both proteins have been detected by Western Blot in A549 cells previously at baseline.

      Nonetheless, we have included modeling of more diverse macrodomains as a supplemental figure and added to the text:

      Modeling of other diverse macrodomains, including those within human PARP9 and PARP14 further suggests that AVI-4206 is selective for Mac1 (Supplementary Figure 4)

      (b) In the context of SARS-CoV-2 superinfection are a known major complication of infections. These superinfections are associated with lung damage and therefore it would be good if the authors could assess lung damage, e.g. by histology, to see if their treatment has a positive impact on lung damage and thus may help to suppress complications.

      We performed histology and the results are inconclusive, but suggest that AVI-4206 treatment could lower apoptosis.There is no difference in pathology between the N40D cohort and vehicle with these markers. This could suggest that AVI-4206 provides an additional mechanism that results in protection.  We added to the results:

      Caspase 3 staining shows that AVI-4206 treatment reduces apoptosis in the lungs compared to vehicle controls. Additionally, Masson's Trichrome staining reveals  a significant reduction in collagen deposition, a surrogate for lung pathology, in the lungs of AVI-4206 treated animals.(Supplementary Figure 9).

      Histology:

      Mouse lung tissues were fixed in 4% PFA (Sigma Aldrich, Cat #47608) for 24 hours, washed three times with PBS and stored in 70% ethanol. All the stainings were performed at Histo-Tec Laboratory (Hayward, CA). Samples were processed, embedded in paraffin, and sectioned at 4μm. The slides were dewaxed using xylene and alcohol-based dewaxing solutions. Epitope retrieval was performed by heat-induced epitope retrieval (HIER) of the formalin-fixed, paraffin-embedded tissue using citrate-based pH 6 solution (Leica Microsystems, AR9961) for 20 mins at 95°C. The tissues were stained for H&E, caspase-3 (Biocare #CP229c 1:100), and trichrome, dried, coverslipped (TissueTek-Prisma Coverslipper), and visualized using Axioscan 7 slide scanner (ZEISS) at 40X. Image quantification was performed with Image J software and GraphPad Prism.

      (c) Fig. 1D labelling is wrong

      Thank you - fortunately the data were plotted correctly and it was just the inset table of values that was incorrect. This is now fixed!

      (d) Line 88: "T" missing at start of sentence

      Fixed, thank you!

      (e) Line 118: NudT5/AMP-Glo assay was developed in https://doi.org/10.1021/acs.orglett.8b01742

      We have added this foundational reference, thank you!

      (f) Line 147ff: It would be good if the authors could highlight that the TPP methodology has known limitations (e.g. detection of low abundance proteins and low thermal shift of some binders) and thus is not an absolute proof that AVI-4206 "engage with high specificity for Mac1"

      We added this important context to the concluding sentence of this paragraph:

      “While this assay may not be sensitive to detection of proteins with low abundance proteins or low thermal shift upon ligand binding, collectively, these results indicate that AVI-4206 can cross cellular membranes and engage with high specificity for Mac1.”

      (g) The authors use their well established in vitro Mac1 model as well as the SARS-CoV-2 WA strain. Given the ongoing diversification of SARS-CoV-2 and the current prevalence of the Omicron VOC it would be good if the authors could investigate whether alteration in Mac1 occurred or are detected which could influence the efficacy of their inhibitor. Similarly, it would be interesting to know how effective their drug is on other clinically relevant beta-CoV Mac1, e.g. from MERS or SARS1.

      We thank the reviewer for the suggestion. Mac1 is one of the more conserved areas of the SARS-CoV-2 genome as there has only been one nonsynonymous mutation V34L (Orf1a:V1056L) that recently emerged in the BA.2.86 lineage and is now in all of the JN.1 derivatives. Currently, the mutation is only ~80% penetrant in circulating SARS-CoV-2 sequences suggesting that it might revert to wild-type and is not associated with a fitness benefit. Based on our structural analysis (shown in Supplementary Figure4D above), we do not believe this mutation affects AVI-4206 binding, but we are including this variant in our future in vitro and in vivo studies as well as other beta-CoV.  For SARS and MERS, see response to Reviewer 1 using CETSA to show that these targets are engaged by AVI-4206.

      (h) As methods to detect PARP14-derived ADP-ribosylation are available and it was shown that Mac1 can reverse this modification in cells. It would be good if the authors could investigate the impact of AVI-4206 on ADP-ribosylation in vivo.

      To test this idea we adapted the IF assay used by others in the field and show an effect of AVI-4206. We have added to the text:

      Although the IFN response was not sufficient to control viral replication, it is possible that the changes in ADP-ribosylation, in particular marks catalyzed by PARP14, downstream of IFN treatment could serve as a marker for Mac1 efficacy  (Ribeiro et al. 2025). To investigate whether downstream signals from PARP14 were specifically erased by Mac1, we used an immunofluorescence assay that showed that Mac1 could remove IFN-γ-induced ADP-ribosylation that is mediated by PARP14 (Kar et al. 2024).  We stably expressed wild-type Mac1 and the N40D mutant Mac1 in A549 cells. The data showed that Mac1 expression decreased IFN-γ-induced ADP-ribosylation, whereas the Mac1-N40D mutant did not (Figure 3E, F), indicating that Mac1 mediates the hydrolysis of IFN-γ-induced ADP-ribosylation. The PARP14 inhibitor RBN012759 completely blocked IFN-γ-induced ADP-ribosylation (Figure 3E, F), further confirming that IFN-γ-induced ADP-ribosylation is mediated by PARP14. AVI-4206 reversed the Mac1-induced hydrolysis of ADP-ribosylation and enhanced the ADP-ribosylation signal in Mac1-overexpressing cells (Figure 3E, F), further demonstrating its ability to inhibit the hydrolase activity of Mac1. We further validated this result using different ADP-ribosylation antibodies for immunofluorescence (Supplementary Figure 7). However, we observed no strong consistent signals of global pan-ADP-ribose (panADPr) or mono-ADP-ribose (monoADPr) accumulation in infected cells treated with AVI-4206 in immunoblot analyses (Supplementary Figure 8). Collectively, these results provide further evidence that simple cellular models are insufficient to explore the effects of Mac1 inhibition and that monitoring specific PARP14-mediated ADP-ribosylation patterns can provide an accessible biomarker for the efficacy of Mac1 inhibition.

      A549 Mac1 expression cell construction

      Mac1 wild-type (Mac1) and N1062D mutant (Mac1 N1062D) gene fragments were loaded into pLVX-EF1α-IRES-Puro (empty vector, EV) using Gibson cloning kit (NEB E5510). Lentivirus was prepared as previously described (PMID: 30449619; DOI: 10.1016/j.cell.2018.10.024). Briefly, 15 million HEK293T cells were grown overnight on 15 cm poly-L-Lysine coated dishes and then transfected with 6 ug pMD2.G (Addgene plasmid # 12259 ; http://n2t.net/addgene:12259 ; RRID:Addgene_12259), 18 ug dR8.91 (since replaced by second generation compatible pCMV-dR8.2, Addgene plasmid #8455) and 24 ug pLVX-EF1α-IRES-Puro (EV, Mac1, Mac1-N1062D) plasmids using the lipofectamine 3000 transfection reagent per the manufacturer’s protocol (Thermo Fisher Scientific, Cat #L3000001). pMD2.G and dR8.91 were a gift from Didier Trono. The following day, media was refreshed with the addition of viral boost reagent at 500x as per the manufacturer’s protocol (Alstem, Cat #VB100). Viral supernatant was collected 48 hours post transfection and spun down at 300 g for 10 minutes, to remove cell debris. To concentrate the lentiviral particles, Alstem precipitation solution (Alstem, Cat #VC100) was added, mixed, and refrigerated at 4°C overnight. The virus was then concentrated by centrifugation at 1500 g for 30 minutes, at 4°C. Finally, each lentiviral pellet was resuspended at 100x of original volume in cold DMEM+10%FBS+1% penicillin-streptomycin and stored until use at -80°C. To generate Mac1 overexpressing cells, 2 million A549 cells were seeded in 10 cm dishes and transduced with lentivirus in the presence of 8 μg/mL polybrene (Sigma, TR-1003-G). The media was changed after 24h and, after 48 hours, media containing 2μg/ml puromycin was added. Cells were selected for 72 hours and then expanded without selection. The expression of Mac1 was confirmed by Western Blot.

      Immunofluorescence assay:

      To assess the effect of Mac1 on IFN-induced ADP-ribosylation. A549-pLVX-EV, A549-pLVX-Mac1 and A549-pLVX-Mac1-N1062D cells were seeded in 96-well plate (10,000 cells/well). Cells were pre-treated with medium or 100 unit/mL IFN-γ (Sigma, SRP3058) for 24 hours to induce the expression of ADP-ribosylation. These 3 cell lines were then treated the next day with the indicated concentrations of AVI-4206 or RBN012759 (Medchemexpress, HY-136979). After 24 hours of exposure to drugs, treated cells were fixed in pre-cooled methanol at -20°C for 20 min, blocked in 3% bovine serum albumin for 15 min, incubated with Poly/Mono-ADP Ribose (E6F6A) Rabbit mAb (CST, 83732S) or Poly/Mono-ADP Ribose (D9P7Z) Rabbit mAb (CST, 89190S) antibodies for 1 h, and then incubated with Goat anti-Rabbit IgG Secondary Antibody, Alexa Fluor 488 (ThermoFisher, A-11008) secondary antibodies for 30 min and stained with DAPI for 10 minutes. Fluorescent cells were imaged with an IN Cell Analyzer 6500 System (Cytiva) and analyzed using IN Carta software (Cytiva).

      Reviewer #3 (Recommendations for the authors):

      Just a couple of observations/details that might help strengthen the article:

      (1) The caco-1 data for AVI4206 would suggest that there is some sort of efflux going on, yet there is no mention of it in the paper. This might be useful in the optimization paradigm moving forward.

      We thank the reviewer for this observation and suggestion.  Indeed, we believe that efflux is behind the low oral bioavailability of AVI-4206.  We are working specifically to remove this liability in next-generation analogs, using the caco2 assay to guide this ongoing effort. Keep an eye out for a preprint on this soon!  We have added to the discussion:

      “In addition to dissecting such molecular mechanisms of macrodomain function and inhibition, future efforts will focus on improving pharmacokinetic properties, including a cellular efflux liability that results in low oral bioavailability of AVI-4206. ”

      (2) There are some spectroscopic anomalies/mistakes in the NMR data. The carbon NMR for 1-((8-amino-9H-pyrimido[4,5-b]indol-4-yl)amino)pyrrolidin-2-one should only have 14 unique carbons, but the authors report 15. The HNMR for AVI1500 should only have 19 H's, but the authors list 20. The HNMR data for AVI3762/3763 should have 16 H's, but the authors only report 13. The CNMR for AVI4206 should only have 19 unique carbons, but the authors report 20.

      Thank you for noting these inconsistencies regarding the reported NMR spectra. We have rectified them by more closely examining the spectra and in some cases acquiring new data. We identified one peak (47.9) in the 13C NMR of 1-((8-amino-9H-pyrimido[4,5-b]indol-4-yl)amino)pyrrolidin-2-one that is apparently an artifact of the automated peak picking in the data analysis software.  In the 1H NMR of AVI-1500, the triplet peak at 7.20 integrates to 1H, but was erroneously reported as 2H in the original manuscript.  This error has been corrected.  Spectra were re-acquired for AVI-3762, AVI-3763, and AVI-4206 with longer acquisition times, and/or on a 600 MHz spectrometer to afford the complete line lists now reported in the revised manuscript. Please note AVI-4206 has 18 distinct 13C resonances due to the equivalence of the gem-dimethyl methyl groups.

    1. eLife Assessment

      This study reanalyzed previously published scRNA-seq and TCR-seq data to examine the proportion and characteristics of dual-TCR-expressing Treg cells in mice, presenting some useful insights into TCR diversity and immune regulation. However, the evidence is incomplete, particularly with respect to data interpretation, statistical rigor, and the functionality of dual -TCR Treg cells. The study is potentially of interest to immunologists studying T-cell biology.

    2. Reviewer #2 (Public review):

      Summary:

      The manuscript, by Xu and Peng, et al. investigates whether co-expression of 2 T cell receptor (TCR) clonotypes can be detected in FoxP3+ regulatory CD4+ T cells (Tregs) and if it is associated with identifiable phenotypic effects. This paper presents data reanalyzing publicly available single-cell TCR sequencing and transcriptional analysis, convincingly demonstrating that dual TCR co-expression can be detected in Tregs, both in peripheral circulation as well as among Tregs in tissues. They then compare metrics of TCR diversity between single-TCR and dual TCR Tregs, as well as between Tregs in different anatomic compartments, finding the TCR repertoires to be generally similar though with dual TCR Tregs exhibiting a less diverse repertoire and some moderate differences in clonal expansion in different anatomic compartments. Finally, they examine the transcriptional profile of dual TCR Tregs in these datasets, finding some potential differences in expression of key Treg genes such as Foxp3, CTLA4, Foxo3, Foxo1, CD27, IL2RA, and Ikzf2 associated with dual TCR-expressing Tregs, which the authors postulate implies a potential functional benefit for dual TCR expression in Tregs.

      Strengths:

      This report examines an interesting and potentially biologically significant question, given recent demonstrations that dual TCR co-expression is a much more common phenomenon than previously appreciated (approximately 15-20% of T cells) and that dual TCR co-expression has been associated with significant effects on the thymic development and antigenic reactivity of T cells. This investigation leverages large existing datasets of single-cell TCRseq/RNAseq to address dual TCR expression in Tregs. The identification and characterization of dual TCR Tregs is rigorously demonstrated and presented, providing convincing new evidence of their existence.

      Weaknesses:

      The existence of dual TCR expression by Tregs has previously been demonstrated in mice and humans, limiting the novelty of the reported findings. The presented results should be considered in the context of these prior important findings. The focus on self-citation of their previous work, using the same approach to measure dual TCR expression in other datasets. limits the discussion of other more relevant and impactful published research in this area. Also, Reference #7 continues to list incorrect authors. The authors do not present a balanced or representative description of the available knowledge about either dual TCR expression by T cells or TCR repertoires of Tregs.

      The approach used follows a template used previously by this group for re-analysis of existing datasets generated by other research groups. The descriptions and interpretations of the data as presented are still shallow, lacking innovative or thoughtful approaches that would potentially be innovation or provide new insight.

      This demonstration of dual TCR Tregs is notable, though the authors do not compare the frequency of dual TCR co-expression by Tregs with non-Tregs. This limits interpreting the findings in the context of what is known about dual TCR co-expression in T cells. The response to this criticism in a previous review is considered non-responsive and does not improve the data or findings.

      Comparison of gene expression by single- and dual TCR Tregs is of interest, but as presented is difficult to interpret. The interpretations of the gene expression analyses are somewhat simplistic, focusing on single-gene expression of some genes known to have function in Tregs. However, the investigators continue to miss an opportunity to examine larger patterns of coordinated gene expression associated with developmental pathways and differential function in Tregs (Yang. 2015. Science. 348:589; Li. 2016. Nat Rev Immunol. Wyss. 2016. 16:220; Nat Immunol. 17:1093; Zenmour. 2018. Nat Immunol. 19:291). No attempt to define clusters is made. No comparison is made of the proportions of dual TCR cells in transcriptionally-defined clusters. The broad assessment of key genes by single- and dual TCR cells is conceptually interesting, but likely to be confounded by the heterogeneity of the Treg populations. This would need to be addressed and considered to make any analyses meaningful.

      The study design, re-analysis of existing datasets generated by other scientific groups, precludes confirmation of any findings by orthogonal analyses.

    3. Reviewer #3 (Public review):

      Summary:

      This study addressed the TCR pairing types and CDR3 characteristics of Treg cells. By analyzing scRNA and TCR-seq data, it claims that 10-20% of dual TCR Treg cells exist in mouse lymphoid and non-lymphoid tissues and suggests that dual TCR Treg cells in different tissues may play complex biological functions.

      Strengths:

      The study addresses an interesting question of how dual-TCR-expressing Treg cells play roles in tissues.

      Weaknesses:

      This study is inadequate, particularly regarding data interpretation, statistical rigor, and the discussion of the functional significance of Dual TCR Tregs.

      Comments on revisions:

      Although the authors have provided brief explanations in response to the reviewers' comments, they do not present any additional analyses that would address the fundamental concerns in a convincing manner.

      Moreover, the in silico analyses presented in the manuscript alone are insufficient to support the conclusions, and the functional experiments requested by the reviewers have not been conducted.

      In the current rebuttal, while some textual additions have been made to the manuscript, the only substantial revision to the figures appears to be the inclusion of statistical significance annotations (e.g., Fig. 1G, Fig. 3G). These changes do not adequately strengthen the overall data or address the core issues raised.

    4. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #1 (Public review):

      (1) The use of single-cell RNA and TCR sequencing is appropriate for addressing potential relationships between gene expression and dual TCR.

      Thank you for your detailed review and suggestions. The main advantages of scRNA+TCR-seq are as follows: (1) It enables comparative analysis of features such as the ratio of single TCR paired T cells to dual TCR paired T cells at the level of a large number of individual T cells, through mRNA expression of the α and β chains. In the past, this analysis was limited to a small number of T cells, requiring isolation of single T cells, PCR amplification of the α and β chains, and Sanger sequencing; (2) While analyzing TCR paired T cell characteristics, it also allows examination of mRNA expression levels of transcription factors in corresponding T cells through scRNA-seq.

      (2) The data confirm the presence of dual TCR Tregs in various tissues, with proportions ranging from 10.1% to 21.4%, aligning with earlier observations in αβ T cells.

      Thank you very much for your detailed review and suggestions. Early studies on dual TCR αβ T cells have been very limited in number, with reported proportions of dual TCR T cells ranging widely from 0.1% to over 30%. In contrast, scRNA+TCR-seq can monitor over 5,000 single and paired TCRs, including dual paired TCRs, in each sample, enabling more precise examination of the overall proportion of dual TCR αβ T cells. It is important to note that our analysis focuses on T cells paired with functional α and β chains, while T cells with non-functional chain pairings and those with a single functional chain without pairing were excluded from the total cell proportion analysis. Previous studies generally lacked the ability to determine expression levels of specific chains in T cells without dual TCR pairings.

      (3) Tissue-specific patterns of TCR gene usage are reported, which could be of interest to researchers studying T cell adaptation, although these were more rigorously analyzed in the original works.

      Thank you very much for your detailed review and suggestions. T cell subpopulations exhibit tissue specificity; thus, we conducted a thorough investigation into Treg cells from different tissue sites. This study builds upon the original by innovatively analyzing the differences in VDJ rearrangement and CDR3 characteristics of dual TCR Treg cells across various tissues. This provides new insights and directions for the potential existence of “new Treg cell subpopulations” in different tissue locations. The results of this analysis suggest the necessity of conducting functional experiments on dual TCR Treg cells at both the TCR protein level and the level of effector functional molecules.

      (4) Lack of Novelty: The primary findings do not substantially advance our understanding of dual TCR expression, as similar results have been reported previously in other contexts.

      Thank you for your detailed review and suggestions. Early research on dual TCR T cells primarily relied on transgenic mouse models and in vitro experiments, using limited TCR alpha chain or TCR beta chain antibody pairings. Flow cytometry was used to analyze a small number of T cells to estimate dual TCR T cell proportion. No studies have yet analyzed dual TCR Treg cell proportion, V(D)J recombination, and CDR3 characteristics at high throughput in physiological conditions. The scRNA+TCR-seq approach offers an opportunity to conduct extensive studies from an mRNA perspective. With high-throughput advantages of single-cell sequencing technology, researchers can analyze transcriptomic and TCR sequence characteristics of all dual TCR Treg cells within a study sample, providing new ideas and technical means for investigating dual TCR T cell proportions, characteristics, and origins under different physiological and pathological states.

      (5) Incomplete Evidence: The claims about tissue-specific differences lack sufficient controls (e.g., comparison with conventional T cells) and functional validation (e.g., cell surface expression of dual TCRs).

      Thank you for your detailed review and suggestions. This study indeed only analyzed dual  TCR Treg cells from different tissue locations based on the original manuscript, without a comparative analysis of other dual TCR T cell subsets corresponding to these tissue locations. The main reason for this is that, in current scRNA+TCR-seq studies of different tissue locations, unless specific T cell subsets are sorted and enriched, the number of T cells obtained from each subset is very low, making a detailed comparative analysis impossible. In the results of the original manuscript, we observed a relatively high proportion of dual TCR Treg cell populations in various tissues, with differences in TCR composition and transcription factor expression. Following the suggestions, we have included additional descriptions in R1, citing the study by Tuovinen et al., which indicates that the proportion of dual TCR Tregs in lymphoid tissues is higher than other T cell types. This will help understand the distribution characteristics of dual TCR Treg cells in different tissues and provide a basis for mRNA expression levels to conduct functional experiments on dual TCR Treg cells in different tissue locations.

      (6) Methodological Weaknesses: The diversity analysis does not account for sample size differences, and the clonal analysis conflates counts and clonotypes, leading to potential misinterpretation.

      We thank you for your review and suggestions. In response to your question about whether the diversity analysis considered the sample size issue, we conducted a detailed review and analysis. This study utilized the inverse Simpson index to evaluate TCR diversity of Treg cells. A preliminary analysis compared the richness and evenness of single TCR Treg cell and dual TCR Treg cell repertoires. The two datasets analyzed were from four mouse samples with consistent processing and sequencing conditions. However, when analyzing single TCR Tregs and dual TCR Tregs from various tissues, differences in detected T cell numbers by sequencing cannot be excluded from the diversity analysis. Following recommendations, we provided additional explanations in R1: CDR3 diversity analysis indicates TCR composition of dual TCR Treg cells exhibits diversity, similar to single TCR Treg cells; however, diversity indices of single TCR Tregs and dual TCR Tregs are not suitable for statistical comparison. Regarding the "clonal analysis" you mentioned, we define clonality based on unique TCR sequences; cells with identical TCR sequences are part of the same clone, with ≥2 counts defined as expansion. For example, in Blood, there are 958 clonal types and 1,228 cells, of which 449 are expansion cells. In R1, we systematically verified and revised clonal expansion cells across all tissue samples according to a unified standard.

      (7) Insufficient Transparency: The sequence analysis pipeline is inadequately described, and the study lacks reproducibility features such as shared code and data.

      Thank you for your review and suggestions. Based on the original manuscript, we have made corresponding detailed additions in R1, providing further elaboration on the analysis process of shared data, screening methods, research codes, and tools. This aims to offer readers a comprehensive understanding of the analytical procedures and results.

      (8) Weak Gene Expression Analysis: No statistical validation is provided for differential gene expression, and the UMAP plots fail to reveal meaningful clustering patterns.

      Thank you very much for your review and suggestions. Based on your recommendations, we conducted an initial differential expression analysis of the top 10 mRNA molecules in single TCR Treg and dual TCR Treg cells using the DESeq2 R package in R1, with statistical significance determined by Padj < 0.05. Regarding the clustering patterns in the UMAP plots, since the analyzed samples consisted of isolated Treg cell subpopulations that highly express immune suppression-related genes, we did not perform a more detailed analysis of subtypes and expression gene differences. This study primarily aims to explore the proportions of single TCR and dual TCR Treg cells from different tissue sources, as well as the characteristics of CDR3 composition, with a focus on showcasing the clustering patterns of samples from different tissue origins and various TCR pairing types.

      (9) A quick online search reveals that the same authors have repeated their approach of reanalysing other scientists' publicly available scRNA-VDJ-seq data in six other publications,In other words, the approach used here seems to be focused on quick re-analyses of publicly available data without further validation and/or exploration.

      Thank you for your review and suggestions. Most current studies utilizing scRNA+TCR-seq overlook analysis of TCR pairing types and related research on single TCR and dual TCR T cell characteristics. Through in-depth analysis of shared scRNA+TCR-seq data from multiple laboratories, we discovered a significant presence of dual TCR T cells in high-throughput T cell research results that cannot be ignored. In this study, we highlight the higher proportion of dual TCR Tregs in different tissue locations, which exhibits a certain degree of tissue specificity, suggesting these cells may participate in complex functional regulation of Tregs. This finding provides new ideas and a foundation for further research into dual TCR Treg functions. However, as reviewers pointed out, findings from scRNA+TCR-seq at the mRNA level require additional functional experiments on dual TCR T cells at the protein level. We have supplemented our discussion in R1 based on these suggestions.

      Reviewer #2 (Public review):

      (1) The existence of dual TCR expression by Tregs has previously been demonstrated in mice and humans (Reference #18 and Tuovinen. 2006. Blood. 108:4063; Schuldt. 2017. J Immunol. 199:33, both omitted from references). The presented results should be considered in the context of these prior important findings.

      Thank you very much for your review and suggestions. Based on the original manuscript, we have supplemented our reading, understanding, and citation of closely related literature (Tuovinen, 2006, Blood, 108:4063 (line 44,line175 in R1); Schuldt, 2017, J Immunol, 199:33 (line 44,line178 in R1)). We once again appreciate the valuable comments from the reviewers, and we will refer to these in our subsequent dual TCR T cell research.

      (2) This demonstration of dual TCR Tregs is notable, though the authors do not compare the frequency of dual TCR co-expression by Tregs with non-Tregs. This limits interpreting the findings in the context of what is known about dual TCR co-expression in T cells.

      Thank you very much for your review and suggestions. This analysis is primarily based on the scRNA+TCR-seq study of sorted Treg cells, where we found the proportions and distinguishing features of dual TCR Treg cells in different tissue sites. Given the diversity and complexity of Treg function, conducting a comparative analysis of the origins of dual TCR Treg cells and non-T cells with dual TCRs will be a meaningful direction. Currently, peripheral induced Treg cells can originate from the conversion of non-Treg cells; however, little is known about the sources and functions of dual TCR Treg cell subsets in both central and peripheral sites. In R1, we have supplemented the discussion regarding the possible origins and potential applications of the "novel dual TCR Treg" subsets.

      (3) Comparison of gene expression by single- and dual TCR Tregs is of interest, but as presented is difficult to interpret. Statistical analyses need to be performed to provide statistical confidence that the observed differences are true.

      Thank you very much for your review and suggestions. Based on your recommendations, we performed an initial differential expression analysis of the top 10 mRNA molecules in single TCR Treg and dual TCR Treg cells using the DESeq2 R package in R1, with a statistical significance threshold of Padj<0.05 for comparisons.

      (4) The interpretations of the gene expression analyses are somewhat simplistic, focusing on the single-gene expression of some genes known to have a function in Tregs. However, the investigators miss an opportunity to examine larger patterns of coordinated gene expression associated with developmental pathways and differential function in Tregs (Yang. 2015. Science. 348:589; Li. 2016. Nat Rev Immunol. Wyss. 2016. 16:220; Nat Immunol. 17:1093; Zenmour. 2018. Nat Immunol. 19:291).

      Thank you for your review and suggestions. This study is based on publicly available scRNA+TCR-seq data from different organ sites generated by the original authors, focusing on sorted and enriched Treg cells within each tissue sample. However, there was no corresponding research on other cell types in each tissue sample, preventing analysis of other cells and factors involved in development and differentiation of single TCR Treg and dual TCR Treg. The literature suggested by the reviewer indicates that development, differentiation, and function of Treg cells have been extensively studied, resulting in significant advances. It also highlights complexity and diversity of Treg origins and functions. This research aims to investigate "novel dual TCR Treg cell subpopulations" that may exhibit tissuespecific differences found in the original authors' studies of Treg cells across different organ sites. This suggests further experimental research into their development, differentiation, origin, and functional gene expression as an important direction, which we have supplemented in the discussion section of R1.

      Reviewer #3 (Public review):

      (1) Definition of Dual TCR and Validity of Doublet Removal:This study analyzes Treg cells with Dual TCR, but it is not clearly stated how the possibility of doublet cells was eliminated. The authors mention using DoubletFinder for detecting doublets in scRNA-seq data, but is this method alone sufficient?We strongly recommend reporting the details of doublet removal and data quality assessment in the Supplementary Data.

      Thank you very much for your review and suggestions. In the analysis of the shared scRNA+TCR-seq data across multiple laboratories, as you mentioned, this study employed the DoubletFinder R package to exclude suspected doublets. Additionally, we used the nCount values of individual cells (i.e., the total sequencing reads or UMI counts for each cell) as auxiliary parameters to further optimize the assessment of cell quality. Generally, due to the possibility that doublet cells may contain gene expression information from two or more cells, their nCount values are often abnormally high. In this study, all cells included in the analysis had nCount values not exceeding 20,000. Among the five tissue sample datasets, we further utilized hashtag oligonucleotide (HTO) labeling (where HTO labeling provides each cell with a unique barcode to differentiate cells from different tissue sources. By analyzing HTO labels, doublets and negative cells can be accurately identified) to eliminate doublets and negative cells.After the removal of chimeric cells, all samples exhibited T cells that possessed two or more TCR clones. This phenomenon validates the reliability of the methodological approach employed in this study and indicates that the analytical results accurately reflect the proportion of dual TCR T cells. Based on the recommendations of the reviewers, we have supplemented and clarified the methods and discussion sections in the manuscript. It is particularly noteworthy that in our analysis, the discussed dual TCR Treg cells and single TCR Treg cells specifically refer to those T cells that possess both functional α and β chains, which are capable of forming TCR. We have excluded from this analysis any Treg cells that possess only a single functional α or β chain and do not form TCR pairs, as well as those Treg cells in which the α or β chains involved in TCR pairing are non-functional.

      (2) In Figure 3D, the proportion of Dual TCR T cells (A1+A2+B1+B2) in the skin is reported to be very high compared to other tissues. However, in Figure 4C, the proportion appears lower than in other tissues, which may be due to contamination by non-Tregs. The authors should clarify why it was necessary to include non-Tregs as a target for analysis in this study. Additionally, the sensitivity of scRNA-seq and TCR-seq may vary between tissues and may also be affected by RNA quality and sequencing depth in skin samples, so the impact of measurement bias should be assessed.

      We deeply appreciate your review and constructive comments. Based on the original manuscript, we have further supplemented and elaborated on the uniqueness and relative proportions of double TCR T cell pairs in skin tissue samples in Section R1. Due to the scarcity of T cells in skin samples, we included some non-Treg cells during single-cell RNA sequencing and TCR sequencing to obtain a sufficient number of cells for effective analysis. The presence of non-regulatory T cells may indeed impact the statistical representation of double TCR T cells as well as the related comparative analyses, as noted by the reviewer. T cells with A1+A2+B1+B2 type double TCR pairings are primarily found within the non-regulatory T cell population in the skin. In response to this point, we have provided a detailed explanation of this analytical result in the revised manuscript R1. Furthermore, concerning the two datasets included in the study, we conducted a comparative analysis in R1, exploring how factors such as sequencing depth at different tissue sites might introduce biases in our findings, which we have thoroughly elaborated upon in the discussion section. We thank you once again for your valuable suggestions. 

      (3) Issue of Cell Contamination:In Figure 2A, the data suggest a high overlap between blood, kidney, and liver samples, likely due to contamination. Can the authors effectively remove this effect? If the dataset allows, distinguishing between blood-derived and tissue-resident Tregs would significantly enhance the reliability of the findings. Otherwise, it would be difficult to separate biological signals from contamination noise, making interpretation challenging.

      We thank you for your review and suggestions. We have carefully verified data sources for tissues such as blood, kidneys, and liver. In the study by Oliver T et al., various techniques were employed to differentiate between leukocytes from blood and those from tissues, ensuring accurate identification of leukocytes from tissue samples. First, anti-CD45 antibody was injected intravenously to label cells in the vasculature, verifying that analyzed cells were indeed resident in the tissue. Second, prior to dissection and cell collection, authors performed perfusion on anesthetized mice to reduce contamination of tissue samples by leukocytes from the vasculature. Additionally, during single-cell sequencing, authors utilized HTO technology to avoid overlap between cells from different tissues.

      Analysis of the scRNA+TCR-seq data shared by the original authors revealed highly overlapping TCR sequences in blood, kidney, and liver, despite distinct cell labels associated with each tissue. While these techniques minimize overlap of cells from different sources, they cannot completely rule out the potential impact of this technical issue. As suggested, we have provided additional clarification in R1 of the manuscript regarding this phenomenon of high overlap in the kidney, liver, and blood, indicating that the possibility of Treg migration from blood to kidney and liver cannot be entirely excluded.

      (4) Inconsistency Between CDR3 Overlap and TCR Diversity:The manuscript states that Single TCR Tregs have a higher CDR3 overlap, but this contradicts the reported data that Dual TCR Tregs exhibit lower TCR diversity (higher 1/DS score). Typically, when TCR diversity is low (i.e., specific clones are concentrated), CDR3 overlap is expected to increase. The authors should carefully address this discrepancy and discuss possible explanations.

      Thank you for your review and suggestions. Regarding the potential relationship between CDR3 overlap and TCR diversity, in samples with consistent sequencing depth, lower diversity indeed corresponds to a higher proportion of CDR3 overlap. In our analysis of scRNA+TCR-seq data, we found that single TCR Tregs exhibit both higher diversity and CDR3 overlap, seemingly presenting contradictory analytical results (i.e., dual TCR Tregs show lower TCR diversity and CDR3 overlap). In R1, we supplemented the analysis of possible reasons: the presence of multiple TCR chains in dual TCR Treg cells may lead to a higher uniqueness of CDR3 due to multiple rearrangements and selections, resulting in lower CDR3 overlap; the lower diversity of dual TCR Tregs may be related to the number of T cells sequenced in each sample. The CDR3 diversity analysis in this study merely suggests that the TCR composition of dual TCR Treg cells is diverse, similar to that of single TCR Tregs. However, the diversity indices of single TCR Tregs and dual TCR Tregs are not suitable for statistical comparative analysis. A more in-depth and specific analysis of the diversity and overlap of the VDJ recombination mechanisms and CDR3 composition in dual TCR Tregs during development will be an important technical means to elucidate the function of dual TCR Treg cells.

      (5) Functional Evaluation of Dual TCR Tregs:This study indicates gene expression differences among tissue-resident Dual TCR T cells, but there is no experimental validation of their functional significance. Including functional assays, such as suppression assays or cytokine secretion analysis, would greatly enhance the study's impact.

      We sincerely appreciate your review and suggestions: In this analysis of scRNA+TCR-seq data, we innovatively discovered a higher proportion of dual TCR Treg cells in different tissue sites, which exhibited differences in tissue characteristics. Furthermore, we conducted a comparative analysis of the homogeneity and heterogeneity between single TCR Treg and dual TCR Treg cells. This result provides a foundation for further research on the origin and characteristics of dual TCR Treg cells in different tissue sites, offering new insights for understanding the complexity and functional diversity of Treg cells. Based on your suggestions, we have supplemented R1 with the feasibility of further exploring the functions of tissue-resident dual TCR T cells and the necessity for potential application research.

      (6) Appropriateness of Statistical Analysis:When discussing increases or decreases in gene expression and cell proportions (e.g., Figure 2D), the statistical methods used (e.g., t-test, Wilcoxon, FDR correction) should be explicitly described. They should provide detailed information on the statistical tests applied to each analysis.

      Thank you for your review and suggestions: Based on the original manuscript, we have supplemented the specific statistical methods for the differences in cell proportions and gene expression in R1.

    1. eLife Assessment

      This study proposes an important new approach to analyzing cell-count data, which are often undersampled and cannot be accurately assessed using traditional statistical methods. The case studies presented in the article provide compelling evidence of the superiority of the proposed methodology over existing approaches, which could promote the use of Bayesian statistics among neuroscientists. The authors have taken steps to make the methodology accessible, although some implementation difficulties are likely to remain.

    2. Reviewer #1 (Public review):

      Summary:

      This work proposes a new approach to analyse cell-count data from multiple brain regions. Collecting such data can be expensive and time-intensive, so, more often than not, the dimensionality of the data is larger than the number of samples. The authors argue that Bayesian methods are much better suited to correctly analyse such data compared to classical (frequentist) statistical methods. They define a hierarchical structure, partial pooling, in which each observation contributes to the population estimate to more accurately explain the variance in the data. They present two case studies in which their method proves more sensitive in identifying regions where there are significant differences between conditions, which otherwise would be hidden.

      Strengths:

      The model is presented clearly, and the advantages of the hierarchical structure are strongly justified. Two alternative ways are presented to account for the presence of zero counts. The first involves the use of a horseshoe prior, which is the more flexible option, while the second involves a modified Poisson likelihood, which is better suited to datasets with a large number of zero counts, perhaps due to experimental artifacts. The results show a clear advantage of the Bayesian method for both case studies.<br /> The code is freely available, and it does not require a high-performance cluster to execute for smaller datasets. As Bayesian statistical methods become more accessible in various scientific fields, the whole scientific community will benefit from the transition away from p-values. Hierarchical Bayesian models are an especially useful tool that can be applied to many different experimental designs. However, while conceptually intuitive, their implementation can be difficult. The authors provide a good framework with room for improvement.

      Weaknesses:

      As with any Bayesian model, the choice of prior can significantly influence the results. The authors explain how the methodology can be adapted to different data properties, though selecting an appropriate prior or likelihood may not always be straightforward. They propose a 'standard workflow' as an alternative to traditional approaches, which could and should be used alongside established methods while Bayesian techniques continue to evolve and improve.

    3. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      “Alternative possibilities are discussed regarding the prior and likelihood of the model. Given that the second case study inspired the introduction of the zero-inflation likelihood, it is not clear how applicable the general methodology is to various datasets. If every unique dataset requires a tailored prior or likelihood to produce the best results, the methodology will not easily replace more traditional statistical analyses that can be applied in a straightforward manner. Furthermore, the differences between the results produced by the two Bayesian models in case study 2 are not discussed. In specific regions, the models provide conflicting results (e.g., regions MH, VPMpc, RCH, SCH, etc.), which are not addressed by the authors. A third case study would have provided further evidence for the generalizability of the methodology.”

      We hope in this paper to propose a ‘standard workflow’ for these data; this standard workflow uses the horseshoe prior and we propose that this is the approach used to describe cell count data instead of the better established, but to our thinking, inefficient, t-testing approach.

      The horseshoe prior is robust and allows a partially-pooled model to used while weighing-up the contribution of different data points. This is an analogue of excluding outliers and, in any analysis it is normal to investigate further if there are points being excluded as outliers. Often this reveals a particular challenge with the data, in the case of the data here, there are a lot of zeros, indicating that some samples should be excluded because the preparation failed to tag cells rather than because there were no cells to tag. This idea behind the ZIP example is to show that the Bayesian method can allow for this sort of further investigation and, indeed, as the reviewer notes this sort of extended analysis is often bespoke, tailored to the data.

      We have clearly failed to explain that the ‘standard workflow’ we propose replace the more traditional methods is the first one we describe, with the horseshoe prior; this produces better results on both datasets than the traditional approach. However, we also feel it is useful to show how a more tailored follow-on can be useful; we need to make it clear that this is intended as an illustration of an ‘optional extra’ rather than a part of the more straightforward ‘standard workflow’.

      To make this clearer we have made altered the text in several locations:

      • end of Introduction: added clarifying sentence “Here, our aim is to introduce a ‘standard’ Bayesian model for cell count data. We illustrate the application of this model to two datasets, one related to neural activation and the other to developmental lineage. For the second dataset, we also demonstrate a second example extension Bayesian model.”

      • Section Hierarchical modeling: “Our goal in both cases is to quantify group differences in the data. We present a ‘standard’ hierarchical model. This model reflects the experimental features common to cell count experiments and reflects the hierarchical structure of cell count data; the standard model is designed to deal robustly and efficiently with noise. On some occasions, to reflect a specific hypotheses, the structure of a particular experiment or an observed source of noise, this model can be further refined or changed to target the analysis. We will give an example of this for our second dataset.”

      • Section Horseshoe prior: “The alternative is via a flexible prior such as the horseshoe Carvalho et al., 2010; Piironen and Vehtari, 2017. This more generic option may be suitable as a default ‘standard’ approach in the typical case where outliers are poorly understood.”

      • Discussion: word ‘standard’ added to sentence: “Our standard workflow uses a horseshoe prior, along with the partial pooling, this allows our model to deal effectively with outliers.”

      • Discussion: modified sentence “The horseshoe prior model workflow we have exhibited here is intended as a standard approach.”

      Indeed, because the horseshoe prior deals robustly with outliers, whereas the ZIP is intended to model the outliers, any substantial difference between the two should be examined carefully. The referee is right to point out that we have not explained this in any detail and has helpfully listed a few brain regions were there are differences. This is useful, particularly since the examples listed illustrate in a useful way the opportunities and hazards this sort of data presents. To address this, we have added a new version of Figure 6 to the revised manuscript

      Previously Figure 6 showed two example brain regions: MPN and TMd. We have now added MH and SCH to the figure, and new text commenting on the insights the plots provide, both in the Results and Discussion.

      Reviewer #2 (Public review):

      “A clearer link between the experimental data and model-structure terminology would be a benefit to the non-expert reader.”

      This is a very good point and we are acutely aware through our own work how difficult it can be moving between fields with different research goals, different scientific cultures and different technical vocabularies. Just as it can be difficult translating from one language to another without losing nuance and meaning, it can be a real challenge finding technical terms that are useful for the non-expert reader while retaining the precision the application requires! In the long run, we hope that, just as some of the very specialized vocabulary that surrounds frequentist statistics has become familiar to to the working experimental scientists, the precise terminology involved in Bayesian modelling will become familiar and transparent. However, in advance of that day, we have included a glossary of terms at the end of the main text, and have made numerous small tweaks to make sure that link between data and model terminology is clearer and better explained.

      Reviewer #1 (Recommendations fro the authors):

      (1) “I would strongly recommend that the authors include more case studies in the manuscript, and address the qualitative differences between the different versions of the model.”

      We agree that our method will only become established when it is applied to more datasets, we hope to contribute to further analysis and we know other people are already using the approach on their own data. We do, however, feel that adding more datasets to this paper will make it longer and more complex; the plan, instead, is to use the method on novel datasets to test specific hypotheses, so that the results will include novel scientific findings as well as adding another illustration of the Bayesian approach applied to data that is already well studied.

      (2) “Figure 6 is not discussed in the main text.”

      We had discussed the results presented in Figure 6 in the second paragraph of the section “Case study two – Ontogeny of inhibitory interneurons of the mouse thalamus”, however the reviewer is right in that we did not directly refer to the Figure – this was an oversight. In any case, in the revised manuscript we present a new version of Figure 6 (in response to above comment), which is now explicitly cited in the text.

      Revised Figure 6: Example data and inferences highlighting model discrepancies. On the left under ‘data’: boxplots with medians and interquartile ranges for the raw data for four example brain regions. The shape of each point pairs left and right hemisphere readings in each of the five animals. On the right under ‘inference’: HDIs and confidence intervals are plotted. Purple is the Bayesian horseshoe model, pink is the Bayesian ZIP model, and orange is the sample mean. The Bayesian estimates are not strongly influenced by the zero-valued observations (MPN, SCH, TMd) or large-valued outliers (MH) and have means close to the data median. This explains the advantage of the Bayesian results over the confidence interval.

      Reviewer #2 (Recommendations from the authors):

      (1) “This is a generally well-written methodology paper that also provides the underlying code as a resource. As a reviewer outside both cell-count modelling and hierarchical-Bayesian approaches (though with a general interest in the topics) I found the method a little difficult to follow and would have liked to have been left with a better understanding of how the method is applied to the data. For example, in Figure 1 we are introduced to brain region count, animal count, and “items”. Then in the next line: pooling, model, structure, population and etc in subsequent lines. It is not clear what the subscripts (the pools?) are referring to: are they different regions R or animals N? These terms need to be better linked to the data and/or trimmed. Having said that, the later results look like a solid contribution to the field with a significant reduction in uncertainty from the Bayesian approach over the frequentist one. A future version of the manuscript, therefore, would benefit from greater precision of language as well as an economy and greater focus of terms linking the method to the biology. This is particularly the case around the exposition parts in Figure 1, Figure 2, and the “Hierarchical modelling” section.”

      This is another important point. We have now made numerous small changes to tighten up the text in the paper, in response to both this point and the next point.

      (2) “Language throughout could be sharpened. Subjectivity like “surprising outliers” could be removed and quirky grammar like “often small, ten is a typical” improved. There are also typos “an rate” etc that should be tidied up.”

      As per previous response, we have made numerous tweaks and small improvements and feel that the paper is stronger in this respect.

      (3) “Figure 1 caption. “It is a spectrum that depends” Is spectrum the right word here? Also, “thicker stroke” what does this refer to? Wasn’t immediately clear. In A, why is the whole animal within the R bracket that signifies brain regions, and then the brain regions are within the N bracket that signifies whole animals? Apart from the teal colouring, what are the other coloured regions in the image referring to? Improving this first figure would greatly help a reader unfamiliar with the context of the approach.”

      We have replaced the word “spectrum” with “continuum”. We have replaced “ Observed quantities have been highlighted with a thicker stroke in the graphical model.” with “The observed data quantities, y<sub>i</sub> to y<sub>n</sub>, are highlighted with a thick line in the model diagrams”. We have added the following text to describe the red and green lines in panel A: “green and red lines indicate regions labeled as damaged”.

      (4) “On P2 there is no discussion of priors when running through the advantage of the Bayesian approach. Is this a choice or an oversight? Priors do have a role in the later analysis.”

      A short additional paragraph has been added to the introduction outlining the advantage of having a prior, but also noting that the obligation to pick a prior can be intimidating and that suggesting priors is one of the contributions of our paper: “A Bayesian model also includes a set of probability distributions, referred to as the prior, which represent those beliefs it is reasonable to hold about the statistical model parameters before actually doing the experiment. The prior can be thought of as an advantage, it allows us to include in our analysis our understanding of the data based on previous experiments. The prior also makes explicit in a Bayesian model assumptions that are often implicit in other approaches. However, having to design priors is often considered a challenge and here we hope to make this more straightforward by suggesting priors that are suitable for this class of data.”

      (5) “On P4 more explanation would help greatly. Formulas like 23*10*4 or 50*6+50*4 are presented without explanation. What are the various numbers being multiplied? Regions, animals? Again, a clearer link between biological data and model structure would be advantageous.”

      We have now modified this line to clearly state the numbers’ sources: “The index i runs over the full set of samples, which in this case comprises 23 brain regions ×10 animals ×4 groups ≈920 datapoints in the first study, and 50 brain regions × 6 HET animals + 50 brain regions × 4 KO animals ≈500 datapoints in the second.”

      (6) “P6 and Results. Is it possible to show examples of the data set sampled from? Perhaps an image or two for the two experiments. Both Figures 4 and 5 as they currently are could be made slightly smaller to provide space for a small explanatory sub-panel. This would help ground the results.”

      This is a good idea. We have now added heatmap visualisations of both entire datasets to revised versions of Figures 4 and 5 (assuming that this is what the reviewer was suggesting).

    1. eLife Assessment

      Using single-cell transcriptomic data from adult mouse inner ear hair cells, the authors identify the differences and similarities of the four hair cell types. They make an important finding: that vestibular hair cells can express many ciliary motility-related genes. Some hair cell kinocilia display motility, suggesting that the kinocilium of vestibular hair cells may function as an active force generator to increase sensitivity. The evidence is incomplete as to whether all kinocilia beat and what the function of kinocilia movement is.

    2. Reviewer #1 (Public review):

      Summary

      Xu et al. use transcriptomic comparisons of mouse cochlear and vestibular hair to show that the vestibular hair cells alone are enriched in gene expression for proteins necessary for cilia motility and to further argue that such motility is a normal function of the kinocilia.

      Background:

      Cilia are prominent in sensory receptors, including vertebrate photoreceptors, olfactory neurons, and mechanosensitive hair cells of the inner ear and lateral line. Cilia can be motile or nonmotile depending on their axonemal structure: motile cilia require dynein and the inner 2 singlet microtubules of the 9+2 array. Primary cilia, present early in development, are considered to have sensory functions and to be nonmotile (Mill et al., Nature Rev Gen 2023).

      In hair cells, the kinocilium anchors and polarizes the mechanosensitive hair bundle of specialized microvilli. The kinocilium matures from the primary cilium of a newborn hair cell; behind it, the bundle of mechanosensory microvilli rises in a descending staircase of rows. During maturation of the mammalian cochlea, all hair cells lose the kinocilium, though not the associated basal body. The consensus for many years has been that most vertebrate kinocilia, and especially mammalian kinocilia, are nonmotile, based largely on the lack of spontaneous motility in excised mammalian vestibular organs, but also on the impression that the rare examples of spontaneous beating motility even in non-mammalian hair cells are associated with deterioration of the preparation (Rüsch & Thurm 1990).

      Strengths

      In comparing RNA expression across the 4 major types of mouse hair cells - 2 cochlear and 2 vestibular - Xu et al. noted that some ciliary genes related to motility are expressed by vestibular but not cochlear hair cells. They curated the ciliary genes into types known to be associated with different aspects of beating motility, and also investigated the expression of genes typical of primary cilia, which are considered to have sensory and cell signaling functions and to be nonmotile. They add immunostaining to back up some of the RNA data, and also evaluate relative expression by neonatal mouse cochlear and vestibular hair cells from a published dataset. The focus on kinociliary genes is an appropriate use of the comparative expression data for cochlear and vestibular hair cells, and the paper overall is readable and interesting. The transcriptome data are rounded off by comparing the authors' results in adult hair cells with published neonatal mouse cochlear and vestibular transcriptomes.

      Weaknesses:

      (1) Data:

      a) The main weakness in the data is the lack of functional and anatomical data from mouse hair bundles. While the authors compensate in part for this difficulty with bullfrog crista bundles, those data are also fragmentary - one TEM and 2 exemplar videos. Much of the novelty of the EM depends on the different appearance of stretches of a single kinocilium - can we be sure of the absence of the central microtubule singlets at the ends?

      b) While it was a good idea to compare ciliary motility expression in published P2 datasets for mouse cochlear and vestibular hair cells for comparison with the authors' adult hair cell data, the presentation is too superficial to assess (Figure 6C-E; text from line 336) - it is hard to see the basis for concluding that motility genes are specifically lower in P2 cochlear hair cells than vestibular hair cells. Visually, it is striking that CHCs have much darker bands for about 10 motility-related genes.

      (2) Interpretation:

      The authors take the view that kinociliary motility is likely to be normally present but is rare in their observations because the conditions are not right. But while others have described some (rare) kinociliary motility in fish organs (Rusch & Thurm 1990), they interpreted its occurrence as a sign of pathology. Indeed, in this paper, it is not clear, or even discussed, how kinociliary motility would help with mechanosensitivity in mature hair bundles. Rather, the presence of an autonomous rhythm would actively interfere with generating temporally faithful representations of the head motions that drive vestibular hair cells.

      Could kinociliary beating play other roles, possibly during development - for example, by interacting with forming accessory structures (but see Whitfield 2020) or by activating mechanosensitivity cell-autonomously, before mature stimulation mechanisms are in place? Then a latent capacity to beat in mature vestibular hair cells might be activated by stressful conditions, as speculated regarding persistent Piezo channels that are normally silent in mature cochlear hair cells but may reappear when TMC channel gating is broken (Beurg and Fettiplace 2017). While these are highly speculative thoughts, there is a need in the paper for more nuanced consideration of whether the observed motility is normal and what good it would do.

    3. Reviewer #2 (Public review):

      Summary:

      In this study, the authors compared the transcriptomes of the various types of hair cells contained in the sensory epithelia of the cochlea and vestibular organs of the mouse inner ear. The analysis of their transcriptomic data led to novel insights into the potential function of the kinocilium.

      Strengths:

      The novel findings for the kinocilium gene expression, along with the demonstration that some kinocilia demonstrate rhythmic beating as would be seen for known motile cilia, are fascinating. It is possible that perhaps the kinocilium, known to play a very important role in the orientation of the stereocilia, may have a gene expression pattern that is more like a primary cilium early in development and later in mature hair cells, more like a motile cilium. Since the kinocilium is retained in vestibular hair cells, it makes sense that it is playing a different role in these mature cells than its role in the cochlea.

      Another major strength of this study, which cannot be overstated, is that for the transcriptome analysis, they are using mature mice. To date, there is a lot of data from many labs for embryonic and neonatal hair cells, but very little transcriptomic data on the mature hair cells. They do a nice job in presenting the differences in marker gene expression between the 4 hair cell types. This information is very useful to those labs studying regeneration or generation of hair cells from ES cell cultures. One of the biggest questions these labs confront is what type of hair cells develop in these systems. The more markers available, the better. These data will also allow researchers in the field to compare developing hair cells with mature hair cells to see what genes are only required during development and not in later functioning hair cells.

    4. Author response:

      Reviewer #1 (Public review):

      Weaknesses:

      (1) Data:

      a) The main weakness in the data is the lack of functional and anatomical data from mouse hair bundles. While the authors compensate in part for this difficulty with bullfrog crista bundles, those data are also fragmentary - one TEM and 2 exemplar videos. Much of the novelty of the EM depends on the different appearance of stretches of a single kinocilium - can we be sure of the absence of the central microtubule singlets at the ends?

      Our single-cell RNA-seq findings show that genes related to motile cilia are specifically expressed in vestibular hair cells. This has not been demonstrated before. We have also provided supporting evidence using electrophysiology and imaging from bullfrogs and mice. Although no ultrastructural images of mouse vestibular kinocilia were provided in our study, transmission electron micrograph of mouse vestibular kinocilia has been published (O’Donnell and Zheng, 2022). The mouse vestibular kinocilia have a “9+2” microtubule configuration with nine doublet microtubules surrounding two central singlet microtubules. This finding contrasts with a previous study, which demonstrated that the vestibular kinocilia from guinea pigs lack central singlet microtubules and inner dynein arms, whereas outer dynein arms and radial spokes are present (Kikuchi et al., 1989). The central pair of microtubules is absent at the end of the bullfrog saccular kinocilium (Fig. 7A).  We would like to point out that the dual identity of primary and motile cilia is not just based on the TEM images. The kinocilium has long been considered a specialized cilium, and its role as a primary cilium during development has been demonstrated before (Moon et al., 2020; Shi et al., 2022).  

      In most motile cilia, the central pair complex (CPC) does not originate directly from the basal body; instead, it begins a short distance above the transition zone, a feature that already illustrates variation in CPC assembly across systems (Lechtreck et al., 2013). The CPC can also show variation in its spatial extent: for example, in mammalian sperm axonemes, it can terminate before reaching the distal end of the axoneme (Fawcett and Ito, 1965). In addition, CPC orientation differs across organisms: in metazoans and Trypanosoma, the CPC is fixed relative to the outer doublets, whereas in Chlamydomonas and ciliates it twists within the axoneme (Lechtreck et al., 2013). Such variation has been described in multiple motile cilia and flagella and is therefore not unique to vestibular kinocilia. What appears more unusual in our data is the organization at the distal tip, where a distinct distal head is present, similar to cilia tip morphologies recently described in human islet cells (Polino et al., 2023). Although this feature is intriguing, we interpret it primarily as a structural signature rather than as evidence for a specialized motile adaptation, and we will moderate our interpretation accordingly in the revision.

      b) While it was a good idea to compare ciliary motility expression in published P2 datasets for mouse cochlear and vestibular hair cells for comparison with the authors' adult hair cell data, the presentation is too superficial to assess (Figure 6C-E; text from line 336) - it is hard to see the basis for concluding that motility genes are specifically lower in P2 cochlear hair cells than vestibular hair cells. Visually, it is striking that CHCs have much darker bands for about 10 motility-related genes.

      We aimed to show that kinocilia in neonatal cochlear and vestibular hair cells are largely similar, except that neonatal cochlear hair cells lack key genes and proteins required for the motile apparatus. While these genes (e.g., Dynll1, Dynll2, Dynlrb1, Cetn2, and Mdh1) appear more highly expressed in P2 cochlear hair cells, they are not uniquely associated with the axoneme. For example, Dynll1/2 and Dynlrb1 are components of the cytoplasmic dynein-1 complex (Pfister et al., 2006), Cetn2 has multiple basic cellular functions beyond cilia (e.g., centrosome organization, DNA repair), and Mdh1 encodes a cytosolic malate dehydrogenase involved in central metabolic pathways such as the citric acid cycle and malate–aspartate shuttle. This contrasts with axonemal dyneins, which are uniquely required for cilia motility. To avoid ambiguity, we will mark such cytoplasmic or multifunctional genes with stars in both Figure 5G and Figure 6D together with legend in the revised manuscript.

      Although those genes (i.e., Dynll1, Dynll2, Dynlrb1, Cetn2, and Mdh1) are highly expressed in neonatal cochlear hair cells, key genes for motile machinery are not detected. For example, Dnah6, Dnah5, and Wdr66 are not expressed in the P2 cochlear hair cells.  Dnah6 and Dnah5 encode axonemal dynein and are part of inner and outer dynein arms while Wdr66 is a component of radial spokes. Importantly, we did not detect the expression of CCDC39 and CCDC40 in kinocilia of P2 cochlear hair cells.  Axonemal CCDC39 and CCDC40 are the molecular rulers that organize the axonemal structure in the 96-nm repeating interactome and are required for the assembly of IDAs and N-DRC for ciliary motility (Becker-Heck et al., 2011; Merveille et al., 2011; Oda et al., 2014). We will modify Figure 6D to highlight the key difference between P2 cochlear and vestibular hair cells in the revised manuscript. We will also revise the text so that the key differences will clearly be described.

      (2) Interpretation:

      The authors take the view that kinociliary motility is likely to be normally present but is rare in their observations because the conditions are not right. But while others have described some (rare) kinociliary motility in fish organs (Rusch & Thurm 1990), they interpreted its occurrence as a sign of pathology. Indeed, in this paper, it is not clear, or even discussed, how kinociliary motility would help with mechanosensitivity in mature hair bundles. Rather, the presence of an autonomous rhythm would actively interfere with generating temporally faithful representations of the head motions that drive vestibular hair cells.

      Spontaneous flagella-like rhythmic beating of kinocilia in vestibular HCs in frogs and eels (Flock et al., 1977; Rüsch and Thurm, 1990) and in zebrafish early otic vesicle (Stooke-Vaughan et al., 2012; Wu et al., 2011) has been reported previously. Based on Rüsch and Thurm (1990), spontaneous kinocilia motility occurred under non-physiological conditions and was interpreted as a sign of cellular deterioration rather than a normal feature. We speculate that deterioration under non-physiological conditions may lead to the disruption of lateral links between the kinocilium and the stereociliary bundle, effectively unloading the kinocilium and allowing it to move more freely. Additionally, fluctuations in intracellular ATP levels may contribute, as ciliary motility is highly ATP-dependent; when ATP is depleted, beating ceases. Similar phenomena have been documented in respiratory epithelia, where ciliary activity can temporarily pause. Nevertheless, the fact that kinocilia can exhibit spontaneous motility under these conditions indicates that they possess the motile machinery necessary for such beating. Irrespective of the condition, cilia without the molecular machinery required for motility will not be able to move.

      We agree with the reviewer that, based on the present data, it is difficult to know the functional role of kinocilia and whether the presence of such autonomous rhythm would interfere with temporal fidelity. Spontaneous bundle motion, driven by the active process associated with mechanotransduction, was observed in bullfrog saccular hair cells (Benser et al., 1996; Martin et al., 2003). We will revise the discussion to clarify this important point of the reviewer. Specifically, we will emphasize that our observations of ciliary beating in the ex vivo conditions may not reflect its properties in the mature in vivo context, but rather a byproduct of motile machinery clearly present in the kinocilia. We speculate that this machinery in mature hair cells could operate in a more subtle mode—modulating the rigor state of dynein arms or related axonemal structures to influence kinociliary mechanics and, in turn, bundle stiffness in response to stimuli or signaling cues. Such a mechanism could either enhance sensitivity or introduce filtering properties, thereby contributing to the fine control of mechanosensory function without compromising temporal fidelity. Future studies using loss-of-function approach will be needed to reveal the unexplored role(s) of kinocilia for vestibular hair cells in vertebrates. 

      Could kinociliary beating play other roles, possibly during development - for example, by interacting with forming accessory structures (but see Whitfield 2020) or by activating mechanosensitivity cell-autonomously, before mature stimulation mechanisms are in place? Then a latent capacity to beat in mature vestibular hair cells might be activated by stressful conditions, as speculated regarding persistent Piezo channels that are normally silent in mature cochlear hair cells but may reappear when TMC channel gating is broken (Beurg and Fettiplace 2017). While these are highly speculative thoughts, there is a need in the paper for more nuanced consideration of whether the observed motility is normal and what good it would do.

      We thank the reviewer for these excellent suggestions. We agree that kinociliary motility could plausibly serve roles during development, for example by guiding hair bundle formation or by contributing to early mechanosensitivity and spontaneous activity before mature stimulation mechanisms are established. It is also possible that the motility machinery represents a latent capacity in mature vestibular hair cells that could be reactivated under stress or pathological conditions. We will revise the Discussion to address these possibilities and to provide a more nuanced consideration of whether the observed motility is normal and what potential functions it might serve.

      Reviewer #2 (Public review):

      Summary:

      In this study, the authors compared the transcriptomes of the various types of hair cells contained in the sensory epithelia of the cochlea and vestibular organs of the mouse inner ear. The analysis of their transcriptomic data led to novel insights into the potential function of the kinocilium.

      Strengths:

      The novel findings for the kinocilium gene expression, along with the demonstration that some kinocilia demonstrate rhythmic beating as would be seen for known motile cilia, are fascinating. It is possible that perhaps the kinocilium, known to play a very important role in the orientation of the stereocilia, may have a gene expression pattern that is more like a primary cilium early in development and later in mature hair cells, more like a motile cilium. Since the kinocilium is retained in vestibular hair cells, it makes sense that it is playing a different role in these mature cells than its role in the cochlea.

      Another major strength of this study, which cannot be overstated, is that for the transcriptome analysis, they are using mature mice. To date, there is a lot of data from many labs for embryonic and neonatal hair cells, but very little transcriptomic data on the mature hair cells. They do a nice job in presenting the differences in marker gene expression between the 4 hair cell types. This information is very useful to those labs studying regeneration or generation of hair cells from ES cell cultures. One of the biggest questions these labs confront is what type of hair cells develop in these systems. The more markers available, the better. These data will also allow researchers in the field to compare developing hair cells with mature hair cells to see what genes are only required during development and not in later functioning hair cells.

      We would like to thank reviewer 2 for his/her comments and hope that the datasets provided in this manuscript will be a useful resource for researchers in the auditory and vestibular neuroscience community.

      Joint Recommendations:

      We will make changes in the revision based on the joint recommendations of the two reviewers.

      References

      Becker-Heck, A., Zohn, I.E., Okabe, N., Pollock, A., Lenhart, K.B., Sullivan-Brown, J., McSheene, J., Loges, N.T., Olbrich, H., Haeffner, K., Fliegauf, M., Horvath, J., Reinhardt, R., Nielsen, K.G., Marthin, J.K., Baktai, G., Anderson, K.V., Geisler, R., Niswander, L., Omran, H., Burdine, R.D., 2011. The coiled-coil domain containing protein CCDC40 is essential for motile cilia function and left-right axis formation. Nat Genet 43, 79–84. https://doi.org/10.1038/ng.727

      Benser, M.E., Marquis, R.E., Hudspeth, A.J., 1996. Rapid, Active Hair Bundle Movements in Hair Cells from the Bullfrog’s Sacculus. J. Neurosci. 16, 5629–5643. https://doi.org/10.1523/JNEUROSCI.16-18-05629.1996

      Fawcett, D.W., Ito, S., 1965. The fine structure of bat spermatozoa. American Journal of Anatomy 116, 567–609. https://doi.org/10.1002/aja.1001160306

      Flock, Å., Flock, B., Murray, E., 1977. Studies on the Sensory Hairs of Receptor Cells in the Inner Ear. Acta Oto-Laryngologica 83, 85–91. https://doi.org/10.3109/00016487709128817

      Kikuchi, T., Takasaka, T., Tonosaki, A., Watanabe, H., 1989. Fine structure of guinea pig vestibular kinocilium. Acta Otolaryngol 108, 26–30.https://doi.org/10.3109/00016488909107388

      Lechtreck, K.-F., Gould, T.J., Witman, G.B., 2013. Flagellar central pair assembly in Chlamydomonas reinhardtii. Cilia 2, 15. https://doi.org/10.1186/2046-2530-2-15

      Martin, P., Bozovic, D., Choe, Y., Hudspeth, A.J., 2003. Spontaneous Oscillation by Hair Bundles of the Bullfrog’s Sacculus. J. Neurosci. 23, 4533–4548. https://doi.org/10.1523/JNEUROSCI.23-11-04533.2003

      Merveille, A.-C., Davis, E.E., Becker-Heck, A., Legendre, M., Amirav, I., Bataille, G., Belmont, J., Beydon, N., Billen, F., Clément, A., Clercx, C., Coste, A., Crosbie, R., de Blic, J., Deleuze, S., Duquesnoy, P., Escalier, D., Escudier, E., Fliegauf, M., Horvath, J., Hill, K., Jorissen, M., Just, J., Kispert, A., Lathrop, M., Loges, N.T., Marthin, J.K., Momozawa, Y., Montantin, G., Nielsen, K.G., Olbrich, H., Papon, J.-F., Rayet, I., Roger, G., Schmidts, M., Tenreiro, H., Towbin, J.A., Zelenika, D., Zentgraf, H., Georges, M., Lequarré, A.-S., Katsanis, N., Omran, H., Amselem, S., 2011. CCDC39 is required for assembly of inner dynein arms and the dynein regulatory complex and for normal ciliary motility in humans and dogs. Nat Genet 43, 72–78. https://doi.org/10.1038/ng.726

      Moon, K.-H., Ma, J.-H., Min, H., Koo, H., Kim, H., Ko, H.W., Bok, J., 2020. Dysregulation of sonic hedgehog signaling causes hearing loss in ciliopathy mouse models. eLife 9, e56551. https://doi.org/10.7554/eLife.56551

      Oda, T., Yanagisawa, H., Kamiya, R., Kikkawa, M., 2014. A molecular ruler determines the repeat length in eukaryotic cilia and flagella. Science 346, 857–860. https://doi.org/10.1126/science.1260214

      O’Donnell, J., Zheng, J., 2022. Vestibular Hair Cells Require CAMSAP3, a Microtubule Minus-End Regulator, for Formation of Normal Kinocilia. Front Cell Neurosci 16, 876805. https://doi.org/10.3389/fncel.2022.876805

      Pfister, K.K., Shah, P.R., Hummerich, H., Russ, A., Cotton, J., Annuar, A.A., King, S.M., Fisher, E.M.C., 2006. Genetic Analysis of the Cytoplasmic Dynein Subunit Families. PLOS Genetics 2, e1. https://doi.org/10.1371/journal.pgen.0020001

      Polino, A.J., Sviben, S., Melena, I., Piston, D.W., Hughes, J.W., 2023. Scanning electron microscopy of human islet cilia. Proceedings of the National Academy of Sciences 120, e2302624120. https://doi.org/10.1073/pnas.2302624120

      Rüsch, A., Thurm, U., 1990. Spontaneous and electrically induced movements of ampullary kinocilia and stereovilli. Hearing Research 48, 247–263. https://doi.org/10.1016/0378-5955(90)90065-W

      Shi, H., Wang, H., Zhang, C., Lu, Y., Yao, J., Chen, Z., Xing, G., Wei, Q., Cao, X., 2022. Mutations in OSBPL2 cause hearing loss associated with primary cilia defects via sonic hedgehog signaling [WWW Document]. https://doi.org/10.1172/jci.insight.149626

      Stooke-Vaughan, G.A., Huang, P., Hammond, K.L., Schier, A.F., Whitfield, T.T., 2012. The role of hair cells, cilia and ciliary motility in otolith formation in the zebrafish otic vesicle. Development 139, 1777–1787. https://doi.org/10.1242/dev.079947

      Wu, D., Freund, J.B., Fraser, S.E., Vermot, J., 2011. Mechanistic Basis of Otolith Formation during Teleost Inner Ear Development. Developmental Cell 20, 271–278. https://doi.org/10.1016/j.devcel.2010.12.00

    1. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #1 (Public review):

      Summary:

      The authors aim to explore the effects of the electrogenic sodium-potassium pump (Na<SUP>+</SUP>/K<SUP>+</SUP>ATPase) on the computational properties of highly active spiking neurons, using the weakly-electric fish electrocyte as a model system. Their work highlights how the pump's electrogenicity, while essential for maintaining ionic gradients, introduces challenges in neuronal firing stability and signal processing, especially in cells that fire at high rates. The study identifies compensatory mechanisms that cells might use to counteract these effects, and speculates on the role of voltage dependence in the pump's behavior, suggesting that Na<SUP>+</SUP>/K<SUP>+</SUP>-ATPase could be a factor in neuronal dysfunctions and diseases

      Strengths:

      (1) The study explores a less-examined aspect of neural dynamics-the effects of Na<SUP>+</SUP>/K<SUP>+</SUP>-ATPase electrogenicity. It offers a new perspective by highlighting the pump's role not only in ion homeostasis but also in its potential influence on neural computation.

      (2) The mathematical modeling used is a significant strength, providing a clear and controlled framework to explore the effects of the Na<SUP>+</SUP>/K<SUP>+</SUP>-ATPase on spiking cells. This approach allows for the systematic testing of different conditions and behaviors that might be difficult to observe directly in biological experiments.

      (3) The study proposes several interesting compensatory mechanisms, such as sodium leak channelsand extracellular potassium buffering, which provide useful theoretical frameworks for understanding how neurons maintain firing rate control despite the pump's effects.

      Weaknesses:

      (1) While the modeling approach provides valuable insights, the lack of experimental data to validate the model's predictions weakens the overall conclusions.

      (2)The proposed compensatory mechanisms are discussed primarily in theoretical terms without providing quantitative estimates of their impact on the neuron's metabolic cost or other physiological parameters.

      Comments on revisions:

      The revised manuscript is notably improved.

      We thank the reviewer for their concise and accurate summary and appreciate the constructive feedback on the article’s strengths and weaknesses. Experimental work is beyond the scope of our modeling-based study. However, we would like our work to serve as a framework for future experimental studies into the role of the electrogenic pump current (and its possible compensatory currents) in disease, and its role in evolution of highly specialized excitable cells (such as electrocytes).

      Quantitative estimates of metabolic costs in this study are limited to the ATP that is required to fuel the Na<SUP>+</SUP>/K<SUP>+</SUP> pump. By integrating the net pump current over time and dividing by one elemental charge, one can find the rate of ATP that is consumed by the Na<SUP>+</SUP>/K<SUP>+</SUP> pump for either compensatory mechanism. The difference in net pump current is thus proportional to ATP consumption, which allows for a direct comparison of the cost efficiency of the Na<SUP>+</SUP>/K<SUP>+</SUP> pump for each proposed compensatory mechanism. The Na<SUP>+</SUP>/K<SUP>+</SUP> pump is however not the only ATP-consuming element in the electrocyte, and some of the compensatory mechanisms induce other costs related to cell ‘housekeeping’ or presynaptic processes. We now added a section in the appendix titled ‘Considerations on metabolic costs of compensatory mechanisms’ (section 11.4), where we provide rough estimates on the influence of the compensatory mechanisms on the total metabolic costs of the cell and membrane space occupation. Although we argue that according these rough estimates, the impact of discussed compensatory mechanisms could be significant, due to the absence of more detailed experimental quantification, a plausible quantitative cost estimate on the whole cell level remains beyond the scope of this article.

      Reviewer #1 (Recommendations for the authors):

      I just have a few recommendations on the updated manuscript.

      (1) When exploring the different roles of Na<SUP>+</SUP>/K<SUP>+</SUP>-ATPase in the Results section, the authors employed many different models. For instance, the voltage equation on page 15, voltage equation (2) on page 22, voltage equation (12) on page 24, voltage equation (30) on page 32, and voltage equation (38) on page 35 are presented as the master equations for their respective biophysical models. Meanwhile, the phase models are presented on page 29 and page 33. I would recommend that the authors clearly specify which equations correspond to each subsection of the Results section and explicitly state which equations were used to generate the data in each figure. This would help readers more easily follow the connections between the models, the results, and the figures.

      We thank the reviewer for pointing out that the links of the different voltage equations to the results could be expressed more explicitly in the article. All simulations were done using the ‘master equation’  expressed in Eq. 2, and the other voltage equations that are specified in the article (in the new version of the article Eqs. 13, 31, and 39) are reformulations of Eq. 2 to analytically show different properties of the voltage equation (Eq. 2). This has now been mentioned in the article when formulating the voltage equations, and the equation for the total leak current (in the new version Eq. 3) has been added for completeness.

      (2) The authors may want to revisit their description and references concerning Eigenmannia virescens. For example, wave-type weakly electric fish (e.g., Eigenmannia) and pulse-type weakly electric fish (e.g., Gymnotus carapo) exhibit large differences, making references 52-55 may be inappropriate for subsection 4.3.1, as these studies focus on Gymnotus carapo. Additionally, even within wave-type species, chirp patterns vary. For example, Eigenmannia can exhibit short "pauses"-type chirps, whereas Apteronotus leptorhynchus (another waver-form fish) does not (https://pubmed.ncbi.nlm.nih.gov/14692494/).

      We thank the reviewer for pointing this out. The citations and phrasing in sections 4.3.1 and 4.3.2 have been updated to specifically refer to the weakly electric fish e. Virescens.

      (3) Table on page 21: Please explain why the parameter value (13.5mM) of [Na<SUP>^</SUP>+]_{in} is 10 timeslarger than its value (1.35mM) in reference [26]? How does this value (13.5mM) compare with the range of variable [Na<SUP>^</SUP>+]_{in} in equation (6)?

      The intracellular sodium concentration in reference [26] was reported to be 1.35 mM, but the authors also reported an extracellular sodium concentration of 120 mM, and a sodium reversal potential of 55 mV. Upon calculating the sodium reversal potential, we found that an intracellular sodium concentration of 1.35 mM would give a sodium reversal potential of 113 mV. An intracellular sodium concentration of 13.5 mM, on the other hand, leads to the reported and physiological reversal potential of 55 mV. This has now been clarified in the article, and the connection between this value and Eq. 6 (Eq. 7 in the new version) has also been clarified.

      Reviewer #2 (Public review):

      Summary:

      The paper by Weerdmeester, Schleimer, and Schreiber uses computational models to present the biological constraints under which electrocytes - specialized, highly active cells that facilitate electro-sensing in weakly electric fish-may operate. The authors suggest potential solutions that these cells could employ to circumvent these constraints.

      Electrocytes are highly active or spiking (greater than 300Hz) for sustained periods (for minutes to hours), and such activity is possible due to an influx of sodium and efflux of potassium ions into these cells after each spike. The resulting ion imbalance must be restored, which in electrocytes, as with many other biological cells, is facilitated by the Na-K pumps at the expense of biological energy, i.e., ATP molecules. For each ATP molecule the pump uses, three positively charged sodium ions from the intracellular space are exchanged for two positively charged potassium ions from the extracellular space. This creates a net efflux of positive ions into the extracellular space, resulting in hyperpolarized potentials for the cell over time. For most cells, this does not pose an issue, as their firing rate is much slower, and other compensatory mechanisms and pumps can effectively restore the ion imbalances. However, in the electrocytes of weakly electric fish, which spike at exceptionally high rates, the net efflux of positive ions presents a challenge. Additionally, these cells are involved in critical communication and survival behaviors, underscoring their essential role in reliable functioning.

      In a computational model, the authors test four increasingly complex solutions to the problem of counteracting the hyperpolarized states that occur due to continuous NaK pump action to sustain baseline activity. First, they propose a solution for a well-matched Na leak channel that operates in conjunction with the NaK pump, counteracting the hyperpolarizing states naturally. Their model shows that when such an orchestrated Na leak current is not included, quick changes in the firing rates could have unexpected side effects. Secondly, they study the implications of this cell in the context of chirps-a means of communication between individual fish. Here, an upstream pacemaking neuron entrains the electrocyte to spike, which ceases to produce a so-called chirp - a brief pause in the sustained activity of the electrocytes. In their model, the authors demonstrate that including the extracellular potassium buffer is necessary to obtain a reliable chirp signal. Thirdly, they tested another means of communication in which there was a sudden increase in the firing rate of the electrocyte, followed by a decay to the baseline. For this to occur reliably, the authors emphasize that a strong synaptic connection between the pacemaker neuron and the electrocyte is necessary. Finally, since these cells are energy-intensive, they hypothesize that electrocytes may have energy-efficient action potentials, for which their NaK pumps may be sensitive to the membrane voltages and perform course correction rapidly.

      Strengths:

      The authors extend an existing electrocyte model (Joos et al., 2018) based on the classical Hodgkin and Huxley conductance-based models of sodium and potassium currents to include the dynamics of the sodium-potassium (NaK) pump. The authors estimate the pump's properties based on reasonable assumptions related to the leak potential. Their proposed solutions are valid and may be employed by weakly electric fish. The authors explore theoretical solutions to electrosensing behavior that compound and suggest that all these solutions must be simultaneously active for the survival and behavior of the fish. This work provides a good starting point for conducting in vivo experiments to determine which of these proposed solutions the fish employ and their relative importance. The authors include testable hypotheses for their computational models.

      Weaknesses:

      The model for action potential generation simplifies ion dynamics by considering only sodium and potassium currents, excluding other ions like calcium. The ion channels considered are assumed to be static, without any dynamic regulation such as post-translational modifications. For instance, a sodium-dependent potassium pump could modulate potassium leak and spike amplitude (Markham et al., 2013).

      This work considers only the sodium-potassium (NaK) pumps to restore ion gradients. However, in many cells, several other ion pumps, exchangers, and symporters are simultaneously present and actively participate in restoring ion gradients. When sodium currents dominate action potentials, and thus when NaK pumps play a critical role, such as the case in Eigenmannia virescens, the present study is valid. However, since other biological processes may find different solutions to address the pump's non-electroneutral nature, the generalizability of the results in this work to other fast-spiking cell types is limited. For example, each spike could include a small calcium ion influx that could be buffered or extracted via a sodium-calcium exchanger.

      We thank the reviewer for the detailed summary and the updated identified strengths and weaknesses. The current article indeed focuses on and isolates the interplay between sodium currents, potassium currents, and sodium-potassium pump currents. As discussed in section 5.1, in excitable cells where these currents are the main players in action-potential generation, the results presented in this article are applicable. The contribution of post-translational effects of ion channels, other ionic currents, and other active transporters and pumps, could be exciting avenues for further studies

      .

      Reviewer #2 (Recommendations for the authors):

      Thank you for addressing my comments.

      All the figures are now consistent. The color schema used is clear.

      The methods and discussions expansions improve the paper.

      Including the model assumptions and simplifications is appreciated.

      Including internal references is helpful.

      The equations are clear, and the references have been fixed.

      I am content with the changes. I have updated my review accordingly.

      We thank the reviewer for their initial constructive comments that lead to the significant improvement of the article.

      Page : 3 Line : 113 Author : Unknown Author 07/24/2025 

      Although this is technically correct, the article is about electrocommunication signals and does not focus on sensing.

      Page : 3 Line : 153 Author : Unknown Author 07/24/2025

      electrocommunication

      Page : 4 Line : 164 Author : Unknown Author 07/24/2025 

      Judging from the cited article, I think this should be a sodium-dependent potassium current.

    2. Reviewer #2 (Public review):

      Summary:

      The paper by Weerdmeester, Schleimer, and Schreiber uses computational models to present the biological constraints under which electrocytes - specialized, highly active cells that facilitate electro-sensing in weakly electric fish-may operate. The authors suggest potential solutions that these cells could employ to circumvent these constraints.

      Electrocytes are highly active or spiking (greater than 300Hz) for sustained periods (for minutes to hours), and such activity is possible due to an influx of sodium and efflux of potassium ions into these cells after each spike. The resulting ion imbalance must be restored, which in electrocytes, as with many other biological cells, is facilitated by the Na-K pumps at the expense of biological energy, i.e., ATP molecules. For each ATP molecule the pump uses, three positively charged sodium ions from the intracellular space are exchanged for two positively charged potassium ions from the extracellular space. This creates a net efflux of positive ions into the extracellular space, resulting in hyperpolarized potentials for the cell over time. For most cells, this does not pose an issue, as their firing rate is much slower, and other compensatory mechanisms and pumps can effectively restore the ion imbalances. However, in the electrocytes of weakly electric fish, which spike at exceptionally high rates, the net efflux of positive ions presents a challenge. Additionally, these cells are involved in critical communication and survival behaviors, underscoring their essential role in reliable functioning.

      In a computational model, the authors test four increasingly complex solutions to the problem of counteracting the hyperpolarized states that occur due to continuous NaK pump action to sustain baseline activity. First, they propose a solution for a well-matched Na leak channel that operates in conjunction with the NaK pump, counteracting the hyperpolarizing states naturally. Their model shows that when such an orchestrated Na leak current is not included, quick changes in the firing rates could have unexpected side effects. Secondly, they study the implications of this cell in the context of chirps-a means of communication between individual fish. Here, an upstream pacemaking neuron entrains the electrocyte to spike, which ceases to produce a so-called chirp - a brief pause in the sustained activity of the electrocytes. In their model, the authors demonstrate that including the extracellular potassium buffer is necessary to obtain a reliable chirp signal. Thirdly, they tested another means of communication in which there was a sudden increase in the firing rate of the electrocyte, followed by a decay to the baseline. For this to occur reliably, the authors emphasize that a strong synaptic connection between the pacemaker neuron and the electrocyte is necessary. Finally, since these cells are energy-intensive, they hypothesize that electrocytes may have energy-efficient action potentials, for which their NaK pumps may be sensitive to the membrane voltages and perform course correction rapidly.

      Strengths:

      The authors extend an existing electrocyte model (Joos et al., 2018) based on the classical Hodgkin and Huxley conductance-based models of sodium and potassium currents to include the dynamics of the sodium-potassium (NaK) pump. The authors estimate the pump's properties based on reasonable assumptions related to the leak potential. Their proposed solutions are valid and may be employed by weakly electric fish. The authors explore theoretical solutions to electrosensing behavior that compound and suggest that all these solutions must be simultaneously active for the survival and behavior of the fish. This work provides a good starting point for conducting in vivo experiments to determine which of these proposed solutions the fish employ and their relative importance. The authors include testable hypotheses for their computational models.

    3. Reviewer #1 (Public review):

      Summary:

      The authors aim to explore the effects of the electrogenic sodium-potassium pump (Na+/K+-ATPase) on the computational properties of highly active spiking neurons, using the weakly-electric fish electrocyte as a model system. Their work highlights how the pump's electrogenicity, while essential for maintaining ionic gradients, introduces challenges in neuronal firing stability and signal processing, especially in cells that fire at high rates. The study identifies compensatory mechanisms that cells might use to counteract these effects, and speculates on the role of voltage dependence in the pump's behavior, suggesting that Na+/K+-ATPase could be a factor in neuronal dysfunctions and diseases

      Strengths:

      (1) The study explores a less-examined aspect of neural dynamics-the effects of Na+/K+-ATPase electrogenicity. It offers a new perspective by highlighting the pump's role not only in ion homeostasis but also in its potential influence on neural computation.

      (2) The mathematical modeling used is a significant strength, providing a clear and controlled framework to explore the effects of the Na+/K+-ATPase on spiking cells. This approach allows for the systematic testing of different conditions and behaviors that might be difficult to observe directly in biological experiments.

      (3) The study several interesting compensatory mechanisms, such as sodium leak channels and extracellular potassium buffering, which provide useful theoretical frameworks for understanding how neurons maintain firing rate control despite the pump's effects.

      Comments on revisions:proposes

      The revised manuscript is notably improved.

    4. eLife Assessment

      This important study provides new insights into the lesser-known effects of the sodium-potassium pump on how nerve cells process signals, particularly in highly active cells like those of weakly electric fish. The computational methods used to establish the claims in this work are compelling and can be used as a starting point for further studies.

    1. eLife Assessment

      This important study presents a sequence-based method for predicting drug-interacting residues in intrinsically disordered proteins (IDPs), addressing a significant challenge in understanding small-molecule:IDP interactions. The findings have solid support through examples underscoring the role of aromatic interactions. While predicted binding sites remain coarse, validation was done on a total of 10 IDPs at varying depths. The method builds on the authors' previous work and, with ad hoc modifications, is poised to benefit this emerging field.

    2. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      The authors developed a sequence-based method to predict drug-interacting residues in IDP, based on their recent work, to predict the transverse relaxation rates (R2) of IDP trained on 45 IDP sequences and their corresponding R2 values. The discovery is that the IDPs interact with drugs mostly using aromatic residues that are easy to understand, as most drugs contain aromatic rings. They validated the method using several case studies, and the predictions are in accordance with chemical shift perturbations and MD simulations. The location of the predicted residues serves as a starting point for ligand optimization.

      Strengths:

      This work provides the first sequence-based prediction method to identify potential druginteracting residues in IDP. The validity of the method is supported by case studies. It is easy to use, and no time-consuming MD simulations and NMR studies are needed.

      Weaknesses:

      The method does not depend on the information of binding compounds, which may give general features of IDP-drug binding. However, due to the size and chemical structures of the compounds (for example, how many aromatic rings), the number of interacting residues varies, which is not considered in this work. Lacking specific information may restrict its application in compound optimization, aiming to derive specific and potent binding compounds.

      We fully recognize that different compounds may have different interaction propensity profiles along the IDP sequence. In future studies, we will investigate compound-specific parameter values. The limiting factor is training data, but such data are beginning to be available.

      Reviewer #2 (Public review):

      Summary:

      In this work, the authors introduce DIRseq, a fast, sequence-based method that predicts druginteracting residues (DIRs) in IDPs without requiring structural or drug information. DIRseq builds on the authors' prior work looking at NMR relaxation rates, and presumes that those residues that show enhanced R2 values are the residues that will interact with drugs, allowing these residues to be nominated from the sequence directly. By making small modifications to their prior tool, DIRseq enables the prediction of residues seen to interact with small molecules in vivo.

      Strengths:

      The preprint is well written and easy to follow

      Weaknesses:

      (1) The DIRseq method is based on SeqDYN, which itself is a simple (which I do not mean as a negative - simple is good!) statistical predictor for R2 relaxation rates. The challenge here is that R2 rates cover a range of timescales, so the physical intuition as to what exactly elevated R2 values mean is not necessarily consistent with "drug interacting". Presumably, the authors are not using the helix boost component of SeqDYN here (it would be good to explicitly state this). This is not necessarily a weakness, but I think it would behove the authors to compare a few alternative models before settling on the DIRseq method, given the somewhat ad hoc modifications to SeqDYN to get DIRseq.

      Actually, the factors that elevate R2 are well-established. These are local interactions and residual secondary structures (if any). The basic assumption of our method is that intra-IDP interactions that elevate R2 convert to IDP-drug interactions. This assumption was supported by our initial observation that the drug interaction propensity profiles predicted using the original SeqDYN parameters already showed good agreement with CSP profiles. We only made relatively small adjustments to the parameters to improve the agreement. Indeed we did not apply the helix boost portion of SeqDYN to DIRseq, and now state as such (p. 4, second last paragraph). We now also compare DIRseq with several alternative models, as summarized in new Table S2.

      Specifically, the authors previously showed good correlation between the stickiness parameter of Tesei et al and the inferred "q" parameter for SeqDYN; as such, I am left wondering if comparable accuracy would be obtained simply by taking the stickiness parameters directly and using these to predict "drug interacting residues", at which point I'd argue we're not really predicting "drug interacting residues" as much as we're predicting "sticky" residues, using the stickiness parameters. It would, I think, be worth the authors comparing the predictive power obtained from DIRseq with the predictive power obtained by using the lambda coefficients from Tesei et al in the model, local density of aromatic residues, local hydrophobicity (note that Tesei at al have tabulated a large set of hydrophobicity scores!) and the raw SeqDYN predictions. In the absence of lots of data to compare against, this is another way to convince readers that DIRseq offers reasonable predictive power.

      We now compare predictions of these various parameter sets, and report the results in Table S2.  In short, among all the tested parameter sets, DIRseq has the best performance as measured by (1) strong correlations between prediction scores and CSPs and (2) high true positives and low false positives (p. 7-9).

      (2) Second, the DIRseq is essentially SeqDYN with some changes to it, but those changes appear somewhat ad hoc. I recognize that there is very limited data, but the tweaking of parameters based on physical intuition feels a bit stochastic in developing a method; presumably (while not explicitly spelt out) those tweaks were chosen to give better agreement with the very limited experimental data (otherwise why make the changes?), which does raise the question of if the DIRseq implementation of SeqDYN is rather over-parameterized to the (very limited) data available now? I want to be clear, the authors should not be critiqued for attempting to develop a model despite a paucity of data, and I'm not necessarily saying this is a problem, but I think it would be really important for the authors to acknowledge to the reader the fact that with such limited data it's possible the model is over-fit to specific sequences studied previously, and generalization will be seen as more data are collected.

      We have explained the rationale for the parameter tweaks, which were limited to q values for four amino-acid types, i.e., to deemphasize hydrophobic interactions and slightly enhance electrostatic interactions (p. 4-5). We now add that these tweaks were motivated by observations from MD simulations of drug interactions with a-syn (ref 13). As already noted in the response to the preceding comment, we now also present results for the original parameter values as well as for when the four q values are changed one at a time.

      (3) Third, perhaps my biggest concern here is that - implicit in the author's assumptions - is that all "drugs" interact with IDPs in the same way and all drugs are "small" (motivating the change in correlation length). Prescribing a specific length scale and chemistry to all drugs seems broadly inconsistent with a world in which we presume drugs offer some degree of specificity. While it is perhaps not unexpected that aromatic-rich small molecules tend to interact with aromatic residues, the logical conclusion from this work, if one assumes DIRseq has utility, is that all IDRs bind drugs with similar chemical biases. This, at the very least, deserves some discussion.

      The reviewer raises a very important point. In Discussion, we now add that it is important to further develop DIRseq to include drug-specific parameters when data for training become available (p. 12-13). To illustrate this point, we use drug size as a simple example, which can be modeled by making the b parameter dependent on drug molecule size.

      (4) Fourth, the authors make some general claims in the introduction regarding the state of the art, which appear to lack sufficient data to be made. I don't necessarily disagree with the author's points, but I'm not sure the claims (as stated) can be made absent strong data to support them. For example, the authors state: "Although an IDP can be locked into a specific conformation by a drug molecule in rare cases, the prevailing scenario is that the protein remains disordered upon drug binding." But is this true? The authors should provide evidence to support this assertion, both examples in which this happens, and evidence to support the idea that it's the "prevailing view" and specific examples where these types of interactions have been biophysically characterized.

      We now cite nine studies showing that IDPs remain disordered upon drug binding.

      Similarly, they go on to say:

      "Consequently, the IDP-drug complex typically samples a vast conformational space, and the drug molecule only exhibits preferences, rather than exclusiveness, for interacting with subsets of residues." But again, where is the data to support this assertion? I don't necessarily disagree, but we need specific empirical studies to justify declarative claims like this; otherwise, we propagate lore into the scientific literature. The use of "typically" here is a strong claim, implying most IDP complexes behave in a certain way, yet how can the authors make such a claim? 

      Here again we add citations to support the statement.

      Finally, they continue to claim:

      "Such drug interacting residues (DIRs), akin to binding pockets in structured proteins, are key to optimizing compounds and elucidating the mechanism of action." But again, is this a fact or a hypothesis? If the latter, it must be stated as such; if the former, we need data and evidence to support the claim.

      We add citations to both compound optimization and mechanism of action.

      Reviewer #1 (Recommendations for the authors):

      (1) The authors should compare the sequences of the IDPs in the case studies with the 45 IDPs in training the SeqDYN model to make sure that they are not included in the training dataset or are highly homologous.

      Please note that the data used for training SeqDYN were R2 rates, which are independent of the property being studied here, i.e., drug interacting residues. Therefore whether the IDPs studied here were in the training set for SeqDYN is immaterial.

      (2) The authors manually tuned four parameters in SeqDYN to develop the model for predicting drug-interacting residues without giving strict testing or explanations. More explanations, testing of more values, and ablation testing should be given.

      As responded above, we now both expand the explanation and present more test results.

      (3) The authors changed the q values of L, I, and M to the value of V. What are the results if these values are not changed?

      These results are shown in Table S2 (entry named SeqDYN_orig).

      (4) Only one b value is chosen based on the assumption that a drug molecule interacts with 3-4 residues at a time. However, the number of interacting residues is related to the size of the drug molecule. Adjusting the b value with the size of the ligand may provide improvement. It is better to test the influence of adjusting b values. At least, this should be discussed.

      Good point! We now state that b potentially can be adjusted according to ligand size (p. 12-13). In addition, we also show the effect of varying b on the prediction results (Table S2; p. 8, last paragraph).

      (5) The authors add 12 Q to eliminate end effects. However, explanations on why 12 Qs are chosen should be given. How about other numbers of Q or using other residues (e.g., the commonly used residues in making links, like GS/PS or A?

      As we already explained, “Gln was selected because its 𝑞 value is at the middle of the 20 𝑞 values.” (p. 5, second paragraph). Also, 12 Qs are sufficient to remove any end effects; a higher number of Qs does not make any difference.

      Reviewer #2 (Recommendations for the authors):

      (1) The authors make reference to the "C-terminal IDR" in cMyc, but the region they note is found in the bHLH DNA binding domain (which falls from residue ~370-420).

      We now clarify that this region is disordered on its own but form a helix-loop-loop structure upon heterodimerization with Max (p. 11, last paragraph).

      (2) Given the fact that X-seq names are typically associated with sequencing-based methods, it's perhaps confusing to name this method DIRseq?

      We appreciate the reviewer’s point, but by now the preprint posted in bioRxiv is in wide circulation, and the DIRseq web server has been up for several months, so changing its name would cause a great deal of confusion.

      (3) I'd encourage the authors just to spell out "drug interacting residues" and retain an IDR acronym for IDRs. Acronyms rarely make writing clearer, and asking folks to constantly flip between IDR and DIR is asking a lot of an audience (in this reviewer's opinion, anyway).

      The reviewer makes a good point; we now spell out “drug-interacting residues”.

      (4) The assumption here is that CSPs result from direct drug:IDR interactions. However, CSPs result from a change in the residue chemical environment, which could in principle be an indirect effect (e.g., in the unbound state, residues A and B interact; in the bound state, residue A is now free, such that it experiences a CSP despite not engaging directly). While I recognize such assumptions are commonly made, it behoves the authors to explicitly make this point so the reader understands the relationship between CSPs and binding.

      We did add caveats of CSP in Introduction (p. 3, second paragraph).

      (5) On the figures, please label which protein is which figure, as well as provide a legend for the annotations on the figures (red line, blue bar, cyan region, etc.)

      We now label protein names in Fig. 1. For annotation of display items, it is also made in the Figs. 2 and 3 captions; we now add it to the Fig. 4 caption.

      (6) abstract: "These successes augur well for deciphering the sequence code for IDP-drug binding." - This is not grammatically correct, even if augur were changed to agree. Suggest rewriting.

      “Augur well” means to be a good sign (for something). We use this phrase here in this meaning.

      (6) page 5: "we raised the 𝑞 value of Asp to be the same as that of Glu" → suggested "increased" instead of raised.

      We have made the suggested change.

      (7) The authors should consider releasing the source code (it is available via the .js implementation on the server, but this is not very transferable/shareable, so I'd encourage the authors to provide a stand-alone implementation that's explicitly shareable).

      We have now added a link for the user to download the source code.

    3. Reviewer #2 (Public review):

      Summary:

      In this work, the authors introduce DIRseq, a fast, sequence-based method that predicts drug-interacting residues (DIRs) in IDPs without requiring structural or drug information. DIRseq builds on the authors' prior work looking at NMR relaxation rates, and presumes that those residues that show enhanced R2 values are the residues that will interact with drugs, allowing these residues to be nominated from the sequence directly. By making small modifications to their prior tool, DIRseq enables the prediction of residues seen to interact with small molecules in vivo.

      Strengths:

      The preprint is well written and easy to follow.

    4. Reviewer #1 (Public review):

      Summary:

      The authors developed a sequence-based method to predict drug-interacting residues in IDP, based on their recent work, to predict the transverse relaxation rates (R2) of IDP trained on 45 IDP sequences and their corresponding R2 values. The discovery is that the IDPs interact with drugs mostly using aromatic residues that are easy to understand, as most drugs contain aromatic rings. They validated the method using several case studies, and the predictions are in accordance with chemical shift perturbations and MD simulations. The location of the predicted residues serves as a starting point for ligand optimization.

      Strengths:

      This work provides the first sequence-based prediction method to identify potential drug-interacting residues in IDP. The validity of the method is supported by case studies. It is easy to use, and no time-consuming MD simulations and NMR studies are needed.

      Weaknesses:

      The method does not depend on the information of binding compounds, which may give general features of IDP-drug binding. However, due to the size and chemical structures of the compounds (for example, how many aromatic rings), the number of interacting residues varies, which is not considered in this work. Lacking specific information may restrict its application in compound optimization, aiming to derive specific and potent binding compounds.

      Comments on revised version:

      I'm satisfied with the authors' response and the public review does not need further changes.

    1. Author response:

      The following is the authors’ response to the current reviews.

      eLife Assessment

      The authors examine the effect of cell-free chromatin particles (cfChPs) derived from human serum or from dying human cells on mouse cells in culture and propose that these cfChPs can serve as vehicles for cell-to-cell active transfer of foreign genetic elements. The work presented in this paper is intriguing and potentially important, but it is incomplete. At this stage, the claim that horizontal gene transfer can occur via cfChPs is not well supported because it is only based on evidence from one type of methodological approach (immunofluorescence and fluorescent in situ hybridization (FISH)) and is not validated by whole genome sequencing.

      We disagree with the eLife assessment that our study is incomplete because we did not perform whole genome sequencing. Tens of thousands of genomes have been sequenced, and yet they have failed to detect the presence of the numerous “satellite genomes” that we describe in our paper. To that extent whole genome sequencing has proved to be an inappropriate technology. Rather, eLife should have commended us for the numerous control experiments that we have done to ensure that our FISH probes and antibodies are target specific and do not cross-react.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Horizontal gene transfer is the transmission of genetic material between organisms through ways other than reproduction. Frequent in prokaryotes, this mode of genetic exchange is scarcer in eukaryotes, especially in multicellular eukaryotes. Furthermore, the mechanisms involved in eukaryotic HGT are unknown. This article by Banerjee et al. claims that HGT occurs massively between cells of multicellular organisms. According to this study, the cell free chromatin particles (cfChPs) that are massively released by dying cells are incorporated in the nucleus of neighboring cells.

      The reviewer is mistaken. We do not claim that the internalized cfChPs are incorporated into the nucleus. We show throughout the paper that the cfChPs perform their novel functions autonomously outside the genome without being incorporated into the nucleus. This is clearly seen in all our chromatin fibre images, metaphase spreads and our video abstract. Occasionally, when the cfChPs fluorescent signal overlie the chromosomes, we have been careful to state that the cfChPs are associated with the chromosomes without implying that they have integrated.

      These cfChPs are frequently rearranged and amplified to form concatemers, they are made of open chromatin, expressed, and capable of producing proteins. Furthermore, the study also suggests that cfChPs transmit transposable elements (TEs) between cells on a regular basis, and that these TEs can transpose, multiply, and invade receiving cells. These conclusions are based on a series of experiments consisting in releasing cfChPs isolated from various human sera into the culture medium of mouse cells, and using FISH and immunofluorescence to monitor the state and fate of cfChPs after several passages of the mouse cell line.

      Strengths:

      The results presented in this study are interesting because they may reveal unsuspected properties of some cell types that may be able to internalize free-circulating chromatin, leading to its chromosomal incorporation, expression, and unleashing of TEs. The authors propose that this phenomenon may have profound impacts in terms of diseases and genome evolution. They even suggest that this could occur in germ cells, leading to within-organism HGT with long-term consequences.

      Again the reviewer makes the same mistake. We do not claim that the internalized cfChPs are incorporated into the chromosomes. We have addressed this issue above.

      We have a feeling that the reviewer has not understood our work – which is the discovery of “satellite genomes” which function autonomously outside the nuclear genome.

      Weaknesses:

      The claims of massive HGT between cells through internalization of cfChPs are not well supported because they are only based on evidence from one type of methodological approach: immunofluorescence and fluorescent in situ hybridization (FISH) using protein antibodies and DNA probes. Yet, such strong claims require validation by at least one, but preferably multiple, additional orthogonal approaches. This includes, for example, whole genome sequencing (to validate concatemerization, integration in receiving cells, transposition in receiving cells), RNA-seq (to validate expression), ChiP-seq (to validate chromatin state).

      We disagree with the reviewer that our study is incomplete because we did not perform whole genome sequencing. Tens of thousands of genomes have been sequenced, and yet they have failed to detect the presence of the numerous “satellite genomes” that we describe in our paper. To that extent whole genome sequencing has proved to be an inappropriate approach. Rather, the reviewer should have commended us for the numerous control experiments that we have done to ensure that our FISH probes and antibodies are target specific and do not cross-react.

      Should HGT through internalization of circulating chromatin occur on a massive scale, as claimed in this study, and as illustrated by the many FISH foci observed on Fig 3 for example, one would expect that the level of somatic mosaicism may be so high that it would prevent assembling a contiguous genome for a given organism. Yet, telomere-to-telomere genomes have been produced for many eukaryote species, calling into question the conclusions of this study.

      The reviewer has raised a related issue below and we have responded to both of them together.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      I thank the authors for taking my comments and those of the other reviewer into account and for adding new material to this new version of the manuscript. Among other modifications/additions, they now mention that they think that NIH3T3 cells treated with cfChPs die out after 250 passages because of genomic instability which might be caused by horizontal transfer of cfChPs DNA into the genome of treated cells (pp. 45-46, lines 725-731). However, no definitive formal proof of genomic instability and horizontal transfer is provided.

      We mention that the NIH3T3 cells treated with cfChPs die out after 250 passages in response to the reviewer’s earlier comment “Should HGT through internalization of circulating chromatin occur on a massive scale, as claimed in this study, and as illustrated by the many FISH foci observed in Fig 3 for example, one would expect that the level of somatic mosaicism may be so high that it would prevent assembling a contiguous genome for a given organism”.

      We have agreed with the reviewer and have simply speculated that the cells may die because of extreme genomic instability. We have left it as a speculation without diverting our paper in a different direction to prove genomic instability.

      The authors now refer to an earlier study they conducted in which they Illumina-sequenced NIH3T3 cells treated with cfChPs (pp. 48, lines. 781-792). This study revealed the presence of human DNA in the mouse cell culture. However, it is unclear to me how the author can conclude that the human DNA was inside mouse cells (rather than persisting in the culture medium as cfChPs) and it is also unclear how this supports horizontal transfer of human DNA into the genome of mouse cells. Horizontal transfer implies integration of human DNA into mouse DNA, through the formation of phosphodiester bounds between human nucleotides and mouse nucleotides. The previous Illumina-sequencing study and the current study do not show that such integration has occured. I might be wrong but I tend to think that DNA FISH signals showing that human DNA lies next to mouse DNA does not necessarily imply that human DNA has integrated into mouse DNA. Perhaps such signals could result from interactions at the protein level between human cfChPs and mouse chromatin?

      With due respect, our earlier genome sequencing study that the reviewer refers to was done on two single cell clones developed following treatment with cfChPs. So, the question of cfChPs lurking in the culture medium does not arise.

      The authors should be commended for doing so many FISH experiments. But in my opinion, and as already mentioned in my earlier review of this work, horizontal transfer of human DNA into mouse DNA should first be demonstrated by strong DNA sequencing evidence (multiple long and short reads supporting human/mouse breakpoints; discarding technical DNA chimeras) and only then eventually confirmed by FISH.

      As mentioned earlier, we disagree with the reviewer that our study is incomplete because we did not perform whole genome sequencing. Tens of thousands of genomes have been sequenced, and yet they have failed to detect the presence of the numerous “satellite genomes” that we describe in our paper. To that extent whole genome sequencing has proved to be an inappropriate approach. Rather, the reviewer should have commended us for the numerous control experiments that we have done to ensure that our FISH probes and antibodies are target specific and do not cross-react.

      Regarding my comment on the quantity of human cfChPs that has been used for the experiments, the authors replied that they chose this quantity because it worked in a previous study. Could they perhaps explain why they chose this quantity in the earlier study? Is there any biological reason to choose 10 ng and not more or less? Is 10 ng realistic biologically? Could it be that 10 ng is orders of magnitude higher than the quantity of cfChPs normally circulating in multicellular organisms and that this could explain, at least in part, the results obtained in this study?

      The reviewer again raises the same issue to which we have already addressed in our revised manuscript. To quote “We chose to use 10ng based on our earlier report in which we had obtained robust biological effects such as activation of DDR and activation of apoptotic pathways using this concentration of cfChPs (Mittra I et. al., 2015)”.

      It is also mentioned in the response that RNA-seq has been performed on mouse cells treated with cfChPs, and that this confirms human-mouse fusion (genomic integration). Since these results are not included in the manuscript, I cannot judge how robust they are and whether they reflect a biological process rather than technical issues (technical chimeras formed during the RNA-seq protocol is a well-known artifact). In any case, I do not think that genomic integration can be demonstrated through RNA-seq as junction between human and mouse RNA could occur at the RNA level (i.e. after transcription). RNA-seq could however show whether human-mouse chimeras that have been validated by DNA-sequencing are expressed or not.

      We did perform transcriptome sequencing as suggested earlier by the reviewer, but realized that the amount of material required to be incorporated into the manuscript to include “material and methods”, “results”, “discussion”, “figures” and “legends to figures” and “supplementary figures and tables” would be so massive that it will detract from the flow of our work and hijack it in a different direction. We have, therefore, decided to publish the transcriptome results as a separate manuscript.

      Given these comments, I believe that most of the weaknesses I mentioned in my review of the first version of this work still hold true.

      An important modification is that the work has been repeated in other cell lines, hence I removed this criticism from my earlier review.

      Additional changes made

      (1) We have now rewritten the “Abstract” to 250 words to fit in eLife’s instructions. (It was not possible to reduce the word count further.

      (2) We have provided the Video 1 as separate file instead of link.

      (3) Some of Figure Supplements (which were stand-alone) are now given as main figures. We have re-arranged Figures and Figure Supplements in accordance with eLife’s instructions.

      (4) We have now provided a list of the various cell lines used in this study, their tissue origin and procurement source in Supplementary File 3.


      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Horizontal gene transfer is the transmission of genetic material between organisms through ways other than reproduction. Frequent in prokaryotes, this mode of genetic exchange is scarcer in eukaryotes, especially in multicellular eukaryotes. Furthermore, the mechanisms involved in eukaryotic HGT are unknown. This article by Banerjee et al. claims that HGT occurs massively between cells of multicellular organisms. According to this study, the cell free chromatin particles (cfChPs) that are massively released by dying cells are incorporated in the nucleus of neighboring cells. These cfChPs are frequently rearranged and amplified to form concatemers, they are made of open chromatin, expressed, and capable of producing proteins. Furthermore, the study also suggests that cfChPs transmit transposable elements (TEs) between cells on a regular basis, and that these TEs can transpose, multiply, and invade receiving cells. These conclusions are based on a series of experiments consisting in releasing cfChPs isolated from various human sera into the culture medium of mouse cells, and using FISH and immunofluorescence to monitor the state and fate of cfChPs after several passages of the mouse cell line.

      Strengths:

      The results presented in this study are interesting because they may reveal unsuspected properties of some cell types that may be able to internalize free-circulating chromatin, leading to its chromosomal incorporation, expression, and unleashing of TEs. The authors propose that this phenomenon may have profound impacts in terms of diseases and genome evolution. They even suggest that this could occur in germ cells, leading to within-organism HGT with long-term consequences.

      Weaknesses:

      The claims of massive HGT between cells through internalization of cfChPs are not well supported because they are only based on evidence from one type of methodological approach: immunofluorescence and fluorescent in situ hybridization (FISH) using protein antibodies and DNA probes. Yet, such strong claims require validation by at least one, but preferably multiple, additional orthogonal approaches. This includes, for example, whole genome sequencing (to validate concatemerization, integration in receiving cells, transposition in receiving cells), RNA-seq (to validate expression), ChiP-seq (to validate chromatin state).

      We have responded to this criticism under “Reviewer #1 (Recommendations for the authors, item no. 1-4)”.

      Another weakness of this study is that it is performed only in one receiving cell type (NIH3T3 mouse cells). Thus, rather than a general phenomenon occurring on a massive scale in every multicellular organism, it could merely reflect aberrant properties of a cell line that for some reason became permeable to exogenous cfChPs. This begs the question of the relevance of this study for living organisms.

      We have responded to this criticism under “Reviewer #1 (Recommendations for the authors, item no. 6)”.

      Should HGT through internalization of circulating chromatin occur on a massive scale, as claimed in this study, and as illustrated by the many FISH foci observed in Fig 3 for example, one would expect that the level of somatic mosaicism may be so high that it would prevent assembling a contiguous genome for a given organism. Yet, telomere-to-telomere genomes have been produced for many eukaryote species, calling into question the conclusions of this study.

      The reviewer is right in expecting that the level of somatic mosaicism may be so high that it would prevent assembling a contiguous genome. This is indeed the case, and we find that beyond ~ 250 passages the cfChPs treated NIH3T3 cells begin to die out apparently become their genomes have become too unstable for survival. This point will be highlighted in the revised version (pp. 45-46, lines 725-731).

      Reviewer #2 (Public review):

      I must note that my comments pertain to the evolutionary interpretations rather than the study's technical results. The techniques appear to be appropriately applied and interpreted, but I do not feel sufficiently qualified to assess this aspect of the work in detail.

      I was repeatedly puzzled by the use of the term "function." Part of the issue may stem from slightly different interpretations of this word in different fields. In my understanding, "function" should denote not just what a structure does, but what it has been selected for. In this context, where it is unclear if cfChPs have been selected for in any way, the use of this term seems questionable.

      We agree. We have removed the term “function” wherever we felt we had used it inappropriately.

      Similarly, the term "predatory genome," used in the title and throughout the paper, appears ambiguous and unjustified. At this stage, I am unconvinced that cfChPs provide any evolutionary advantage to the genome. It is entirely possible that these structures have no function whatsoever and could simply be byproducts of other processes. The findings presented in this study do not rule out this neutral hypothesis. Alternatively, some particular components of the genome could be driving the process and may have been selected to do so. This brings us to the hypothesis that cfChPs could serve as vehicles for transposable elements. While speculative, this idea seems to be compatible with the study's findings and merits further exploration.

      We agree with the reviewer’s viewpoint. We have replaced the term “predatory genome” with a more realistic term “satellite genome” in the title and throughout the manuscript. We have also thoroughly revised the discussion section and elaborated on the potential role of LINE-1 and Alu elements carried by the concatemers in mammalian evolution. (pp. 46-47, lines 743-756).

      I also found some elements of the discussion unclear and speculative, particularly the final section on the evolution of mammals. If the intention is simply to highlight the evolutionary impact of horizontal transfer of transposable elements (e.g., as a source of new mutations), this should be explicitly stated. In any case, this part of the discussion requires further clarification and justification.

      As mentioned above, we have revised the “discussion” section taking into account the issues raised by the reviewer and highlighted the potential role of cfChPs in evolution by acting as vehicles of transposable elements.

      In summary, this study presents important new findings on the behavior of cfChPs when introduced into a foreign cellular context. However, it overextends its evolutionary interpretations, often in an unclear and speculative manner. The concept of the "predatory genome" should be better defined and justified or removed altogether. Conversely, the suggestion that cfChPs may function at the level of transposable elements (rather than the entire genome or organism) could be given more emphasis.

      As mentioned above, we have replaced the term “predatory genome” with “satellite genome” and revised the “discussion” section taking into account the issues raised by the reviewer.

      Reviewer #1 (Recommendations for the authors):

      (1) I strongly recommend validating the findings of this study using other approaches. Whole genome sequencing using both short and long reads should be used to validate the presence of human DNA in the mouse cell line, as well as its integration into the mouse genome and concatemerization. Breakpoints between mouse and human DNA can be searched in individual reads. Finding these breakpoints in multiple reads from two or more sequencing technologies would strengthen their biological origin. Illumina and ONT sequencing are now routinely performed by many labs, such that this validation should be straightforward. In addition to validating the findings of the current study, it would allow performance of an in-depth characterization of the rearrangements undergone by both human cfChPs and the mouse genome after internalization of cfChPs, including identification of human TE copies integrated through bona fide transposition events into the mouse genome. New copies of LINE and Alu TEs should be flanked by target site duplications. LINE copies should be frequently 5' truncated, as observed in many studies of somatic transposition in human cells.

      (2) Furthermore, should the high level of cell-to-cell HGT detected in this study occur on a regular basis within multicellular organisms, validating it through a reanalysis of whole genome sequencing data available in public databases should be relatively easy. One would expect to find a high number of structural variants that for some reason have so far gone under the radar.

      (3) Short and long-read RNA-seq should be performed to validate the expression of human cfChPs in mouse cells. I would also recommend performing ChIP-seq on routinely targeted histone marks to validate the chromatin state of human cfChPs in mouse cells.

      (4) The claim that fused human proteins are produced in mouse cells after exposing them to human cfChPs should be validated using mass spectrometry.

      The reviewer has suggested a plethora of techniques to validate our findings. Clearly, it is neither possible to undertake all of them nor to incorporate them into the manuscript. However, as suggested by the reviewer, we did conduct transcriptome sequencing of cfChPs treated NIH3T3 cells and were able to detect the presence of human-human fusion sequences (representing concatemerisation) as well as human-mouse fusion sequences (representing genomic integration). However, we realized that the amount of material required to be incorporated into the manuscript to include “material and methods”, “results”, “discussion”, “figures” and “legends to figures” and “supplementary figures and tables” would be so massive that it will detract from the flow of our work and hijack it in a different direction. We have, therefore, decided to publish the transcriptome results as a separate manuscript. However, to address the reviewer’s concerns we have now referred to results of our earlier whole genome sequencing study of NIH3T3 cells similarly treated with cfChPs wherein we had conclusively detected the presence of human DNA and human Alu sequences in the treated mouse cells. These findings have now been added as an independent paragraph (pp. 48, lines. 781-792).

      (5) It is unclear from what is shown in the paper (increase in FISH signal intensity using Alu and L1 probes) if the increase in TE copy number is due to bona fide transposition or to amplification of cfChPs as a whole, through mechanisms other than transposition. It is also unclear whether human TEs end up being integrated into the neighboring mouse genome. This should be validated by whole genome sequencing.

      Our results suggest that TEs amplify and increase their copy number due to their association with DNA polymerase and their ability to synthesize DNA (Figure 14a and b). Our study design cannot demonstrate transposition which will require real time imaging.

      The possibility of incorporation of TEs into the mouse genome is supported by our earlier genome sequencing work, referred to above, wherein we detected multiple human Alu sequences in the mouse genome (pp. 48, lines. 781-792).

      (6) In order to be able to generalize the findings of this study, I strongly encourage the authors to repeat their experiments using other cell types.

      We thank the reviewer for this suggestion. We have now used four different cell lines derived from four different species and demonstrated that horizontal transfer of cfChPs occur in all of them suggesting that it is a universal phenomenon. (pp. 37, lines 560-572) and (Supplementary Fig. S14a-d).

      We have also mentioned this in the abstract (pp. 3, lines 52-54).

      (7) Since the results obtained when using cfChPs isolated from healthy individuals are identical to those shown when using cfChPs from cancer sera, I wonder why the authors chose to focus mainly on results from cancer-derived cfChPs and not on those from healthy sera.

      Most of the experiments were conducted using cfChPs isolated from cancer patients because of our especial interest in cancer, and our earlier results (Mittra et al., 2015) which had shown that cfChPs isolated from cancer patients had significantly greater activity in terms of DNA damage and activation of apoptotic pathways than those isolated from healthy individuals. We have now incorporated the above justification on (pp. 6, lines. 124-128).

      (8) Line 125: how was the 10-ng quantity (of human cfChPs added to the mouse cell culture) chosen and how does it compare to the quantity of cfChPs normally circulating in multicellular organisms?

      We chose to use 10ng based on our earlier report in which we had obtained robust biological effects such as activation of DDR and apoptotic pathways using this concentration of cfChPs (Mittra I et. al. 2015). We have now incorporated the justification of using this dose in our manuscript (pp. 51-52, lines. 867-870).

      (9) Could the authors explain why they repeated several of their experiments in metaphase spreads, in addition to interphase?

      We conducted experiments on metaphase spreads in addition to those on chromatin fibres because of the current heightened interest in extra-chromosomal DNA in cancer, which have largely been based on metaphase spreads. We were interested to see how the cfChP concatemers might relate to the characteristics of cancer extrachromosomal DNA and whether the latter in fact represent cfChPs concatemers acquired from surrounding dying cancer cells. We have now mentioned this on pp. 7, lines 150-155.

      (10) Regarding negative controls consisting in checking whether human probes cross-react with mouse DNA or proteins, I suggest that the stringency of washes (temperature, reagents) should be clearly stated in the manuscript, such that the reader can easily see that it was identical for controls and positive experiments.

      We were fully aware of these issues and were careful to ensure that washing steps were conducted meticulously. The careful washing steps have been repeatedly emphasized under the section on “Immunofluorescence and FISH” (pp. 54-55, lines. 922-944).

      (11) I am not an expert in Immuno-FISH and FISH with ribosomal probes but it can be expected that ribosomal RNA and RNA polymerase are quite conserved (and thus highly similar) between humans and mice. A more detailed explanation of how these probes were designed to avoid cross-reactivity would be welcome.

      We were aware of this issue and conducted negative control experiment to ensure that the human ribosomal RNA probe and RNA polymerase antibody did not cross-react with mouse. Please see Supplementary Fig. S4c.

      (12) Finally, I could not understand why the cfChPs internalized by neighboring cells are called predatory genomes. I could not find any justification for this term in the manuscript.

      We agree and this criticism has also been made by #Reviewer 2. We have now replaced the term “predatory” genomes with “satellite” genomes.

      Reviewer #2 (Recommendations for the authors):

      (1) P2 L34: The term "role" seems to imply "what something is supposed to do" (similar to "function"). Perhaps "impact" would be more neutral. Additionally, "poorly defined" is vague-do you mean "unknown"?

      We thank the reviewer for this suggestion. We have now rephrased the sentence to read “Horizontal gene transfer (HGT) plays an important evolutionary role in prokaryotes, but it is thought to be less frequent in mammals.” (pp. 2, lines. 26-27).

      (2) P2 L35: It seems that the dash should come after "human blood."

      Thank you, we have changed the position of the dash (pp. 2, line. 29).

      (3) P2 L37: Must we assume these structures have a function? Could they not simply be side effects of other processes?

      We think this is a matter of semantics, especially since we show that cfChPs once inside the cell perform many functions such as replication, DNA synthesis, RNA synthesis, protein synthesis etc. We, therefore, think the word “function” is not inappropriate.

      (4) Abstract: After reading the abstract, I am unclear on the concept of a "predatory genome." Based on the summarized results, it seems one cannot conclude that these elements provide any adaptive value to the genome.

      We agree. We have now replaced the term “predatory” genomes with a more realistic term viz. “satellite” genomes.

      (5) Video abstract: The video abstract does not currently stand on its own and needs more context to be self-explanatory.

      Thank you for pointing this out. We have now created a new and much more professional video with more context which we hope will meet with the reviewer’s approval.

      (6) P4 L67: Again, I am uncertain that HGT should be said to have "a role" in mammals, although it clearly has implications and consequences. Perhaps "role" here is intended to mean "consequence"?

      We have now changed the sentence to read as follows “However, defining the occurrence of HGT in mammals has been a challenge” (pp. 4, line. 73).

      (7) P6 L111: The phrase "to obtain a new perspective about the process of evolution" is unclear. What exactly is meant by this statement?

      We have replaced this sentence altogether which now reads “The results of these experiments are presented in this article which may help to throw new light on mammalian evolution, ageing and cancer” (pp. 5-6, lines 116-118).

      (8) P38 L588: The term "predatory genome" has not been defined, making it difficult to assess its relevance.

      This issue has been addressed above.

      (9) P39 L604: The statement "transposable elements are not inherent to the cell" suggests that some TEs could originate externally, but this does not rule out that others are intrinsic. In other words, TEs are still inherent to the cell.

      This part of the discussion section has been rewritten and the above sentence has been deleted.

      (10) P39 L609: The phrase "may have evolutionary functions by acting as transposable elements" is unclear. Perhaps it is meant that these structures may serve as vehicles for TEs?

      This sentence has disappeared altogether in the revised discussion section.

      (11) P41 L643: "Thus, we hypothesize ... extensively modified to act as foreign genetic elements." This sentence is unclear. Are the authors referring to evolutionary changes in mammals in general (which overlooks the role of standard mutational processes)? Or is it being proposed that structural mutations (including TE integrations) could be mediated by cfChPs in addition to other mutational mechanisms?

      We have replaced this sentence which now reads “Thus, “within-self” HGT may occur in mammals on a massive scale via the medium of cfChP concatemers that have undergone extensive and complex modifications resulting in their behaviour as “foreign” genetic elements” (pp. 47, lines 763-766).

      (12) P41 L150: The paragraph beginning with "It has been proposed that extreme environmental..." transitions too abruptly from HGT to adaptation. Is it being proposed that cfChPs are evolutionary processes selected for their adaptive potential? This idea is far too speculative at this stage and requires clarification.

      We agree. This paragraph has been removed.

      (13) P43 L681: This summary appears overly speculative and unclear, particularly as the concept of a "predatory genome" remains undefined and thus cannot be justified. It suggests that cfChPs represent an alternative lifestyle for the entire genome, although alternative explanations seem far more plausible at this point.

      We have now replaced the term “predatory” genome with “satellite” genome. The relevant part of the summary section has also been partially revised (pp. 49-50, lines 817-831).

      Changes independent of reviewers’ comments.

      We have made the following additions / modifications.

      (1) The abstract has been modified and it’s “conclusion” section has been rewritten.

      (2) Section 1.14 has been newly added together with accompanying Figures 15 a,b and c.

      (3) The “Discussion” section has been greatly modified and parts of it has been rewritten.

    1. eLife Assessment

      This fundamental study reveals that aging in yeast leads to chromosome mis-segregation due to asymmetric partitioning of chromosomes, driven by disruption of the nuclear pore complex and pre-mRNA leakage. The findings are convincingly supported by carefully-designed experimental data with a combination of genetic, molecular biology and cell biology approaches.

    2. Reviewer #1 (Public review):

      Summary:

      In this study, the authors explore a novel mechanism linking aging to chromosome mis-segregation and aneuploidy in yeast cells. They reveal that, in old yeast mother cells, chromosome loss occurs through asymmetric partitioning of chromosomes to daughter cells, a process coupled with the inheritance of an old Spindle Pole Body. Remarkably, the authors identify that remodeling of the nuclear pore complex (NPC), specifically the displacement of its nuclear basket, triggers these asymmetric segregation events. This disruption also leads to the leakage of unspliced pre-mRNAs into the cytoplasm, highlighting a breakdown in RNA quality control. Through genetic manipulation, the study demonstrates that removing introns from key chromosome segregation genes is sufficient to prevent chromosome loss in aged cells. Moreover, promoting pre-mRNA leakage in young cells mimics the chromosome mis-segregation observed in old cells, providing further evidence for the critical role of nuclear envelope integrity and RNA processing in aging-related genome instability.

      Strengths:

      The findings presented are not only intriguing but also well-supported by robust experimental data, highlighting a previously unrecognized connection between nuclear envelope integrity, RNA processing, and genome stability in aging cells, deepening our understanding of the molecular basis of chromosome loss in aging.

      Weaknesses:

      The authors have satisfactorily addressed my concerns.

    3. Reviewer #2 (Public review):

      Summary:

      The authors make the interesting discovery of increased chromosome non-dysjunction in aging yeast mother cells. The phenotype is quite striking and well supported with solid experimental evidence. This is quite significant to a haploid cell (as used here) - loss of an essential chromosome leads to death soon thereafter. The authors then work to tie this phenotype to other age-associated phenotypes that have been previously characterized: accumulation of extrachromosomal rDNA circles that then correlate with compromised nuclear pore export functions, which correlates with "leaky" pores that permit unspliced mRNA messages to be inappropriately exported to the cytoplasm. They then infer that three intron containing mRNAs that encode portions in resolving sister chromatid separation during mitosis, are unspliced in this age-associated defect and thus lead to the non-dysjunction problem.

      Strengths:

      The discovery of age-associated chromosome non-dysjunction is an interesting discovery, and it is demonstrated in a convincing fashion with "classic" microscopy-based single cell fluorescent chromosome assays that are appropriate and seem robust. The correlation of this phenotype with other age-associated phenotypes - specifically extrachromosomal rDNA circles and nuclear pore dysfunction - is supported by in vivo genetic manipulations that have been well-characterized in the past.

      In addition, the application of the single cell mRNA splicing defect reporter showed very convincingly that general mRNA splicing is compromised in aged cells. Such a pleiotropic event certainly has big implications.

      Weaknesses:

      The authors have addressed my major concerns with experimentation or clarification.

    4. Reviewer #3 (Public review):

      Summary:

      Mirkovic et al explore the cause underlying development of aneuploidy during aging. This paper provides a compelling insight into the basis of chromosome missegregation in aged cells, tying this phenomenon to the established Nuclear Pore Complex architecture remodeling that occurs with aging across a large span of diverse organisms. The authors first establish that aged mother cells exhibit aberrant error correction during mitosis. As extrachromosomal rDNA circles (ERCs) are known to increase with age and lead to NPC dysfunction that can result in leakage of unspliced pre-mRNAs, Mirkovic et al search for intron-containing genes in yeast that may be underlying chromosome missegregation, identifying three genes in the aurora B-dependent error correction pathway: MCM21, NBL1, and GLC7. Interestingly, intron-less mutants in these genes suppress chromosome loss in aged cells, with a significant impact observed when all three introns were deleted (3x∆i). The 3x∆i mutant also suppresses the increased chromosome loss resulting from nuclear basket destabilization in a mlp1∆ mutant. The authors then directly test if aged cells do exhibit aberrant mRNA export, using RNA FISH to identify that old cells indeed leak intron-containing pre-mRNA into the cytoplasm, as well as a reporter assay to demonstrate translation of leaked pre-mRNA, and that this is suppressed in cells producing less ERCs. Mutants causing increased pre-mRNA leakage are sufficient to induce chromosome missegregation, which is suppressed by the 3x∆i.

      Strengths:

      The finding that deleting the introns of 3 genes in the Aurora B pathway can suppress age-related chromosome missegregation is highly compelling. Additionally, the rationale behind the various experiments in this paper is well-reasoned and clearly explained.

      Weaknesses:

      My main concerns have been thoroughly addressed by the authors.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review): 

      Summary: 

      In this study, the authors explore a novel mechanism linking aging to chromosome mis-segregation and aneuploidy in yeast cells. They reveal that, in old yeast mother cells, chromosome loss occurs through asymmetric partitioning of chromosomes to daughter cells, a process coupled with the inheritance of an old Spindle Pole Body. Remarkably, the authors identify that remodelling of the nuclear pore complex (NPC), specifically the displacement of its nuclear basket, triggers these asymmetric segregation events. This disruption also leads to the leakage of unspliced pre-mRNAs into the cytoplasm, highlighting a breakdown in RNA quality control. Through genetic manipulation, the study demonstrates that removing introns from key chromosome segregation genes is sufficient to prevent chromosome loss in aged cells. Moreover, promoting pre-mRNA leakage in young cells mimics the chromosome mis-segregation observed in old cells, providing further evidence for the critical role of nuclear envelope integrity and RNA processing in aging-related genome instability. 

      Strengths: 

      The findings presented are not only intriguing but also well-supported by robust experimental data, highlighting a previously unrecognized connection between nuclear envelope integrity, RNA processing, and genome stability in aging cells, deepening our understanding of the molecular basis of chromosome loss in aging. 

      We thank the reviewer for this very positive assessment of our work

      Weaknesses: 

      Further analysis of yeast aging data from microfluidic experiments will provide important information about the dynamic features and prevalence of the key aging phenotypes, e.g. pre-mRNA leakage and chromosome loss, reported in this work. 

      We thank the reviewer for bringing this point, which we have addressed in the revised version of the manuscript.  In short, chromosome loss is an abrupt, late event in the lifespan of the cells. To examine its prevalence, we have quantified the combined loss frequency of two chromosomes when both are labelled in the same cell. Whereas single chromosomes are lost at a frequency of 10-15% per cell, less than 5% of the cells lose both at the same time.  Thus, the different chromosomes are lost largely but not fully independently from each other. Based on these data, and on the fact that yeast cells have 16 chromosomes, we evaluate that about half of the cells lose at least one chromosome in their final cell cycle.

      We also tried to estimate the prevalence of the pre-mRNA leakage phenotype, based on the increased mCherry to GFP ratio observed between 0h and 24 hours of aging for 146 individual cells. For this analysis, we compared the mCherry/GFP ratio at 0 and 24h for the same individual cell. This analysis indicates that 81% of the cells show a fold change strictly above 1 as they age. Furthermore, the data appears to be unimodal. Thus, we can conservatively conclude that a majority of the cells show premRNA leakage at 24 hours.  Since not all cells are at the end of their life at that time, this is possibly an underestimate.

      In addition, a discussion would be needed to clarify the relationship between "chromosome loss" in this study and "genomic missegregation" reported previously in yeast aging. 

      Genomic mis-segregation is characterized by the entry of both SPBs and all the chromosomes into the daughter cell compartment (PMID: 31714209).  We have observed these events in our movies as well.  However, the chromosome loss phenotype that we are focusing on affects only some chromosomes (as discussed above) and takes place under proper elongation of the spindle, with one SPB remaining in the mother cell whereas the other one goes to the bud, as shown in the manuscript’s Figure 2.  In our movies, chromosome loss is at least three-fold more frequent (for a single chromosome) than full genome mis-segregation (Sup Fig 1A-B). Furthermore, whereas chromosome loss is alleviated by the removal of the introns of MCM21, NBL1 and GLC7, genomic mis-segregation is not (Sup Fig 1B).  Thus, genomic mis-segregation mentioned by the reviewer is a process distinct from the chromosome loss that we report.  This discussion and the relevant data have been added to the manuscript.

      We thank the reviewer for bringing up the possible confusion between these two phenotypes, allowing us to clarify this point.

      Reviewer #2 (Public review): 

      Summary: 

      The authors make the interesting discovery of increased chromosome non-dysjunction in aging yeast mother cells. The phenotype is quite striking and well supported with solid experimental evidence. This is quite significant to a haploid cell (as used here) - loss of an essential chromosome leads to death soon thereafter. The authors then work to tie this phenotype to other age-associated phenotypes that have been previously characterized: accumulation of extrachromosomal rDNA circles that then correlate with compromised nuclear pore export functions, which correlates with "leaky" pores that permit unspliced mRNA messages to be inappropriately exported to the cytoplasm. They then infer that three intron containing mRNAs that encode portions in resolving sister chromatid separation during mitosis, are unspliced in this age-associated defect and thus lead to the non-dysjunction problem. 

      Strengths: The discovery of age-associated chromosome non-dysjunction is an interesting discovery, and it is demonstrated in a convincing fashion with "classic" microscopy-based single cell fluorescent chromosome assays that are appropriate and seem robust. The correlation of this phenotype with other age-associated phenotypes - specifically extrachromosomal rDNA circles and nuclear pore dysfunction - is supported by in vivo genetic manipulations that have been well-characterized in the past. 

      In addition, the application of the single cell mRNA splicing defect reporter showed very convincingly that general mRNA splicing is compromised in aged cells. Such a pleiotropic event certainly has big implications. 

      We thank the reviewer for this assessment of our work.  To avoid confusion, we would like to stress out, however, that our data do not show that splicing per se is defective in old cells.  Actually, we specifically show that the cells are unlikely to show splicing defect (last figure of the original and the revised version of the manuscript). Our data specifically show that unspliced mRNAs tend to leak out of the nucleus of old cells.

      Weaknesses: 

      The biggest weakness is "connecting all the dots" of causality and linking the splicing defect to chromosome disjunction. I commend the authors for making a valiant effort in this regard, but there are many caveats to this interpretation. While the "triple intron" removal suppressed the non-dysjunction defect in aged cells, this could simply be a kinetic fix, where a slowdown in the relevant aspects of mitosis, could give the cell time to resolve the syntelic attachment of the chromatids.  

      The possibility that intron-removal leads to a kinetic fix is an interesting idea that we have now considered.  In the revised manuscript, we now provide measurements of mitotic duration in the “triple intron” mutant compared to wild type cells and the duration of their last cell cycle (See supplementary figure 3A-D). There is no evidence that removing these introns slows down mitosis.  Thus, the kinetic fix hypothesis is unlikely to explain our observation about the effect of intron removal.

      To this point, I note that the intron-less version of GLC7, which affects the most dramatic suppression of the three genes, is reported by one of the authors to have a slow growth rate (Parenteau et al, 2008 - https://doi.org/10.1091/mbc.e07-12-1254)

      The reviewer is right, removing the intron of GLC7 reduces the expression levels of the gene product (PMID: 16816425) to about 50% of the original value and causes a slow growth phenotype.  However, the cells revert fairly rapidly through duplication of the GLC7-∆i gene (see supplementary Figure 3EF).  As a consequence, neither the GLC7-∆i nor the 3x∆i mutant strains show noticeable growth phenotypes by spot assays.  We now document these findings in supplementary figure 3.  

      Lastly, the Herculean effort to perform FISH of the introns in the cytoplasm is quite literally at the statistical limit of this assay. The data were not as robust as the other assays employed through this study. The data show either "no" signal for the young cells or a signal of 0, 1, or 2 FISH foci in the aged cells. In a Poisson distribution, which this follows, it is improbable to distinguish between these differences. 

      This is correct, this experiment was not the easiest of the manuscript... However, despite the limitations of the assay, the data presented in figure 7B are very clear.  300 cells aged by MEP were analysed, divided in the cohorts of 100 each, and the distribution of foci (nuclear vs cytoplasmic) in these aged cells were compared to the distribution in three cohorts of young cells.  For all 3 aged cohorts, over 70% of the visible foci were cytoplasmic, while in the young cells, this figure was around 3%.  A t-test was conducted to compare these frequencies between young and old cells (Figure 7B). The difference is highly significant.  Therefore, we are clearly not at the statistical limit.

      What the reviewer refers to is the supplementary Figure 4, where we were simply asking i) is the signal lost in cells lacking the intron of GLC7 (the response is unambiguously yes) and ii) what is the general number of dots per cell between young and old wild type cells (without distinguishing between nuclear and cytoplasmic) and the information to be taken from this last quantification is indeed that there is no clearly distinguishable difference between these two population of cells, as the reviewer rightly concludes.  In other word, the reason why there are more dots in the cytoplasm of the old cells in the Figure 7B is not because the old cells have much more dots in general (see supplementary Figure 4C).  We hope that these clarifications help understand the data better.  We have edited the manuscript to avoid confusion.

      Reviewer #3 (Public review): 

      Summary: 

      Mirkovic et al explore the cause underlying development of aneuploidy during aging. This paper provides a compelling insight into the basis of chromosome missegregation in aged cells, tying this phenomenon to the established Nuclear Pore Complex architecture remodelling that occurs with aging across a large span of diverse organisms. The authors first establish that aged mother cells exhibit aberrant error correction during mitosis. As extrachromosomal rDNA circles (ERCs) are known to increase with age and lead to NPC dysfunction that can result in leakage of unspliced pre-mRNAs, Mirkovic et al search for intron-containing genes in yeast that may be underlying chromosome missegregation, identifying three genes in the aurora B-dependent error correction pathway: MCM21, NBL1, and GLC7. Interestingly, intron-less mutants in these genes suppress chromosome loss in aged cells, with a significant impact observed when all three introns were deleted (3x∆i). The 3x∆i mutant also suppresses the increased chromosome loss resulting from nuclear basket destabilization in a mlp1∆ mutant. The authors then directly test if aged cells do exhibit aberrant mRNA export, using RNA FISH to identify that old cells indeed leak intron-containing pre-mRNA into the cytoplasm, as well as a reporter assay to demonstrate translation of leaked pre-mRNA, and that this is suppressed in cells producing less ERCs. Mutants causing increased pre-mRNA leakage are sufficient to induce chromosome missegregation, which is suppressed by the 3x∆i. 

      Strengths: 

      The finding that deleting the introns of 3 genes in the Aurora B pathway can suppress age-related chromosome missegregation is highly compelling. Additionally, the rationale behind the various experiments in this paper is well-reasoned and clearly explained. 

      We thank the reviewer for their very positive assessment of our work

      Weaknesses:  

      In some cases, controls for experiments were not presented or were depicted in other figures. 

      We are sorry about this confusion.  We have improved our presentation of the controls, bringing them back each time they are relevant.  We have also added those that were missing (such as those mentioned by reviewer 2, see above). Note that the frequencies of centromeric plasmid loss at 0h in Figure 1C is not meaningful and therefore not presented. Since the cells were grown on selective medium before loading on to the ageing chip, we cannot report a plasmid loss frequency here. The ageing experiments themselves were subsequently conducted in full medium, to allow for centromeric plasmid loss without killing the cell. We explain this in the materials and methods section.

      High variability was seen in chromosome loss data, leading to large error bars. 

      We thank the reviewer for this comment. The variance in those two figures (3A and 5D) comes from the suboptimal plotting of this data. This is now corrected as follows.  We divided the available data into 4 cohorts and then plotted the average loss frequency across these cohorts for the indicated age groups.  This filters out much of the noise and improves the statistical resolution.

      The text could have been more polished. 

      Thank you for this comment.  We have gone through the manuscript again in detail.

      Reviewer #1 (Recommendations for the authors):

      (1) A previous study (PMID: 31714209). showed that aging yeast cells undergo genomic missegregation in which material was abnormally segregated to the daughter cells, leading to cell cycle arrest. After that, the missegregation is either corrected by returning aberrantly segregated genetic material to the mother cells so that they can resume cell cycles, or if not corrected, the mother cells will terminally exist the cell cycle and eventually die. That paper also showed that this agedependent genomic missegregation is related to rDNA instability. Is the chromosome loss in this work related to the genomic missegregation reported before? Is it partially reversible like genomic missegregation? Are all the chromosomes lost in one cell division, like in the case of genomic missegregation? Some additional characterization and a discussion would be helpful. 

      As mentioned above, indeed the phenotype of full genome mis-segregation described by Crane et al. (2019) is observable in our data as well. At 24h ~3% of the cells segregate both SPBs to the bud, as they previously described (Supp Figure 1A and B).  This phenomenon is clearly distinct from asymmetric chromosome partition, where cells undergo anaphase, separate the SPBs and segregate one to the mother cell and one to the bud (Figure 2A).  Also, asymmetric chromosome partitioning affects only a subset of the chromosomes (see below), not the entire genome. Finally, unlike asymmetric chromosome partitioning, the frequency of genome mis-segregation in ageing was not alleviated by intron removal (Supp Figure 1B). Thus, these two processes are clearly distinct and driven by different mechanisms. Note that asymmetric chromosome partitioning appears 3 to 5 times more frequently than genomic mis-segregation.

      Supporting further the notion that these two processes are distinct, chromosome loss seals the end of the life of the cell, as we reported, indicating that this is not a reversible event.  Also, it does not involve all chromosomes at once. Cells that contain the labelled versions of both chromosome II and IV at the same time, the loss frequency of both chromosomes is less than 5%, whereas each chromosome is lost in 10-15% of the cells (Figure 1C). Thus, most cells lose one and keep the other. Furthermore, this indicates that there are many more cells losing at least one chromosome than the 15% that lose chromosome IV for example, probably 50% or more.  Thus, chromosome loss by asymmetric segregation is much more frequent than the partly transient transfer of the entire nucleus to the bud.

      (2) What percentage of aging WT cells undergo pre-mRNA leakage (using the GFP/mCherry reporter) during their entire lifespan? Is it a sporadic, reversible process or an accumulative, one-way deterioration? Previous studies (PMID: 32675375; PMID: 24332850; PMID: 36194205; PMID: 31291577) showed that only a fraction of yeast cells age with rDNA instability and ERC accumulation, as indicated by excessive rRNA transcription and nucleolar enlargement. Are they the same fraction of aging cells that undergo pre-mRNA leakage and chromosome loss? This information will indicate the prevalence of the key aging phenotypes reported in this work and should be readily obtainable from microfluidic experiments. In addition, a careful discussion would be helpful. 

      Pre-mRNA leakage is relatively widespread in the population, but it is difficult to put a precise number on it. Analysis of how the mCherry/GFP ratio changes in 146 individual cells between 0 and 24 hours and imaging in our microfluidics platform indicates that ~80% show an increase and 50% of the cells show an increase above 1.5-fold. Therefore, the frequencies of pre-mRNA leakage and chromosome loss are probably similar.  We have modified the discussion to account for these considerations.  This would be in the same range as the frequency of aging by ERC accumulation (mode 1) estimated by PMID: 32675375. 

      Reviewer #2 (Recommendations for the authors)

      The manuscript could use a bit of editing in places - please go through it once more. 

      Editing suggestions: 

      Line 80 – irrespective

      Corrected.

      Line 97 - these are not "rates" but frequencies. Please correct this error throughout. 

      Replaced “rate” with “frequency throughout the manuscript and the figures, when pertaining to chromosome loss

      Line 328 - increase in chromosome... 

      Corrected.

      Line 379 - tampering 

      Reviewer #3 (Recommendations for the authors):

      Specific Feedback to Authors 

      (a) Major Points 

      (i) While the proposed connection between ERC-mediated nuclear basket removal and erroneous error correction was clearly stated, this connection is correlative and was not directly tested. Specifically, although mutants impacting ERC levels were tested for missegregation, it was not directly tested if increased missegregation levels occurred due to ERC tethering to the NPC and subsequent nuclear basket removal. It is possible that the increased ERCs may be driving missegregation via a different pathway. Authors should consider experiments to strengthen this idea, such as looking at chromosome loss frequency in a sir2∆ 3x∆i double mutant, or a sir2∆ sgf73∆ double mutant. 

      This connection is addressed in the original version of the manuscript, where we show that preventing attachment of ERCs to the NPC, by removing the linker protein Sgf73, alleviates chromosome loss.  The link is further substantiated by the fact that removing the basket on its own promote chromosome loss and that in both cases, namely during normal aging, i.e., upon ERC accumulation, and upon basket removal the mechanism of chromosome loss is the same.  In both cases, it depends on the introns of the GLC7, MCM21 and NBL1 genes.  

      However, we acknowledge that the mutants tested have pleiotropic effects, making interpretation somewhat difficult, even when examining chromosome loss in multiple mutants that affect ERC formation and NPC remodelling, as we have done.  As recommended by the reviewer, we have characterized the phenotype of the sir2∆ 3x∆i mutant strain. Intron removal in the sir2∆ mutant cells largely rescued the elevated chromosome loss frequency of these cells and slightly extended their replicative lifespan (Figure 6D-E). We conclude that intron removal can remedy the chromosome loss phenotype of the sir2∆. Although clearly significant, the effect on the replicative lifespan was not very strong, likely due to the sir2∆ affecting other ageing processes.

      Touching on this question, we added a new set of experiments asking whether any accumulating DNA circle causes chromosome loss in an intron-dependent manner.  Thus, we have introduced a noncentromeric replicative plasmid in wild type and 3x∆i mutant strains carrying the labelled version of chromosome II (Figure 6A-C).  These studies show that these cells age much faster than wild type cells, as expected, and lose chromosomes at a higher frequency than non-transformed cells.  Finally, the effect is at least in part alleviated by removing the introns of NBL1, MCM21 and GLC7.

      Therefore, after adding this new and more direct test of the role of DNA circles in chromosome loss, we are confidently concluding that ERC-mediated basket removal is the trigger of chromosome loss in old cells.

      (b) Minor Points 

      (i) In Figure 1C, the text (lines 91-92) argues that chromosome loss happens abruptly as cells age; however the data only show loss at young and old time points, not an intermediate, which leaves open the possibility that chromosome loss is occurring gradually. While cells that lost chromosomes should fail to divide further, we don't know if these events happened and were simply excluded.

      We agree with the reviewer that formally the conclusion drawn in the lines 91-92 (of the original manuscript), namely that chromosome loss takes place abruptly as cells age, cannot be drawn from the Figure 1C alone but only from subsequent observations. However, since chromosome loss is lethal in haploid, as we mention in the text and the reviewer notes as well, it is difficult to envision how cells could lose chromosomes before the end of their lifespan and must therefore increase abruptly as the cells reach that point.  This is now underlined in the revised version of the manuscript. Accordingly, the frequency of chromosome loss per age group, which is depicted in Figure 3A, shows that the wild type cells that have budded less than 10 times show no chromosome loss. The chromosome loss frequency starts to ramp up only pass that point. Therefore, chromosome loss does not increase linearly with age.

      Additionally, cells that lost minichromosome should not arrest. We suggest that the interpretation of these data should be softened in the text, or that chromosome loss fraction could be more effectively portrayed as a Kaplan-Meier survival curve depicting cells that have not lost chromosomes, if these data are easily available. Or, chromosome loss at an intermediate time point could be depicted. 

      Since we cannot visualize more than 2 chromosomes at a time, it is not possible to plot the KaplanMeier curve of cells that have not lost chromosomes. However, as mentioned above, the chromosome loss frequencies at intermediate time points are depicted in Figure 3A and Figure 4B and shows that it increases with age.

      (ii) Also regarding Figure 1, it would be helpful to expound on the purpose of the minichromosomes, as well as how the Ubi-GFP minichromosome is constructed. 

      We now explained why we tested the loss of minichromosome, namely, as a mean to test whether the centromere is necessary and sufficient to drive the loss of the genetic material linked to it, i.e., chromosomes, in old cells.  Concerning the Ubi-GFP minichromosome, the Materials and methods section is now updated and reports plasmid construction, backbone used, primers as well as the plasmid sequence being available in the supplementary data.

      The purpose of the minichromosome initially appears to be the engineering of an eccDNA (ERC) with a CEN to demonstrate distinct behaviour, but it is unclear whether this was actually conducted or if the minichromosome are simply CEN plasmids and/or if this was the intended goal. Furthermore, lines 102-103 state that the presence of a centromere was necessary and sufficient for minichromosome loss. However, since no constructs lacking a centromere were tested, necessity cannot be concluded. Please clarify this in the text and include experimental details to help readers understand what was tested. 

      We apologize for having been too short here. The behaviour of the CEN-less version of this plasmid has been characterized in detail in previous studies (Shcheprova et al., 2008; Denoth-Lippuner 2014, Meinema et al 2022). Here we focused on the behaviour of the CEN+ version of an otherwise Identical plasmid.  We now clarify in the text that this plasmid is retained in the mother cell when CEN-less and cite the relevant literature. 

      (iii) It is unclear how cells at 0-3 budding events were identified in assays using the microfluidics platform. Can the authors clarify the known "age" of the cells once captured, i.e. how do the authors know how many divisions a cell has undergone prior to capture? 

      The reviewer is right; we do not know the exact age of these cells.  However, in any asynchronous population of yeast cells, which is what we start from, 50% of the cells are newborn daughters, 25% have budded once, 12.5 have budded twice, 6.25 % have budded three times…  Therefore, at the time of loading, 93% of the cells have budded between 0 and 3 times.  For this reason, we report to this population as cells age 0-3 CBE. We acknowledge that this is an approximation, but it remains a relatively safe one.  

      (iv) While the schematic in Figure 2D is generally helpful, a different depiction of the old and new SPBs would be beneficial in cases where the new SPB and TetR-GFP are depicted as colocalized, it is difficult to see that the red is fainter for the new SPB. 

      We have corrected this issue by completely separating the SPB and the Chromosome signals in the Figure 2D.

      (v) In Figure 2F, the grey colour of the 12h Ipl1-321 data bar did not have high enough contrast when the manuscript was printed-would recommend changing this to a darker shade. 

      We have corrected this issue by using a darker shade of grey.

      (vi) In Figure 3A, 'Budding' is misspelled on X-axis label  

      We have corrected this error.

      (vii) In Figure 4, the authors should clarify the differences between the analyses in panels B and C. The distinction is not immediately clear and may be difficult to grasp upon initial reading. 

      We have corrected this issue in the main text as well as figure legend.

      (viii) In Figure 5, It would aid comparisons to depict the 3x∆i only as well on panels B, D, and E. 

      We have added 3x∆i data to Figure 5,6 and 8.

      (ix) In Figure 6D, it is unclear why there was an appreciable level of unspliced RNA in the wild-type and sir2∆ young cells. Additionally, it is unclear why there is so much signal observed in the Merge image for the old wild-type cell, especially regarding the apparent bright spot. Is that nuclear signal? Please clarify. 

      The pre-mRNA processing reporter is not very efficiently spliced. It was selected as such during design (Sorenson et al 2014; DOI: 10.1261/rna.042663.113) to provide sensitivity. As for the bright spot occurring, translation of the unspliced reporter produces the N-terminal part of a ribosomal protein, a fraction of which forms some sort of nuclear aggregate in a fraction of the population. 

      (x) In Figure 6E, why does the sir2∆ exhibit higher mCherry/GFP than the wild-type and fob1∆ at "young age"? Is this due to disrupted proteostasis in the sir2∆, or a different pleiotropic effect of sir2∆? Please comment on this observation in the text.

      Indeed, as we have stated in the text the sir2∆ mutation already perturbs pre-mRNA processing in young cells. We do not know the reason of this but indeed it is most probably reflective of its pleiotropic function. Following the reviewer’s request, we now state this in the text. For example, Sir2 may regulate the acetylation state of the basket itself.  The genetic interactions observed between sir2∆ and quite a few nucleoporin mutations seem to support this possibility. 

      (xi) Throughout, the authors switch between depicting aging in Completed Budding Events versus hours, which made it difficult to compare data across figures

      Ideally, all the data in this manuscript should be plotted according to the CBE age of the cell. To ensure that the major findings are plotted in such a way, we have done so for over ~3000 combined cells and thousands of replicative divisions in Figures 3,5-7. All the measurements of chromosome loss at a specific CBE had to be done manually, due to the absence of algorithms that would be able to accurately detect chromosome loss and replicative age. Therefore, doing this for the entirety of our dataset, encompassing well over 50 ageing chips and tens of thousands of cells is not easily doable at this stage. 

      (xii) Typo on line 12 (Sindle Pole Body) 

      We have corrected this error.

      (xiii) The phrase should be 'chromosome partitioning' rather than 'chromosome partition', throughoutfor example, line 17 

      Replaced “chromosome partition” with “chromosome partitioning” throughout the text.

      (xiv) There are inconsistencies between plural and singular references throughout sentences-example, lines 35-37, and lines 44-45. 

      We carefully combed through the manuscript again and hope that we caught all inconsistencies.

    1. eLife Assessment

      This important study of artificial selection in microbial communities shows that the possibility of selecting a desired fraction of slow and fast-growing types is impacted by their initial fractions. The evidence, which relies on mathematical analysis and simulations of a stochastic model, is compelling. It highlights the tension between selection at the strain and the community level. This study should be of interest to researchers interested in ecology, both theoretical and experimental.

    2. Reviewer #1 (Public review):

      Summary:

      The authors demonstrate with a simple stochastic model that the initial composition of the community is important in achieving a target frequency during the artificial selection of a community.

      Strengths:

      To my knowledge, the intra-collective selection during artificial selection has not been seriously theoretically considered. However, in many cases, the species dynamics during the incubation of each selection cycle is important and relevant to the outcome of the artificial selection experiment. Stochasticity from birth and death (demographic stochasticity) plays a big role in these species' abundance dynamics. This work uses a simple framework to tackle this idea meticulously.

      This work may or may not be related to hysteresis (path dependency). If this is true, maybe it would be nice to have a discussion paragraph talking about how this may be the case. Then, this work would even attract the interest of people studying dynamical systems.

      Weaknesses:

      (1) Connecting structure and function.<br /> In typical artificial selection literature, most of them select the community based on collective function. Here in this paper, the authors are selecting a target composition. Although there is a schematic cartoon illustrating the relationship between collective function (y-axis) and the community composition in the main figure 1, there is no explicit explanation or justification of what may be the origin of this relationship. I think giving the readers a naïve idea about how this structure-function relationship arises in the introduction section would help. This is because the conclusion of this paper is that the intra-collective selection makes it hard to artificially select for a community that has an intermediate frequency of f (or s). If there is really evidence or theoretical derivation from this framework that indeed the highest function comes from the intermediate frequency of f, then the impact of this paper would increase because the conclusions of this stochastic model could allude to the reasons for the prevalent failures of artificial selection in literature.

      (2) Explain intra-collective and inter-collective selection better for readers.<br /> The abstract, the introduction, and the result section use these terms or intra-collective and inter-collective selection without much explanation. For the wide readership of eLife, a clear definition in the beginning would help the audience grasp the importance of this paper, because these concepts are at the core of this work.

      (3) Achievable target frequency strongly depending on the degree of demographic stochasticity.<br /> I would expect that the experimentalists would find these results interesting and would want to consider these results during their artificial selection experiments. The main figure 4 indicates that the Newborn size N0 is a very important factor to consider during the artificial selection experiment. This would be equivalent to how much bottleneck you impose on the artificial selection process in every iteration step (i.e., the ratio of serial dilution experiment). However, with a low population size, all target frequencies can be achieved, and therefore in these regimes, the initial frequency now does not matter much. It would be great for the authors to provide what the N0 parameter actually means during the artificial selection experiments. Maybe relative to some other parameter in the model. I know this could be very hard. But without this, the main result of this paper (initial frequency matters) cannot be taken advantage of by the experimentalists.

      (4) Consideration of environmental stochasticity.<br /> The success (gold area of Figure 2d) in this framework mainly depends on the size of the demographic stochasticity (birth-only model) during the intra-collective selection. However, during experiments, a lot of environmental stochasticity appears to be occurring during artificial selection. This may be out of the scope of this study. But it would definitely be exciting to see how much environmental stochasticity relative to the demographic stochasticity (variation in the Gaussian distribution of F and S) matters in succeeding in achieving the target composition from artificial selection.

      (5) Assumption about mutation rates<br /> If setting the mutation rates to zero does not change the result of the simulations and the conclusion, what is the purpose of having the mutation rates \mu? Also, is the unidirectional (S -> F -> FF) mutation realistic? I didn't quite understand how the mutations could fit into the story of this paper.

      (6) Minor points<br /> In Figure 3b, it is not clear to me how the frequency difference for the Intra-collective and the Inter-collective selection is computed.<br /> In Figure 5b, the gold region (success) near the FF is not visible. Maybe increase the size of the figure or have an inset for zoom-in. Why is the region not as big as the bottom gold region?

      Comments on revisions:

      I thank the authors for addressing many points raised by the reviewers. Overall, the readability of the manuscript has improved with more context provided around why they were solving this specific problem. However, I've found many of the responses to be too terse. It would have been nicer if there had been more discussion and description of the thought process that led up to the conclusions they made for each comment or question. Instead, many of the responses only showed the screenshot of the text they added.

      Most of my comments or questions were answered. Below are my comments on some of the authors' responses.

      (2) Explain intra-collective and inter-collective selection better for readers.<br /> In the Abstract and Introduction, you've added more sentences about the intra-collective or inter-collective selection. However, these are either making analogies to the waterfall or just describing the result of the intra/inter-collective selection. I would still appreciate a proper definition of those terms, which is paramount for readers to understand the entire paper.

      (4) Consideration of environmental stochasticity.<br /> I think providing the reason 'why' the paper focuses on demographic stochasticity and not environmental stochasticity will greatly justify the paper's work. For example, citing papers that actually performed artificial selection and pointing out that your model captures the stochasticity from those kinds of experiments would be great.

      (5) Assumption about mutation rates.<br /> It would be great if you could add a citation in the added sentence to support your claim: "This scenario is encountered in biotechnology: .....".

    3. Reviewer #3 (Public review):

      The authors address the process of community evolution under collective-level selection for a prescribed community composition. They mostly consider communities composed of two types that reproduce at different rates, and that can mutate one into the other. Due to such difference in 'fitness' and to the absence of density dependence, within-collective selection is expected to always favour the fastest grower, but collective-level selection can oppose this tendency, to a certain extent at least. By approximating the stochastic within-generation dynamics and solving it analytically, the authors show that not only high frequencies of fast growers can be reproducibly achieved, aligned with their fitness advantage. Small target frequencies can also be maintained, provided that the initial proportion of fast growers is sufficiently small. In this regime, similar to the 'stochastic corrector' model, variation upon which selection acts is maintained by a combination of demographic stochasticity and of sampling at reproduction. These two regions of achievable target compositions are separated by a gap, encompassing intermediate frequencies that are only achievable when the bottleneck size is small enough or the number of communities is (disproportionately) large.

      A similar conclusion, that stochastic fluctuations can maintain the system over evolutionary time far from the prevalence of the faster-growing type, is then confirmed by analyzing a three-species community, suggesting that the qualitative conclusions of this study are generalizable to more complex communities.

      I expect that these results will be of broad interest to the community of researchers who strive to improve community-level selection but are often limited to numerical explorations, with prohibitive costs for a full characterization of the parameter space of such embedded populations. The realization that not all target collective functions can be as easily achieved and that they should be adapted to the initial conditions and the selection protocol is also a sobering message for designing concrete applications.

      A major strength of this work is that the qualitative behaviour of the system is captured by an analytically solvable approximation so that the extent of the 'forbidden region' can be directly and generically related to the parameters of the selection protocol.

      The phenomenon the authors characterize is ecological in nature, though it is maintained even when switching between types is possible. Calling this dynamics community evolution reflects a widespread ambiguity in the field, not ascribable just to this work.

      Although different types compete for being represented in the next generation's propagules, within-generation ecology is here representative of exponential growth. As species interactions are commonly manifest in lab serial dilution experiments, it would be interesting if future work explores the extent of the robustness of these results to density-dependent demography.