10,000 Matching Annotations
  1. Sep 2025
    1. Reviewer #1 (Public review):

      Summary:

      In this study, the authors attempt to devise general rules for aptamer design based on structure and sequence features. The main system they are testing is an aptamer targeting a viral sequence.

      Strengths:

      The method combines a series of well-established protocols, including docking, MD, and a lot of system-specific knowledge, to design several new versions of the Ta aptamer with improved binding affinity.

      Weaknesses:

      The approach requires a lot of existing knowledge and, importantly, an already known aptamer, which presumably was found with Selex. In addition, although the aptamer may have a stronger binding affinity, it is not clear if any of it has any additional useful properties such as stability, etc.

    2. Reviewer #2 (Public review):

      Summary:

      This manuscript proposes a workflow for discovering and optimizing RNA aptamers, with application in the optimization of a SARS-CoV-2 RBD. The authors took a previously identified RNA aptamer, computationally docked it into one specific RBD structure, and searched for variants with higher predicted affinity. The variants were subsequently tested for RBD binding using gel retardation assays and competition with antibodies, and one was found to be a stronger binder by about three-fold than the founding aptamer.

      Overall, this would be an interesting study if it were performed with truly high-affinity aptamers, and specificity was shown for RBD or several RBD variants.

      Strengths:

      The computational workflow appears to mostly correctly find stronger binders, though not de novo binders.

      Weaknesses:

      (1) Antibody competition assays are reported with RBD at 40 µM, aptamer at 5 µM, and a titration of antibody between 0 and 1.2 µg. This approach does not make sense. The antibody concentration should be reported in µM. An estimation of the concentration is 0-8 pmol (from 0-1.2 µg), but that's not a concentration, so it is unknown whether enough antibody molecules were present to saturate all RBD molecules, let alone whether they could have displaced all aptamers.

      (2) These are not by any means high-affinity aptamers. The starting sequence has an estimated (not measured, since the titration is incomplete) KD of 110 µM. That's really the same as non-specific binding for an interaction between an RNA and a protein. This makes the title of the manuscript misleading. No high-affinity aptamer is presented in this study. If the docking truly presented a bound conformation of an aptamer to a protein, a sub-micromolar Kd would be expected, based on the number of interactions that they make.

      (3) The binding energies estimated from calculations and those obtained from the gel-shift experiments are vastly different, as calculated from the Kd measurements, making them useless for comparison, except for estimating relative affinities.

    1. eLife Assessment

      This study is a valuable contribution to the evidence base. However, the evidence provided is incomplete as the study results only partially support the study conclusions. Addressing the methodological and reporting issues raised by the peer reviewers and properly aligning the claim made for providing a tool for early warning with the study analysis/results would improve the study quality and usefulness of its findings.

    2. Reviewer #1 (Public review):

      This is my first review of this manuscript. The authors included previous reviews for a different journal with a length of 90 and 39 pages; I did not review this reply in my assessment of the paper itself. Influenza prediction is not my area of expertise.

      A major concern is that the model is trained in the midst of the COVID-19 pandemic and its associated restrictions and validated on 2023 data. The situation before, during, and after COVID is fluid, and one may not be representative of the other. The situation in 2023 may also not have been normal and reflective of 2024 onward, both in terms of the amount of testing (and positives) and measures taken to prevent the spread of these types of infections. A further worry is that the retrospective prospective split occurred in October 2020, right in the first year of COVID, so it will be impossible to compare both cohorts to assess whether grouping them is sensible.

      The outcome of interest is the number of confirmed influenza cases. This is not only a function of weather, but also of the amount of testing. The amount of testing is also a function of historical patterns. This poses the real risk that the model confirms historical opinions through increased testing in those higher-risk periods. Of course, the models could also be run to see how meteorological factors affect testing and the percentage of positive tests. The results only deal with the number of positive (only the overall number of tests is noted briefly), which means there is no way to assess how reasonable and/or variable these other measures are. This is especially concerning as there was massive testing for respiratory viruses during COVID in many places, possibly including China.

      (1) Although the authors note a correlation between influenza and the weather factors. The authors do not discuss some of the high correlations between weather factors (e.g., solar radiation and UV index). Because of the many weather factors, those plots are hard to parse.

      (2) The authors do not actually compare the results of both methods and what the LSTM adds.

      Minor comments:

      (3) The methods are long and meandering. They could be cleaned up and shortened. E.g., there is no need for 30 lines on PCR testing; the study area should come before the study design. The authors discuss similar elements in multiple places; this whole section can be shortened considerably without affecting the content.

      (4) How reliable is the "Our Word in Data" website for subnational coverage of restrictions? Some of the authors are from Putian and should be able to confirm the accuracy for both studied areas.

      (5) Figure 2A is hard to parse; it would make more sense to plot these as line plots (y=count, x=month).

    3. Reviewer #2 (Public review):

      Summary:

      The study aimed to assess the associations between meteorological drivers and influenza is important although not new. The authors used only 6 years of surveillance data and deep learning models, combining distributed lag non-linear models (DLNM) with Bayesian-optimized LSTM neural networks for predictive modeling. The key interest in this area is to explore the subtropical locations, where influenza is less common and circulates year-round. The authors further claimed that such an association could be able to provide an early warning in the community. In this direction, the current manuscript has several scopes of improvements and clarification of the claims, as I list here.

      Strengths:

      Study design based on a prospective cohort to analyse the data for retrospective outcomes.

      Weaknesses:

      (1) The rationale of the study is not clearly stated.

      (2) Several issues with methodological and data integration should be clarified.

      (3) Validation of the models is not presented clearly.

      (4) The claim for providing tools for 'early warning' was not validated by analysis and results.

    4. Author response:

      Reviewer # 1 (Public review):

      A major concern is that the model is trained in the midst of the COVID-19 pandemic and its associated restrictions and validated on 2023 data. The situation before, during, and after COVID is fluid, and one may not be representative of the other. The situation in 2023 may also not have been normal and reflective of 2024 onward, both in terms of the amount of testing (and positives) and measures taken to prevent the spread of these types of infections. A further worry is that the retrospective prospective split occurred in October 2020, right in the first year of COVID, so it will be impossible to compare both cohorts to assess whether grouping them is sensible.

      We fully concur with the reviewer that the COVID-19 pandemic represents a profound confounding factor that fundamentally impacts the interpretation and generalizability of our model. This is a critical point that deserves a more thorough treatment. In the revised manuscript, we will add a dedicated subsection in the Discussion to explicitly analyze the pandemic’s impact. We will reframe our model’s contribution not as a universally generalizable tool for a hypothetical “normal” future, but as a robust framework demonstrated to capture complex epidemiological dynamics under the extreme, non-stationary conditions of a real-world public health crisis. We will argue that its strong performance on the 2023 validation data, a unique post-NPI “rebound” year, specifically showcases its utility in modeling volatile periods.

      The outcome of interest is the number of confirmed influenza cases. This is not only a function of weather, but also of the amount of testing. The amount of testing is also a function of historical patterns. This poses the real risk that the model confirms historical opinions through increased testing in those higher-risk periods. Of course, the models could also be run to see how meteorological factors affect testing and the percentage of positive tests. The results only deal with the number of positive (only the overall number of tests is noted briefly), which means there is no way to assess how reasonable and/or variable these other measures are. This is especially concerning as there was massive testing for respiratory viruses during COVID in many places, possibly including China.

      The reviewer raises a crucial point regarding surveillance bias, which is inherent in studies using reported case data. We acknowledge this limitation and will address it more transparently.

      (1) Clarification of Available Data: Our manuscript states that over the six-year period, a total of 20,488 ILI samples were tested, yielding 3,155 positive cases (line 471; Figure 1). We will make this denominator more prominent in the Methods section. However, the reviewer is correct that our models for Putian and the external validation for Sanming utilize the daily positive case counts as the outcome. The reality of our surveillance data source is that while we have the aggregate total of tests over six years, obtaining a reliable daily denominator of all respiratory virus tests conducted (not just for ILI patients as per the surveillance protocol) is not feasible. This is a common constraint in real-world public health surveillance systems.

      (2) Justification and Discussion: We will add a detailed paragraph to the Limitations section to address this. We will justify our use of case counts as it is the most direct metric for assessing public health burden and planning resource allocation (e.g., hospital beds, antivirals). We will also explain that modeling the positivity rate presents its own challenges, as the ILI denominator is also subject to biases (e.g., shifts in healthcare-seeking behavior, co-circulation of other pathogens causing similar symptoms). We will thus frame our work as forecasting the direct surveillance signal that public health officials monitor daily.

      Although the authors note a correlation between influenza and the weather factors. The authors do not discuss some of the high correlations between weather factors (e.g., solar radiation and UV index). Because of the many weather factors, those plots are hard to parse.

      This is an excellent point. Our preliminary analysis (Supplementary Figure S2) indeed confirms a strong positive correlation between solar radiation and the UV index. Perhaps the reviewer overlooked the contents of the supplementary information document. We have included the figure for their review. Our original discussion did explicitly address this multicollinearity, summarized as follows: We acknowledge the high correlation between certain meteorological variables. We then explain that our two-stage modeling approach is designed to mitigate this issue. In the first stage, the DLNM models assess the impact of each variable individually, thus isolating their non-linear and lagged effects without being confounded by interactions. In the second stage, the LSTM network, by its nature, is a powerful non-linear function approximator that is robust to multicollinearity and can learn the complex, interactive relationships between all input features, including correlated ones.

      Figure S2. Scatterplot matrix illustrating correlations between Influenza cases and meteorological factors. This comprehensive scatterplot matrix visualizes the relationships between influenza-like illness (ILI) cases, influenza A and B cases, and multiple meteorological variables, including average temperature, humidity, precipitation, wind speed, wind direction, solar radiation, and ultraviolet (UV) index. The figure is composed of three distinct sections that collectively provide an in-depth analysis of these relationships:

      (1) Upper-right triangle: This section presents a Pearson correlation coefficient matrix, with color intensity reflecting the strength of correlations between the variables. Red cells represent positive correlations, while green cells represent negative correlations. The closer the coefficient is to 1 or -1, the darker the cell and the stronger the correlation, with statistically significant correlations marked by asterisks. This matrix allows for a rapid identification of notable relationships between influenza cases and meteorological factors.

      (2) Lower-left triangle: This section contains scatterplots of pairwise comparisons between variables. These scatterplots facilitate the visual identification of potential linear or non-linear relationships, as well as any outliers or anomalies. This visualization is essential for evaluating the nature of interactions between meteorological factors and influenza cases.

      (3) Diagonal: The diagonal displays the density distribution curves for each individual variable. These curves provide an overview of the distribution characteristics of each variable, revealing central tendencies, variance, and any skewness present in the data.

      The authors do not actually compare the results of both methods and what the LSTM adds.

      We thank the reviewer for this comment and realize we may not have signposted the comparison clearly enough. Our manuscript does present a direct comparison between the LSTM and ARIMA models in the Results section (lines 737-745) and Table 2, where performance metrics (MAE, RMSE, MAPE, SMAPE) for both models on the 2023 validation set are detailed, showing LSTM’s superior performance, particularly for Influenza A. Furthermore, Figure 6 (panels A and B) visualizes the LSTM’s predictions against observed values, and Supplementary Figure S3 does the same for the ARIMA model, allowing for a visual comparison of their fit.

      To address the reviewer’s concern, in the revised manuscript, we will:

      (1) Add a more explicit comparative statement in the Results section, directly contrasting the key metrics and highlighting the LSTM’s advantages in capturing peak activities.

      (2) Consider combining the visualizations from Figure 6 and Supplementary Figure S3 into a single, more powerful comparative figure that shows the observed data, the LSTM predictions, and the ARIMA predictions on the same plot.

      Meandering methods; reliability of “Our Word in Data”; Figure 2A is hard to parse.

      We will address these points comprehensively.

      (3) Methods: We will significantly streamline and restructure the Methods section. We also wish to provide context that the manuscript’s current structure reflects an effort to incorporate feedback from multiple rounds of peer review across different journals, which may have led to some repetition. We will perform a thorough edit to improve its conciseness and logical flow.

      (4) Data Reliability: The reviewer raises a crucial and highly insightful question regarding the validity of using a national-level index to represent local public health interventions. This is a critical aspect of our model’s construction, and we are grateful for the opportunity to provide a more thorough justification.

      We acknowledge that the ideal variable would be a daily, quantitative, city-level index of non-pharmaceutical interventions (NPIs). However, the practical reality of the data landscape in China is that such granular, publicly accessible databases for subnational regions do not exist. Given this constraint, our choice of the Our World in Data (OWID) national stringency index was the result of a careful consideration process, and we believe it serves as the best available proxy for our study context.

      In the revised manuscript, we will significantly expand the Methods section to articulate our rationale, which is threefold:

      National Policy Coherence: During the COVID-19 pandemic in mainland China, core NPIs, particularly mandatory face-covering policies in shared public spaces, were implemented with a high degree of national uniformity. While local governments had some autonomy, they operated within a centrally defined framework, ensuring a baseline level of policy consistency across the country.

      Local Context Alignment: A key factor supporting the use of this national proxy is the specific epidemiological context of Putian during the study period. For the vast majority of the pandemic, Putian was classified as a low-risk area with only sporadic COVID-19 cases. Consequently, the city’s public health measures consistently aligned with the standard national guidelines. It did not experience prolonged or exceptionally strict local lockdowns that would cause a significant deviation from the national-level policy trends captured by the OWID index.

      Validation by Local Public Health Experts: Most critically, and to directly address your suggestion, our co-authors from the Putian Center for Disease Control and Prevention have meticulously reviewed the OWID stringency index against their on-the-ground, institutional knowledge of the mandates that were in effect. They have confirmed that the categorical levels (0-4) and the temporal trends of the OWID index provide a faithful representation of the public health restrictions concerning face coverings as experienced by the population of Putian.

      Therefore, we will revise our manuscript to make it clear that the use of the OWID index was not a choice of convenience, but a necessary and well-vetted decision. Given the unavailability of official local data, the OWID index, cross-validated by our local experts, represents the most rigorous and appropriate variable available to account for the profound impact of NPIs on influenza transmission in our model.

      (5) Figure 2A: We agree completely and will replace the heatmap with a multi-line plot or a stacked area chart to better visualize the temporal dynamics of influenza subtypes.

      We have preliminarily completed the redrawing of Figure 3A. The new and old versions are presented for your review to determine which figure is more suitable for this manuscript in terms of scientific accuracy and visual impact.

      Reviewer #2 (Public review):

      Weakness (1):

      The rationale of the study is not clearly stated.

      We appreciate the reviewer’s critique and acknowledge that the unique contribution of our study needs to be articulated more forcefully. Our introduction (lines 105-140) attempted to outline the limitations of existing studies, but we will revise it to be much sharper. The revised introduction will state unequivocally that our study’s rationale is to address a confluence of specific, unresolved gaps in the literature: 1) The persistent challenge of forecasting influenza in subtropical regions with their erratic seasonality; 2) The lack of studies that build subtype-specific models for Influenza A and B, which we show have distinct meteorological drivers; 3) The methodological gap in integrating the explanatory power of DLNM with the predictive power of a rigorously, Bayesian-optimized LSTM network; and 4) The unique opportunity to develop and test a model on data that encompasses the unprecedented disruption of the COVID-19 pandemic, a critical test of model robustness.

      Weakness (2):

      Several issues with methodological and data integration should be clarified.

      We interpret this as a general statement, with the specific issues detailed in the reviewer’s subsequent points and the “Recommendations for the authors” section. We will meticulously address each of these specific points in our revision. For instance, as a demonstration of our commitment to clarification, we will provide a much more detailed justification for our choice of benchmark model (ARIMA), as detailed in our response to Recommendation #11.

      Reviewer #2 (Recommendation  for the authors):

      The authors should justify why the baseline model selection was made by comparing the LSTM model only with ARIMA? How the outcomes could be sensitive to other commonly used machine learning methods, such as Random Forest or XGBoost, etc, as a benchmark for their performance.

      The reviewer raises a highly pertinent question regarding the selection of our benchmark model. A robust comparison is indeed essential for contextualizing the performance of our proposed LSTM network. Our choice to benchmark against the ARIMA model was a deliberate and principled decision, grounded in the specific literature of influenza forecasting at the intersection of climatology and epidemiology.

      In the revised manuscript, we will expand our justification within the Methods section and reinforce it in the Discussion. Our rationale is as follows:

      (1) ARIMA as the Established Standard: As we briefly noted in our original introduction (lines 110-113), the ARIMA model is arguably the most widely established and frequently cited statistical method for time-series forecasting of influenza incidence, including studies investigating meteorological drivers. It serves as the conventional benchmark against which novel methods in this specific domain are often evaluated. Therefore, demonstrating superiority over ARIMA is the most direct and scientifically relevant way to validate the incremental value of our deep learning approach.

      (2) A Focused Scientific Hypothesis: Our primary hypothesis was that the LSTM network, with its inherent ability to capture complex non-linearities and long-term dependencies, could overcome the documented limitations of linear autoregressive models like ARIMA in the context of climate-influenza dynamics. Our study was designed specifically to test this hypothesis.

      (3) Avoiding a “Bake-off” without a Clear Rationale: While other machine learning models like Random Forest or XGBoost are powerful, they are not established as the standard baseline in this particular niche of literature. Including them would shift the focus from a targeted comparison against the conventional standard to a broader, less focused “bake-off” of various algorithms. Such an exercise, while potentially interesting, would risk diluting the core message of our paper and would be undertaken without a clear, literature-driven hypothesis for why one of these specific tree-based models should be the next logical benchmark.

      Therefore, we will argue in the revised manuscript that our focused comparison with ARIMA provides the clearest and most meaningful assessment of our model’s contribution to the existing body of work on climate-informed influenza forecasting. We will, however, explicitly acknowledge in the Discussion that future work could indeed benefit from a broader comparative analysis as the field continues to evolve and adopt a wider array of machine learning techniques.

      Similarly, for some of the reviewer’s recommendations that do not require significant time and effort to implement, such as recommendation 7, we have also redrawn Figure 3 based on your feedback. It is provided for your review.

      Figure 3 presents the time series of the cases. I wonder whether the data for these factors and outcomes are daily or aggregated by week/month? I suggest representing it in 9x1 format with a single x-axis to compare, instead of 3x3 format. Authors can refer similar plot in https://doi.org/ 10.1371/journal.pcbi.1012311 in Figure 1.

      We are deeply grateful for the reviewer’s valuable suggestion and thoughtful provision of reference illustrations. Based on their input, we have redrawn Figure 3 and have included it for their review.

      Weakness (3):

      Validation of the models is not presented clearly.

      We were concerned by this comment and conducted a thorough self-assessment of our manuscript. We believe we have performed a multi-faceted validation, but we have evidently failed to present it with sufficient clarity and structure. Our validation strategy, detailed across the Methods and Results sections, includes:

      • Internal Out-of-Time Validation: Using 2023 data as a hold-out set to test the model trained on 2018-2022 data (lines 695-696, 705-710; Figure 6A, B).

      • External Validation: Testing the trained model on an independent dataset from a different city, Sanming (lines 730-736; Figure 6I, J).

      • Benchmark Model Comparison: Quantitatively comparing the LSTM’s performance against the standard ARIMA model using multiple error metrics (lines 737-745; Table 2).

      • Interpretability Validation (Sanity Check): Using SHAP analysis to ensure the model’s predictions are driven by epidemiologically plausible factors (lines 746-755; Figure 6E-H).

      To address the reviewer’s valid critique of our presentation, we will significantly restructure the relevant parts of the Results section. We will create explicit subheadings such as “Internal Validation,” “External Validation,” and “Comparative Performance against ARIMA Benchmark” to make our comprehensive validation process unambiguous and easy to follow.

      Weakness (4):

      The claim for providing tools for 'early warning' was not validated by analysis and results.

      We agree with this assessment entirely. This aligns with the eLife Assessment and comments from Reviewer #1. Our primary revision will be to systematically recalibrate the manuscript's language. We will replace all instances of “early warning tool” with more accurate and modest phrasing, such as “high-performance forecasting framework” or “a foundational model for future warning systems.” We will ensure that our revised title, abstract, and conclusions precisely reflect what our study has delivered: a robust predictive model, not a field-ready public health intervention tool.

    1. eLife Assessment

      This landmark manuscript comprehensively examines the roles of nine structural proteins in herpes simplex virus 1 (HSV-1) assembly and nuclear egress. By integrating cryo-light microscopy and soft X-ray tomography, the study presents an innovative approach to investigating viral assembly within cells. The research is thoroughly executed, yielding exceptional data that explain previously unknown functions expected to bear widespread influence. This work is of broad interest to virologists, cellular biologists, and structural biologists, offering a robust, contextually rich methodology for studying large protein complex assembly within the cellular environment, serving as an excellent starting point for high-resolution techniques.

    2. Reviewer #1 (Public review):

      Summary:

      Nahas et al. investigated the roles of herpes simplex virus 1 (HSV-1) structural proteins using correlative cryo-light microscopy and soft X-ray tomography. The authors generated nine viral variants with deletions or mutations in genes encoding structural proteins. They employed a chemical fixation-free approach to study native-like events during viral assembly, enabling observation of a wider field of view compared to cryo-ET. The study effectively combined virology, cell biology, and structural biology to investigate the roles of viral proteins in virus assembly and budding.

      Strengths:

      (1) The study presented a novel approach to studying viral assembly in cellulo.

      (2) The authors generated nine mutant viruses to investigate the roles of essential proteins in nuclear egress and cytoplasmic envelopment.

      (3) The use of correlative imaging with cryoSIM and cryoSXT allowed for the study of viral assembly in a near-native state and in 3D.

      (4) The study identified the roles of VP16, pUL16, pUL21, pUL34, and pUS3 in nuclear egress.

      (5) The authors demonstrated that deletion of VP16, pUL11, gE, pUL51, or gK inhibits cytoplasmic envelopment.

      (6) The manuscript is well-written, clearly describing findings, methods, and experimental design.

      (7) The figures and data presentation are of good quality.

      (8) The study effectively correlated light microscopy and X-ray tomography to follow virus assembly, providing a valuable approach for studying other viruses and cellular events.

      (9) The research is a valuable starting point for investigating viral assembly using more sophisticated methods like cryo-ET with FIB-milling.

      (10) The study proposes a detailed assembly mechanism and tracks the contributions of studied proteins to the assembly process.

      (11) The study includes all necessary controls and tests for the influence of fluorescent proteins.

      Weaknesses:

      Overall, the manuscript does not have any major weaknesses, just a few minor comments, which were mostly solved in the revised version of the manuscript.

      Comments on the latest version:

      I reviewed the responses and the updated manuscript, and I am very pleased with how the authors have revised it. The manuscript was already strong, but with the addition of the summary table and the separated images, it is now excellent.

    3. Reviewer #2 (Public review):

      Summary:

      For centuries, humans have been developing methods to see ever smaller objects, such as cells and their contents. This has included studies of viruses and their interactions with host cells during processes extending from virion structure to the complex interactions between viruses and their host cells: virion entry, virus replication and virion assembly, and release of newly constructed virions. Recent developments have enabled simultaneous application of fluorescence-based detection and intracellular localization of molecules of interest in the context of sub-micron resolution imaging of cellular structures by electron microscopy.

      The submission by Nahas et al., extends the state-of-the-art for visualization of important aspects of herpesvirus (HSV-1 in this instance) virion morphogenesis, a complex process that involves virus genome replication, and capsid assembly and filling in the nucleus, transport of the nascent nucleocapsid and some associated tegument proteins through the inner and outer nuclear membranes to the cytoplasm, orderly association of several thousand mostly viral proteins with the capsid to form the virion's tegument, envelopment of the tegumented capsid at a virus-tweaked secretory vesicle or at the plasma membrane, and release of mature virions at the plasma membrane.

      In this groundbreaking study, cells infected with HSV-1 mutants that express fluorescently tagged versions of capsid (eYFP-VP26) and tegument (gM-mCherry) proteins were visualized with 3D correlative structured illumination microscopy and X-ray tomography. The maturation and egress pathways thus illuminated were studied further in infections with fluorescently tagged viruses lacking one of nine viral proteins.

      Strengths:

      This outstanding paper meets the journal's definitions of Landmark, Fundamental, Important, Valuable, and Useful. The work is also Exceptional, Compelling, Convincing, and Solid. The work is a tour de force of classical and state-of-the-art molecular and cellular virology. Beautiful images accompanied by appropriate statistical analyses and excellent figures. The numerous complex issues addressed are explained in a clear and coordinated manner; the sum of what was learned is greater than the sum of the parts. Impacts go well beyond cytomegalovirus and the rest of the herpesviruses, to other viruses and cell biology in general.

      Comments on the latest version:

      This is a very nice paper. The authors responded affirmatively to the suggestions and questions of the reviewers.

    4. Reviewer #3 (Public review):

      Summary:

      Kamal L. Nahas et al. demonstrated that pUL16, pUL21, pUL34, VP16, and pUS3 are involved in the egress of the capsids from the nucleous, since mutant viruses ΔpUL16, ΔpUL21, ΔUL34, ΔVP16, and ΔUS3 HSV-1 show nuclear egress attenuation determined by measuring the nuclear:cytoplasmic ratio of the capsids, the dfParental, or the mutants. Then, they showed that gM-mCherry+ endomembrane association and capsid clustering were different in pUL11, pUL51, gE, gK, and VP16 mutants. Furthermore, the 3D view of cytoplasmic budding events suggests an envelopment mechanism where capsid budding into spherical/ellipsoidal vesicles drives the envelopment.

      Strengths:

      The authors employed both structured illumination microscopy and cellular ultrastructure analysis to examine the same infected cells, using cryo-soft-X-ray tomography to capture images. This combination, set here for the first time, enabled the authors to obtain holistic data regarding a biological process, as a viral assembly. Using this approach, the researchers studied various stages of HSV-1 assembly. For this, they constructed a dual-fluorescently labelled recombinant virus, consisting of eYFP-tagged capsids and mCherry-tagged envelopes, allowing for the independent identification of both unenveloped and enveloped particles. They then constructed nine mutants, each targeting a single viral protein known to be involved in nuclear egress and envelopment in the cytoplasm, using this dual-fluorescent as the parental one. The experimental setting, both the microscopic and the virological, is robust and well-controlled. The manuscript is well-written, and the data generated is robust and consistent with previous observations made in the field.

      I congratulate the authors. The work is robust, and I personally highlight the way they managed to include others' results merged among their own, providing a complete view of the story.

      Comments on the latest version:

      I reviewed the responses and the updated manuscript, and I agree with the reviewer's #1 words: "The manuscript was already strong, but with the addition of the summary table and the separated images, it is now excellent."

    5. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Nahas et al. investigated the roles of herpes simplex virus 1 (HSV-1) structural proteins using correlative cryo-light microscopy and soft X-ray tomography. The authors generated nine viral variants with deletions or mutations in genes encoding structural proteins. They employed a chemical fixation-free approach to study native-like events during viral assembly, enabling observation of a wider field of view compared to cryo-ET. The study effectively combined virology, cell biology, and structural biology to investigate the roles of viral proteins in virus assembly and budding.

      Strengths:

      (1) The study presented a novel approach to studying viral assembly in cellulo.

      (2) The authors generated nine mutant viruses to investigate the roles of essential proteins in nuclear egress and cytoplasmic envelopment.

      (3) The use of correlative imaging with cryoSIM and cryoSXT allowed for the study of viral assembly in a near-native state and in 3D.

      (4) The study identified the roles of VP16, pUL16, pUL21, pUL34, and pUS3 in nuclear egress.

      (5) The authors demonstrated that deletion of VP16, pUL11, gE, pUL51, or gK inhibits cytoplasmic envelopment.

      (6) The manuscript is well-written, clearly describing findings, methods, and experimental design.

      (7) The figures and data presentation are of good quality.

      (8) The study effectively correlated light microscopy and X-ray tomography to follow virus assembly, providing a valuable approach for studying other viruses and cellular events.

      (9) The research is a valuable starting point for investigating viral assembly using more sophisticated methods like cryo-ET with FIB-milling.

      (10) The study proposes a detailed assembly mechanism and tracks the contributions of studied proteins to the assembly process.

      (11) The study includes all necessary controls and tests for the influence of fluorescent proteins.

      Weaknesses:

      Overall, the manuscript does not have any major weaknesses, just a few minor comments:

      (1) The gel quality in Figure 1 is inconsistent for different samples, with some bands not well resolved (e.g., for pUL11, GAPDH, or pUL20).

      We thank the reviewer for their suggestion. We tried to resolve the bands several times, but unfortunately this was the best outcome we could achieve.

      (2) The manuscript would benefit from a summary figure or table to concisely present the findings for each protein. It is a large body of manuscript, and a summary figure showing the discovered function would be great.

      We thank the reviewer for their suggestion. We have created a summary table (Table 2).

      (3) Figure 2 lacks clarity on the type of error bars used (range, standard error, or standard deviation). It says, however, range, and just checking if this is what the authors meant.

      We thank the reviewer for double-checking, but it is meant to be range, as reported in the legend. We used range because there are only two data points for each time point, which are insufficient to calculate standard deviation or standard error.

      (4) The manuscript could be improved by including details on how the plasma membrane boundary was estimated from the saturated gM-mCherry signal. An additional supplementary figure with the data showing the saturation used for the boundary definition would be helpful.

      We appreciate the suggestion and have included an example of how saturated gM-mCherry signal was used to delineate the cytoplasm in Supp. Fig. 4A.

      (5) Additional information or supplementary figures on the mask used to filter the YFP signal for Figure 4 would be helpful.

      Thanks, we have adapted the text in the results section to clarify: “eYFP-VP26 signal was manually inspected to determine threshold values that filtered out background and included pixels containing individual or clustered puncta that represent capsids.”

      (6) The figure legends could include information about which samples are used for comparison for significance calculations. As the colour of the brackets is different from the compared values (dUL34), it would be great to have this information in the figure legend.

      Thanks, we have adapted Fig. 4B to make the colour of the brackets match the colour used for the ΔUL34 mutant, and we have included labels next to the brackets for clarity. We have applied similar adjustments to Fig. 5D & E and Supp. Fig. 4C.

      (7) In Figure 5B, the association between YFP and mCherry signals is difficult to assess due to the abundance of mCherry signal; single-channel and combined images might improve visualization.

      Thanks, we have provided split and combined channel views in Supp. Fig. 4B to improve visualization.

      (8) In Figure 6D, staining for tubulin could help identify the cytoskeleton structures involved in the observed virus arrays.

      We thank the reviewer for their suggestion, which we think would be interesting future work to build on the current study. Given the competitive nature of access to the cryoSIM and cryoSXT, CLXT, including staining for tubulin was outside the scope of additional experiments we were able to conduct at this time.

      (9) It is unclear in Figure 6D if the microtubule-associated capsids are with the gM envelope or not, as the signal from mCherry is quite weak. It could be made clearer with the split signals to assess the presence of both viral components.

      We have provided split channels to the figure to aid with visualization.

      (10) The representation of voxel intensity in Figure 8 is somewhat confusing. Reversion of the voxel intensity representation to align brighter values with higher absorption, which would simplify interpretation.

      We thank the reviewer for this suggestion. In contrast to fluorescence microscopy where high intensities reflect signal, low intensities represent signal (absorbance of X-rays) in cryoSXT. We respectfully decided not to reverse the values, as we believe that could cause more confusion. We have instead added a black-to-white gradient bar to illustrate that low voxel intensities correspond to dark signal in Fig 8.

      (11) The visualization in panel I of Figure 8 might benefit from a more divergent colormap to better show the variation in X-ray absorbance.

      We thank the reviewer for their suggestion. We experimented with a few different colour schemes but concluded that the current one produced the clearest results and was most accessible for color-blind viewers.

      (12) Figure 9 would be enhanced by images showing the different virus sizes measured for the comparative study, which would help assess the size differences between different assembly stages.

      We thank the reviewer for their suggestion and have included images to accompany the graph.

      Overall, this is an excellent manuscript and an enjoyable read. It would be interesting to see this approach applied to the study of other viruses, providing valuable insights before progressing to high-resolution methods.

      Reviewer #2 (Public review):

      Summary:

      For centuries, humans have been developing methods to see ever smaller objects, such as cells and their contents. This has included studies of viruses and their interactions with host cells during processes extending from virion structure to the complex interactions between viruses and their host cells: virion entry, virus replication and virion assembly, and release of newly constructed virions. Recent developments have enabled simultaneous application of fluorescence-based detection and intracellular localization of molecules of interest in the context of sub-micron resolution imaging of cellular structures by electron microscopy.

      The submission by Nahas et al., extends the state-of-the-art for visualization of important aspects of herpesvirus (HSV-1 in this instance) virion morphogenesis, a complex process that involves virus genome replication, and capsid assembly and filling in the nucleus, transport of the nascent nucleocapsid and some associated tegument proteins through the inner and outer nuclear membranes to the cytoplasm, orderly association of several thousand mostly viral proteins with the capsid to form the virion's tegument, envelopment of the tegumented capsid at a virus-tweaked secretory vesicle or at the plasma membrane, and release of mature virions at the plasma membrane.

      In this groundbreaking study, cells infected with HSV-1 mutants that express fluorescently tagged versions of capsid (eYFP-VP26) and tegument (gM-mCherry) proteins were visualized with 3D correlative structured illumination microscopy and X-ray tomography. The maturation and egress pathways thus illuminated were studied further in infections with fluorescently tagged viruses lacking one of nine viral proteins.

      Strengths:

      This outstanding paper meets the journal's definitions of Landmark, Fundamental, Important, Valuable, and Useful. The work is also Exceptional, Compelling, Convincing, and Solid. The work is a tour de force of classical and state-of-the-art molecular and cellular virology. Beautiful images accompanied by appropriate statistical analyses and excellent figures. The numerous complex issues addressed are explained in a clear and coordinated manner; the sum of what was learned is greater than the sum of the parts. Impacts go well beyond cytomegalovirus and the rest of the herpesviruses, to other viruses and cell biology in general.

      Reviewer #3 (Public review):

      Summary:

      Kamal L. Nahas et al. demonstrated that pUL16, pUL21, pUL34, VP16, and pUS3 are involved in the egress of the capsids from the nucleous, since mutant viruses ΔpUL16, ΔpUL21, ΔUL34, ΔVP16, and ΔUS3 HSV-1 show nuclear egress attenuation determined by measuring the nuclear:cytoplasmic ratio of the capsids, the dfParental, or the mutants. Then, they showed that gM-mCherry+ endomembrane association and capsid clustering were different in pUL11, pUL51, gE, gK, and VP16 mutants. Furthermore, the 3D view of cytoplasmic budding events suggests an envelopment mechanism where capsid budding into spherical/ellipsoidal vesicles drives the envelopment.

      Strengths:

      The authors employed both structured illumination microscopy and cellular ultrastructure analysis to examine the same infected cells, using cryo-soft-X-ray tomography to capture images. This combination, set here for the first time, enabled the authors to obtain holistic data regarding a biological process, as a viral assembly. Using this approach, the researchers studied various stages of HSV-1 assembly. For this, they constructed a dual-fluorescently labelled recombinant virus, consisting of eYFP-tagged capsids and mCherry-tagged envelopes, allowing for the independent identification of both unenveloped and enveloped particles. They then constructed nine mutants, each targeting a single viral protein known to be involved in nuclear egress and envelopment in the cytoplasm, using this dual-fluorescent as the parental one. The experimental setting, both the microscopic and the virological, is robust and well-controlled. The manuscript is well-written, and the data generated is robust and consistent with previous observations made in the field.

      Weaknesses:

      It would be helpful to find out what role the targeted proteins play in nuclear egress or envelopment acquisition in a different orthoherpesvirus, like HSV-2. This would confirm the suitability of the technical approach set and would also act as a way to validate their mechanism at least in one additional herpesvirus beyond HSV-1. So, using the current manuscript as a starting point and for future studies, it would be advisable to focus on the protein functions of other viruses and compare them.

      We appreciate the suggestion and agree that this would be a great starting point for future studies. At present, we do not have a panel of mutant viruses in HSV-2 or another orthoherpesvirus, and it would be significant work to generate them, so we consider this outside the scope of the current study.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      (1) There are enough uncommon abbreviations in the text to justify the inclusion of an abbreviation list.

      We thank the reviewer for the suggestion, but we define all uncommon abbreviations at first mention and an abbreviations list is not part of eLife’s house style.

      (2) The complex paragraph on p. 7 would be much easier to digest if broken into smaller chunks. Consider similar treatment for other lengthy landmark-free blocks of text, e.g., the one that begins on p. 14. Subheadings would help.

      We thank the reviewer for this suggestion. We have divided large paragraphs into more easily digestible chunks throughout the manuscript, for example in the discussion where the previous monolithic 3rd paragraph has been divided into five shorter, focussed paragraphs.

      (3) Table 1 needs units.

      We thank the reviewer for noticing our omission and apologise for the oversight - the table has been updated accordingly.

      Reviewer #3 (Recommendations for the authors):

      (1) Toward the end of the manuscript, I missed some lines attempting to speculate on the origin/nature of the spherical/ellipsoidal vesicles providing the envelopment. Would it be possible to incorporate this in the Discussion section?

      Thank you for noticing that omission. We have now included a few lines speculating that they may represent recycling endosomes, trans-Golgi network vesicles, or a hybrid compartment.

      (2) I congratulate the authors. The work is robust, and I personally highlight the way they managed to include others' results merged with their own, providing a complete view of the story.

      We thank the reviewer for their kind words.

      Note to editors

      In addition to these responses to the reviewer’s comments, we have also now included in the methods section details of the Tracking of Indels by Decomposition (TIDE) analysis we performed (data in Supplementary Figure 3) that was omitted by mistake from the original submission.

    1. eLife Assessment

      The ratio of nuclei to cell volume is a well-controlled parameter in eukaryotic cells. This study now reports important findings that expand our understanding of the regulatory relationship between cell size and number of nuclei. The evidence supporting the conclusions is convincing obtained by applying appropriate and validated methodology in line with current state-of-the-art. The paper will be of broad interest for cell biologists and fungal biotechnologists seeking to understand mechanisms determining cell size and number of nuclei and why this knowledge might also be of importance for the production of enzymes and thus production strains not only of Aspergillus oryzae but also other industrially used fungi.

    2. Reviewer #1 (Public review):

      Filamentous fungi are established work horses in biotechnology with Aspergillus oryzae as a prominent example with a thousand-year of history. Still the cell biology and biochemical properties of the production strains is not well understood. The paper of the Takeshita group describes the change in nuclear numbers and correlate it to different production capacities. They used microfluidic devices to really correlate the production with nuclear numbers. In addition, they used microdissection to understand expression profile changes and found an increase of ribosomes. The analysis of two genes involved in cell volume control in S. pombe did not reveal conclusive answers to explain the phenomenon. It appears that it is a multi-trait phenotype. Finally, they identified SNPs in many industrial strains and tried to correlate them to the capability of increasing their nuclear numbers.

      The methods used in the paper range from high quality cell biology, Raman spectroscopy to atomic force and electron microscopy and from laser microdissection to the use of microfluidic devices to study individual hyphae.

      This is a very interesting, biotechnologically relevant paper with the application of excellent cell biology.

      Comments on revised version:

      The authors addressed all suggestions satisfactorily.

    3. Reviewer #2 (Public review):

      Summary:

      In the study presented by Itani and colleagues it is shown that some strains of Aspergillus oryzae - especially those used industrially for the production of sake and soy sauce - develop hyphae with a significantly increased number of nuclei and cell volume over time. These thick hyphae are formed by branching from normal hyphae and grow faster and therefore dominate the colonies. The number of nuclei positively correlates with the thicker hyphae and also the amount of secreted enzymes. The addition of nutrients such as yeast extract or certain amino acids enhanced this effect. Genome and transcriptome analyses identified genes, including rseA, that are associated with the increased number of nuclei and enzyme production. The authors conclude from their data involvement of glycosyltransferases, calcium channels and the tor regulatory cascade in regulation of cell volume and number of nuclei. Thicker hyphae and an increased number of nuclei was also observed in high-production strains of other industrially used fungi such as Trichoderma reesei and Penicillium chrysogenum, leading to the hypothesis that the mentioned phenotypes are characteristic of production strains which is of significant interest for fungal biotechnology.

      Strengths:

      The study is very comprehensive and involves application of divers state-of-the-art cell biological, biochemical and genetic methods. Overall, the data are properly controlled and analyzed, and the figures and movies are of excellent quality.The results are particularly interesting with regard to the elucidation of molecular mechanisms that regulate the size of fungal hyphae and the number of nuclei. For this, the authors have discovered a very good model: (regular) strains with a low number of nuclei and strains with high number of nuclei. Also, the results can be expected to be of interest for the further optimization of industrially relevant filamentous fungi.

      In the revision the authors addressed all my comments and as a result produced an even stronger study.

    4. Reviewer #3 (Public review):

      Summary:

      The authors seek to determine the underlying traits that support the exceptional capacity of Aspergillus oryzae to secrete enzymes and heterologous proteins. To do so, they leverage the availability of multiple domesticated isolates of A. oryzae along with other Aspergillus species to perform comparative imaging and genomic analysis.

      Strengths:

      The strength of this study lies in the use of multifaceted approaches to identify significant differences in hyphal morphology that correlate with enzyme secretion, which is then followed by the use of genomics to identify candidate functions that underlie these differences.

      Weaknesses:

      The authors addressed all suggestions satisfactorily.

    5. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #1 (Recommendations for the authors): 

      The authors addressed all suggestions satisfactorily. 

      Reviewer #2 (Recommendations for the authors):

      The authors have adequately dealt with the comments. 

      Reviewer #3 (Recommendations for the authors):

      (1) Line 157. Although the authors have added a statement acknowledging that addition of YE increased hyphal width and secretion in A. nidulans without increasing nuclear number, they have not indicated how this result might impact their model. It might just boil down to variation between the different Aspergilli, but it merits attention. 

      (2) Line 341. To extend the argument, you might consider adding this citation (https://elifesciences.org/articles/76075), which provides evidence that nuclear size might scale with osmotic pressure based on the density of macromolecules in the nucleus vs. cytoplasm.

      Thanks for the suggestion.

      L341 This is likely related to the phenomenon in which a decrease in cell size is accompanied by a reduction in nuclear size (66).

      (3) Line 343. Neurospora crass hyphal cells can exceed 100 nuclei... 

      Changed.

    1. eLife Assessment

      This study presents a valuable finding regarding the role of Arp2/3 and the actin nucleators N-WASP and WAVE complexes in myoblast fusion. The data presented is convincing, and the work will be of interest to biologists studying skeletal muscle stem cell biology in the context of skeletal muscle regeneration.

    2. Reviewer #1 (Public review):

      Overall, the manuscript reveals the role for actin polymerization to drive fusion of myoblasts during adult muscle regeneration. This pathway regulates fusion in many contexts, but whether it was conserved in adult muscle regeneration remained unknown. Robust genetic tools and histological analyses were used to convincingly support the claims.

    3. Reviewer #2 (Public review):

      To fuse, differentiated muscle cells must rearrange their cytoskeleton and assemble actin-enriched cytoskeletal structures. These actin foci are proposed to generate mechanical forces necessary to drive close membrane apposition and the fusion pore formation. While the study of these actin-rich structures has been conducted mainly in drosophila and in vertebrate embryonic development, the present manuscript present clear evidence this mechanism is necessary for fusion of adult muscle stem cells in vivo, in mice. The data presented here clearly demonstrate that ARP2/3 and SCAR/WAVE complexes are required for differentiating satellite cells fusion into multinucleated myotubes, during skeletal muscle regeneration.

    4. Reviewer #3 (Public review):

      The authors have satisfactorily addressed my inquiries. However, I had to look quite hard to find where they responded to my final comment regarding the potential role of Arpc2 post-fusion during myofiber growth and/or maintenance, which I eventually located on page 7. I would appreciate it if the authors could state this point more explicitly, perhaps by adding a sentence such as "However, we cannot rule out the possibility that Arpc2 may also play a role in....." to improve clarity of communication.

      While I understood from the original version that this issue falls beyond the immediate scope of the study, I believe it is important to adopt a more cautious and rigorous interpretative framework, especially given the widespread use of this experimental approach. In particular, when a gene could potentially have additional roles in myofibers, it may be helpful to explicitly acknowledge that possibility. Even if Arpc2 may not necessarily be one of them, such roles cannot be fully excluded without direct testing.

    1. eLife Assessment

      This computational modeling study builds on multiple previous lines of experimental and theoretical research to investigate how a single neuron can solve a nonlinear pattern classification task. The revised manuscript presents convincing evidence that the location of synapses on dendritic branches, as well as synaptic plasticity of excitatory and inhibitory synapses, influences the ability of a neuron to discriminate combinations of sensory stimuli. The ideas in this work are very interesting, presenting an important direction in the computational neuroscience field about how to harness the computational power of "active dendrites" for solving learning tasks.

    2. Reviewer #1 (Public review):

      Summary:

      This computational modeling study builds on multiple previous lines of experimental and theoretical research to investigate how a single neuron can solve a nonlinear pattern classification task. The authors construct a detailed biophysical and morphological model of a single striatal medium spiny neuron, and endow excitatory and inhibitory synapses with dynamic synaptic plasticity mechanisms that are sensitive to (1) the presence or absence of a dopamine reward signal, and (2) spatiotemporal coincidence of synaptic activity in single dendritic branches. The latter coincidence is detected by voltage-dependent NMDA-type glutamate receptors, which can generate a type of dendritic spike referred to as a "plateau potential." In the absence of inhibitory plasticity, the proposed mechanisms result in good performance on a nonlinear classification task when specific input features are segregated and clustered onto individual branches, but reduced performance when input features are randomly distributed across branches. Interestingly, adding inhibitory plasticity improves classification performance even when input features are randomly distributed.

      Strengths:

      The integrative aspect of this study is its major strength. It is challenging to relate low-level details such as electrical spine compartmentalization, extrasynaptic neurotransmitter concentrations, dendritic nonlinearities, spatial clustering of correlated inputs, and plasticity of excitatory and inhibitory synapses to high-level computations such as nonlinear feature classification. Due to high simulation costs, it is rare to see highly biophysical and morphological models used for learning studies that require repeated stimulus presentations over the course of a training procedure. The study aspires to prove the principle that experimentally-supported biological mechanisms can explain complex learning.

      Weaknesses:

      The high level of complexity of each component of the model makes it difficult to gain an intuition for which aspects of the model are essential for its performance, or responsible for its poor performance under certain conditions. Stripping down some of the biophysical detail and comparing it to a simpler model may help better understand each component in isolation.

    3. Reviewer #2 (Public review):

      Summary:

      The study explores how single striatal projection neurons (SPNs) utilize dendritic nonlinearities to solve complex integration tasks. It introduces a calcium-based synaptic learning rule that incorporates local calcium dynamics and dopaminergic signals, along with metaplasticity to ensure stability for synaptic weights. Results show SPNs can solve the nonlinear feature binding problem and enhance computational efficiency through inhibitory plasticity in dendrites, emphasizing the significant computational potential of individual neurons. In summary, the study provides a more biologically plausible solution to single-neuron learning and gives further mechanical insights into complex computations at the single-neuron level.

      Strengths:

      The paper introduces a novel learning rule for training a single multicompartmental neuron model to perform nonlinear feature binding tasks (NFBP), highlighting two main strengths: the learning rule is local, calcium-based, and requires only sparse reward signals, making it highly biologically plausible, and it applies to detailed neuron models that effectively preserve dendritic nonlinearities, contrasting with many previous studies that use simplified models.

    4. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #1 (Public review):

      Summary:

      This computational modeling study builds on multiple previous lines of experimental and theoretical research to investigate how a single neuron can solve a nonlinear pattern classification task. The authors construct a detailed biophysical and morphological model of a single striatal medium spiny neuron, and endow excitatory and inhibitory synapses with dynamic synaptic plasticity mechanisms that are sensitive to (1) the presence or absence of a dopamine reward signal, and (2) spatiotemporal coincidence of synaptic activity in single dendritic branches. The latter coincidence is detected by voltage-dependent NMDA-type glutamate receptors, which can generate a type of dendritic spike referred to as a "plateau potential." In the absence of inhibitory plasticity, the proposed mechanisms result in good performance on a nonlinear classification task when specific input features are segregated and clustered onto individual branches, but reduced performance when input features are randomly distributed across branches. Interestingly, adding inhibitory plasticity improves classification performance even when input features are randomly distributed.

      Strengths:

      The integrative aspect of this study is its major strength. It is challenging to relate low-level details such as electrical spine compartmentalization, extrasynaptic neurotransmitter concentrations, dendritic nonlinearities, spatial clustering of correlated inputs, and plasticity of excitatory and inhibitory synapses to high-level computations such as nonlinear feature classification. Due to high simulation costs, it is rare to see highly biophysical and morphological models used for learning studies that require repeated stimulus presentations over the course of a training procedure. The study aspires to prove the principle that experimentally-supported biological mechanisms can explain complex learning.

      Weaknesses:

      The high level of complexity of each component of the model makes it difficult to gain an intuition for which aspects of the model are essential for its performance, or responsible for its poor performance under certain conditions. Stripping down some of the biophysical detail and comparing it to a simpler model may help better understand each component in isolation.

      We greatly appreciate your recognition of the study’s integrative scope and the challenges of linking detailed biophysics to high-level computation. We acknowledge that the model’s complexity can obscure the contribution of individual components. However, as stated in the introduction the principles already have been shown in simplified theoretical models for instance  in Tran-Van-Minh et al. 2015. Our aim here was to extend those ideas into a more biologically detailed setting to test whether the same principles still hold under realistic constraints. While simplification can aid intuition, we believe that demonstrating these effects in a biophysically grounded model strengthens the overall conclusion. We agree that further comparisons with reduced models would be valuable for isolating the contribution of specific components and plan to explore that in future work.  

      Reviewer #2 (Public review):

      Summary:

      The study explores how single striatal projection neurons (SPNs) utilize dendritic nonlinearities to solve complex integration tasks. It introduces a calcium-based synaptic learning rule that incorporates local calcium dynamics and dopaminergic signals, along with metaplasticity to ensure stability for synaptic weights. Results show SPNs can solve the nonlinear feature binding problem and enhance computational efficiency through inhibitory plasticity in dendrites, emphasizing the significant computational potential of individual neurons. In summary, the study provides a more biologically plausible solution to single-neuron learning and gives further mechanical insights into complex computations at the single-neuron level.

      Strengths:

      The paper introduces a novel learning rule for training a single multicompartmental neuron model to perform nonlinear feature binding tasks (NFBP), highlighting two main strengths: the learning rule is local, calcium-based, and requires only sparse reward signals, making it highly biologically plausible, and it applies to detailed neuron models that effectively preserve dendritic nonlinearities, contrasting with many previous studies that use simplified models.

      Thank you for highlighting the biological plausibility of our calcium- and dopamine-dependent learning rule and its ability to exploit dendritic nonlinearities. Your positive assessment reinforces our commitment to refining the rule and exploring its implications in larger, more diverse settings.

      Reviewer #1 (Recommendations for the authors):

      Major recommendations:

      P9: When introducing the excitatory learning rule, the reader is referred to the Methods. I suggest moving Figure 7A-D, "Excitatory plasticity" to be more prominently presented in the main body of the paper where the reader needs to understand it. There are errors in the current Figure 7, and wrong/confusing acronyms. The abbreviations "LTP-K" and "MP-K" are not intuitive. In A, I would spell out "LTP kernel" and "Theta_LTP adaptation".  In B, I would spell out "LTD kernel" and "Theta_LTD adaptation".

      We have clarified the terminology in Figure 7 by replacing “LTP-K” with “LTP kernel” and “MP-K” with “metaplasticity kernel”.  While we kept Figure 7 in the Methods section to maintain the flow of the main text, we agree that an earlier introduction of the learning rule improves clarity. To that end, we added a simplified schematic to Figure 3 in the Results section, which provides readers with an accessible overview of the excitatory plasticity mechanism at the point where it is first introduced.

      In C, for simplicity and clarity, I would only show the initial and updated LTP kernel and Calcium and remove the Theta_LTP adaptation curve, it's too busy and not necessary. Similarly in D, I would show only the initial and updated LTD kernel and Calcium and remove the Theta_LTD adaptation curve. In the current version of the Figure, panel B, right incorrectly labels "Theta_LTD" as "Theta_LTP". Panel D incorrectly labels "LTD kernel" as "LTP/MP-K" in the subheading and "MP/LTP-K" in the graph.

      To avoid confusion and better illustrate the interactions between calcium signals, kernels, and thresholds, we have added a movie showing how these components evolve during learning. The figure panels remain as originally designed, since the LTP kernel governs both potentiation and depression through metaplastic threshold adaptation, while the LTD kernel remains fixed.

      P17: Again, instead of pointing the reader to the Methods, I would move Figure 7E, "Inhibitory plasticity" to the main body of the paper where the reader needs to understand it. For clarity, I would label "C_TL" and "Theta_Inh,low" and "C_TH" as "Theta_Inh,high". The right panel could be better labeled "Inhibitory plasticity kernel". The left panel could be better labeled "Theta_Inh adaptation", with again replacing the acronyms "C_TL" and "C_TH". The same applies to Fig. 5D on P19.

      We have updated the labeling in Figures 5D and 7E for clarity, including replacing "C_TL" and "C_TH" with "Theta_Inh,low" and "Theta_Inh,high". In addition, we added a simplified schematic of the inhibitory plasticity rule to Figure 5 to assist the reader’s understanding when presenting the results. Figure 7E remains in the Methods section to preserve the flow of the main text.

      P12: I would suggest simplifying Fig. 3 panels and acronyms as well. Remove "MP-K" from C and D. Relabel "LTP-K" as "LTP kernel". The same applies to Fig. 5E on P19 and Fig. 3 - supplement 1 on P46 and Fig 6 - supplement 1 on P49.

      We have simplified the labeling across all relevant figures by replacing “MP-K” with “metaplasticity kernel” and “LTP-K” with “LTP kernel.” To maintain clarity, we retained these terms in only one panel as a reference.

      Minor recommendations:

      P4: "Although not discussed much in more theoretical work, our study demonstrates the necessity of metaplasticity for achieving stable and physiologically realistic synaptic weights." This sentence is jarring. BCM and metaplasticity has been discussed in hundreds of theory papers! Cite some. This sentence would more accurately read, "Our study corroborates prior theory work (citations) demonstrating that metaplasticity helps to achieve stable and physiologically realistic synaptic weights."

      We have followed the reviewers suggestion and updated the sentence to: Previous theoretical studies (Bienenstock et al., 1982; Fusi et al., 2005; Clopath et al., 2010; Benna & Fusi, 2016; Zenke & Gerstner, 2017) demonstrate the essential role of metaplasticity in maintaining stability in synaptic weight distributions. (page 2 line 49-51, page 3 line 1)

      P9: Grammar. "The neuron model was during training activated..." should read "During training, the neuron model was activated..."

      Corrected

      P17: Lovett-Barron et al., 2012 is appropriately cited here. Milstein et al., Neuron, 2015 also showed dendritic inhibition regulates plateau potentials in CA1 pyramidal cells in vitro, and Grienberger et al., Nat. Neurosci., 2017 showed it in vivo.

      P19 vs P16 vs P21. Fig. 4B, Fig. 5B, and Fig. 6B choose different strategies to show variance across seeds. Please choose one strategy and apply to all comparable plots.

      We thank the reviewer for these helpful points.

      We have added the suggested citations (Milstein et al., 2015; Grienberger et al., 2017) alongside Lovett-Barron et al., 2012. 

      Variance across seeds is now displayed uniformly (mean is solid line STD is shaded area) in Figures 4B, 5B, and 6B.

      Reviewer #2 (Recommendations for the authors):

      Major Points:

      (1)  Quality of Scientific Writing:

      i. Mathematical and Implementation Details:

      I appreciate the authors' efforts in clarifying the mathematical details and providing pseudocode for the learning rule, significantly improving readability and reproducibility. The reference to existing models via GitHub and ModelDB repositories is acceptable. However, I suggest enhancing the presentation quality of equations within the Methods section-currently, they are low-resolution images. Please consider rewriting these equations using LaTeX or replacing them with high-resolution images to further improve clarity.

      We appreciate the reviewer’s comment regarding clarity and reproducibility. In response, we have rewritten all equations in LaTeX to improve their readability and presentation quality in the Methods section.

      ii. Figure quality.

      I acknowledge the authors' effort to improve figure clarity and consistency throughout the manuscript. However, I notice that the x-axis label "[Ca]_v (μm)" in Fig. 7E still appears compressed and unclear. Additionally, given the complexity and abundance of hyperparameters or artificial settings involved in your experimental design and learning rule (such as kernel parameters, metaplasticity kernels, and unspecific features), the current arrangement of subfigures (particularly Fig. 3C, D and Fig. 5D, E) still poses readability challenges. I recommend reordering subfigures to present primary results (e.g., performance outcomes) prominently upfront, while relegating visualizations of detailed hyperparameter manipulations or feature weight variations to later sections or the discussion, thus enhancing clarity for readers.

      We thank the reviewer for pointing out the readability issue. We have corrected the x-axis label in Figure 7D. We hope this new layout with a simplified rule in Fig 3 and Fig 5   presents the key findings while retaining full mechanistic detail to make it easier to understand the model behavior.  

      iii. Writing clarity.

      The authors have streamlined the "Metaplasticity" section and reduced references to dopamine, which is a positive step. However, the broader issue remains: the manuscript still appears overly detailed and more like a technical report of a novel learning rule, rather than a clearly structured scientific paper. I strongly recommend that the authors further distill the manuscript by clearly focusing on one or two central scientific questions or hypotheses-for instance, emphasizing core insights such as "inhibitory inputs facilitate nonlinear dendritic computations" or "distal dendritic inputs significantly contribute to nonlinear integration." Clarifying and highlighting these primary scientific questions early and consistently throughout the manuscript would substantially enhance readability and impact.

      We appreciate the reviewer’s guidance on improving the manuscript’s clarity and focus.In response, we now highlight two central questions at the end of the Introduction and have retitled the main Results subsections to follow this thread, thereby sharpening the manuscript’s focus while retaining necessary technical detail (page3 line 20-28).We have also removed redundant passages and simplified technical details to improve overall readability .

      Minor:

      (1) The [Ca]NMDA in Figure 2A and 2C can have large values even when very few synapses are activated. Why is that? Is this setting biologically realistic?

      The authors acknowledge that their simulated [Ca²⁺] levels exceed typical biological measurements but claim that the learning rule remains robust across variations in calcium concentrations. However, robustness to calcium variations was not explicitly demonstrated in the main figures. To convincingly address this concern, I recommend the authors explicitly test and present whether adopting biologically realistic calcium concentrations (~1 μM) impacts the learning outcomes or synaptic weight dynamics. Clarifying this point with a supplemental analysis or an additional figure panel would significantly strengthen their argument regarding the model's biological plausibility and robustness.

      We thank the reviewer for the comment. The elevated [Ca<sup>²⁺</sup>]<sub>NMDA</sub> values reflect localized transients in spine heads with narrow necks and high NMDA conductance. These values are not problematic for our model, as the plasticity rule depends on relative calcium differences rather than absolute levels as the metaplasticity kernel will adjust. In future versions of our detailed neuron model, we will likely decrease the spine axial resistance of the spine neck.

    1. eLife Assessment

      This important computational study investigates homeostatic plasticity mechanisms that neurons may employ to achieve and maintain stable target activity patterns. The work extends previous analyses of calcium-dependent homeostatic mechanisms based on ion channel density by considering activity-dependent shifts in channel activation and inactivation properties that operate on faster and potentially variable timescales. The model simulations convincingly demonstrate the potential functional importance of these mechanisms.

    2. Reviewer #1 (Public review):

      This revision of the computational study by Mondal et al addresses several issues that I raised in the previous round of reviews and, as such, is greatly improved. The manuscript is more readable, its findings are more clearly described, and both the introduction and the discussion sections are tighter and more to the point. And thank you for addressing the three timescales of half activation/inactivation parameters. It makes the mechanism clearer.

      Some issues remain that I bring up below.

      Comment:

      I still have a bone to pick with the claim that "activity-dependent changes in channel voltage-dependence alone are insufficient to attain bursting". As I mentioned in my previous comment, this is also the case for the gmax values (channel density). If you choose the gmax's to be in a reasonable range, then the statement above is simply cannot be true. And if, in contrast, you choose the activation/inactivation parameters to be unreasonable, then no set of gmax's can produce proper activity. So I remain baffled what exactly is the point that the authors are trying to make.

    3. Reviewer #2 (Public review):

      Summary:

      In this study, Mondal and co-authors present the development of a computational model of homeostatic plasticity incorporating activity-dependent regulation of gating properties (activation, inactivation) of ion channels. The authors show that, similar to what has been observed for activity-dependent regulation of ion channel conductances, implementing activity-dependent regulation of voltage sensitivity participates in the achievement of a target phenotype (bursting or spiking). The results however suggest that activity-dependent regulation of voltage sensitivity is not sufficient to allow this and needs to be associated with the regulation of ion channel conductances in order to reliably reach target phenotype. Although the implementation of this biologically relevant phenomenon is undeniably relevant, a few important questions are left unanswered.

      Strengths:

      (1) Implementing activity-dependent regulation of gating properties of ion channels is biologically relevant.

      (2) The modeling work appears to be well performed and provides results that are consistent with previous work performed by the same group.

      Weaknesses:

      (1) The main question not addressed in the paper is the relative efficiency and/or participation of voltage-dependence regulation compared to channel conductance in achieving the expected pattern of activity. Is voltage-dependence participating to 50% or 10%. Although this is a difficult question to answer (and it might even be difficult to provide a number), it is important to determine whether channel conductance regulation remains the main parameter allowing the achievement of a precise pattern of activity (or its recovery after perturbation).

      (2) Another related question is whether the speed of recovery is significantly modified by implementing voltage-dependence regulation (it seems to be the case looking at Figure 3). More generally, I believe it would be important to give insights into the overall benefit of implementing voltage-dependence regulation, beyond its rather obvious biological relevance.

      (3) Along the same line, the conclusion about how voltage-dependence regulation and channel conductance regulation interact to provide the neuron with the expected activity pattern (summarized and illustrated in Figure 6) is rather qualitative. Consistent with my previous comments, one would expect some quantitative answers to this question, rather than an illustration that approximately places a solution in parameter space.

    4. Reviewer #3 (Public review):

      Mondal et al. use computational modeling to investigate how activity-dependent shifts in voltage-dependent (in)activation curves can complement changes in ion channel conductance to support homeostatic plasticity. While it is well established that the voltage-dependent properties of ion channels influence neuronal excitability, their potential role in homeostatic regulation, alongside conductance changes, has remained largely unexplored. The results presented here demonstrate that activity-dependent regulation of voltage dependence can interact with conductance plasticity to enable neurons to attain and maintain target activity patterns, in this case, intrinsic bursting. Notably, the timescale of these voltage-dependent shifts influences the final steady-state configuration of the model, shaping both channel parameters and activity features such as burst period and duration. A major conclusion of the study is that altering this timescale can seamlessly modulate a neuron's intrinsic properties, which the authors suggest may be a mechanism for adaptation to perturbations.

      While this conclusion is largely well-supported, additional analyses could help clarify its scope. For instance, the effects of timescale alterations are clearly demonstrated when the model transitions from an initial state that does not meet the target activity pattern to a new stable state. However, Fig. 6 and the accompanying discussion appear to suggest that changing the timescale alone is sufficient to shift neuronal activity more generally. It would be helpful to clarify that this effect primarily applies during periods of adaptation, such as neurodevelopment or in response to perturbations, and not necessarily once the system has reached a stable, steady state. As currently presented, the simulations do not test whether modifying the timescale can influence activity after the model has stabilized. In such conditions, changes in timescale are unlikely to affect network dynamics unless they somehow alter the stability of the solution, which is not shown here. That said, it seems plausible that real neurons experience ongoing small perturbations which, in conjunction with changes in timescale, could allow gradual shifts toward new solutions. This possibility is not discussed but could be a fruitful direction for future work.

      Editor's note: The authors have adequately addressed the concerns raised in the public reviews above, as well as the previous recommendations, and revised the manuscript where necessary.

    5. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #1 (Public review):

      I still have a bone to pick with the claim that "activity-dependent changes in channel voltage-dependence alone are insufficient to attain bursting". As I mentioned in my previous comment, this is also the case for the gmax values (channel density). If you choose the gmax's to be in a reasonable range, then the statement above is simply cannot be true. And if, in contrast, you choose the activation/inactivation parameters to be unreasonable, then no set of gmax's can produce proper activity. So I remain baffled what exactly is the point that the authors are trying to make.

      We thank the reviewer for this clarification. We did not intend to imply that voltage-dependence modulation is universally incapable of supporting bursting or that conductance changes alone are universally sufficient. To avoid any overstatement, we now write:

      “…activity-dependent changes in channel voltage-dependence alone did not assemble bursting from these low-conductance initial states (cf. Figure 1B)”.

      Reviewer #2 (Public review):

      (1) The main question not addressed in the paper is the relative efficiency and/or participation of voltage-dependence regulation compared to channel conductance in achieving the expected pattern of activity. Is voltage-dependence participating to 50% or 10%. Although this is a difficult question to answer (and it might even be difficult to provide a number), it is important to determine whether channel conductance regulation remains the main parameter allowing the achievement of a precise pattern of activity (or its recovery after perturbation).

      We appreciate the reviewer’s interest in a quantitative partitioning of the contributions from voltage-dependence regulation versus conductance regulation. We agree that this would be an important analysis in principle. In practice, obtaining this would be difficult.

      Our goal here was to establish the principle: that half-(in)activation shifts can meaningfully influence recovery. This is not an obvious result, given that these two processes can act on vastly different timescales.

      That said, our current dataset does provide partial quantitative insight. Eight of the twenty models required some form of voltage-dependence modulation to recover; among these, two only recovered under fast modulation and two only under slow modulation. This demonstrates that voltage-dependence regulation is essential for recovery in some neurons, and its timescale critically shapes the outcome.

      (2) Another related question is whether the speed of recovery is significantly modified by implemeting voltage-dependence regulation (it seems to be the case looking at Figure 3). More generally, I believe it would be important to give insights into the overall benefit of implementing voltage-dependence regulation, beyond its rather obvious biological relevance.

      Our current results suggest that voltage-dependence regulation can indeed accelerate recovery, as illustrated in Figure 3 and supported by additional simulations (not shown). However, a fully quantitative comparison (e.g., time-to-recovery distributions or survival analysis) would require a much larger ensemble of degenerate models to achieve sufficient statistical power across all four conditions. Generating and simulating this expanded model set is computationally intensive, requiring stochastic searches in a high-dimensional parameter space, full time-course simulations, and a subsequent selection process that may succeed or fail.

      The principal aim of the present study is conceptual: to demonstrate that this multi-timescale homeostatic model—built here for the first time—can capture interactions between conductance regulation and voltage-dependence modulation during assembly (“neurodevelopment”) and perturbation. Establishing the conceptual framework and exploring its qualitative behavior were the necessary first steps before pursuing a large-scale quantitative study.

      (3) Along the same line, the conclusion about how voltage-dependence regulation and channel conductance regulation interact to provide the neuron with the expected activity pattern (summarized and illustrated in Figure 6) is rather qualitative. Consistent with my previous comments, one would expect some quantitative answers to this question, rather than an illustration that approximately places a solution in parameter space.

      We appreciate the reviewer’s interest in a more quantitative characterization of the interaction between voltage-dependence and conductance regulation (Fig. 6). As noted in our responses to Comments 1 and 2, some of the facets of this interaction—such as the ability to recover from perturbations and the speed of assembly—can be measured.

      However, fully quantifying the landscape sketched in Figure 6 would require systematically mapping the regions of high-dimensional parameter space where stable solutions exist. In our model, this space spans 18 dimensions (maximal conductances and half‑(in)activations). Even a coarse grid with three samples per dimension would entail over 100 million simulations, which is computationally prohibitive and would still collapse to a schematic representation for visualization.

      For this reason, we chose to present Figure 6 as a conceptual summary, illustrating the qualitative organization of solutions and the role of multi-timescale regulation, rather than attempting an exhaustive mapping. We view this figure as a necessary first step toward guiding future, more quantitative analyses.

      Reviewer #3 (Public review):

      Mondal et al. use computational modeling to investigate how activity-dependent shifts in voltage-dependent (in)activation curves can complement changes in ion channel conductance to support homeostatic plasticity. While it is well established that the voltage-dependent properties of ion channels influence neuronal excitability, their potential role in homeostatic regulation, alongside conductance changes, has remained largely unexplored. The results presented here demonstrate that activity-dependent regulation of voltage dependence can interact with conductance plasticity to enable neurons to attain and maintain target activity patterns, in this case, intrinsic bursting. Notably, the timescale of these voltage-dependent shifts influences the final steady-state configuration of the model, shaping both channel parameters and activity features such as burst period and duration. A major conclusion of the study is that altering this timescale can seamlessly modulate a neuron's intrinsic properties, which the authors suggest may be a mechanism for adaptation to perturbations.

      While this conclusion is largely well-supported, additional analyses could help clarify its scope. For instance, the effects of timescale alterations are clearly demonstrated when the model transitions from an initial state that does not meet the target activity pattern to a new stable state. However, Fig. 6 and the accompanying discussion appear to suggest that changing the timescale alone is sufficient to shift neuronal activity more generally. It would be helpful to clarify that this effect primarily applies during periods of adaptation, such as neurodevelopment or in response to perturbations, and not necessarily once the system has reached a stable, steady state. As currently presented, the simulations do not test whether modifying the timescale can influence activity after the model has stabilized. In such conditions, changes in timescale are unlikely to affect network dynamics unless they somehow alter the stability of the solution, which is not shown here. That said, it seems plausible that real neurons experience ongoing small perturbations which, in conjunction with changes in timescale, could allow gradual shifts toward new solutions. This possibility is not discussed but could be a fruitful direction for future work.

      We thank the reviewer for this thoughtful comment and for highlighting an important point about the scope of our conclusions regarding timescale effects. The reviewer is correct that our simulations demonstrate the influence of voltage-dependence timescale primarily during periods of adaptation—when the neuron is moving from an initial, target-mismatched state toward a final target-satisfying state. Once the system has reached a stable solution, simply changing the timescale of voltage-dependent modulation does not by itself shift the neuron’s activity, unless a new perturbation occurs that re-engages the homeostatic mechanism. We have clarified this point in the revised Discussion.

      The confusion likely arose from imprecise phrasing in the original text describing Figure 6. Previously, we wrote:

      “When channel gating properties are altered quickly in response to deviations from the target activity, the resulting electrical patterns are shown in Figure 6 as the orange bubble labeled 𝝉<sub>𝒉𝒂𝒍𝒇</sub> = 6 s”. 

      We have revised this sentence to emphasize that the orange bubble represents the eventual stable state, rather than implying that timescale changes alone drive activity shifts:

      ”When channel gating properties are altered quickly in response to deviations from the target activity, the neuron ultimately settles into a stable activity pattern. The resulting electrical patterns are shown in Figure 6 as the orange bubble labeled 𝝉<sub>𝒉𝒂𝒍𝒇</sub> = 6 s”.

      Reviewer #1 (Recommendations for the authors):

      Unless I am missing something, Figure 2 should be a supplement to Figure 1. I would prefer to see panel B in Figure 1 to indicate that the findings of that figure are general. Panel A really is not showing anything useful to the reader.

      We appreciate the suggestion to combine Figure 2 with Figure 1, but we believe keeping Figure 2 separate better preserves the manuscript’s flow. Figure 1 illustrates the mechanism in a single model, while Figure 2 presents the population-level summary that generalizes the phenomenon across all models.

      Also, I find Figure 6 unnecessary and its description in the Discussion more detracting than useful. Even with the descriptions, I find nothing in the figure itself that clarifies the concept.

      We appreciate the reviewer’s feedback on Figure 6. The purpose of this figure is to conceptually illustrate that multiple degenerate solutions can satisfy the calcium target and that the timescale of voltage‑dependence modulation can influence which region of this solution space is accessed during the acquisition of the activity target. Reviewer 3 noted some confusion about this point. We made a small clarifying edit.

      At the risk of being really picky, I also don't see the purpose of Figure 7. And I find it strange to plot -Vm just because that's the argument of findpeaks.

      We appreciate the reviewer’s comment on Figure 7. The purpose of this figure is to illustrate exactly what the findpeaks function is detecting, as indicated by the red arrows on the traces. For readers unfamiliar with findpeaks, it may not be obvious how the algorithm interprets the waveform. Showing the peaks directly ensures that the measurements used in our analysis align with what one would intuitively expect.

      Reviewer #2 (Recommendations for the authors):

      The writing of the article has been much improved since the last version. It is much clearer, and the discussion has been improved and better addresses the biological foundations and relevance of the study. However, conclusions are rather qualitative, while one would expect some quantitative answers to be provided by the modeling approach.

      We appreciate the reviewer’s concern regarding quantification and share this perspective. As noted above, our study is primarily conceptual. Many aspects of the model, such as calcium handling and channel regulation, are parameterized based on incomplete biological data. These uncertainties make robust quantitative predictions difficult, so we focus on qualitative outcomes that are likely to hold independently of specific parameter choices.

    1. eLife Assessment

      This study presents a valuable investigation into cell-specific microstructural development in the neonatal rat brain using diffusion-weighted magnetic resonance spectroscopy. The evidence supporting the core claims is solid, with innovative in vivo data acquisition and modeling, noting residual caveats with regard to the limitations of diffusion-weighted magnetic resonance spectroscopy for strict validation of cell-type-specific metabolite compartmentation. In addition, the study provides community resources that will benefit researchers in this field. The work will be of interest to researchers studying brain development and biophysical imaging methods.

    2. Reviewer #1 (Public review):

      In this work, Ligneul and coauthors implemented diffusion-weighted MRS in young rats to follow longitudinally and in vivo the microstructural changes occurring during brain development. Diffusion-weighted MRS is here instrumental in assessing microstructure in a cell-specific manner, as opposed to the claimed gold-standard (manganese-enhanced MRI) that can only probe changes in brain volume. Differential microstructure and complexification of the cerebellum and the thalamus during rat brain development were observed non-invasively. In particular, lower metabolite ADC with increasing age were measured in both brain regions, reflecting increasing cellular restriction with brain maturation. Higher sphere (representing cell bodies) fraction for neuronal metabolites (total NAA, glutamate) and total creatine and taurine in the cerebellum compared to the thalamus were estimated, reflecting the unique structure of the cerebellar granular layer with a high density of cell bodies. Decreasing sphere fraction with age was observed in the cerebellum, reflecting the development of the dendritic tree of Purkinje cells and Bergmann glia. From morphometric analyses, the authors could probe non-monotonic branching evolution in the cerebellum, matching 3D representations of Purkinje cells expansion and complexification with age. Finally, the authors highlighted taurine as a potential new marker of cerebellar development.

      From a technical standpoint, this work clearly demonstrates the potential of diffusion-weighted MRS at probing microstructure changes of the developing brain non-invasively, paving the way for its application in pathological cases. Ligneul and coauthors also show that diffusion-weighted MRS acquisitions in neonates are feasible, despite the known technical challenges of such measurements, even in adult rats. They also provide all necessary resources to reproduce and build upon their work, which is highly valuable for the community.

      From a biological standpoint, claims are well supported by the microstructure parameters derived from advanced biophysical modelling of the diffusion MRS data.

      Specific strengths:

      (1) The interpretation of dMRS data in terms of cell-specific microstructure through advanced biophysical modelling (e.g. the sphere fraction, modelling the fraction of cell bodies versus neuronal or astrocytic processes) is a strong asset of the study, going beyond the more commonly used signal representation metrics such as the apparent diffusion coefficient, which lacks specificity to biological phenomena.

      (2) The fairly good data quality despite the complexity of the experimental framework should be praised: diffusion-weighted MRS was acquired in two brain regions (although not in the same animals) and longitudinally, in neonates, including data at high b-values and multiple diffusion times, which altogether constitutes a large-scale dataset of high value for the diffusion-weighted MRS community.

      (3) The authors have shared publicly data and codes used for processing and fitting, which will allow one to reproduce or extend the scope of this work to disease populations, and which goes in line with the current effort of the MR(S) community for data sharing.

      Specific weaknesses:

      Ligneul and coauthors have convincingly addressed and included my comments from the first and second round in their revised manuscript.

      I believe the following conceptual concerns, which are inherent to the nature of the study and do not require further adjustments of the manuscript, remain:

      (1) Metabolite compartmentation in one cell type or the other has often been challenged and is currently impossible to validate in vivo. Here, Ligneul and coauthors did not use this assumption a priori and supported their claims also with non-MR literature (eg. for Taurine), but the interpretation of results in that direction should be made with care.

      (2) Longitudinal MR studies of the developing brain make it difficult to extract parameters with an "absolute" meaning. Indirect assumptions used to derive such parameters may change with age and become confounding factors (brain structure, cell distribution, concentrations normalizing metabolites (here macromolecules), relaxation times...). While findings of the manuscript are convincing and supported with literature, the true underlying nature of such changes might be difficult to access.

      (3) Diffusion MRI in addition to diffusion MRS would have been complementary and beneficial to validate some of the signal contributions, but was unfeasible in the time constraints of experiments on young animals.

    3. Author response:

      The following is the authors’ response to the previous reviews

      We thank the reviewers once again for their careful evaluation of the revised manuscript and for their constructive suggestions. In response to the remaining recommendations, we have made minor amendments to the manuscript. The main changes are as follows:

      • Metabolite Concentrations: we now report them more conventionally, i.e. normalised by water content. The original normalisation by the absolute MM content has been retained in the supplementary information, as MMs are an endogenous tissue probe (i.e., not dependent on cerebrospinal fluid).  The fact that both water and MM normalisation provide similar trends supports the robustness of our conclusions. We have also updated Figure S2 to include the absolute MM concentrations, raw water content, and the MM-to-water ratios for each time point.

      • Taurine Interpretation: We have revised the wording related to the interpretation of taurine findings to clarify that we present a set of converging observations suggesting taurine may serve as a marker of early cerebellar neurodevelopment, rather than asserting it as a definitive conclusion.

      Comments to the editor & reviewers:

      We sincerely thank the reviewers and the editor for their valuable feedback, which has significantly improved the manuscript since its initial submission.

      Please note a correction in Figure S2 (added during the previous revision round): the reported evolution of metabolite/water concentrations has changed due to an earlier error in calculating the water peak integral, which has now been corrected.

      While we recognise that a study and manuscript can always be improved, we prefer not to make further changes at this stage. We cannot conduct new experiments, and redesigning the model falls outside the scope of this work. Additionally, we believe that further altering the manuscript’s structure could lead to unnecessary confusion rather than clarity.

    1. eLife Assessment

      This valuable work explores how synaptic activity encodes information during memory tasks. All reviewers agree that the work is of very high quality and that the methodological approach is praiseworthy. Although the experimental data support the possibility that phospholipase diacylglycerol signaling and synaptotagmin 7 (Syt7) dynamically regulate the vesicle pool required for presynaptic release, a concern remains that the central finding of paired-pulse depression at very short intervals could be due to a mechanism that does not depend on exocytosis, such as Ca²⁺ channel inactivation, rather than vesicle pool depletion. Overall, this is a solid study although the results still warrant consideration of alternative interpretations.

    2. Reviewer #3 (Public review):

      To summarize: The authors' overfilling hypothesis depends crucially on the premise that the very-quickly reverting paired-pulse depression seen after unusually short rest intervals of << 50 ms is caused by depletion of release sites whereas Dobrunz and Stevens (1997) concluded that the cause was some other mechanism that does not involve depletion. The authors now include experiments where switching extracellular Ca2+ from 1.2 to 2.5 mM increases synaptic strength on average, but not by as much as at other synapse types. They contend that the result supports the depletion hypothesis. I didn't agree because the model used to generate the hypothesis had no room for any increase at all, and because a more granular analysis revealed a mixed population with a subset where: (a) synaptic strength increased by as much as at standard synapses; and yet (b) the quickly reverting depression for the subset was the same as the overall population.

      The authors raise the possibility of additional experiments, and I do think this could clarify things if they pre-treat with EGTA as I recommended initially. They've already shown they can do this routinely, and it would allow them to elegantly distinguish between pv and pocc explanations for both the increases in synaptic strength and the decreases in the paired pulse ratio upon switching Ca2+ to 2.5 mM. Plus/minus EGTA pre-treatment trials could be interleaved and done blind with minimal additional effort.

      Showing reversibility would be a great addition too, because, in our experience, this does not always happen in whole-cell recordings in ex-vivo tissue even when electrical properties do not change. If the goal is to show that L2/3 synapses are less sensitive to changes in Ca2+ compared to other synapse types - which is interesting but a bit off point - then I would additionally include a positive control, done by the same person with the same equipment, at one of those other synapse types using the same kind of presynaptic stimulation (i.e. ChRs).

      Specific points (quotations are from the Authors' rebuttal)

      (1) Regarding the Author response image 1, I was instead suggesting a plot of PPR in 1.2 mM Ca2+ versus the relative increase in synaptic strength in 2.5 versus in 1.2 mM. This continues to seem relevant.

      (2) "Could you explain in detail why two-fold increase implies pv < 0.2?"

      a. start with power((2.5/(1 + (2.5/K1) + 1/2.97)),4) = 2*power((1.3/(1 + (1.3/K1) + 1/2.97)),4);

      b. solve for K1 (this turns out to be 0.48);

      c. then implement the premise that pv -> 1.0 when Ca2+ is high by calculating Max = power((C/(1 + (C/K1) + 1/2.97)),4) where C is [Ca] -> infinity.

      d. pv when [Ca] = 1.3. mM must then be power((1.3/(1 + (1.3/K1) + 1/2.97)),4)/Max, which is <0.2.

      Note that modern updates of Dodge and Rahamimoff typically include a parameter that prevents pv from approaching 1.0; this is the gamma parameter in the versions from Neher group.

      (3) "If so, we can not understand why depletion-dependent PPD should lead to PPF."

      When PPD is caused by depletion and pv < 0.2, the number of occupied release sites should not be decreased by more than one-fifth at the second stimulus so, without facilitation, PPR should be > 0.8. The EGTA results then indicate there should be strong facilitation, driving PPR to something like 1.2 with conservative assumptions. And yet, a value of < 0.4 is measured, which is a large miss.

      (4) Despite the authors' suggestion to the contrary, I continue to think there is a substantial chance that Ca2+-channel inactivation is the mechanism underlying the very quickly reverting paired-pulse depression. However, this is only one example of a non-depletion mechanism among many, with the main point being that any non-depletion mechanism would undercut the reasoning for overfilling. And, this is what Dobrunz and Stevens claimed to show; that the mechanism - whatever it is - does not involve depletion. The most effective way to address this would be affirmative experiments showing that the quickly reverting depression is caused by depletion after all. Attempting to prove that Ca2+-channel inactivation does not occur does not seem like a worthwhile strategy because it would not address the many other possibilities.

      (5) True that Kusick et al. observed morphological re-docking, but then vesicles would have to re-prime and Mahfooz et al. (2016) showed that re-priming would have to be slower than 110 ms (at least during heavy use at calyx of Held).

    1. eLife Assessment

      This valuable study introduces a non-perturbative pulse-labeling strategy for yeast nuclear pore complexes (NPCs), employing a nanobody-based approach in order to selectively capture Nup84-containing complexes for imaging and biochemical analysis. The data convincingly demonstrate that a short induction period (20 minutes to 1 hour) yields a strong and sustained signal, enabling affinity purification that faithfully recapitulates the endogenous Nup84 interactome. This tool offers a powerful framework for investigating NPC dynamics and associated interactomes through both imaging and biochemical assays.

    2. Reviewer #1 (Public review):

      Summary:

      The authors present a nanobody-based pulse-labeling system to track yeast NPCs. Transient expression of a nanobody targeting Nup84 (fused to NeonGreen or an affinity tag) permits selective visualization and biochemical capture of NPCs. Short induction effectively labels NPCs, and the resulting purifications match those from conventional Nup84 tagging. Crucially, when induction is repressed, dilution of the labeled pool through successive cell cycles allows the visualization of "old" NPCs (and potentially individual NPCs), providing a powerful view of NPC lifespan and turnover without permanently modifying a core scaffold protein.

      Strengths:

      (1) A brief expression pulse labels NPCs, and subsequent repression allows dilution-based tracking of older (and possibly single) NPCs over multiple cell cycles.

      (2) The affinity-purified complexes closely match known Nup84-associated proteins, indicating specificity and supporting utility for proteomics.

      Weaknesses:

      (1) Reliance on GAL induction introduces metabolic shifts (raffinose → galactose → glucose) that could subtly alter cell physiology or the kinetics of NPC assembly. Alternative induction systems (e.g., β-estradiol-responsive GAL4-ER-VP16) could be discussed as a way to avoid carbon-source changes.

      (2) While proteomics is solid, a comprehensive supplementary table listing all identified proteins (with enrichment and statistics) would enhance transparency.

      (3) Importantly, the authors note that the method is particularly useful "in conditions where direct tagging of Nup84 interferes with its function, while sub-stoichiometric nanobody binding does not." After this sentence, it would be valuable to add concrete examples, such as experiments examining NPC integrity in aging or stress conditions where epitope tags can exacerbate phenotypes. These examples will help readers identify situations in which this approach offers clear advantages.

    3. Reviewer #2 (Public review):

      Summary:

      This preprint describes a practical and useful approach for labeling and tracking NPCs in situ. While useful applications including timelapse imaging, affinity purification, or proximity labeling are envisioned, addressing some outstanding technical questions would give a clearer picture of the sensitivity and temporal resolution of this approach.

      Strengths:

      Clever use of a fluorescently conjugated nanobody that binds directly to the core scaffold nucleoporin Nup84 with nanomolar affinity.

      Weaknesses:

      The decrease in nanobody labeling over 8 hours of chase period is interpreted to indicate that NPCs turn over during this time. However, it is also possible that the nanobody:Nup84 association is disrupted during mitosis by phosphorylation, other PTMs, or structural remodeling.

    4. Reviewer #3 (Public review):

      Summary:

      Submitted to the Tools and Resources series, this study reports on the use of a single-domain antibody targeting the nucleoporin Nup84 to probe and track NPCs in budding yeast. The authors demonstrate their ability to rapidly label or pull down NPCs by inducing the expression of a tagged version of the nanobody (Figure 1).

      Strengths:

      This tool's main strength is its versatility as an inexpensive, easy-to-set-up alternative to metabolic labelling or optical switching. This same rationale could, in principle, be applied to the study of other multiprotein complexes using similar strategies, provided that single-chain antibodies are available.

      Weaknesses:

      This approach has no inherent weaknesses, but it would be useful for the authors to verify that their pulse labelling strategy can also be used to detect assembly intermediates, structural variants, or damaged NPCs.

      Overall, the data clearly show that Nup84 nanobodies are a valuable tool for imaging NPC dynamics and investigating their interactomes through affinity purification.

    1. eLife Assessment

      The authors examined the frequency of alternative splicing across prokaryotes and eukaryotes and found that the rate of alternative splicing varies with taxonomic groups and genome coding content. This solid work, based on nearly 1,500 high-quality genome assemblies, relies on a novel genome-scale metric that enables cross-species comparisons and that quantifies the extent to which coding sequences generate multiple mRNA transcripts via alternative splicing. This timely study provides an important basis for improving our general understanding of genome architecture and the evolution of life forms.

    2. Reviewer #2 (Public review):

      Summary:

      In this contribution, the authors investigate the degree of alternative splicing across the evolutionary tree, and identify a trend of increasing alternative splicing as you move from the base of the tree (here, only prokaryotes are considered) towards the tips of the tree. In particular, the authors investigate how the degree of alternative splicing (roughly speaking, the number of different proteins made from a single ORF (open reading frame) via alternative splicing) relates to three genomic variables: the genome size, the gene content (meaning the fraction of the genome composed of ORFs), and finally, the coding percentage of ORFs, meaning the ratio between exons and total DNA in the ORF.

      The revised manuscript addresses the problems identified in the first round of reviews and now serves as a guide to understand how alternative splicing has evolved within different phyla, as opposed to making unsubstantiated claims about overall trends.

    3. Reviewer #3 (Public review):

      Summary:

      In "Alternative Splicing Across the Tree of Life: A Comparative Study," the authors use rich annotation features from nearly 1,500 high-quality NCBI genome assemblies to develop a novel genome-scale metric, the Alternative Splicing Ratio, that quantifies the extent to which coding sequences generate multiple mRNA transcripts via alternative splicing (AS). This standardized metric enables cross-species comparisons and reveals clear phylogenetic patterns: minimal AS in prokaryotes and unicellular eukaryotes, moderate AS in plants, and high AS in mammals and birds. The study finds a strong negative correlation between AS and coding content, with genomes containing approximately 50% intergenic DNA exhibiting the highest AS activity. By integrating diverse lines of prior evidence, the study offers a cohesive evolutionary framework for understanding how alternative splicing varies and evolves across the tree of life.

      Strengths:

      By studying alternative splicing patterns across the tree of life, the authors systematically address an important yet historically understudied driver of functional diversity, complexity, and evolutionary innovation. This manuscript makes a valuable contribution by leveraging standardized, publicly available genome annotations to perform a global survey of transcriptional diversity, revealing lineage-specific patterns and evolutionary correlates. The authors have done an admirable job in this revised version, thoroughly addressing prior reviewer comments. The updated manuscript includes more rigorous statistical analyses, careful consideration of potential methodological biases, expanded discussion of regulatory mechanisms, and acknowledgment of non-adaptive alternatives. Overall, the work presents an intriguing view of how alternative splicing may serve as a flexible evolutionary strategy, particularly in lineages with limited capacity for coding expansion (e.g., via gene duplication). Notably, the identification of genome size and genic coding fraction thresholds (~20 Mb and ~50%, respectively) as tipping points for increased splicing activity adds conceptual depth and potential generalizability.

      Weaknesses:

      While the manuscript offers a broad comparative view of alternative splicing, its central message becomes diffuse in the revised version. The focus of the study is unclear, and the manuscript comes across as largely descriptive without a well-articulated hypothesis or explanatory evolutionary model. Although the discussion gestures toward adaptive and non-adaptive mechanisms, these interpretations are not developed early or prominently enough to anchor the reader. The negative correlation between alternative splicing and coding content is compelling, but the biological significance of this pattern remains ambiguous: it is unclear whether it reflects functional constraint, genome organization, or annotation bias. This uncertainty weakens the manuscript's broader evolutionary inferences.

      Sections of the Introduction, particularly lines 72-90, lack cohesion and logical flow, shifting abruptly between topics without a clear structure. A more effective approach may involve separating discussions of coding and non-coding sequence evolution to clarify their distinct contributions to splicing complexity. Furthermore, some interpretive claims lack nuance. For example, the assertion that splicing in plants "evolved independently" seems overstated given the available evidence, and the citation regarding slower evolution of highly expressed genes overlooks counterexamples from the immunity and reproductive gene literature.

      Presentation of the results is occasionally vague. For instance, stating "we conducted comparisons of mean values" (line 146) without specifying the metric undercuts interpretability. The authors should clarify whether these comparisons refer to the Alternative Splicing Ratio or another measure. Additionally, the lack of correlation between splicing and coding region fraction in prokaryotes may reflect a statistical power issue, particularly given their limited number of annotated isoforms, rather than a biological absence of pattern.

      Finally, the assessment of annotation-related bias warrants greater methodological clarity. The authors note that annotations with stronger experimental support yield higher splicing estimates, yet the normalization strategy for variation in transcriptomic sampling (e.g., tissue breadth vs sequencing depth) is insufficiently described. As these factors can significantly influence splicing estimates, a more rigorous treatment is essential. While the authors rightly acknowledge that splicing represents only one layer of regulatory complexity, the manuscript would benefit from a more integrated consideration of additional dimensions, such as 3D genome architecture, e.g., the potential role of topologically associating domains in constraining splicing variation.

    4. Reviewer #4 (Public review):

      The manuscript reports on a large-scale study correlating genomic architecture with splicing complexity over almost 1,500 species. We still know relatively little about alternative splicing functional consequences and evolution, and thus, the study is relevant and timely. The methodology relies on annotations from NCBI for high-quality genomes and a main metric proposed by the authors and named Alternative Splicing Ratio (ASR). It quantifies the level of redundancy of each coding nucleotide in the annotated isoforms.

      According to the authors' response to the first reviewers' comments, the present version of the manuscript seems to be a profoundly revised version compared to the original submission. I did not have access to the reviewers' comments.

      Although the study addresses an important question and the authors have visibly made an important effort to make their claims more statistically robust, I have a number of major concerns regarding the methodology and its presentation.

      (1) A large part of the manuscript is speculative and vague. For instance, the Discussion is very long (almost longer than the Results section) and the items discussed are sometimes not in direct connection with the present work. I would suggest merging the last 2 paragraphs, for instance, since the before last paragraph is essentially a review of the literature without direct connection to the present work.

      (2) The Methods section lacks clarity and precision. A large part is devoted to explaining the biases in the data without any reference or quantification. The definition of ASR is very confusing. It is first defined in equation 2, with a different name, and then again in the next subsection from a different perspective on lines 512-518. Why build matrices of co-occurrences if these are, in practice, never used? It seems the authors exploit only the trace. A major revision, if I understood correctly, was the correction/normalisation of the ASR metric. This normalisation is not explained. The authors argue that they will write another paper about it, I do not think this is acceptable for the publication of the present manuscript. Furthermore, there is no information about the technical details of the implementation: which packages did the authors use?

      (3) Could the authors motivate why they do not directly focus on the MC permutation test? They motivate the use of permutations because the data contains extreme outliers and are non normal in most cases. Hence, it seems the Welch's ANOVA is not adapted. "To further validate our findings, we also conducted<br /> 148 a Monte Carlo permutation test, which supported the conclusions (see Methods)." Where is the comparison shown? I did not see any report of the results for the non-permuted version of the Welch's ANOVA.

      (4) What are the assumptions for the Phylogenetic Generalized Least Squares? Which evolution model was chosen and why? What is the impact of changing the model? Could the authors define more precisely (e.g. with equations) what is lambda? Is it estimated or fixed?

      (5) I think the authors could improve their account of recent literature on the topic. For instance, the paper https://doi.org/10.7554/eLife.93629.3, published in the same journal last year, should be discussed. It perfectly fits in the scope of the subsection "Evidence for the adaptive role of alternative splicing". Methods and findings reported in https://doi.org/10.1186/s13059-021-02441-9 and https://www.genome.org/cgi/doi/10.1101/gr.274696.120 directly concern the assessment of AS evolutionary conservation across long evolutionary times and/or across many species. These aspects are mentioned in the introduction on p.3. but without pointing to such works. Can we really qualify a work published in 2011 as "recent" (line 348-350)?

      The generated data and codes are available on Zenodo, which is a good point for reproducibility and knowledge sharing with the community.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1

      Methodological biases in annotation and sequencing methods

      We acknowledge the reviewer’s concern regarding methodological heterogeneity in genome annotations, particularly regarding the use of CDS annotations derived from public databases. In response, we have properly addressed the potential sources of bias in estimating alternative splicing (AS) across such a broad taxonomic range.

      Given the methodological challenges encountered in this study, we have undertaken an in-depth analysis of the biases associated with genome annotations and their impact on large-scale estimates of alternative splicing. This effort has resulted in the development of a comprehensive framework for quantifying, modeling, and correcting such biases, which we believe will be of interest to the broader genomics community. We are currently preparing a separate manuscript dedicated to this methodological aspect, which we intend to submit for publication in the near future.

      To account for these biases, we performed a statistical evaluation of annotation quality by examining the relationship between ASR values and multiple features of the NCBI annotation pipeline, including both technical and biological variables. Specifically, we analyzed a set of metadata descriptors related to: (i) genome assembly quality (e.g., Contig N50, Scaffold N50, number of gaps, gap length, contig/scaffold count), (ii) the amount and diversity of experimental evidence used in annotation (e.g., number of RNA-Seq reads, number of tissues, number of experimental runs, number of proteins and transcripts, including those derived from Homo sapiens), and (iii) the nature of the annotated coding sequences (e.g., total number of CDSs, percentage of CDSs supported by experimental evidence, proportion of known CDSs, percentage of CDSs derived from ab initio predictions).

      This comprehensive analysis revealed that the strongest bias affecting ASR values is associated with the proportion of fully supported CDSs, which showed a strong positive correlation with observed splicing levels. In contrast, the percentage of CDSs relying on ab initio models showed a negative correlation, indicating that computational predictions tend to underestimate splicing complexity. Based on these findings, we implemented a polynomial normalization model using the percentage of fully supported CDSs as the main predictor of annotation bias. The resulting normalized metric, ASR<sup>∗</sup>, corrects for annotation-related variability while preserving biologically meaningful variation.

      We further verified the robustness of this correction by comparing the main results of our study using both the raw ASR and the normalized ASR<sup>*</sup> across all analyses. The qualitative and quantitative consistency of results obtained with both metrics demonstrates that our findings are not an artifact of methodological bias and validates the reliability of our approach.

      Conceptual and Statistical Framework

      Our aim was not to investigate specific regulatory mechanisms of alternative splicing, but rather to explore large-scale statistical patterns across the tree of life using a newly defined metric—the Alternative Splicing Ratio (ASR)—that enables genome-wide comparisons of splicing complexity across species. To clarify the conceptual framework, we have revised the manuscript to explicitly state our assumptions, objectives, and the scope of our conclusions. The ASR metric is now briefly introduced in the Results section, with a more detailed mathematical formulation included in the Methods section.

      From a methodological standpoint, we have expanded the manuscript to better support the comparative framework through additional statistical analyses. In particular, we now include:

      • Monte Carlo permutation tests to assess pairwise differences in splicing and genomic variables across taxonomic groups, which are robust to non-normality and heteroscedasticity in the data.

      • Welch’s ANOVA with Bonferroni correction, which accounts for unequal variances when comparing group means.

      • Phylogenetic Generalized Least Squares (PGLS) regression, which explicitly models phylogenetic non-independence between species and allows us to infer lineage-specific associations between genomic composition and alternative splicing.

      • Coefficient of variation analysis, used to evaluate the relative variability of splicing and genomic traits across groups in a scale-independent manner.

      • Variability ratio metrics, designed to compare the dispersion of splicing values relative to genomic features, thereby quantifying trends in regulatory plasticity versus structural constraints.

      All methods are thoroughly described in the revised Methods section, and their application is presented in the Results section.

      Functional vs. non-functional nature of AS events

      We have included a new discussion paragraph addressing the ongoing debate regarding the functionality of alternative splicing and a possible non-adaptive explanation for the patterns observed. While many previous studies suggest that a considerable fraction of AS events might represent splicing noise or non-functional isoforms, our intention is not to adopt this view uncritically. Instead, we cite recent literature to provide a more nuanced interpretation, recognizing both the potential adaptive value and the uncertainty surrounding the functional relevance of many AS events. Thus, rather than assuming that all observed alternative splicing events are adaptive or biologically meaningful, we now emphasize that many patterns may emerge from other processes, such as those associated to genomic constraints.

      Terminology and Result Interpretation

      The manuscript has been thoroughly revised to improve both the scientific language and the conceptual framing. We have removed inappropriate terminology such as “higher/lower organisms” and “highly evolved”. Also, we have reinterpreted the results. As part of this process, the manuscript has been substantially rewritten to focus on the most meaningful findings. Ultimately, we have retained only those results that specifically concern broad-scale patterns of alternative splicing across taxa, which are now presented with greater clarity and methodological rigor.

      Reviewer #2

      Gene Regulatory Complexity Beyond Splicing Mechanisms

      While alternative splicing represents a prominent mechanism of transcriptomic diversification, we agree with the reviewer that it constitutes only one component of the broader landscape of gene regulation. Structural and behavioral complexity in organisms arises from a combination of regulatory processes, and our study focuses specifically on alternative splicing as a measurable proxy within this multifactorial system. To clarify this point, we have added a paragraph in the Discussion section, where we explicitly contextualize alternative splicing within the wider regulatory architecture. In that paragraph, we discuss additional mechanisms that contribute to phenotypic complexity—such as transcriptional control, chromatin remodeling, epigenetic modifications, and RNA editing—citing key literature.

      Alternative Splicing Measure and Methodology

      While we agree that alternative splicing is not a definitive measure of organismal complexity, we argue that it remains a meaningful proxy for transcriptomic and regulatory diversification, especially when analyzed at large phylogenetic scale. In this version of the manuscript, our goal was not to equate alternative splicing with biological complexity, but rather to quantify its patterns across lineages and evaluate its relationship with genome structure. This point is now explicitly stated in both the Introduction and Discussion.

      We also recognize the limitations associated with the use of coding sequence (CDS) annotations from public databases such as NCBI RefSeq. To address this concern, we have conducted a detailed analysis of the potential biases introduced by heterogeneous annotation quality, sequencing depth, and computational prediction, as previously addressed in our response to Reviewer #1.

      In response to concerns about unsupported statements, we have completely rewritten the manuscript to ensure that all claims are now explicitly supported by data and grounded in up-to-date scientific literature. We have reformulated speculative statements, removed inappropriate generalizations, and improved the logical flow of the arguments throughout the text. In summary, we have strengthened both the conceptual framework and the methodological foundation of the study, while maintaining a cautious interpretation of the results.

      Trends of Alternative Splicing

      To address the reviewer’s concern, we have revised the interpretation of trends as used in our analysis. In this study, we define a trend not as a strict directional progression or a linear trajectory across all species, but rather as a broad statistical pattern observable in the relative distribution and variability of alternative splicing across major taxonomic groups. We do not claim that this pattern reflects a universal adaptive pathway. Instead, we interpret it as a signal of differences in regulatory strategies associated to the genome architecture. To avoid misinterpretation, we have rephrased several sentences in the manuscript and explicitly emphasized the variability within groups, and the lack of significant correlations in certain clades.

      Inconsistent statistics

      The discrepancies pointed out were due to differences between mean and median-based analyses. These have been clarified and consistently reported in the revised manuscript. Error bars, p-values, and a supplementary table summarizing all tests are now included. Furthremore, we have no removed any species from our dataset.

    1. eLife Assessment

      This important study examines the evolution of virulence and antibiotic resistance in Staphylococcus aureus under multiple selection pressures. The evidence presented is convincing, with rigorous data that characterizes the outcomes of the evolution experiments. However, the manuscript's primary weakness is in its presentation, as claims about the causal relationship between genotypes and phenotypes are based on correlational evidence. The manuscript needs to be revised to address these limitations, clarify the implications of the experimental design, and adjust the overall narrative to better reflect the nature of the findings.

    2. Reviewer #1 (Public review):

      Summary:

      The authors investigate how methicillin-resistant (MRSA) and sensitive (MSSA) Staphylococcus aureus adapt to a new host (C. elegans) in the presence or absence of a low dose of the antibiotic oxacillin. Using an "Evolve and Resequence" design with 48 independently evolving populations, they track changes in virulence, antibiotic resistance, and other fitness-related traits over 12 passages. Their key finding is that selection from both the host and the antibiotic together, rather than either pressure alone, results in the evolution of the most virulent pathogens. Genomically, they find that this adaptation repeatedly involves mutations in a small number of key regulatory genes, most notably codY, agr, and saeRS.

      Strengths:

      The main advantage of the research lies in its strong and thoroughly replicated experimental framework, enabling significant conclusions to be drawn based on the concept of parallel evolution. The study successfully integrates various phenotypic assays (virulence, growth, hemolysis, biofilm formation) with whole-genome sequencing, offering an extensive perspective on the adaptive landscape. The identification of certain regulatory genes as common targets of selection across distinct lineages is an important result that indicates a level of predictability in how pathogens adapt.

      Weaknesses:

      (1) The main limitation of the paper is that its findings on the function of specific genes are based on correlation, not cause-and-effect evidence. While the parallel evolution evidence is strong, the authors have not yet performed the definitive tests (i.e., reconstruction of ancestral genes) to ensure that the mutations identified in isolation are enough to account for the virulence or resistance changes observed. This makes the conclusions more like firm hypotheses, not confirmed facts.

      (2) In some instances, the claims in the text are not fully supported by the visual data from the figures or are reported with vagueness. For example, the display of phenotypic clusters in the PCA (Figure 6A) and the sweeping generalization about the effect of antibiotics on the mutation rates (Figure S5) can be more precise and nuanced. Such small deviations dilute the overall argument somewhat and must be corrected.

    3. Reviewer #2 (Public review):

      Summary:

      The manuscript describes the results of an evolution experiment where Staphylococcus aureus was experimentally evolved via sequential exposure to an antibiotic followed by passaging through C. elegans hosts. Because infecting C. elegans via ingestion results in lysis of gut cells and an immune response upon infection, the S. aureus were exposed separately across generations to antibiotic stress and host immune stress. Interestingly, the dual selection pressure of antibiotic exposure and adaptation to a nematode host resulted in increased virulence of S. aureus towards C. elegans.

      Strengths:

      The data presented provide strong evidence that in S. aureus, traits involved in adaptation to a novel host and those involved in antibiotic resistance evolution are not traded off. On the contrary, they seem to be correlated, with strains adapted to antibiotics having higher virulence towards the novel host. As increased virulence is also associated with higher rates of haemolysis, these virulence increases are likely to reflect virulence levels in vertebrate hosts.

      Weaknesses:

      Right now, the results are presented in the context of human infections being treated with antibiotics, which, in my opinion, is inappropriate. This is because<br /> (1) exposure to the host and antibiotics was sequential, not simultaneous, and thus does not reflect the treatment of infection, and<br /> (2) because the site of infection is different in C. elegans and human hosts.

      Nevertheless, the results are of interest; I just think the interpretation and framing should be adjusted.

    4. Reviewer #3 (Public review):

      Summary:

      Su et al. sought to understand how the opportunistic pathogen Staphylococcus aureus responds to multiple selection pressures during infection. Specifically, the authors were interested in how the host environment and antibiotic exposure impact the evolution of both virulence and antibiotic resistance in S. aureus. To accomplish this, the authors performed an evolution experiment where S. aureus was fed to Caenorhabditis elegans as a model system to study the host environment and then either subjected to the antibiotic oxacillin or not. Additionally, the authors investigated the difference in evolution between an antibiotic-resistant strain, MRSA, and an isogenic susceptible strain, MSSA. They found that MRSA strains evolved in both antibiotic and host conditions became more virulent, and that strains evolved outside these conditions lost virulence. Looking at the strains evolved in just antibiotic conditions, the authors found that S. aureus maintained its ability to lyse blood cells. Mutations in codY, gdpP, and pbpA were found to be associated with increased virulence. Additionally, these mutations identified in these experiments were found in S. aureus strains isolated from human infections.

      Strengths:

      The data are well-presented, thorough, and are an important addition to the understanding of how certain pathogens might adapt to different selective pressures in complex environments.

      Weaknesses:

      There are a few clarifications that could be made to better understand and contextualize the results. Primarily, when comparing the number of mutations and selection across conditions in an evolution experiment, information about population sizes is important to be able to calculate the mutation supply and number of generations throughout the experiment. These calculations can be difficult in vivo, but since several steps in the methodology require plating and regrowth, those population sizes could be determined. There was also no mention of how the authors controlled the inoculation density of bacteria introduced to each host. This would need to be known to calculate the generation time within the host. These caveats should be addressed in the manuscript.

      Another concern is the number of generations the populations of S. aureus spent either with relaxed selection in rich media or under antibiotic pressure in between the host exposure periods. It is probable then that the majority of mutations were selected for in these intervening periods between host infection. Again, a more detailed understanding of population sizes would contribute to the understanding of which phase of the experiment contributed to the mutation profile observed.

    1. eLife Assessment

      This study reports on the development and characterization of chickens with genetic deficiencies in type I or type III interferon receptors, which is an important contribution to the field of avian immunology. The data reflecting the development of the new interferon-receptor-deficient chickens is compelling. However, the characterization of IFN biology and infection responses in these knockout chickens is somewhat incomplete and could be improved by addressing the noted weaknesses.

    2. Reviewer #1 (Public review):

      Summary:

      This manuscript presents an extensive body of work and an outstanding contribution to our understanding of the IFN type I and III system in chickens. The research started with the innovative approach of generating KO chickens that lack the receptor for IFNα/β (IFNAR1) or IFN-λ (IFNLR1). The successful deletion and functional loss of these receptors was clearly and comprehensively demonstrated in comparison to the WT. Moreover, the homozygous KO lines (IFNAR1-/- or IFNLR1-/- ) were found to have similar body weights, and normal egg production and fertility compared to their WT counterparts. These lines are a major contribution to the toolbox for the study of avian/chicken immunology.

      The significance of this contribution is further demonstrated by the use of these lines by the authors to gain insight into the roles of IFN type I and IFN-type III in chickens, by conducting in ovo and in vivo studies examining basic aspects of immune system development and function, as well as the responses to viral challenges conducted in ovo and in vivo.

      Based on solid, state-of the-art methods and convincing evidence from studies comparing various immune system related functions in the IFNAR1-/- or IFNLR1-/- lines to the WT, revealed that the deletion of IFNAR1 and/or IFNLR1 resulted in:<br /> (1) impaired IFN signaling and induction of anti-viral state;<br /> (2) modulation of immune cell profiles in the peripheral blood circulation and spleen;<br /> (3) modulation of the cecum microbiome;<br /> (4) reduced concentrations of IgM and IgY in the blood plasma before and following immunization with model antigen KLH, whereby also line differences in the time-course of the antibody production were observed;<br /> (5) decrease in MHCII+ macrophages and B cells in the spleen of IFNAR1 KO chickens, although the MHCII-expression per cell was not affected in this line; and<br /> (6) reduction in the response of αβ1 TCR+ T cells of IFNAR1 KO chickens as suggested by clonal repertoire analyses.

      These studies were then followed by examination of the role of type I and type III IFN in virus infection, using different avian influenza A virus strains as well as an avian gamma corona virus (IBV) in in ovo challenge experiments. These studies revealed: viral titers that reflect virus-species and strain-specific IFN responses; no differences in the secretion of IFN-α/β in both KO compared to the WT lines; a predominant role of type I IFN in inducing the interferon-stimulated gene (ISG) Mx; and that an excessive and unbalanced type I IFN response can harm host fitness (survival rate, length of survival) and contribute to immunopathology.

      Based on guidance from the in ovo studies, comprehensive in vivo studies were conducted on host-pathogen interactions in hens from the three lines (WT, IFNAR1 KO, or IFNLR1 KO). These studies revealed the early appearance of symptoms and poor survival of hens from the IFNR1 KO line challenged with H3N1 avian influenza A virus; efficient H#N1 virus replication in IFNAR1 KO hens, increased plasma concentrations of IFNα/β and mRNA expression of IFN-λ in spleens of the IFNAR1 KO hens; a pro-inflammatory role of IFN-λ in the oviduct of hens infected with H3N1 virus; increased proinflammatory cytokine expression in spleens of IFNAR1 KO hens, and Impairment of negative feedback mechanisms regulating IFN-α/β secretion in IFNAR1-KO hens and a significant decrease in this group's antiviral state; additionally it was demonstrated that IFN-α/β can compensate IFN-λ to induce an adequate antiviral state in the spleen during H3N1 infection, but IFN-λ cannot compensate for IFN-α/β signaling in the spleen.

      Strengths:

      (1) Both the methods and results from the comprehensive, well-designed, and well-executed experiments are considered excellent. The results are well and correctly described in the result narrative and well presented in both the manuscript and supplement Tables and Figures. Excellent discussion/interpretation of results.

      (2) The successful generation of the type I and type III IFN KO lines offers unprecedented insight and opens multiple new venues for exploring the IFN system in chickens. The new knowledge reported here is direct evidence of the high impact of this model system on effectively addressing a critical knowledge gap in avian immunology.

      (3) The thoughtful selection of highly relevant viruses to poultry and human health for the in ovo and in vivo challenge studies to examine and assess host-pathogen interactions in the IFNR KO and WT lines.

      (4) Making use of the unique opportunities in the chicken model to examine and evaluate the host's IFN system responses to various viral challenges in ovo, before conducting challenge studies in hens.

      (5) The new knowledge gained from the IFNAR1 and IFNLR1 KO lines will find much-needed application in developing more effective strategies to prevent health challenges like avian influenza and its devastating effects on poultry, humans, and other mammals.

      (6) The excellent cooperation and contributions of the co-authors and institutions.

      Weaknesses:

      No weaknesses were identified by this reviewer.

    3. Reviewer #2 (Public review):

      Summary:

      This study attempts to dissect the contributions of type I and type III IFNs to the antiviral response in chickens. The first part of the study characterises the generation of IFNAR and IFNLR KO chicken strains and describes basic differences. Four different viruses are then tested in chicken embryos, while the subsequent analysis of the antiviral response in vivo is performed with one influenza H3N1 strain.

      Strengths:

      Having these two KO chicken strains as a tool is a great achievement. The initial analysis is solid. Clear effect of IFNAR deficiency in in vivo infection, less so for IFNLR deficiency.

      Weaknesses:

      (1) The antibody induction by KLH immunisation: No data indicated whether or not this vaccination induces IFN responses in wt mice, so the effects observed may be due to steady-state differences or to differential effects of IFN induced during the vaccination phase. No pre-immune results are shown. The differences are relatively small and often found at only one plasma dilution - the whole of Figure 4 could be condensed into one or two panels by proper calculation of Ab titers - would these titres be significantly different? This, as all of the other in vivo experiments, has not been repeated, if I understand the methods section correctly.

      (2) The basic conundrum here and in later figures is never addressed by the authors: Situations where IFN type 1 and 3 signalling deficiency each have an independent effect (i.e., Figure 4d) suggest that they act by separate, unrelated mechanisms. However, all the literature about these IFN families suggests that they show almost identical signalling and gene induction downstream of their respective receptors. How can the same signalling, clearly active here downstream of the receptors for IFN type 1 or type 3, be non-redundant, i.e., why does the unaffected IFN family not stand in? This is a major difference from the mouse studies, which showed a rather subtle phenotype when only one of the two IFN systems was missing, but a massive reduction in virus control in double KO mice (the correct primary paper should be quoted here, not only the review by McNab). Reasons could be a direct effect of IFNab on B cells and an indirect effect of IFNL through non-B cells, timing issues, and many other scenarios can be envisaged. The authors do not address this question, which limits the depth of analysis.

      (3) In the one in vivo experiment performed with chickens, only one virus was tested; more influenza strains should be included, as well as non-influenza viruses.

      (4) The basic conundrum of point 2 applies equally to Figure 6a; both KOs have a phenotype. Again in 6d, both IFNs appear to be separately required for Mx induction. An explanation is needed.

      (5) Line 308, where are the viral titers you refer to in the text? The statement that the results demonstrate that excessive IFNab has a negative impact is overstretched, as no IFN measurements of the infected embryos are shown here.

      (6) The in vivo infection is the most interesting experiment, and the key outcome here is that IFN type 1 is crucial for anti-H3N1 protection in chickens, while type 3 is less impactful. However, this experiment suffers from the different time points when chickens were culled, so many parameters are impossible to compare (e.g., weight loss, histopathology, IFN measurements, and more). Many of these phenomena are highly dynamic in acute virus infections, so disparate time points do not allow a meaningful comparison between different genotypes. What are the stats in 7b? Is the median rather than the mean indicated by the line? Otherwise, the lines appear in surprising places. SD must be shown, and I find it difficult to believe that there is a significant difference in weight, for e.g., IFNAR KO, unless maybe with a paired t test. What is the statistical test?

      (7) Figures 7e,f: these comparisons are very difficult to interpret as the virus loads at these time points already differ significantly, so any difference could be secondary to virus load differences.

    1. eLife Assessment

      Non-essential amino acids such as glutamine have been known to be required for T cell general activation through sustaining basic biosynthetic processes, including nucleotide biosynthesis, ATP generation, and protein synthesis. In this important study, the authors found that extracellular asparagine (Asn) is required not only for T cells to generally refuel metabolic reprogramming, but to produce helper T cell lineage-specific cytokine, for instance, IL17. In particular, the importance of Asn in IL17 production was convincingly demonstrated in the mouse experimental autoimmune encephalomyelitei (EAE) model, mimicking human multiple sclerosis disease.

    2. Reviewer #1 (Public review):

      Summary:

      In this manuscript, the authors reveal that the availability of extracellular asparagine (Asn) represents a metabolic vulnerability for the activation and differentiation of naive CD4+ T cells. To deplete extracellular Asn, they employed two orthogonal approaches: activating naive CD4+ T cells in either PEGylated asparaginase (PEG-AsnASE)-treated medium or custom-formulated RPMI medium specifically lacking Asn. Importantly, they demonstrate that Asn depletion not only impaired metabolic reprogramming associated with CD4+ T cell activation but also reduced CD4+ helper T cell lineage-specific cytokine production, thereby ameliorating the severity of experimental autoimmune encephalomyelitis.

      Strengths:

      The experiments presented here are comprehensive and well-designed, providing compelling evidence for the conclusions. The conclusions will be important to the field.

      Weaknesses:

      (1) EAE is the prototypic T cell-mediated autoimmune disease model, and both Th1 and Th17 cells are implicated in its pathogenesis. In contrast, Th2 and Treg cells and their associated cytokines (such as IL-4 and IL-10) have been shown to play a role in the resolution of EAE, and potentially in the modulation of disease progression. Thus, it will be important to determine whether Asn depletion affects the differentiation of naive CD4+ T cells into corresponding subsets under Th2 and Treg polarization conditions, as well as the expression of lineage-specific transcription factors and cytokine production.

      (2) EAE is characterized by inflammation and demyelination in the central nervous system (CNS), leading to neurological deficits. Myelin destruction is directly correlated with the severity of the disease. For Figure 6, did the authors perform spinal cord histological analysis by hematoxylin and eosin (H&E) or Luxol fast blue (LFB) staining? This is important to rigorously examine pathological EAE symptoms.

    3. Reviewer #2 (Public review):

      While the importance of asparagine in the differentiation and activation of CD8 T cells has been previously reported, its role in CD4 T cells remained unclear. Using culture media containing specific amino acids, the authors demonstrated that extracellular asparagine promotes CD4 T cell proliferation. Consistent with this, depletion of extracellular asparagine using PEG-AsnASE suppressed CD4 T cell activation. Proteomic analysis focusing on asparagine content revealed that, during the early phase of T cell activation, most asparagine incorporated into proteins is derived from extracellular sources. The authors further confirmed the importance of extracellular asparagine in vivo, demonstrating improved EAE pathology.

      While the data are well organized and convincing, the mechanism by which asparagine deficiency leads to altered T cell differentiation remains unclear. It is also necessary to investigate the transporters involved in asparagine uptake. In particular, elucidating whether different T cell subsets utilize the same or distinct transport mechanisms would provide important insight into the immunoregulatory role of asparagine.

      (1) The finding that asparagine supplementation promotes T cell proliferation under various amino acid conditions is highly significant. However, the concentration at which this effect occurs remains unclear. A titration analysis would be necessary to determine the dose-dependency of asparagine.

      (2) The effects of asparagine deficiency occur during the early phase of T cell activation. Thus, it is likely that the transporters responsible for asparagine uptake are either rapidly induced upon activation or already expressed in the resting state. Since this is central to the focus of the manuscript, it is interesting to identify the transporter responsible for asparagine uptake during early T cell activation. A recent paper (DOI: 10.1126/sciadv.ads350) reported that macrophages utilize Slc6a14 to use extracellular asparagine. Is this also true for CD4+ T cells?

      (3) Given that depletion of extracellular asparagine impairs differentiation of Th1 and Th17 cells, it is possible that TCR signaling is compromised under these conditions. This point should be investigated by targeting downstream signaling molecules such as Lck, ZAP70, or mTOR. Also, does it affect the protein stability of master transcription factors such as T-bet and RORgt?

      (4) Is extracellular asparagine also important for the differentiation of helper T cell subsets other than Th1 and Th17, such as Th2, Th9, and iTreg?

      (5) Asparagine taken up from outside the cell has been shown to be used for de novo protein synthesis (Figure 3E), but are there any proteins that are particularly susceptible to asparagine deficiency? This can be verified by performing proteome analysis, and the effects on Th1/17 subset differentiation mentioned above should also be examined.

      (6) While the importance of extracellular asparagine is emphasized, Asns expression is markedly induced during early T cell activation. Nevertheless, the majority of asparagine incorporated into proteins appears to be derived from extracellular sources. Does genetic deletion of Asns have any impact on early CD4+ T cell activation? The authors indicated that newly synthesized Asns have little impact on CD8+ T cells in the Discussion section, but is this also true for CD4+ T cells? This could be verified through experiments using CRISPR-mediated Asns gene targeting or pharmacological inhibition.

    1. eLife Assessment

      This study illustrates a valuable application of BID-seq to bacterial RNA, allowing transcriptome-wide mapping of pseudouridine modifications across various bacterial species. The evidence presented includes a mix of solid and incomplete data and analyses, and would benefit from more rigorous approaches. The work will interest a specialized audience involved in RNA biology.

    2. Reviewer #1 (Public review):

      Summary:

      The manuscript by Xu et al. reported base-resolution mapping of RNA pseudouridylation in five bacterial species, utilizing recently developed BID-seq. They detected pseudouridine (Ψ) in bacterial rRNA, tRNA, and mRNA, and found growth phase-dependent Ψ changes in tRNA and mRNA. They then focused on mRNA and conducted a comparative analysis of Ψ profiles across different bacterial species. Finally, they developed a deep learning model to predict Ψ sites based on RNA sequence and structure.

      Strengths:

      This is the first comprehensive Ψ map across multiple bacterial species, and systematically reveals Ψ profiles in rRNA, tRNA, and mRNA under exponential and stationary growth conditions. It provides a valuable resource for future functional studies of Ψ in bacteria.

      Weaknesses:

      Ψ is highly abundant on non-coding RNA such as rRNA and tRNA, while its level on mRNA is very low. The manuscript focuses primarily on mRNA, which raises questions about the data quality and the rigor of the analysis. Many conclusions in the manuscript are speculative, based solely on the sequencing data but not supported by additional experiments.

    3. Reviewer #2 (Public review):

      Summary:

      In this study, Xu et al. present a transcriptome-wide, single-base resolution map of RNA pseudouridine modifications across evolutionarily diverse bacterial species using an adapted form of BID-Seq. By optimizing the method for bacterial RNA, the authors successfully mapped modifications in rRNA, tRNA, and, importantly, mRNA across both exponential and stationary growth phases. They uncover evolutionarily conserved Ψ motifs, dynamic Ψ regulation tied to bacterial growth state, and propose functional links between pseudouridylation and bacterial transcript stability, translation, and RNA-protein interactions. To extend these findings, they develop a deep learning model that predicts pseudouridine sites from local sequence and structural features.

      Strengths:

      The authors provide a valuable resource: a comprehensive Ψ atlas for bacterial systems, spanning hundreds of mRNAs and multiple species. The work addresses a gap in the field - our limited understanding of bacterial epitranscriptomics, by establishing both the method and datasets for exploring post-transcriptional modifications.

      Weaknesses:

      The main limitation of the study is that most functional claims (i.e., translation efficiency, mRNA stability, and RNA-binding protein interactions) are based on correlative evidence. While suggestive, these inferences would be significantly strengthened by targeted perturbation of specific Ψ synthases or direct biochemical validation of proposed RNA-protein interactions (e.g., with Hfq). Additionally, the GNN prediction model is a notable advance, but methodological details are insufficient to reproduce or assess its robustness.

    4. Reviewer #3 (Public review):

      Summary:

      This study aimed to investigate pseudouridylation across various RNA species in multiple bacterial strains using an optimized BID-seq approach. It examined both conserved and divergent modification patterns, the potential functional roles of pseudouridylation, and its dynamic regulation across different growth conditions.

      Strengths:

      The authors optimized the BID-seq method and applied this important technique to bacterial systems, identifying multiple pseudouridylation sites across different species. They investigated the distribution of these modifications, associated sequence motifs, their dynamics across growth phases, and potential functional roles. These data are of great interest to researchers focused on understanding the significance of RNA modifications, particularly mRNA modifications, in bacteria.

      Weaknesses:

      (1) The reliability of BID-seq data is questionable due to a lack of experimental validations.

      (2) The manuscript is not well-written, and the presented work shows a major lack of scientific rigor, as several key pieces of information are missing.

      (3) The manuscript's organization requires significant improvement, and numerous instances of missing or inconsistent information make it difficult to understand the key objectives and conclusions of the study.

      (4) The rationale for selecting specific bacterial species is not clearly explained, and the manuscript lacks a systematic comparison of pseudouridylation among these species.

    1. eLife Assessment

      This study presents valuable data suggesting that ATP-induced modulation of alveolar macrophage (AM) functions is associated with NLRP3 inflammasome activation and enhanced phagocytic capacity. While the in vivo and in vitro data reveal an interesting phenotype, the evidence provided is incomplete and does not fully support the paper's conclusions. Additional investigations would be of value in complementing the data and strengthening the interpretation of the results. This study should be of interest to immunologists and the mucosal immunity community.

    2. Reviewer #1 (Public review):

      Summary:

      Alveolar macrophages (AMs) are key sentinel cells in the lungs, representing the first line of defense against infections. There is growing interest within the scientific community in the metabolic and epigenetic reprogramming of innate immune cells following an initial stress, which alters their response upon exposure to a heterologous challenge. In this study, the authors show that exposure to extracellular ATP can shape AM functions by activating the P2X7 receptor. This activation triggers the relocation of the potassium channel TWIK2 to the cell surface, placing macrophages in a heightened state of responsiveness. This leads to the activation of the NLRP3 inflammasome and, upon bacterial internalization, to the translocation of TWIK2 to the phagosomal membrane, enhancing bacterial killing through pH modulation. Through these findings, the authors propose a mechanism by which ATP acts as a danger signal to boost the antimicrobial capacity of AMs.

      Strengths:

      This is a fundamental study in a field of great interest to the scientific community. A growing body of evidence has highlighted the importance of metabolic and epigenetic reprogramming in innate immune cells, which can have long-term effects on their responses to various inflammatory contexts. Exploring the role of ATP in this process represents an important and timely question in basic research. The study combines both in vitro and in vivo investigations and proposes a mechanistic hypothesis to explain the observed phenotype.

      Weaknesses:

      First, the concept of training or trained immunity refers to long-term epigenetic reprogramming in innate immune cells, resulting in a modified response upon exposure to a heterologous challenge. The investigations presented demonstrate phenotypic alterations in AMs seven days after ATP exposure; however, they do not assess whether persistent epigenetic remodeling occurs with lasting functional consequences. Therefore, a more cautious and semantically precise interpretation of the findings would be appropriate.

      Furthermore, the in vivo data should be strengthened by additional analyses to support the authors' conclusions. The authors claim that susceptibility to Pseudomonas aeruginosa infection differs depending on the ATP-induced training effect. Statistical analyses should be provided for the survival curves, as well as additional weight curves or clinical assessments. Moreover, it would be appropriate to complement this clinical characterization with additional measurements, such as immune cell infiltration analysis (by flow cytometry), and quantification of pro-inflammatory cytokines in bronchoalveolar lavage fluid and/or lung homogenates.

      Moreover, the authors attribute the differences in resistance to P. aeruginosa infection to the ATP-induced training effect on AMs, based on a correlation between in vivo survival curves and differences in bacterial killing capacity measured in vitro. These are correlative findings that do not establish a causal role for AMs in the in vivo phenotype. ATP-mediated effects on other (i.e., non-AM) cell populations are omitted, and the possibility that other cells could be affected should be, at least, discussed. Adoptive transfer experiments using AMs would be a suitable approach to directly address this question.

    3. Reviewer #2 (Public review):

      Summary:

      In this manuscript, Thompson et al. investigate the impact of prior ATP exposure on later macrophage functions as a mechanism of immune training. They describe that ATP training enhances bactericidal functions, which they connect to the P2x7 ATP receptor, Nlrp3 inflammasome activation, and TWIK2 K+ movement at the cell surface and subsequently at phagosomes during bacterial engulfment. With stronger methodology, these findings could provide useful insight into how ATP can modulate macrophage immune responses, though they are generally an incremental addition to existing literature. The evidence supporting their conclusions is currently inadequate. Gaps in explaining methodology are substantial enough to undermine trust in much of the data presented. Some assays may not be designed rigorously enough for interpretation.

      Strengths:

      The authors demonstrate two novel findings that have sufficient rigor to assess:

      (1) prolonged persistence of TWIK2 at the macrophage plasma membrane following ATP, and can translocate to the phagosome during particle engulfment, which builds upon their prior report of ATP-driven 'training' of macrophages.

      (2) administering mice intra-nasal ATP to 'train' lungs to protect mice from otherwise fatal bacterial infection.

      Weaknesses:

      (1) Missing details from methods/reported data: Substantial sections of key methods have not been disclosed (including anything about animal infection models, RNA-sequencing, and western blotting), and the statistical methods, as written, only address two-way comparisons, which would mean analysis was improperly performed. In addition, there is a general lack of transparency - the methods state that only representative data is included in the manuscript, and individual data points are not shown for assays.

      (2) Poor experimental design including missing controls: Particularly problematic are the Seahorse assay data (requires normalization to cell numbers to interpret this bulk assay - differences in cell growth/loss between conditions would confound data interpretation) and bacterial killing assays (as written, this method would be heavily biased by bacterial initial binding/phagocytosis which would confound assessment of killing). Controls need to be included for subcellular fractionating to confirm pure fractions and for dye microscopy to show a negative background. Conclusions from these assays may be incorrect, and in some cases, the whole experiment may be uninterpretable.

      (3) The conclusions overstate what was tested in the experiments: Conceptually, there are multiple places where the authors draw conclusions or frame arguments in ways that do not match the experiments used. Particularly:<br /> a) The authors discuss their findings in the context of importance for AM biology during respiratory infection but in vitro work uses cells that are well-established to be poor mimics of resident AMs (BMDM, RAW), particularly in terms of glycolytic metabolism.<br /> b) In vivo work does not address whether immune cell recruitment is triggered during training.<br /> c) Figure 3 is used to draw conclusions about K+ in response to bacterial engulfment, but actually assesses fungal zymosan particles.<br /> d) Figure 5 is framed in bacterial susceptibility post-viral infection, but the model used is bacterial post-bacterial.<br /> e) In their discussion, the authors propose to have shown TWIK2-mediated inflammasome activation. They link these separately to ATP, but their studies do not test if loss of TWIK2 prevents inflammasome activation in response to ATP (Figure 4E does not use TWIK2 KO).

      In summary, this work contains some useful data showing how ATP can 'train' macrophages. However, it largely lacks the expected level of rigor. For this work to be valuable to the field, it is likely to need substantial improvement in methods reporting, inclusion of missing assay controls, may require repeating key experiments that were run with insufficient methodology (or providing details and supplemental data to prove that methodology was sufficient), and should either add additional experiments that properly test their experimental question or rewrite their conclusions.

    1. eLife Assessment

      This convincing study, which is based on a survey of researchers, finds that women are less likely than men to submit articles to elite journals. It also finds that there is no relation between gender and reported desk rejection. The study is an important contribution to work on gender bias in the scientific literature.

    2. Joint Public Review:

      Summary from an earlier round of review:

      This paper summarises responses from a survey completed by around 5,000 academics on their manuscript submission behaviours. The authors find several interesting stylised facts, including (but not limited to):- Women are less likely to submit their papers to highly influential journals (e.g., Nature, Science and PNAS).

      - Women are more likely to cite the demands of co-authors as a reason why they didn’t submit to highly influential journals.

      - Women are also more likely to say that they were advised not to submit to highly influential journals.

      The paper highlights an important point, namely that the submission behaviours of men and women scientists may not be the same (either due to preferences that vary by gender, selection effects that arise earlier in scientists’ careers or social factors that affect men and women differently and also influence submission patterns). As a result, simply observing gender differences in acceptance rates - or a lack thereof - should not be automatically interpreted as as evidence for or against discrimination (broadly defined) in the peer review process.

      Editor’s note: This is the third version of this article.

      Comments made during the peer review of the second version, along with author’s responses to these comments, are available below. Revisions made in response to these comments include changing the colour scheme used for the figures to make the figures more accessible for readers with certain forms of colour blindness.

      Comments made during the peer review of the first version, along with author’s responses to these comments, are available with previous versions of the article.

    3. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #1 (Public review):

      Summary

      This paper summarises responses from a survey completed by around 5,000 academics on their manuscript submission behaviours. The authors find several interesting stylised facts, including (but not limited to):

      Women are less likely to submit their papers to highly influential journals (e.g., Nature, Science and PNAS).

      Women are more likely to cite the demands of co-authors as a reason why they didn't submit to highly influential journals.

      Women are also more likely to say that they were advised not to submit to highly influential journals.

      The paper highlights an important point, namely that the submission behaviours of men and women scientists may not be the same (either due to preferences that vary by gender, selection effects that arise earlier in scientists' careers or social factors that affect men and women differently and also influence submission patterns). As a result, simply observing gender differences in acceptance rates - or a lack thereof - should not be automatically interpreted as as evidence for or against discrimination (broadly defined) in the peer review process.

      Major comments

      What do you mean by bias?

      In the second paragraph of the introduction, it is claimed that "if no biases were present in the case of peer review, then we should expect the rate with which members of less powerful social groups enjoy successful peer review outcomes to be proportionate to their representation in submission rates." There are a couple of issues with this statement.

      First, the authors are implicitly making a normative assumption that manuscript submission and acceptance rates *should* be equalised across groups. This may very well be the case, but there can also be valid reasons - even when women are not intrinsically better at research than men - why a greater fraction of female-authored submissions are accepted relative to male-authored submissions (or vice versa). For example, if men are more likely to submit their less ground-breaking work, then one might reasonably expect that they experience higher rejection rates compared to women, conditional on submission.

      We do assume that normative statement: unless we believe that men’s papers are intrinsically better than women’s papers, the acceptance rate should be the same. But the referee is right: we have no way of controlling for the intrinsic quality of the work of men and women. That said, our manuscript does not show that there is a different acceptance rate for men and women; it shows that women are less likely to submit papers to a subset of journals that are of a lower Journal Impact Factor, controlling for their most cited paper, in an attempt to control for intrinsic quality of the manuscripts.

      Second, I assume by "bias", the authors are taking a broad definition, i.e., they are not only including factors that specifically relate to gender but also factors that are themselves independent of gender but nevertheless disproportionately are associated with one gender or another (e.g., perhaps women are more likely to write on certain topics and those topics are rated more poorly by (more prevalent) male referees; alternatively, referees may be more likely to accept articles by authors they've met before, most referees are men and men are more likely to have met a given author if he's male instead of female). If that is the case, I would define more clearly what you mean by bias. (And if that isn't the case, then I would encourage the authors to consider a broader definition of "bias"!)

      Yes, the referee is right that we are taking a broad definition of bias. We provide a definition of bias on page 3, line 92. This definition is focused on differential evaluation which leads to differential outcomes. We also hedge our conversation (e.g., page 3, line 104) to acknowledge that observations of disparities may only be an indicator of potential bias, as many other things could explain the disparity. In short, disparities are a necessary but insufficient indicator of bias. We add a line in the introduction to reinforce this. The only other reference to the term bias comes on page 10, line 276. We add a reference to Lee here to contextualize.

      Identifying policy interventions is not a major contribution of this paper

      I would take out the final sentence in the abstract. In my opinion, your survey evidence isn't really strong enough to support definitive policy interventions to address the issue and, indeed, providing policy advice is not a major - or even minor - contribution of your paper. (Basically, I would hope that someone interested in policy interventions would consult another paper that much more thoughtfully and comprehensively discusses the costs and benefits of various interventions!) While it's fine to briefly discuss them at the end of your paper - as you currently do - I wouldn't highlight that in the abstract as being an important contribution of your paper.

      We thank the referee for this comment. While we agree that our results do not lead to definitive policy interventions, we believe that our findings point to a phenomenon that should be addressed through policy interventions. Given that some interventions are proposed in our conclusion, we feel like stating this in the abstract is coherent.

      Minor comments

      What is the rationale for conditioning on academic rank and does this have explanatory power on its own - i.e., does it at least superficially potentially explain part of the gender gap in intention to submit?

      Thank you for this thoughtful question. We conditioned on academic rank in all regression analyses to account for structural differences in career stage that may potentially influence submission behaviors. Academic rank (e.g., assistant, associate, full professor) is a key determinant of publishing capacity and strategic considerations, such as perceived likelihood of success at elite journals, tolerance for risk, and institutional expectations for publication venues.

      Importantly, academic rank is also correlated with gender due to cumulative career disadvantages that contribute to underrepresentation of women at more senior levels. Failing to adjust for rank would conflate gender effects with differences attributable to career stage. By including rank as a covariate, we aim to isolate gender-associated patterns in submission behavior within comparable career stages, thereby producing a more precise estimate of the gender effect.

      Regarding explanatory power, academic rank does indeed contribute significantly to model fit across our analyses, indicating that it captures meaningful variation in submission behavior. However, even after adjusting for rank, we continue to observe significant gender differences in submission patterns in several disciplines. This suggests that while academic rank explains part of the variation, it does not fully account for the gender gap—highlighting the importance of examining other structural and behavioral factors that shape the publication trajectory.

      Reviewer #2 (Public review):

      Basson et al. present compelling evidence supporting a gender disparity in article submission to "elite" journals. Most notably, they found that women were more likely to avoid submitting to one of these journals based on advice from a colleague/mentor. Overall, this work is an important addition to the study of gender disparities in the publishing process.

      I thank the authors for addressing my concerns.

      Reviewer #4 (Public review):

      Main strengths

      The topic of the MS is very relevant given that across the sciences/academia, genders are unevenly represented, which has a range of potential negative consequences. To change this, we need to have the evidence on what mechanisms cause this pattern. Given that promotion and merit in academia are still largely based on the number of publications and the impact factor, one part of the gap likely originates from differences in publication rates of women compared to men.

      Women are underrepresented compared to men in journals with a high impact factor. While previous work has detected this gap and identified some potential mechanisms, the current MS provides strong evidence that this gap might be due to a lower submission rate of women compared to men, rather than the rejection rates. These results are based on a survey of close to 5000 authors. The survey seems to be conducted well (though I am not an expert in surveys), and data analysis is appropriate to address the main research aims. It was impossible to check the original data because of the privacy concerns.

      Interestingly, the results show no gender bias in rejection rates (desk rejection or overall) in three high-impact journals (Science, Nature, PNAS). However, submission rates are lower for women compared to men, indicating that gender biases might act through this pathway. The survey also showed that women are more likely to rate their work as not groundbreaking and are advised not to submit to prestigious journals, indicating that both intrinsic and extrinsic factors shape women's submission behaviour.

      With these results, the MS has the potential to inform actions to reduce gender bias in publishing, but also to inform assessment reform at a larger scale.

      I do not find any major weaknesses in the revised manuscript.

      Reviewer #4 (Recommendations for the authors):

      (1) Colour schemes of the Figures are not adjusted for colour-blindness (red-green is a big NO), some suggestions can be found here https://www.nceas.ucsb.edu/sites/default/files/2022-06/Colorblind%20Safe%20Color%20Schemes.pdf

      We appreciate the suggestion. We’ve adjusted the colors in the manuscript to be color-blind friendly using one of the colorblind safe palettes suggested by the reviewer.

      (2) I do not think that the authors have fully addressed the comment about APCs and the decision to submit, given that PNAS has publication charges that amount to double of someone's monthly salary. I would add a sentence or two to explain that publication charges should not be a factor for Nature and Science, but might be for PNAS.

      While APCs are definitely a factor affecting researchers’ submission behavior, it is mostly does so for lower prestige journals rather than for the three elite journals analyzed here. As mentioned in the previous round of revisions, Nature and Science have subscription options. And PNAS authors without funding have access to waivers: https://www.pnas.org/author-center/publication-charges

      (3) Line 268, the first suggestion here is not something that would likely work. Thus, I would not put it as the first suggestion.

      We made the suggested change.

      (4) Data availability - remove AND in 'Aggregated and de-identified data' because it sounds like both are shared. Suggest writing: 'Aggregated, de-identified data..'. I still suggest sharing data/code in a trusted repository (e.g. Dryad, ZENODO...) rather than on GitHub, as per the current recommendation on the best practices for data sharing.

      Thank you for your comment regarding data availability. Due to IRB restrictions and the conditions of our ethics approval, we are not permitted to share the survey data used in this study. However, to support transparency and reproducibility, we have made all analysis code available on Zenodo at https://doi.org/10.5281/zenodo.16327580. In addition, we have included a synthetic dataset with the same structure as the original survey data but containing randomly generated values. This allows others to understand the data structure and replicate our analysis pipeline without compromising participant confidentiality.

    1. eLife Assessment

      This valuable study introduces a modern and accessible PyTorch reimplementation of the widely used SpliceAI model for splice site prediction. The authors provide convincing evidence that their OpenSpliceAI implementation matches the performance of the original while improving usability and enabling flexible retraining across species. These advances are likely to be of broad interest to the computational genomics community.

    2. Reviewer #1 (Public review):

      Summary:

      Chao et al. produced an updated version of the SpliceAI package using modern deep learning frameworks. This includes data preprocessing, model training, direct prediction, and variant effect prediction scripts. They also added functionality for model fine-tuning and model calibration. They convincingly evaluate their newly trained models against those from the original SpliceAI package and investigate how to extend SpliceAI to make predictions in new species. Their comparisons to the original SpliceAI models are convincing on the grounds of model performance and their evaluation of how well the new models match the original's understanding of non-local mutation effects. However, their evaluation of the new calibration functionality would benefit from a more nuanced discussion of the limitations of calibration.

      Strengths

      (1) They provide convincing evidence that their new implementation of SpliceAI matches the performance and mutation effect estimation capabilities of the original model on a similar dataset while benefiting from improved computational efficiencies. This will enable faster prediction and retraining of splicing models for new species as well as easier integration with other modern deep learning tools.

      (2) They produce models with strong performance on non-human model species and a simple well well-documented pipeline for producing models tuned for any species of interest. This will be a boon for researchers working on splicing in these species and make it easy for researchers working on new species to generate their own models.

      (3) Their documentation is clear and abundant. This will greatly aid the ability of others to work with their code base.

      Weaknesses

      (1) Their discussion of their package's calibration functionality does not adequately acknowledge the limitations of model calibration. This is problematic as this is a package intended for general use and users who are not experienced in modeling broadly and the subfield of model calibration specifically may not already understand these limitations. This could lead to serious errors and misunderstandings down the road. A model is not calibrated or uncalibrated in and of itself, only with respect to a specific dataset. In this case they calibrated with respect to the training dataset, a set of canonical transcript annotations. This is a perfectly valid and reasonable dataset to calibrate against. However, this is unlikely to be the dataset the model is applied to in any downstream use case, and this calibration is not guaranteed or expected to hold for any shift in the dataset distribution. For example, in the next section they use ISM based approaches to evaluate which sequence elements the model is sensitive to and their calibration would not be expected to hold for this set of predictions. This issue is particularly worrying in the case of their model because annotation of canonical transcript splice sites is a task that it is unlikely their model will be applied to after training. Much more likely tasks will be things such as predicting the effects of mutations, identification of splice sites that may be used across isoforms beyond just the canonical one, identification of regulatory sequences through ISM, or evaluation of human created sequences for design or evaluation purposes (such as in the context of an MPSA or designing a gene to splice a particular way), we would not expect their calibration to hold in any of these contexts. To resolve this issue, the authors should clarify and discuss this limitation in their paper (and in the relevant sections of the package documentation) to avoid confusing downstream users.

      (2) The clarity of their analysis of mutation effects could be improved with some minor adjustments. While they report median ISM importance correlation it would be helpful to see a histogram of the correlations they observed. Instead of displaying (and calculating correlations using) importance scores of only the reference sequence, showing the importance scores for each nucleotide at each position provides a more informative representation. This would also likely make the plots in 6B clearer.

    3. Reviewer #2 (Public review):

      Summary:

      The paper by Chao et al offers a reimplantation of the SpliceAI algorithm in PyTorch so that the model can more easily/efficiently be retrained. They apply their new implementation of the SpliceAI algorithm, which they call OpenSpliceAI, to several species and compare it against the original model, showing that the results are very similar and that in some small species pre-training on other species helps improve performance.

      Strengths:

      On the upside, the code runs fine and it is well documented.

      Weaknesses:

      The paper itself does not offer much beyond reimplementing SpliceAI. There is no new algorithm, new analysis, new data, or new insights into RNA splicing. There is not even any comparison to many of the alternative methods that have since been published to surpass SpliceAI. Given that some of the authors are well known with a long history of important contributions, our expectations were admittedly different. Still, we hope some readers will find the new implementation useful.

      Update for the revised version:

      The update includes mostly clarifications for tech questions/comments raised by the other two reviewers. There is no additional analysis/results that changes our above initial assessment of this paper's contribution.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      Chao et al. produced an updated version of the SpliceAI package using modern deep learning frameworks. This includes data preprocessing, model training, direct prediction, and variant effect prediction scripts. They also added functionality for model fine-tuning and model calibration. They convincingly evaluate their newly trained models against those from the original SpliceAI package and investigate how to extend SpliceAI to make predictions in new species. While their comparisons to the original SpliceAI models are convincing on the grounds of model performance, their evaluation of how well the new models match the original's understanding of non-local mutation effects is incomplete. Further, their evaluation of the new calibration functionality would benefit from a more nuanced discussion of what set of splice sites their calibration is expected to hold for, and tests in a context for which calibration is needed.

      Strengths:

      (1) They provide convincing evidence that their new implementation of SpliceAI matches the performance of the original model on a similar dataset while benefiting from improved computational efficiencies. This will enable faster prediction and retraining of splicing models for new species as well as easier integration with other modern deep learning tools.

      (2) They produce models with strong performance on non-human model species and a simple, well-documented pipeline for producing models tuned for any species of interest. This will be a boon for researchers working on splicing in these species and make it easy for researchers working on new species to generate their own models.

      (3) Their documentation is clear and abundant. This will greatly aid the ability of others to work with their code base.

      We thank the reviewer for these positive comments.  

      Weaknesses:

      (1) The authors' assessment of how much their model retains SpliceAI's understanding of "nonlocal effects of genomic mutations on splice site location and strength" (Figure 6) is not sufficiently supported. Demonstrating this would require showing that for a large number of (non-local) mutations, their model shows the same change in predictions as SpliceAI or that attribution maps for their model and SpliceAI are concordant even at distances from the splice site. Figure 6A comes close to demonstrating this, but only provides anecdotal evidence as it is limited to 2 loci. This could be overcome by summarizing the concordance between ISM maps for the two models and then comparing across many loci. Figure 6B also comes close, but falls short because instead of comparing splicing prediction differences between the models as a function of variants, it compares the average prediction difference as a function of the distance from the splice site. This limits it to only detecting differences in the model's understanding of the local splice site motif sequences. This could be overcome by looking at comparisons between differences in predictions with mutants directly and considering non-local mutants that cause differences in splicing predictions.

      We agree that two loci are insufficient to demonstrate preservation of non-local effects. To address this, we have extended our analysis to a larger set of sites: we randomly sampled 100 donor and 100 acceptor sites, applied our ISM procedure over a 5,001 nt window centered at each site for both models, and computed the ISM map as before. We then calculated the Pearson correlation between the collection of OSAI<sub>MANE</sub> and SpliceAI ISM importance scores. We also created 10 additional ISM maps similar to those in Figure 6A, which are now provided in Figure S23.

      Follow is the revised paragraph in the manuscript’s Results section:

      First, we recreated the experiment from Jaganathan et al. in which they mutated every base in a window around exon 9 of the U2SURP gene and calculated its impact on the predicted probability of the acceptor site. We repeated this experiment on exon 2 of the DST gene, again using both SpliceAI and OSAI<sub>MANE</sub> . In both cases, we found a strong similarity between the resultant patterns between SpliceAI and OSAI<sub>MANE</sub>, as shown in Figure 6A. To evaluate concordance more broadly, we randomly selected 100 donor and 100 acceptor sites and performed the same ISM experiment on each site. The Pearson correlation between SpliceAI and OSAI<sub>MANE</sub> yielded an overall median correlation of 0.857 (see Methods; additional DNA logos in Figure S23). 

      To characterize the local sequence features that both models focus on, we computed the average decrease in predicted splice-site probability resulting from each of the three possible singlenucleotide substitutions at every position within 80bp for 100 donor and 100 acceptor sites randomly sampled from the test set (Chromosomes 1, 3, 5, 7, and 9). Figure 6B shows the average decrease in splice site strength for each mutation in the format of a DNA logo, for both tools.

      We added the following text to the Methods section:

      Concordance evaluation of ISM importance scores between OSAI<sub>MANE</sub> and SpliceAI

      To assess agreement between OSAI<sub>MANE</sub>  and SpliceAI across a broad set of splice sites, we applied our ISM procedure to 100 randomly chosen donor sites and 100 randomly chosen acceptor sites. For each site, we extracted a 5,001 nt window centered on the annotated splice junction and, at every coordinate within that window, substituted the reference base with each of the three alternative nucleotides. We recorded the change in predicted splice-site probability for each mutation and then averaged these Δ-scores at each position to produce a 5,001-score ISM importance profile per site.

      Next, for each splice site we computed the Pearson correlation coefficient between the paired importance profiles from ensembled OSAI<sub>MANE</sub> and ensembled SpliceAI. The median correlation was 0.857 for all splice sites. Ten additional zoom-in representative splice site DNA logo comparisons are provided in Supplementary Figure S23.

      (2) The utility of the calibration method described is unclear. When thinking about a calibrated model for splicing, the expectation would be that the models' predicted splicing probabilities would match the true probabilities that positions with that level of prediction confidence are splice sites. However, the actual calibration that they perform only considers positions as splice sites if they are splice sites in the longest isoform of the gene included in the MANE annotation. In other words, they calibrate the model such that the model's predicted splicing probabilities match the probability that a position with that level of confidence is a splice site in one particular isoform for each gene, not the probability that it is a splice site more broadly. Their level of calibration on this set of splice sites may very well not hold to broader sets of splice sites, such as sites from all annotated isoforms, sites that are commonly used in cryptic splicing, or poised sites that can be activated by a variant. This is a particularly important point as much of the utility of SpliceAI comes from its ability to issue variant effect predictions, and they have not demonstrated that this calibration holds in the context of variants. This section could be improved by expanding and clarifying the discussion of what set of splice sites they have demonstrated calibration on, what it means to calibrate against this set of splice sites, and how this calibration is expected to hold or not for other interesting sets of splice sites. Alternatively, or in addition, they could demonstrate how well their calibration holds on different sets of splice sites or show the effect of calibrating their models against different potentially interesting sets of splice sites and discuss how the results do or do not differ.

      We thank the reviewer for highlighting the need to clarify our calibration procedure. Both SpliceAI and OpenSpliceAI are trained on a single “canonical” transcript per gene: SpliceAI on the hg 19 Ensembl/Gencode canonical set and OpenSpliceAI on the MANE transcript set. To calibrate each model, we applied post-hoc temperature scaling, i.e. a single learnable parameter that rescales the logits before the softmax. This adjustment does not alter the model’s ranking or discrimination (AUC/precision–recall) but simply aligns the predicted probabilities for donor, acceptor, and non-splice classes with their observed frequencies. As shown in our reliability diagrams (Fig. S16-S22), temperature scaling yields negligible changes in performance, confirming that both SpliceAI and OpenSpliceAI were already well-calibrated. However, we acknowledge that we didn’t measure how calibration might affect predictions on non-canonical splice sites or on cryptic splicing. It is possible that calibration might have a detrimental effect on those, but because this is not a key claim of our paper, we decided not to do further experiments. We have updated the manuscript to acknowledge this potential shortcoming; please see the revised paragraph in our next response.

      (3) It is difficult to assess how well their calibration method works in general because their original models are already well calibrated, so their calibration method finds temperatures very close to 1 and only produces very small and hard to assess changes in calibration metrics. This makes it very hard to distinguish if the calibration method works, as it doesn't really produce any changes. It would be helpful to demonstrate the calibration method on a model that requires calibration or on a dataset for which the current model is not well calibrated, so that the impact of the calibration method could be observed.

      It’s true that the models we calibrated didn’t need many changes. It is possible that the calibration methods we used (which were not ours, but which were described in earlier publications) can’t improve the models much. We toned down our comments about this procedure, as follows.

      Original:

      “Collectively, these results demonstrate that OSAIs were already well-calibrated, and this consistency across species underscores the robustness of OpenSpliceAI’s training approach in diverse genomic contexts.”

      Revised:

      “We observed very small changes after calibration across phylogenetically diverse species, suggesting that OpenSpliceAI’s training regimen yielded well‐calibrated models, although it is possible that a different calibration algorithm might produce further improvements in performance.”

      Reviewer #2 (Public review):

      Summary:

      The paper by Chao et al offers a reimplementation of the SpliceAI algorithm in PyTorch so that the model can more easily/efficiently be retrained. They apply their new implementation of the SpliceAI algorithm, which they call OpenSpliceAI, to several species and compare it against the original model, showing that the results are very similar and that in some small species, pretraining on other species helps improve performance.

      Strengths:

      On the upside, the code runs fine, and it is well documented.

      Weaknesses:

      The paper itself does not offer much beyond reimplementing SpliceAI. There is no new algorithm, new analysis, new data, or new insights into RNA splicing. There is no comparison to many of the alternative methods that have since been published to surpass SpliceAI. Given that some of the authors are well-known with a long history of important contributions, our expectations were admittedly different. Still, we hope some readers will find the new implementation useful.

      We thank the reviewer for the feedback. We have clarified that OpenSpliceAI is an open-source PyTorch reimplementation optimized for efficient retraining and transfer learning, designed to analyze cross-species performance gains, and supported by a thorough benchmark and the release of several pretrained models to clearly position our contribution.

      Reviewer #3 (Public review):

      Summary:

      The authors present OpenSpliceAI, a PyTorch-based reimplementation of the well-known SpliceAI deep learning model for splicing prediction. The core architecture remains unchanged, but the reimplementation demonstrates convincing improvements in usability, runtime performance, and potential for cross-species application.

      Strengths:

      The improvements are well-supported by comparative benchmarks, and the work is valuable given its strong potential to broaden the adoption of splicing prediction tools across computational and experimental biology communities.

      Major comments:

      Can fine-tuning also be used to improve prediction for human splicing? Specifically, are models trained on other species and then fine-tuned with human data able to perform better on human splicing prediction? This would enhance the model's utility for more users, and ideally, such fine-tuned models should be made available.

      We evaluated transfer learning by fine-tuning models pretrained on mouse (OSAI<sub>Mouse</sub>), honeybee (OSAI<sub>Honeybee</sub>), Arabidopsis (OSAI<sub>Arabidopsis</sub>), and zebrafish (OSAI<sub>Zebrafish</sub>) on human data. While transfer learning accelerated convergence compared to training from scratch, the final human splicing prediction accuracy was comparable between fine-tuned and scratch-trained models, suggesting that performance on our current human dataset is nearing saturation under this architecture.

      We added the following paragraph to the Discussion section:

      We also evaluated pretraining on mouse (OSAI<sub>Mouse</sub>), honeybee (OSAI<sub>Honeybee</sub>), zebrafish (OSAI<sub>Zebrafish</sub>), and Arabidopsis (OSAI<sub>Arabidopsis</sub>) followed by fine-tuning on the human MANE dataset. While cross-species pretraining substantially accelerated convergence during fine-tuning, the final human splicing-prediction accuracy was comparable to that of a model trained from scratch on human data. This result indicates that our architecture seems to capture all relevant splicing features from human training data alone, and thus gains little or no benefit from crossspecies transfer learning in this context (see Figure S24).

      Reviewer #1 (Recommendations for the authors):

      We thank the editor for summarizing the points raised by each reviewer. Below is our point-bypoint response to each comment:

      (1) In Figure 3 (and generally in the other figures) OpenSpliceAI should be replaced with OSAI_{Training dataset} because otherwise it is hard to tell which precise model is being compared. And in Figure 3 it is especially important to emphasize that you are comparing a SpliceAI model trained on Human data to an OSAI model trained and evaluated on a different species.

      We have updated the labels in Figures 3, replacing “OpenSpliceAI” with “OSAI_{training dataset}” to more clearly specify which model is being compared.

      (2) Are genes paralogous to training set genes removed from the validation set as well as the test set? If you are worried about data leakage in the test set, it makes sense to also consider validation set leakage.

      Thank you for this helpful suggestion. We fully agree, and to avoid any data leakage we implemented the identical filtering pipeline for both validation and test sets: we excluded all sequences paralogous or homologous to sequences in the training set, and further removed any sequence sharing > 80 % length overlap and > 80 % sequence identity with training sequences. The effect of this filtering on the validation set is summarized in Supplementary Figure S7C.

      Reviewer #3 (Recommendations for the authors):

      (1) The legend in Figure 3 is somewhat confusing. The labels like "SpliceAI-Keras (species name)" may imply that the model was retrained using data from that species, but that's not the case, correct?

      Yes, “SpliceAI-Keras (species name)” was not retrained; it refers to the released SpliceAI model evaluated on the specified species dataset. We have revised the Figure 3 legends, changing “SpliceAI-Keras (species name)” to “SpliceAI-Keras” to clarify this.

      (2) Please address the minor issues with the code, including ensuring the conda install works across various systems.

      We have addressed the issues you mentioned. OpenSpliceAI is now available on Conda and can be installed with:  conda install openspliceai. 

      The conda package homepage is at: https://anaconda.org/khchao/openspliceai We’ve also corrected all broken links in the documentation.

      (3) Utility:

      I followed all the steps in the Quick Start Guide, and aside from the issues mentioned below, everything worked as expected.

      I attempted installation using conda as described in the instructions, but it was unsuccessful. I assume this method is not yet supported.

      In Quick Start Guide: predict, the link labeled "GitHub (models/spliceai-mane/10000nt/)" appears to be incorrect. The correct path is likely "GitHub (models/openspliceaimane/10000nt/)".

      In Quick Start Guide: variant (https://ccb.jhu.edu/openspliceai/content/quick_start_guide/quickstart_variant.html#quick-startvariant), some of the download links for input files were broken. While I was able to find some files in the GitHub repository, I think the -A option should point to data/grch37.txt, not examples/data/input.vcf, and the -I option should be examples/data/input.vcf, not data/vcf/input.vcf.

      Thank you for catching these issues. We’ve now addressed all issues concerning Conda installation and file links. We thank the editor for thoroughly testing our code and reviewing the documentation.

    1. eLife Assessment

      This fundamental work advances our understanding of how SP5 and SP8 promote neuromesodermal competent progenitors in murine embryos. Generally the evidence is compelling, with strong developmental genetics, transcriptomic, and genomic transcription binding surveys contributing to the strength of the data. Some of the language could be softened to avoid overinterpretation of the data, and figures and diagrams could be improved.

    2. Reviewer #1 (Public review):

      This is an important, interesting, and in-depth study examining the role of Sp5/8 transcription factors in maintaining the neuromesodermal progenitor (NMP) niche. The authors first used Sp5/8 double conditional KO mouse embryos to establish that these factors function in the NMP niche to promote trunk elongation. They then conducted extensive single-cell analyses on embryos of various genetic mutant backgrounds to unravel the complex and intricate interactions between Wnt signaling and Sp5/8. The key conclusion from these experiments is that Sp5/8 function within an autoregulatory loop crucial for maintaining the NMP niche. The authors went on to identify and characterize a novel enhancer element downstream of the Wnt3a coding sequence, which mediates the effects of Sp5/8 on Wnt3a expression. Overall, the data presented are compelling and of high quality, and the study offers a prime example of how a relatively small set of signaling pathways and transcription factors can function in concert to impart robustness to developmental processes.

    3. Reviewer #2 (Public review):

      Chalamalasetty et al. investigate the regulatory circuit of signaling molecules and transcription factors that drive the fate of neuromesodermal competent progenitors (NMCs). NMCs contribute to Sox2-positive spinal cord and Tbxt/Bra-expressing somitic mesoderm, and this choice is governed by the interplay between Wnt3a and Fgf signaling. The authors discovered that the transcription factors SP5 and SP8 participate in this process. Mouse genetics, in vivo development, and transcription factors profiling point to a model where SP5 and SP8 directly regulate Wnt3a expression to foster Tbxt-marked mesoderm formation at the expense of Sox2-marked neural ectoderm. Mechanistically, SP5/8 bind to an enhancer which the authors characterize: its activity depends on the presence of SP5, CDX2, TCF7, and TBXT binding sites, and it is activated only in primitive streak cells at E7.5, in NMP, and in caudal and somitic mesoderm, underscoring the tissue and stage-specific nature of this Wnt3a enhancer.

      Moreover, the authors find that SP5/8 likely regulate the TCF7 association with the chromatin and compete for its binding to the TLE repressor.

      The study is extensive, compelling, and well written. The combination of in vivo evidence with single-cell transcriptomics, transcription factors profiling, and in vitro regulatory element characterization is notable and builds a convincing picture of the action of SP5/SP8.

      Here, I provide a series of comments and questions that, if addressed and clarified, could, in my opinion, improve the study.

      (1) While Sp5 and Sp8 are both present in NMCs, their expression does not fully overlap. Sp5 is also detected in caudal and presomitic mesoderm, notochord and gut, while Sp8 overlaps with Sox2 in neural progenitors of the spinal cord and brain (Fig. 1D). Accordingly, Sp8 expression is also activated by the neural-promoting RA+Fgf. It is not easy for me to reconcile this non-fully overlapping expression pattern - and in particular the overlap of Sp8 and Sox2 - with the presumed redundancy (or similarity of function) described later. Sp5/8 dko NMCs show reduced Tbxt and expanded Sox2, indicating that SP8 also represses Sox2 or neural fate, an observation confirmed by Sp8 overexpression (Figure 4c). What is the explanation for this, and is the function of SP8 in Sox2-positive neural progenitors different from its Wnt3a-sustaining role in NMCs? Or what am I missing?

      (2) I suggest that the authors show relevant ChIP-seq peaks in Figure 3 to lend credibility to the complicated overlapping Venn diagrams. I consider visual inspection of peak tracks as primary quality control of this type of experiment. A good choice could be the cis-regulatory elements at Sp5, Sp8, Tbxt, Cdx1, 2, 4 bound by TBXT and either CDX2, SP5, or SP8 (now referring to the Venn diagrams and the annotated peak table). On ChIP-seq visualization, in reference to Figures 5 and 7, I also suggest that the authors show the tracks of a negative control (IgG, non-related antibody, or better anti-flag in Sp5/8 dko). While I do not doubt the validity of these experiments, there are peaks in these figures bound by all factors tested that could be suspicious (even though, admittedly, they look like genuinely good TF peaks). A negative track would clearly show beyond any doubt that these are not suspect regions of positive unspecific signal caused by open chromatin, excessive cross-linking, or antibody cross-reaction.

      (3) SP5 here is found as a direct inducer of Wnt3a expression, and accordingly positive regulator of Tbxt and mesoderm, caudal development. I find this in partial contradiction with a finding by the Willert group (PMID: 29044119). They show that "genes with an associated SP5 peak, such as SP5 itself, AXIN2, AMOTL2, GPR37, GSC, MIXL1, NODAL, and T, show significant upregulation in expression upon Wnt3a treatment in SP5 mutant cells". There, essentially, SP5 inhibits Wnt target genes. While the authors are aware of this and cite Huggins et al., I find that this deserves a better discussion addressing how opposite functions could be sustained in different contexts, if these really are different cellular contexts in the first place, or if this could result from different methodologies.

      (4) The gastruloid experiment is nice, but I wonder whether there is any marker that the authors can use to show that other features of the gastruloids respond accordingly. For example, is the Sox2 expression domain expanded? And is there any unaffected marker to emphasize the specificity of the decreased Tbxt and Cdx2?

      (5) SP5/8 seems to enhance the TCF7 occupancy at WRE. And then, SP5/8 appears to counteract the presence of TLE repressor associated with TCF7. While these two mechanisms are interesting, they are not necessarily interconnected. According to the still-established view, TCF7 should be associated with WRE even in the absence of the Wnt signal, when TLEs are also present on the locus. One could expect that SP5 competes with TLE, to decrease its presence on TCF7-bound loci, leaving the abundance of TCF7 binding unchanged. Yet, the authors also observe that the TCF7 association changes. What is the mechanism implied? Do they perhaps consider a TCF7L1 > TCF7 switch, and if so, what evidence exists for this?

      (6) Along the same line as above, I wonder whether beta-catenin binding is also enhanced at these sites? Any TCF/LEF would require beta-catenin for gene upregulation.

      (7) The authors write that "Small Tle peaks were identified at these WREs in WT cells, demonstrating that both repressive Tle and activating Tcf7 could be detected at active genes". However, ChIP-seq is a population assay, and it is possible - more plausible, in fact - that cells displaying TLE binding are not expressing the target genes.

    4. Reviewer #3 (Public review):

      Summary:

      This is a well-done study. It shows, in a comprehensive manner, that Sp5 and Sp8 play essential roles in maintaining the complicated positive feedback circuitry needed for specification of neuromesodermal competent progenitors (NMCs) in caudal mesodermal development in murine embryos.

      Strengths:

      The developmental genetics, transcriptomic, and genomic survey of TF binding are all satisfactory and make a compelling story. The CRISPR deletion of the Wnt3a downstream enhancer clearly demonstrates that it plays an important role in the positive feedback circuit.

      Weaknesses:

      My only concerns are some of the language surrounding the mechanistic interpretation of the Wnt3a downstream enhancer and the relationship between TCF and TLE binding.

    1. eLife Assessment

      This work presents important information on rhythmicity of overlapping target and distractor processing and how this affects behaviour. The methods are, in general, clearly laid out and defensible, with several supplementary analyses leading to a solid base of evidence for their claims.

    2. Reviewer #1 (Public review):

      Summary:

      Using a combination of EEG and behavioural measurements, the authors investigate the degree to which processing of spatially-overlapping targets (coherent motion) and distractors (affective images) are sampled rhythmically and how this affects behaviour. They found that both target processing (via measurement of amplitude modulations of SSVEP amplitude to target frequency) and distractor processing (via MVPA decoding accuracy of bandpassed EEG relative to distractor SSVEP frequency) displayed a pronounced rhythm at ~1Hz, time-locked to stimulus onset. Furthermore, the relative phase of this target/distractor sampling predicted accuracy of coherent motion detection across participants.

      Strengths:

      - The authors are addressing a very interesting question with respect to sampling of targets and distractors, using neurophysiological measurements to their advantage in order to parse out target and distractor processing.<br /> - The general EEG analysis pipeline is sensible and well-described.<br /> - The main result of rhythmic sampling of targets and distractors is striking and very clear even on a participant-level.<br /> - The authors have gone to quite a lot of effort to ensure the validity of their analyses, especially in the Supplementary Material.<br /> - It is incredibly striking how the phase of both target and distractor processing are so aligned across trials for a given participant. I would have thought that any endogenous fluctuation in attention or stimulus processing like that would not be so phase aligned. I know there is literature on phase resetting in this context, the results seem very strong here and it is worth noting. The authors have performed many analyses to rule out signal processing artifacts, e.g. the sideband and beating frequency analyses.

      Weaknesses:

      - In general, the representation of target and distractor processing is a bit of a reach. Target processing is represented by SSVEP amplitude, which is going to most likely be related to the contrast of the dots, as opposed to representing coherent motion energy which is the actual target. These may well be linked (e.g. greater attention to the coherent motion task might increase SSVEP amplitude) but I would call it a limitation of the interpretation. Decoding accuracy of emotional content makes sense as a measure of distractor processing, and the supplementary analysis comparing target SSVEP amplitude to distractor decoding accuracy is duly noted. Overall, this limitation remains and has been noted in the Limitations section.<br /> - Then comparing SSVEP amplitude to emotional category decoding accuracy feels a bit like comparing apples with oranges. They have different units and scales and reflect probably different neural processes. Is the result the authors find not a little surprising in this context? This relationship does predict performance and is thus intriguing, but I think this methodological aspect needs to be discussed further. For example, is the phase relationship with behaviour a result of a complex interaction between different levels of processing (fundamental contrast vs higher order emotional processing)? Again, this has been noted in the Limitations section, but changing the data to z-scores doesn't really take care of the conceptual issue, i.e. that on-screen contrast changes would necessarily be distracting during emotional category decision-making.

    3. Reviewer #2 (Public review):

      In this study, Xiong et al. investigate whether rhythmic sampling - a process typically observed in the attended processing of visual stimuli - extends to task-irrelevant distractors. By using EEG with frequency tagging and multivariate pattern analysis (MVPA), they aimed to characterize the temporal dynamics of both target and distractor processing and examine whether these processes oscillate in time. The central hypothesis is that target and distractor processing occur rhythmically, and the phase relationship between these rhythms correlates with behavioral performance.

      Major Strengths<br /> (1) The extension of rhythmic attentional sampling to include distractors is a novel and interesting question.<br /> (2) The decoding of emotional distractor content using MVPA from SSVEP signals is an elegant solution to the problem of assessing distractor engagement in the absence of direct behavioral measures.<br /> (3) The finding that relative phase (between 1 Hz target and distractor processes) predicts behavioral performance is compelling.

      Major Weaknesses and Limitations<br /> (1) The central claim of 1 Hz rhythmic sampling is insufficiently validated. The windowing procedure (0.5s windows with 0.25s step) inherently restricts frequency resolution, potentially biasing toward low-frequency components like 1 Hz. Testing different window durations or providing controls would significantly strengthen this claim.<br /> (2) The study lacks a baseline or control condition without distractors. This makes it difficult to determine whether the distractor-related decoding signals or the 1 Hz effect reflect genuine distractor processing or more general task dynamics.<br /> (3) The pairwise decoding accuracies for distractor categories hover close to chance (~55%), raising concerns about robustness. While statistically above chance, the small effect sizes need careful interpretation, particularly when linked to behavior.<br /> (4) Neither target nor distractor signal strength (SSVEP amplitude) correlates with behavioral accuracy. The study instead relies heavily on relative phase, which-while interesting-may benefit from additional converging evidence.<br /> (5) Phase analysis is performed between different types of signals hindering their interpretability (time-resolved SSVEP amplitude and time-resolved decoding accuracy).

      The authors largely achieved their stated goal of assessing rhythmic sampling of distractors. However, the conclusions drawn - particularly regarding the presence of 1 Hz rhythmicity - rest on analytical choices that should be scrutinized further. While the observed phase-performance relationship is interesting and potentially impactful, the lack of stronger and convergent evidence on the frequency component itself reduces confidence in the broader conclusions.

      If validated, the findings will advance our understanding of attentional dynamics and competition in complex visual environments. Demonstrating that ignored distractors can be rhythmically sampled at similar frequencies to targets has implications for models of attention and cognitive control. However, the methodological limitations currently constrain the paper's impact.

      Additional Considerations<br /> • The use of EEG-fMRI is mentioned but not leveraged. If BOLD data were collected, even exploratory fMRI analyses (e.g., distractor modulation in visual cortex) could provide valuable converging evidence.<br /> • In turn, removal of fMRI artifacts might introduce biases or alter the data. For instance, the authors might consider investigating potential fMRI artifact harmonics around 1 Hz to address concerns regarding induced spectral components.

      Comments on revisions:

      The authors have addressed my previous points, and the manuscript is substantially improved. The key methodological clarifications have been incorporated, and the interpretation of findings has been appropriately moderated. I have no further major concerns.

    1. eLife Assessment

      This fundamental work significantly advances our understanding of gravity sensing and orientation behavior in the ctenophore, an animal of major importance in understanding the evolution of nervous systems. Through comprehensive reconstruction with volumetric electron microscopy, and time-lapse imaging of cilia motion, the authors provide compelling evidence that the aboral nerve net coordinates the activity of balancer cilia. The resemblance to the ciliomotor circuit in marine annelids provides a fascinating example of how neural circuits may convergently evolve to solve common sensorimotor challenges.

    2. Reviewer #1 (Public review):

      Summary:

      This work presents an interesting circuit dissection of the neural system allowing a ctenophore to keep its balance and orientation in its aquatic environment by using a fascinating structure called the statocyst. By combining serial-section electron microscopy with behavioral recordings, the authors found a population of neurons that exists as a syncytium and could associate these neurons with specific functions related to controlling the beating of cilia located in the statocyst. The type A ANN neurons participate in arresting cilia beating, and the type B ANN neurons participate in resuming cilia beating and increasing their beating frequency.

      Moreover, the authors found that bridge cells are connected with the ANN neurons, giving them the role of rhythmic modulators.

      From these observations, the authors conclude that the control is coordination instead of feedforward sensory-motor function, a hypothesis that had been put forth in the past but could not be validated until now. They also compare it to the circuitry implementing a similar behavior in a species that belongs to a different phylum, where the nervous system is thought to have evolved separately.

      Therefore, this work significantly advances our knowledge of the circuitry implementing the control of the cilia that participate in statocyst function, which ultimately allows the animal to correct its orientation. It represents an example of systems neuroscience explaining how the nervous system allows an animal to solve a specific problem and puts it in an evolutionary perspective, showing a convincing case of convergent evolution.

      Strengths:

      The evidence for how the circuitry is connected is convincing. Pictures of synapses showing the direction of connectivity are clear, and there are good reasons to believe that the diagram inferred is valid, even though we can always expect that some connections are missing.

      The evidence for how the cilia change their beating frequency is also convincing, and the paradigm and recording methods seem pretty robust.

      The authors achieved their aims, and the results support their conclusions. This work impacts its field by presenting a mechanism by which ctenophores correct their balance, which will provide a template for comparison with other sensory systems.

      Weaknesses:

      The evidence supporting the claim that the neural circuitry presented here controls the cilia beating is more correlational because it only relies on the fact that the location of the two types of ANN neurons coincides with the quadrants that are affected in the behavioral recordings. Discussing ways by which causality could be established might be helpful.

      The explanation of the relevance of this work could be improved. The conclusion that the work hints at coordination instead of feedforward sensory-motor control is explained over only a few lines. The authors could provide a more detailed explanation of how the two models compete (coordination vs feedforward sensory-motor control), and why choosing one option over the other could provide advantages in this context.

      Since the fact that the ANN neurons form a syncytium is an important finding of this study, it would be useful to have additional illustrations of it. For instance, pictures showing anastomosing membranes could typically be added in Figure 2.

      Also, to better establish the importance of the study, it could be useful to explain why the balancers' cilia spontaneously beat in the first place (instead of being static and just acting as stretch sensors).

    3. Reviewer #2 (Public review):

      Summary:

      In this manuscript, the authors describe the production of a high-resolution connectome for the statocyst of a ctenophore nervous system. This study is of particular interest because of the apparent independent evolution of the ctenophore nervous system. The statocyst is a component of the aboral organ, which is used by ctenophores to sense gravity and regulate the activity of the organ's balancer cilia. The EM reconstruction of the aboral organ was carried out on a five-day-old larva of the model ctenophore Mnemiopsis leidyi. To place their connectome data in a functional context, the authors used high-speed imaging of ciliary beating in immobilized larvae. With these data, the authors were able to model the circuitry used for gravity sensing in a ctenophore larva.

      Strengths:

      Because of it apparently being the sister phylum to all other metazoans, Ctenophora is a particularly important group for studies of metazoan evolution. Thus, this work has much to tell us about how animals evolved. Added to that is the apparent independent evolution of the ctenophore nervous system. This study provides the first high-resolution connectomic analysis of a portion of a ctenophore nervous system, extending previous studies of the ctenophore nervous system carried out by Sid Tamm. As such, it establishes the methodology for high-resolution analysis of the ctenophore nervous system. While the generation of a connectome is in and of itself an important accomplishment, the coupling of the connectome data with analysis of the beating frequency of balancer cell cilia provides a functional context for understanding how the organization of the neural circuitry in the aboral organ carries out gravity sensing. In addition, the authors identified a new type of syncytial neuron in Mnemiopsis. Interestingly, the authors show that the neural circuitry controlling cilia beating in Mnemiopsis shares features with the circuitry that controls ciliary movement in the annelid Platynereis, suggesting convergent evolution of this circuitry in the two organisms. The data in this paper are of high quality, and the analyses have been thoroughly and carefully done.

      Weaknesses:

      The paper has no obvious weaknesses.

    4. Reviewer #3 (Public review):

      Summary:

      It has been a long time since I enjoyed reviewing a paper as much as this one. In it, the authors generate an unprecedented view of the aboral organ of a 5-day-old ctenophore. They proceed to derive numerous insights by reconstructing the populations and connections of cell types, with up to 150 connections from the main Q1-4 neuron.

      Strengths:

      The strengths of the analysis are the sophisticated imaging methods used, the labor-intensive reconstruction of individual neurons and organelles, and especially the mapping of synapses. The synaptic connections to and from the main coordinating neurons allow the authors to create a polarized network diagram for these components of the aboral organ. These connections give insight into the potential functions of the major neurons. This also gives some unexpected results, particularly the lack of connections from the balancer system to the coordinating system.

      Weaknesses:

      There were no significant weaknesses in the paper - only a slate of interesting unanswered questions to motivate future studies.

    1. eLife Assessment

      This valuable work presents a novel computational framework for modeling macroscopic traveling waves in the mouse cortex by integrating open-source connectomic and transcriptomic data into a spiking network model. This approach allows the computational model to assign excitatory/inhibitory connections based on neurotransmitter profiles and extends simulations to the 3D domain. The authors present results that demonstrate how spatiotemporal dynamics such as slow oscillations (0.5-4 Hz) emerge and self-organize at the whole-brain scale. This study provides convincing initial insights into the structural basis of traveling waves at the whole-brain scale in the mouse.

    2. Reviewer #1 (Public review):

      Summary:

      The manuscript "Realistic coupling enables flexible macroscopic traveling waves in the mouse cortex" by Sun, Forger, and colleagues presents a novel computational framework for studying macroscopic traveling waves in the mouse cortex by integrating realistic brain connectivity data with large-scale neural simulations.

      The key contributions include:<br /> (1) developing an algorithm that combines spatial transcriptomic data (providing detailed neuron positions and molecular properties) with voxelized connectivity data from the Allen Brain Atlas to construct neuron-to-neuron connections across ~300,000 cortical neurons;<br /> (2) building a GPU-accelerated simulation platform capable of modeling this large-scale network with both excitatory and inhibitory Hodgkin-Huxley neurons;<br /> (3) extending phase-based analysis methods from 2D to 3D to quantify traveling wave activity in the realistic brain geometry; and<br /> (4) demonstrating that realistic Allen connectivity generates significantly higher levels of macroscopic traveling waves compared to simplified local or uniform connectivity patterns.

      The study reveals that wave activity depends non-monotonically on coupling strength and that slow oscillations (0.5-4 Hz) are particularly conducive to large-scale wave propagation, providing new insights into how anatomical connectivity enables flexible spatiotemporal dynamics across the cortex.

      Strengths:

      The authors leverage two existing dense datasets of spatial transcriptomic data and connection strength between pairwise voxels in the mouse cortex in a novel way, allowing for the computational model to capture molecular and functional properties of neurons as determined by their neurotransmitter profiles, rather than making arbitrary assignments of excitatory/inhibitory roles. Additionally, the author's expansion of 2D phase dynamics to 3D phase gradient analysis methods is important and can be widely applied to calcium imaging, LFP recordings, and likely other electrophysiological recordings.

      Weaknesses:

      Despite these important computational advancements, a few aspects of this model, particularly the inability to validate the model with experimental neural data, diminish my enthusiasm for this paper:

      (1) The model's Allen connectivity approach overlooks critical aspects of real cortical dynamics. Most importantly, it excludes subcortical structures, especially the thalamus, which drives cortical traveling waves through thalamocortical interactions. The authors' method of electrically stimulating all layer 4 neurons simultaneously to initiate waves is artificially crude and bears little resemblance to natural wave generation mechanisms.

      (2) The model handles voxel-to-voxel connections crudely when neurons have mixed excitatory/inhibitory properties and varying synaptic strengths. Real connectivity differs dramatically between neuron types (pyramidal cells vs. interneurons, across cortical layers), but the model only distinguishes excitatory and inhibitory neurons. Additionally, uniform synaptic weights ignore natural variations in connection strength based on neuron type, distance, and functional role. Integrating the updated thalamocortical dataset mentioned by the authors, even at regional resolution, would substantially improve the model.

      (3) While the authors bridge microscopic (single neuron) and mesoscopic (regional connectivity) data to study macroscopic (whole-cortex) waves, they don't integrate the distinct mechanisms operating at each scale. The framework demonstrates that realistic connectivity enables macroscopic waves but fails to connect how wave dynamics emerge and interact across spatial scales systematically.

      (4) Claims that Allen connectivity produces higher phase gradient directionality (PGD) than local connectivity appear limited to delta oscillations at very specific coupling strengths and applied currents. Few parameter combinations show significantly higher PGD for Allen connectivity, and these are generally low PGD values overall.

      (5) Broadly, it's unclear how this computational framework can study memory, learning, sleep, sensory processing, or disease states, given the disconnect between simulated intracellular voltages and the local field potentials or other electrophysiological measurements typically used to study cortical traveling waves. While computationally impressive, the practical research applications remain vague.

      (6) The paper needs a clearer explanation for why medium coupling (100%) eliminates waves in Allen connectivity (Figure 6) while stronger coupling (150%) restores them.

      (7) Does using a single connectivity parameter (ρ = 300) across all regions miss important regional differences in cortical connectivity density?

    3. Reviewer #2 (Public review):

      Summary:

      This work presents a spiking network model of traveling waves at the whole-brain scale in the mouse neocortex. The authors use data from the Allen Institute to reconstruct connectivity between different neocortical sites. They then quantify macroscopic traveling waves following stimulation of all layer 4 neurons in the neocortex.

      Strengths:

      Overall, the results are interesting and shed new light on the dynamic organization of activity across the neocortex of the mouse. The paper uses realistic neuron models specifically fit to intracellular recordings, demonstrating that traveling waves occur in the mouse neocortex with both realistic connectivity and realistic single-neuron dynamics. The paper is also well-written in general. For these reasons, the authors have generally achieved their aims in this work.

      Weaknesses:

      (1) Description of Algorithm 1:<br /> While the Methods section clearly explains the density parameter \rho, the statement on line 358 concerning the "ideal" average number of connections is a little unclear. The authors should explicitly clarify that \rho is a free parameter that can be adjusted to balance computational feasibility (for a given set of computational resources) and biological fidelity.

      (2) Lines 102-103:<br /> The \rho parameter used here results in approximately 300 connections per neuron on average. The authors should state clearly that the number of connections per cell is the key determinant of computational feasibility (cf. Morrison et al., Neural Computation, 2005). The authors should also review neuronal density and synaptic connectivity in the mouse neocortex and clearly reference density and connectivity in their model to the biological scales found in the mouse.

      (3) Line 131:<br /> From the plots in Figure 2, it is not clear that the stimulus response is necessarily a rhythmic oscillation, in the sense of a single narrowband frequency.

      (4) Line 217:<br /> The authors should clarify how these findings relate to the results from Mohajerani et al. (Nature Neuroscience, 2013) or differ from them.

      (5) Line 230:<br /> Because higher temporal frequency activity also tends to be more spatially localized, a correlation between PGD and temporal frequency could be an inherent consequence of this relationship, rather than a meaningful result.

      (6) Line 247-248:<br /> It is not clear that the algorithm for generating connections between neurons presented here really relates to those for community detection. For example, in the case of the Allen Institute data, the communities are essentially in the data already.

      (7) Line 284-285:<br /> The relationship between conduction delay is more direct than this sentence suggests. Conduction delay is fundamentally determined by the time required for action potentials to propagate along axons, making it intrinsically linked to anatomical distance.

      (8) Line 287-288:<br /> The authors suggest at this point that they do not have enough information to estimate time delays due to axonal conduction along white matter fibers. However, experimental data from white matter connections typically includes information about fiber length, which does enable estimating conduction delays. These estimations have been previously implemented for Allen Institute connectome data in the mouse (Choi and Mihalas, PLoS Comput Biology, 2019) and human connectome data (Budzinski et al., Physical Review Research, 2023).

      (9) Lines 294-295:<br /> Several methods do exist for detecting and characterizing wave dynamics in three-dimensional data (Budzinski et al., Physical Review Research, 2023).

    1. eLife Assessment

      This important study utilizes behavioral data and computational modeling to show that spatial properties of visual attention affect human planning. The methodology and statistical analyses are solid, though the way attention is conceptualized and modeled could be refined. The findings of this study will interest cognitive scientists studying attention, perception, and decision-making.

    2. Reviewer #1 (Public review):

      Summary: This study investigated how visuospatial attention influences the way people build simplified mental representations to support planning and decision-making. Using computational modeling and virtual maze navigation, the authors examined whether spatial proximity and the spatial arrangement of obstacles determine which elements are included in participants' internal models of a task. The study developed and tested an extension of the value-guided construal (VGC) model that incorporates features of spatial attention for selecting simpler task mental representation.

      Strengths:

      (1) Original Perspective: The study introduces an explicit attentional component to established models of planning, offering an approach that bridges perception, attention, and decision-making.

      (2) Methodological Approach: The combination of computational modeling, behavioral data, and eye-tracking provides converging measures to assess the relationship between attention and planning representations.

      (3) Cross-validated data: The study relies on the analysis of three separate datasets, two already published and an additional novel one. This allows for cross-validation of the findings and enhances the robustness of the evidence.

      (4) Focus on Individual Differences: Reports of how individual variability in attentional "spillover" correlates with the sparsity of task representations and spatial proximity add depth to the analysis.

      Weaknesses:

      (1) Clarity of the VGC model and behavioral task: The exposition of the VGC model lacks sufficient detail for non-expert readers. It is not clear how this model infers which maze obstacles are relevant or irrelevant for planning, nor how the maze tasks specifically operationalize "planning" versus other cognitive processes.

      The method for classifying obstacles as relevant or irrelevant to the task and connecting metacognitive awareness (i.e., participants' reports of noticing obstacles) to attentional capture is not well justified. The rationale for why awareness serves as a valid attention proxy, as opposed to behavioral or neurophysiological markers, should be clearer.

      (2) Attention framework: The account of attention is largely limited to the "spotlight" model. When solving a maze, participants trace the correct trail, following it mentally with their overt or covert attention. In this perspective, relevant concepts are also rooted in attention literature pertaining to object-based attention using tasks like curve tracing (e.g., Pooresmaeili & Roelfsema, 2014) and to mental maze solving (e.g., Wong & Scholl, 2024), which may be highly relevant and add nuance to the current work. This view of attention may be more pertinent to the task than models of simultaneously tracking multiple objects cited here. Prior work (notably from the Roelfsema group) indicates that attentional engagement in curve-tracing tasks may be a continuous, bottom-up process that progressively spreads along a trajectory, in time and space, rather than a "spotlight" that simply travels along the path. The spread of attention depends on the spatial proximity to distractors - a point that could also be pertinent to the findings here.

      Moreover, the tracing of a "solution" trail in a maze may be spontaneous and not only a top-down voluntary operation (Wong & Scholl, 2024), a finding that requires a more careful framing of the link to conscious perception discussed in the manuscript.

      Conceptualizing attention as a spatial spotlight may therefore oversimplify its role in navigation and planning. Perhaps the observed attentional modulation reflects a perceptual stage of building the trail in the maze rather than a filter for a later representation for more efficient decision making and planning. A fuller discussion of whether the current model and data can distinguish between these frameworks would benefit readers.

      (3) Lateralization of attention: The analysis considers whether relevant information is distributed bilaterally or unilaterally across the visual display, but does not sufficiently address evidence for attentional asymmetries across the left and right visual fields due to hemispheric specialization (e.g., Bartolomeo & Seidel Malkinson, 2019). Whether effects differ for left versus right hemifield arrangements is not made explicit in the presented findings.

      (4) Individual differences: Individual differences in attentional modulation are a strength of the work, but similar analyses exploring individual variation in lateralization effects could provide further insight, and the lack of such analyses may mask important effects.

      (5) Distinction between overt and covert attention: The current report at times equates eye movement patterns with the locus of attention. However, attention can be covertly shifted without corresponding gaze changes (see, for example, Pooresmaeili & Roelfsema, 2014).

      The implications for interpreting the relationship between eye movement, memory, and attention in this setting are not fully addressed. The potential dynamics of attention along a maze trajectory and their impact on lateralization analysis would benefit from further clarification.

      Appraisal of Aims and Results:

      The study sets out to determine how spatial attention shapes the construction of task representations in planning contexts. The authors provide evidence that spatial proximity and arrangement influence which environmental features are incorporated into internal models used for navigation, and that accounting for these effects improves model predictions. There is clear documentation of individual variation, with some participants showing greater attentional spillover and more sparse awareness profiles.

      However, some conceptual and methodological aspects would be clearer with greater engagement with the broader literature on attention dynamics, a more explicit justification of operational choices, and more targeted lateralization analyses.

    3. Reviewer #2 (Public review):

      Summary:

      Castanheira et al. investigate the role of spatial attention for planning during three maze navigation experiments (one new experiment and two existing datasets). Effective planning in complex situations requires the construction of simplified representations of the task at hand. The authors find that these mental representations (as assessed by conscious awareness) of a given stimulus are influenced by (spatially) surrounding stimuli. Individual participants varied in the degree to which attention influenced their task representations, and this attentional effect correlated with the sparsity of representations (as measured by the range of awareness reports across all stimuli). Spatially grouping task-relevant information on either the left or right side of the maze led to mental representations more similar to optimal representations predicted by the value-guided construal (VGC) model - a normative model describing a theoretical approach to simplifying complex task information. Finally, the authors propose an update to this model, incorporating an attentional spotlight component; the revised descriptive model predicts empirical task representations better than the original (normative) VGC model.

      Strengths:

      The novelty of this study lies in the proposal and investigation of a cognitive mechanism through which a normative model like value-guided construal can enable human planning. After proposing attention as this mechanism, the authors make concrete hypotheses about mismatches between the VGC predictions and real human behavior, which are experimentally validated. Thus, not only does this study describe a possible mechanism for simplification of task information for planning, but the authors also propose a descriptive model, revising VGC to incorporate this attentional component.

      A strength of this paper is the variety of investigative approaches: analysis of existing data, novel experiment, and a computational approach to predict experimental findings from a theoretical model. Analyzing pre-existing datasets increases the size of the participant cohort and strengthens the authors' conclusions. Meanwhile, comparing the predictions of the existing normative model and the authors' own refined model is a clever approach to substantiate their claims. In addition, the authors describe several crucial controls, which are key to the interpretability of their results. In particular, the eye tracking results were critical.

      In summary, this paper constitutes an important step toward a more complete understanding of the human ability to plan.

      Weaknesses:

      (1) There is a critical conceptual gap in the study and its interpretation, mainly due to the reliance on a self-report metric of awareness (rather than an objective measure of behavioral performance).

      a. Awareness is tested by a 9-point self-report scale. It is currently unclear why awareness of task-irrelevant obstacles in this task would necessarily compromise optimal planning. There is no indication of whether self-reported awareness affects performance (e.g., navigation path distance, time to complete the maze, number of errors). Such behavioral evidence of planning would be more compelling.

      b. Relatedly, it would have been more convincing to have an objective measure of awareness, for instance, how the presence or absence of a "task-irrelevant" obstacle affects performance (e.g., change navigation path distance or time to complete the maze), or whether participants can accurately recall the location of obstacles.

      c. Consequently, I'm not sure that we can conclude that the spatial context does impact participants' ability to plan spatial navigation or to "incorporate task-relevant information into their construal". We know that the spatial context affects subjective (self-reported) awareness, but the authors do not present evidence that spatial context affects behavioral performance.

      d. Another concern that may complicate interpretation is the following: Figure 3c shows improved VGC model predictions (steeper slope) for mazes with greater lateralization. However, there are notable outliers in these plots, where a high lateralization index does not correspond to good model performance. There is currently no discussion/explanation of these cases.

      (2) I noticed an issue with clarity regarding task-relevance. It is currently not fully clear which obstacles are "task irrelevant". Also, the term is used inconsistently, sometimes conflating with "awareness". For example, in the "Attentional spotlight model of task representations" section, the authors state that "task-relevant information becomes less relevant when surrounded by task-irrelevant information". But they really mean that participants become less aware of those task-relevant obstacles. I assume task-relevance is an objective characteristic related to maze organization, not to a participant's construal. Indeed, the following paragraph provides evidence of model predictions of awareness.

      (3) The behavioral paradigm has some distinct disadvantages, and the validity of the task is not backed up by behavioral data.

      a. I understand the need for central fixation, but it also makes the task less naturalistic.

      b. The task with its top-down grid view does not seem to mimic real human navigation. Though this grid may be similar to mental maps we form for navigation, the sensory stimuli corresponding to possible paths and to spatial context during real-life navigation are very different.

      c. Behavioral performance is not reported, so it is unknown whether participants are able to properly complete the task. The task seems pretty difficult to navigate, especially when the obstacles disappear, and in combination with the central fixation.

      d. There is no discussion of whether/how this navigation task generalizes to other forms of planning.

    4. Reviewer #3 (Public review):

      Summary:

      The authors build on a recent computational model of planning, the "value-guided construal" framework by Ho et al. (2022), which proposes that people plan by constructing simple models of a task, such as by attending to a subset of obstacles in a maze. They analyze both published experimental data and new experimental data from a task in which participants report attention to objects in mazes. The authors find that attention to objects is affected by spatial proximity to other objects (i.e., attentional overspill) as well as whether relevant objects are lateralized to the same hemifield. To account for these results, the authors propose a "spotlight-VGC" model, in which, after calculating attention scores based on the original VGC model, attention to objects is enhanced based on distance. They find that this model better explains participant responses when objects are lateralized to different hemifields. These results demonstrate complex interactions between filtering of task-relevant information and more classical signatures of attentional selection.

      Strengths:

      (1) The paper builds on existing modeling work in a novel manner and integrates classic results on attention into the computational framework.

      (2) The authors report new and extensive analyses of existing data that shed light on additional sources of systematic variability in responses related to attentional spillover effects

      (3) They collect new data using new stimuli in the original paradigm that directly test predictions related to the lateralization of task-relevant information, including eye tracking data that allows them to control for possible confounds.

      (4) The extended model (spotlight-VGC) provides a formal account of these new results.

      Weaknesses:

      (1) The spotlight-VGC model has a free parameter - the "width" of the attentional spotlight. This seems to have been fixed to be 3 squares. It would be good if the authors could describe a more principled procedure for selecting the width so that others can use the model in other contexts.

      (2) Have the authors considered other ways in which factors such as attentional spillover and lateralization could be incorporated into the model? The spotlight-VGC model, as presented, involves first computing VGC predictions and only afterwards computing spillover. This seems psychologically implausible, since it supposes that the "optimal" representation is first formed and then it gets corrupted. Is there a way to integrate these biases directly into the VGC framework, perhaps as a prior on construals? The authors gesture towards this when they talk about "inductive biases", but this is not formalized.

      (3) Can the authors rule out that the lateralization effects are the result of memory biases since the main measure used is a self-report of attention?

    1. eLife Assessment

      This study presents a valuable and rigorous molecular resource, offering subtype-specific insight into the composition of ribosome-associated protein complexes in the developing cerebral cortex. The evidence is compelling in terms of data quality and is strongly supported by the results, given the rigorous technical execution. However, the findings remain primarily descriptive, as the study lacks functional validation to support mechanistic conclusions.

    2. Reviewer #1 (Public review):

      This work provides a valuable toolkit for endogenous isolation of projection neuron subtypes. With further validation, it could present a solid method for low-input ribosome affinity purification using a ribosomal RNA (rRNA) antibody. The experimental evidence for the distinct ribosomal complexes is limited to this method and indirect support from complementary analyses of pre-existing data. However, with additional experimental data to support the specificity of ribosomal complex pulldown and confirmation of the putative ribosomal complex proteins of interest, the study would provide compelling evidence for translation regulation of neuronal development through compositional ribosome heterogeneity. This work would be of interest to neuroscientists, developmental biologists, and those studying translational networks underlying gene regulation.

      Strengths

      (1) This in vivo labeling of specific projection neurons and ribosomal rRNA affinity purification method accommodates a low input of <100K somata per replicate, which is useful for the study of neuronal subtypes with limited input. In principle, this set of techniques could work across different cell types with limited input, depending on the molecule used for cell type labeling.

      (2) The authors are also able to isolate endogenous neurons with minimal perturbation up to the point of collection, preserving the native state for the neuron in vivo as long as possible prior to processing.

      (3) This study identified over a dozen potential non-ribosomal proteins associated with SCPN ribosomal complexes, as well as a ribosomal protein enriched in CPN.

      Limitations

      (1) In this study, the authors address the advantages of their ribosomal complex isolation method in SCPN and CPN against RPL22-HA affinity purification. While this does show more pull-down of the ribosomal RNA by the Y10B rRNA antibody, the authors claim this method identifies cell-type-specific ribosomal complex proteins without demonstrating a positive control for the method's specificity. There are very limited experiments to truly delineate how "specific" this method is working and whether there could be contamination from other complexes bound by the antibody. I see this as the major limitation that should be addressed. To boost their claims of capturing cell-type-specific ribosomal complexes, the authors could consider applying their rRNA affinity purification pipeline to compare cell types with well-characterized ribosome-associated proteins, like mouse embryonic stem cells and HELA cells. The reviewer can completely appreciate the elegance in the neural characterization here, but it seems there needs to be a solid foothold on the specificity of the method, perhaps facilitated by cell types that can be more readily scaled up and tested.

      (2) The authors followed up on their differentially enriched ribosomal complex proteins by analyzing the ribosome association of these proteins in external datasets. While this analysis supports the ribosome-association of these proteins, there is limited experimental validation of physical association with the ribosome, much less any functional characterization. The reciprocal pulldown of PRKCE is promising; however, I would recommend orthogonal validation of several putative ribosomal complex proteins to increase confidence. Specifically, the authors could use sucrose gradient fractionation of SCPN and CPN, followed by a western blot to identify the putative interaction with the 80S monosome or polysomes. This would also provide evidence towards the pulldown capturing association with mature ribosome species, which is currently unclear. This experiment would provide substantial evidence for the direct association of these non-ribosomal proteins with subtype-specific ribosomal complexes.

      (3) The authors state interest in learning more about the differences underlying translational regulation of projection neuron development. This method only captures neuronal somata, which will only capture ribosomes in the main cell body. There are also ribosomes regulating local translation in the axons, which may also play a critical role in axonal circuit establishment and activity. These ribosomal complex interactions may also be rather transient and difficult to capture at only one developmental stage. Therefore, this method is currently limited to a single developmental snapshot of ribosomal complexes at P3 within the main cell body. It would be exciting to see the extended utility of this method to sample neurites and additional developmental stages to gain further resolution on the developmental translation regulation of these projection neurons.

      Likely impact of the work on the field, and the utility of the methods and data to the community:

      The authors introduce a unique pipeline of techniques to identify cell-type-specific ribosomal complex compositions. With more validation, there is certainly potential for those studying neuronal translation to leverage this method in limited primary cells as an alternative to existing methods that do not rely on ribosomal protein tagging, such as ARC-MS (Bartsch et al., 2023), RAPIDASH (Susanto and Hung et al., 2024), and RAPPL (Nature Communications, 2025).

    3. Reviewer #2 (Public review):

      Summary:

      This study presents a sophisticated molecular dissection of ribosome-associated complexes (RCs) in two well-defined cortical projection neuron subtypes (ScPN and CPN) during early postnatal development. The authors develop and optimize an rRNA immunoprecipitation-mass spectrometry (rRNA IP-MS) workflow to recover RCs from FACS-purified, retrogradely labeled neurons, achieving remarkable subtype specificity and biochemical resolution. Through proteomic profiling, they reveal both shared and distinct ribosome-associated proteins between ScPN and CPN, with a focus on non-core RC components and their potential functional relevance. The work advances our understanding of cell-type-specific translation regulation, moving beyond the transcriptome to explore the proteome-level complexity in neuronal subtypes.

      Strengths:

      This work stands out for its technical sophistication and innovation. The authors combine retrograde labeling, FACS purification, and an optimized rRNA IP-MS approach (low input) to isolate ribosome-associated complexes from highly specific neuronal subtypes in vivo, a challenging issue that they execute with impressive rigor. The methodological pipeline is both elegant and well-controlled, yielding high-quality, reproducible data. The depth of proteomic coverage is remarkable, with nearly all known cytoplasmic ribosomal proteins identified, along with hundreds of ribosome-associated proteins (RAPs), including translation factors, chaperones, and RNA-binding proteins. The analysis not only reveals shared components between ScPN and CPN RCs but also uncovers subtype-specific differences in associated proteins.

      Particularly notable is the integration of this new proteomic dataset with previously published transcriptomic and ribosome footprinting data, which helps to validate the specificity and relevance of the findings. Overall, the clarity of the writing, the robustness of the data, and the transparency of the methods make this a strong and compelling contribution.

      Weaknesses:

      Despite the depth and high quality of the dataset, the study remains descriptive. While the identification of subtype-specific RC components is intriguing, the current version of the manuscript does not explore their functional roles or the biological consequences of their alterations. There is no perturbation, causal testing, in vitro or in vivo manipulation to demonstrate whether these proteins are necessary for ScPN or CPN identity, specific axonal targeting, metabolism, or synaptic function.

      One important point highlighted by the authors in the discussion - and critical for establishing the subtype specificity of the identified proteins - is that some ribosomal complexes may be specialized for specific developmental stages, rather than exclusively for the subtype-specific needs of projection neuron development. The work presented here provides a valuable starting point for further investigation into such RC specialization. However, it will be essential to determine to what extent these RCs exhibit true subtype specificity, independently of their temporal maturation context.

      As a result, key mechanistic insights remain a bit speculative. Although several of the identified proteins have known roles in processes like synaptogenesis or metabolism, their relevance to the specific neuronal subtypes under study is not experimentally addressed. That said, given its rich content and the comprehensive early postnatal dataset, the manuscript represents an extremely valuable resource for the community. While primarily exploratory, it lays a strong foundation for future functional studies aimed at uncovering the biological impact of the identified ribosomal complexes.

    1. eLife Assessment

      This valuable model-based study seeks to mimic bat echolocation behavior and flight under conditions of high interference, such as when large numbers of bats leave their roost together. Although some of the assumptions made in the model may be questioned, the simulations convincingly suggest that the problem of acoustic jamming in these situations may be less severe than previously thought. This finding will be of broad interest to scientists working in the fields of bat biology and collective behaviour.

    2. Reviewer #1 (Public review):

      Summary:

      Mazer & Yovel 2025 dissect the inverse problem of how echolocators in groups manage to navigate their surroundings despite intense jamming using computational simulations.

      The authors show that despite the 'noisy' sensory environments that echolocating groups present, agents can still access some amount of echo-related information and use it to navigate their local environment. It is known that echolocating bats have strong small and large-scale spatial memory that plays an important role for individuals. The results from this paper also point to the potential importance of an even lower-level, short-term role of memory in the form of echo 'integration' across multiple calls, despite the unpredictability of echo detection in groups. The paper generates a useful basis to think about the mechanisms in echolocating groups for experimental investigations too.

      Strengths:

      * The paper builds on biologically well-motivated and parametrised 2D acoustics and sensory simulation setup to investigate the various key parameters of interest

      * The 'null-model' of echolocators not being able to tell apart objects & conspecifics while echolocating still shows agents succesfully emerge from groups - even though the probability of emergence drops severely in comparison to cognitively more 'capable' agents. This is nonetheless an important result showing the direction-of-arrival of a sound itself is the 'minimum' set of ingredients needed for echolocators navigating their environment.

      * The results generate an important basis in unraveling how agents may navigate in sensorially noisy environments with a lot of irrelevant and very few relevant cues.

      * The 2D simulation framework is simple and computationally tractable enough to perform multiple runs to investigate many variables - while also remaining true to the aim of the investigation.

      Weaknesses:

      * Authors have not yet provided convincing justification for the use of different echolocation phases during emergence and in cave behaviour. In the previous modelling paper cited for the details - here the bat-agents are performing a foraging task, and so the switch in echolocation phases is understandable. While flying with conspecifics, the lab's previous paper has shown what they call a 'clutter response' - but this is not necessarily the same as going into a 'buzz'-type call behaviour. As pointed out by another reviewer - the results of the simulations may hinge on the fact that bats are showing this echolocation phase-switching, and thus improving their echo-detection. This is not necessarily a major flaw - but something for readers to consider in light of the sparse experimental evidence at hand currently.

      * The decision to model direction-of-arrival with such high angular resolution (1-2 degrees) is not entirely justifiable - and the authors may wish to do simulation runs with lower angular resolution. Past experimental paradigms haven't really separated out target-strength as a confounding factor for angular resolution (e.g. see the cited Simmons et al. 1983 paper). Moreover, to this reviewer's reading of the cited paper - it is not entirely clear how this experiment provides source-data to support the DoA-SNR parametrisation in this manuscript. The cited paper has two array-configurations, both of which are measured to have similar received levels upon ensonification. A relationship between angular resolution and signal-to-noise ratio is understandable perhaps - and one can formulate such a relationship, but here the reviewer asks that the origin/justification be made clear. On an independent line, also see the recent contrasting results of Geberl, Kugler, Wiegrebe 2019 (Curr. Biol.) - who suggest even poorer angular resolution in echolocation.

    3. Reviewer #2 (Public review):

      This manuscript describes a detailed model for bats flying together through a fixed geometry. The model considers elements which are faithful to both bat biosonar production and reception and the acoustics governing how sound moves in air and interacts with obstacles. The model also incorporates behavioral patterns observed in bats, like one-dimensional feature following and temporal integration of cognitive maps. From a simulation study of the model and comparison of the results with the literature, the authors gain insight into how often bats may experience destructive interference of their acoustic signals and those of their peers, and how much such interference may actually negatively effect the groups' ability to navigate effectively. The authors use generalized linear models to test the significance of the effects they observe.

      The work relies on a thoughtful and detailed model which faithfully incorporates salient features, such as acoustic elements like the filter for a biological receiver and temporal aggregation as a kind of memory in the system. At the same time, the authors abstract features that are complicating without being expected to give additional insights, as can be seen in the choice of a two-dimensional rather than three-dimensional system. I thought that the level of abstraction in the model was perfect, enough to demonstrate their results without needless details. The results are compelling and interesting, and the authors do a great job discussing them in the context of the biological literature.

      With respect to the first version of the manuscript, the authors have remedied all my outstanding questions or concerns in the current version. The new supplementary figure 5 is especially helpful in understanding the geometry.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      We thank the reviewer for his valuable input and careful assessment, which have significantly improved the clarity and rigor of our manuscript.

      Summary:

      Mazer & Yovel 2025 dissect the inverse problem of how echolocators in groups manage to navigate their surroundings despite intense jamming using computational simulations.

      The authors show that despite the 'noisy' sensory environments that echolocating groups present, agents can still access some amount of echo-related information and use it to navigate their local environment. It is known that echolocating bats have strong small and large-scale spatial memory that plays an important role for individuals. The results from this paper also point to the potential importance of an even lower-level, short-term role of memory in the form of echo 'integration' across multiple calls, despite the unpredictability of echo detection in groups. The paper generates a useful basis to think about the mechanisms in echolocating groups for experimental investigations too.

      Strengths:

      (1) The paper builds on biologically well-motivated and parametrised 2D acoustics and sensory simulation setup to investigate the various key parameters of interest

      (2) The 'null-model' of echolocators not being able to tell apart objects & conspecifics while echolocating still shows agents successfully emerge from groups - even though the probability of emergence drops severely in comparison to cognitively more 'capable' agents. This is nonetheless an important result showing the directionof-arrival of a sound itself is the 'minimum' set of ingredients needed for echolocators navigating their environment.

      (3) The results generate an important basis in unraveling how agents may navigate in sensorially noisy environments with a lot of irrelevant and very few relevant cues.

      (4) The 2D simulation framework is simple and computationally tractable enough to perform multiple runs to investigate many variables - while also remaining true to the aim of the investigation.

      Weaknesses:

      There are a few places in the paper that can be misunderstood or don't provide complete details. Here is a selection:

      (1) Line 61: '... studies have focused on movement algorithms while overlooking the sensory challenges involved' : This statement does not match the recent state of the literature. While the previous models may have had the assumption that all neighbours can be detected, there are models that specifically study the role of limited interaction arising from a potential inability to track all neighbours due to occlusion, and the effect of responding to only one/few neighbours at a time e.g. Bode et al. 2011 R. Soc. Interface, Rosenthal et al. 2015 PNAS, Jhawar et al. 2020 Nature Physics.

      We appreciate the reviewer's comment and the relevant references. We have revised the manuscript accordingly to clarify the distinction between studies that incorporate limited interactions and those that explicitly analyze sensory constraints and interference. We have refined our statement to acknowledge these contributions while maintaining our focus on sensory challenges beyond limited neighbor detection, such as signal degradation, occlusion effects, and multimodal sensory integration (see lines 58-64):

      (2) The word 'interference' is used loosely places (Line 89: '...took all interference signals...', Line 319: 'spatial interference') - this is confusing as it is not clear whether the authors refer to interference in the physics/acoustics sense, or broadly speaking as a synonym for reflections and/or jamming.

      To improve clarity, we have revised the manuscript to distinguish between different types of interference:

      • Acoustic interference (jamming): Overlapping calls that completely obscure echo detection, preventing bats from perceiving necessary environmental cues.

      • Acoustic interference (masking): Partial reduction in signal clarity due to competing calls.

      • Spatial interference: Physical obstruction by conspecifics affecting movement and navigation.

      We have updated the manuscript to use these terms consistently and explicitly define them in relevant sections (see lines 84-85, 119-120). This distinction ensures that the reader can differentiate between interference as an acoustic phenomenon and its broader implications in navigation.

      (3) The paper discusses original results without reference to how they were obtained or what was done. The lack of detail here must be considered while interpreting the Discussion e.g. Line 302 ('our model suggests...increasing the call-rate..' - no clear mention of how/where call-rate was varied) & Line 323 '..no benefit beyond a certain level..' - also no clear mention of how/where call-level was manipulated in the simulations.

      All tested parameters, including call rate dynamics and call intensity variations, are detailed in the Methods section and Tables 1 and 2. Specifically:

      • Call Rate Variation: The Inter-Pulse Interval (IPI) was modeled based on documented echolocation behavior, decreasing from 100 msec during the search phase to 35 msec (~28 calls per second) at the end of the approach phase, and to 5 msec (200 calls per second) during the final buzz (see Table 2). This natural variation in call rate was not manually manipulated in the model but emerged from the simulated bat behavior.

      • Call Intensity Variation: The tested call intensity levels (100, 110, 120, 130 dB SPL) are presented in Table 1 under the “Call Level” parameter. The effect of increasing call intensity was analyzed in relation to exit probability, jamming probability, and collision rate. This is now explicitly referenced in the Discussion. We have revised the manuscript to explicitly reference these aspects in the Results and Discussion sections – see lines 346-349, 372-375.

      Reviewer #2 (Public review):

      We are grateful for the reviewer’s insightful feedback, which has helped us clarify key aspects of our research and strengthen our conclusions.

      This manuscript describes a detailed model of bats flying together through a fixed geometry. The model considers elements that are faithful to both bat biosonar production and reception and the acoustics governing how sound moves in the air and interacts with obstacles. The model also incorporates behavioral patterns observed in bats, like one-dimensional feature following and temporal integration of cognitive maps. From a simulation study of the model and comparison of the results with the literature, the authors gain insight into how often bats may experience destructive interference of their acoustic signals and those of their peers, and how much such interference may actually negatively affect the groups' ability to navigate effectively. The authors use generalized linear models to test the significance of the effects they observe.

      In terms of its strengths, the work relies on a thoughtful and detailed model that faithfully incorporates salient features, such as acoustic elements like the filter for a biological receiver and temporal aggregation as a kind of memory in the system. At the same time, the authors' abstract features are complicating without being expected to give additional insights, as can be seen in the choice of a twodimensional rather than three-dimensional system. I thought that the level of abstraction in the model was perfect, enough to demonstrate their results without needless details. The results are compelling and interesting, and the authors do a great job discussing them in the context of the biological literature. 

      The most notable weakness I found in this work was that some aspects of the model were not entirely clear to me. 

      For example, the directionality of the bat's sonar call in relation to its velocity. Are these the same?

      For simplicity, in our model, the head is aligned with the body, therefore the direction of the echolocation beam is the same as the direction of the flight. 

      Moreover, call directionality (directivity) is not directly influenced by velocity. Instead, directionality is estimated using the piston model, as described in the Methods section. The directionality is based on the emission frequency and is thus primarily linked to the behavioral phases of the bat, with frequency shifts occurring as the bat transitions from search to approach to buzz phases. During the approach phase, the bat emits calls with higher frequencies, resulting in increased directionality. This is supported by the literature (Jakobsen and Surlykke, 2010; Jakobsen, Brinkløv and Surlykke, 2013). This phase is also associated with a natural reduction in flight speed, which is a well-documented behavioral adaptation in echolocating bats(Jakobsen et al., 2024).

      To clarify this in the manuscript, we have updated the text to explicitly state that directionality follows phase-dependent frequency changes rather than being a direct function of velocity, see lines 543-545. 

      If so, what is the difference between phi_target and phi_tx in the model equations? 

      𝝓<sub>𝒕𝒂𝒓𝒈𝒆𝒕</sub> represents the angle between the bat and the reflected object (target).

      𝝓<sub>𝑻𝒙</sub> the angle [rad], between the masking bat and target (from the transmitter’s perspective)

      𝝓<sub>𝑻𝒙𝑹𝒙</sub> refers to the angle between the transmitting conspecific and the receiving focal bat, from the transmitter’s point of view.

      𝝓<sub>𝑹𝒙𝑻𝒙</sub> represents the angle between the receiving bat and the transmitting bat, from the receiver’s point of view.

      These definitions have been explicitly stated in the revised manuscript to prevent any ambiguity (lines 525-530). Additionally, a Supplementary figure demonstrating the geometrical relations has been added to the manuscript.

      What is a bat's response to colliding with a conspecific (rather than a wall)? 

      In nature, minor collisions between bats are common and typically do not result in significant disruptions to flight (Boerma et al., 2019; Roy et al., 2019; Goldshtein et al., 2025). Given this, our model does not explicitly simulate the physical impact of a collision event. Instead, during the collision event the bat keeps decreasing its velocity and changing its flight direction until the distance between bats is above the threshold (0.4 m). We assume that the primary cost of such interactions arises from the effort required to avoid collisions, rather than from the collision itself. This assumption aligns with observations of bat behavior in dense flight environments, where individuals prioritize collision avoidance rather than modeling post-collision dynamics. See lines 479-484.

      From the statistical side, it was not clear if replicate simulations were performed. If they were, which I believe is the right way due to stochasticity in the model, how many replicates were used, and are the standard errors referred to throughout the paper between individuals in the same simulation or between independent simulations, or both? 

      The number of repetitions for each scenario is detailed in Table 1, but we included it in a more prominent location in the text for clarity. Specifically, we now state (Lines 110-111):

      "The number of repetitions for each scenario was as follows: 1 bat: 240; 2 bats: 120; 5 bats: 48; 10 bats: 24; 20 bats: 12; 40 bats: 12; 100 bats: 6."

      Regarding the reported standard errors, they are calculated across all individuals within each scenario, without distinguishing between different simulation trials. 

      We clarified in the revised text (Lines 627-628 in Statistical Analysis) 

      Overall, I found these weaknesses to be superficial and easily remedied by the authors. The authors presented well-reasoned arguments that were supported by their results, and which were used to demonstrate how call interference impacts the collective's roost exit as measured by several variables. As the authors highlight, I think this work is valuable to individuals interested in bat biology and behavior, as well as to applications in engineered multi-agent systems like robotic swarms.

      Reviewer #3 (Public review):

      We sincerely appreciate the reviewer’s thoughtful comments and the time invested in evaluating our work, which have greatly contributed to refining our study.

      We would like to note that in general, our model often simplifies some of the bats’ abilities, under the assumption that if the simulated bats manage to perform this difficult task with simpler mechanisms, real better adapted bats will probably perform even better. This thought strategy will be repeated in several of the s below.

      Summary:

      The authors describe a model to mimic bat echolocation behavior and flight under high-density conditions and conclude that the problem of acoustic jamming is less severe than previously thought, conflating the success of their simulations (as described in the manuscript) with hard evidence for what real bats are actually doing. The authors base their model on two species of bats that fly at "high densities" (defined by the authors as colony sizes from tens to tens of thousands of individuals and densities of up to 33.3 bats/m2), Pipistrellus kuhli and Rhinopoma microphyllum. This work fits into the broader discussion of bat sensorimotor strategies during collective flight, and simulations are important to try to understand bat behavior, especially given a lack of empirical data. However, I have major concerns about the assumptions of the parameters used for the simulation, which significantly impact both the results of the simulation and the conclusions that can be made from the data. These details are elaborated upon below, along with key recommendations the authors should consider to guide the refinement of the model.

      Strengths:

      This paper carries out a simulation of bat behavior in dense swarms as a way to explain how jamming does not pose a problem in dense groups. Simulations are important when we lack empirical data. The simulation aims to model two different species with different echolocation signals, which is very important when trying to model echolocation behavior. The analyses are fairly systematic in testing all ranges of parameters used and discussing the differential results.

      Weaknesses:

      The justification for how the different foraging phase call types were chosen for different object detection distances in the simulation is unclear. Do these distances match those recorded from empirical studies, and if so, are they identical for both species used in the simulation? 

      The distances at which bats transition between echolocation phases are identical for both species in our model (see Table 2). These distances are based on welldocumented empirical studies of bat hunting and obstacle avoidance behavior (Griffin, Webster and Michael, 1958; Simmons and Kick, 1983; Schnitzler et al., 1987; Kalko, 1995; Hiryu et al., 2008; Vanderelst and Peremans, 2018). These references provide extensive evidence that insectivorous bats systematically adjust their echolocation calls in response to object proximity, following the characteristic phases of search, approach, and buzz.

      To improve clarity, we have updated the text to explicitly state that the phase transition distances are empirically grounded and apply equally to both modeled species (lines 499-508).

      What reasoning do the authors have for a bat using the same call characteristics to detect a cave wall as they would for detecting a small insect? 

      In echolocating bats, call parameters are primarily shaped by the target distance and echo strength. Accordingly, there is little difference in call structure between prey capture and obstacles-related maneuvers, aside from intensity adjustments based on target strength (Hagino et al., 2007; Hiryu et al., 2008; Surlykke, Ghose and Moss, 2009; Kothari et al., 2014). In our study, due to the dense cave environment, the bats are found to operate in the approach phase most of the time, which is consistent with natural cave emergence, where they are navigating through a cluttered environment rather than engaging in open-space search. For one of the species (Rhinopoma), we also have empirical recordings of individuals flying under similar conditions (Goldshtein et al., 2025). Our model was designed to remain as simple as possible while relying on conservative assumptions that may underestimate bat performance. If, in reality, bats fine-tune their echolocation calls even earlier or more precisely during navigation than assumed, our model would still conservatively reflect their actual capabilities. See lines 500-508.

      The two species modeled have different calls. In particular, the bandwidth varies by a factor of 10, meaning the species' sonars will have different spatial resolutions. Range resolution is about 10x better for PK compared to RM, but the authors appear to use the same thresholds for "correct detection" for both, which doesn't seem appropriate.

      The detection process in our model is based on Saillant’s method using a filterbank, as detailed in the paper (Saillant et al., 1993; Neretti et al., 2003; Sanderson et al., 2003). This approach inherently incorporates the advantages of a wider bandwidth, meaning that the differences in range resolution between the species are already accounted for within the signal-processing framework. Thus, there is no need to explicitly adjust the model parameters for bandwidth variations, as these effects emerge from the applied method.

      Also, the authors did not mention incorporating/correcting for/exploiting Doppler, which leads me to assume they did not model it.

      The reviewer is correct. To maintain model simplicity, we did not incorporate the Doppler effect or its impact on echolocation. The exclusion of Doppler effects was based on the assumption that while Doppler shifts can influence frequency perception, their impact on jamming and overall navigation performance is minor within the modelled context.

      The maximal Doppler shifts expected for the bats in this scenario are of ~ 1kHz. These shifts would be applied variably across signals due to the semi-random relative velocities between bats, leading to a mixed effect on frequency changes. This variability would likely result in an overall reduction in jamming rather than exacerbating it, aligning with our previous statement that our model may overestimate the severity of acoustic interference. Such Doppler shifts would result in errors of 2-4 cm in localization (i.e., 200-400 micro-seconds) (Boonman, Parsons and Jones, 2003).

      We have now explicitly highlighted this in the revised version (see 548-581).

      The success of the simulation may very well be due to variation in the calls of the bats, which ironically enough demonstrates the importance of a jamming avoidance response in dense flight. This explains why the performance of the simulation falls when bats are not able to distinguish their own echoes from other signals. For example, in Figure C2, there are calls that are labeled as conspecific calls and have markedly shorter durations and wider bandwidths than others. These three phases for call types used by the authors may be responsible for some (or most) of the performance of the model since the correlation between different call types is unlikely to exceed the detection threshold. But it turns out this variation in and of itself is what a jamming avoidance response may consist of. So, in essence, the authors are incorporating a jamming avoidance response into their simulation. 

      We fully agree that the natural variations in call design between the phases contribute significantly to interference reduction (see our discussion in a previous paper in Mazar & Yovel, 2020). However, we emphasize that this cannot be classified as a Jamming Avoidance Response (JAR). In our model, bats respond only to the physical presence of objects and not to the acoustic environment or interference itself. There is no active or adaptive adjustment of call design to minimize jamming beyond the natural phase-dependent variations in call structure. Therefore, while variation in call types does inherently reduce interference, this effect emerges passively from the modeled behavior rather than as an intentional strategy to avoid jamming. 

      The authors claim that integration over multiple pings (though I was not able to determine the specifics of this integration algorithm) reduces the masking problem. Indeed, it should: if you have two chances at detection, you've effectively increased your SNR by 3dB.  

      The reviewer is correct. Indeed, integration over multiple calls improves signal-tonoise ratio (SNR), effectively increasing it by approximately 3 dB per doubling of observations. The specifics of the integration algorithm are detailed in the Methods section, where we describe how sensory information is aggregated across multiple time steps to enhance detection reliability.

      They also claim - although it is almost an afterthought - that integration dramatically reduces the degradation caused by false echoes. This also makes sense: from one ping to the next, the bat's own echo delays will correlate extremely well with the bat's flight path. Echo delays due to conspecifics will jump around kind of randomly. However, the main concern is regarding the time interval and number of pings of the integration, especially in the context of the bat's flight speed. The authors say that a 1s integration interval (5-10 pings) dramatically reduces jamming probability and echo confusion. This number of pings isn't very high, and it occurs over a time interval during which the bat has moved 5-10m. This distance is large compared to the 0.4m distance-to-obstacle that triggers an evasive maneuver from the bat, so integration should produce a latency in navigation that significantly hinders the ability to avoid obstacles. Can the authors provide statistics that describe this latency, and discussion about why it doesn't seem to be a problem? 

      As described in the Methods section, the bat’s collision avoidance response does not solely rely on the integration process. Instead, the model incorporates real-time echoes from the last calls, which are used independently of the integration process for immediate obstacle avoidance maneuvers. This ensures that bats can react to nearby obstacles without being hindered by the integration latency. The slower integration on the other hand is used for clustering, outlier removal and estimation wall directions to support the pathfinding process, as illustrated in Supplementary Figure 1.

      Additionally, our model assumes that bats store the physical positions of echoes in an allocentric coordinate system (x-y). The integration occurs after transforming these detections from a local relative reference frame to a global spatial representation. This allows for stable environmental mapping while maintaining responsiveness to immediate changes in the bat’s surroundings.

      See lines 600-616 in the revised version.

      The authors are using a 2D simulation, but this very much simplifies the challenge of a 3D navigation task, and there is an explanation as to why this is appropriate. Bat densities and bat behavior are discussed per unit area when realistically it should be per unit volume. In fact, the authors reference studies to justify the densities used in the simulation, but these studies were done in a 3D world. If the authors have justification for why it is realistic to model a 3D world in a 2D simulation, I encourage them to provide references justifying this approach. 

      We acknowledge that this is a simplification; however, from an echolocation perspective, a 2D framework represents a worst-case scenario in terms of bat densities and maneuverability:

      • Higher Effective Density: A 2D model forces all bats into a single plane rather than distributing them through a 3D volume, increasing the likelihood of overlap in calls and echoes and making jamming more severe. As described in the text: the average distance to the nearest bat in our simulation is 0.27m (with 100 bats), whereas reported distances in very dense colonies are 0.5m (Fujioka et al., 2021), as observed in Myotis grisescens (Sabol and Hudson, 1995) and Tadarida brasiliensis (Theriault et al., no date; Betke et al., 2008; Gillam et al., 2010)

      • Reduced Maneuverability: In 3D space, bats can use vertical movement to avoid obstacles and conspecifics. A 2D constraint eliminates this degree of freedom, increasing collision risk and limiting escape options.

      Thus, our 2D model provides a conservative difficult test case, ensuring that our findings are valid under conditions where jamming and collision risks are maximized. Additionally, the 2D framework is computationally efficient, allowing us to perform multiple simulation runs to explore a broad parameter space and systematically test the impact of different variables.

      To address the reviewer’s concern, we have clarified this justification in the revised text and will provide supporting references where applicable (see Methods lines 450455).

      The focus on "masking" (which appears to be just in-band noise), especially relative to the problem of misassigned echoes, is concerning. If the bat calls are all the same waveform (downsweep linear FM of some duration, I assume - it's not clear from the text), false echoes would be a major problem. Masking, as the authors define it, just reduces SNR. This reduction is something like sqrt(N), where N is the number of conspecifics whose echoes are audible to the bat, so this allows the detection threshold to be set lower, increasing the probability that a bat's echo will exceed a detection threshold. False echoes present a very different problem. They do not reduce SNR per se, but rather they cause spurious threshold excursions (N of them!) that the bat cannot help but interpret as obstacle detection. I would argue that in dense groups the mis-assignment problem is much more important than the SNR problem. 

      There is substantial literature supporting the assumption that bats can recognize their own echoes and distinguish them from conspecific signals (Schnitzler, Bioscience and 2001, no date; Kazial, Burnett and Masters, 2001; Burnett and Masters, 2002; Kazial, Kenny and Burnett, 2008; Chili, Xian and Moss, 2009; Yovel et al., 2009; Beetz and Hechavarría, 2022)). However, we acknowledge that false echoes may present a major challenge in dense groups. To address this, we explicitly tested the impact of the self-echo identification assumption in our study see Results Figure 1: The impact of confusion on performance, and lines 399-404 in the Discussion.

      Furthermore, we examined a full confusion scenario, where all reflected echoes from conspecifics were misinterpreted as obstacle reflections (i.e., 100% confusion). Our results show that this significantly degrades navigation performance, supporting the argument that echo misassignment is a critical issue. However, we also explored a simple mitigation strategy based on temporal integration with outlier rejection, which provided some improvement in performance. This suggests that real bats may possess additional mechanisms to enhance self-echo identification and reduce false detections. See lines 411-420 in the manuscript for further discussion. 

      We actually used logarithmically frequency modulated (FM) chirps, generated using the MATLAB built-in function chirp(t, f0, t1, f1, 'logarithmic'). This method aligns with the nonlinear FM characteristics of Pipistrellus kuhlii (PK) and Rhinopoma microphyllum (RM) and provides a realistic approximation of their echolocation signals. We acknowledge that this was not sufficiently emphasized in the original text, and we have now explicitly highlighted this in the revised version to ensure clarity (see Lines 509-512 in Methods).

      The criteria set for flight behavior (lines 393-406) are not justified with any empirical evidence of the flight behavior of wild bats in collective flight. How did the authors determine the avoidance distances? Also, what is the justification for the time limit of 15 seconds to emerge from the opening? Instead of an exit probability, why not instead use a time criterion, similar to "How long does it take X% of bats to exit?"  :

      While we acknowledge that wild bats may employ more complex behaviors for collision avoidance, we chose to implement a simplified decision-making rule in our model to maintain computational tractability.

      The avoidance distances (1.5 m from walls and 0.4 m from other bats) were selected as internal parameters to support stable and realistic flight trajectories while maintaining a reasonable collision rate. These values reflect a trade-off between maneuverability and behavioral coherence under crowding. To address this point, we added a sensitivity analysis to the revised manuscript. Specifically, we tested the effect of varying the conspecific avoidance distance from 0.2 to 1.6 meters at bat densities of 2 to 40 bats/3m². The only statistically significant impact was at the highest density (40 bats/3m²), where exit probability increased slightly from 82% to 88% (p = 0.024, t = 2.25, DF = 958). No significant changes were observed in exit time, collision rate, or jamming probability across other densities or conditions (GLM, see revised Methods). These results suggest that the selected avoidance distances are robust and not a major driver of model performance, see lines 469-47.

      The 15-second exit limit was determined as described in the text (Lines 489-491): “A 15-second window was chosen because it is approximately twice the average exit time for 40 bats and allows for a second corrective maneuver if needed.” In other words, it allowed each bat to circle the ‘cave’ twice to exit even in the most crowded environment. This threshold was set to keep simulation time reasonable while allowing sufficient time for most bats to exit successfully.

      We acknowledge that the alternative approach suggested by the reviewer— measuring the time taken for a certain percentage of bats to exit—is also valid. However, in our model, some outlier bats fail to exit and continue flying for many minutes, such simulations would lead to excessive simulation times making it difficult to generate repetitions and not teaching us much – they usually resulted from the bat slightly missing the opening (see video S1. Our chosen approach ensures practical runtime constraints while still capturing relevant performance metrics.

      What is the empirical justification for the 1-10 calls used for integration?  

      The "average exit time for 40 bats" is also confusing and not well explained. Was this determined empirically? From the simulation? If the latter, what are the conditions?

      Does it include masking, no masking, or which species? 

      Previous studies have demonstrated that bats integrate acoustic information received sequentially over several echolocation calls (2-15), effectively constructing an auditory scene in complex environments (Ulanovsky and Moss, 2008; Chili, Xian and Moss, 2009; Moss and Surlykke, 2010; Yovel and Ulanovsky, 2017; Salles, Diebold and Moss, 2020). Additionally, bats are known to produce echolocation sound groups when spatiotemporal localization demands are high (Kothari et al., 2014). Studies have documented call sequences ranging from 2 to 15 grouped calls (Moss and Surlykke, 2010), and it has been hypothesized that grouping facilitates echo segregation.

      We did not use a single integration window - we tested integration sizes between 1 and 10 calls and presented the results in Figure 3A. This range was chosen based on prior empirical findings and to explore how different levels of temporal aggregation impact navigation performance. Indeed, the results showed that the performance levels between 5-10 calls integration window (Figure 3A)

      Regarding the average exit time for 40 bats, this value was determined from our simulations, where it represents the mean time for successful exits under standard conditions with masking. We have revised the text to clarify these details see, lines 489-491.

      Reviewer #1 (Recommendations for the authors):

      (1) Data Availability:

      As it stands now, this reviewer cannot vouch for the uploaded code as it wasn't accessible according to F.A.I.R principles. The link to the code/data points to a private company's file-hosting account that requires logging in or account creation to see its contents, and thus cannot be accessed.

      This reviewer urges the authors to consider uploading the code onto an academic data repository from the many on offer (e.g. Dryad, Zenodo, OSF). Some repositories offer an option to share a private link (e.g. Zenodo) to the folder that can then be shared only with reviewers so it is not completely public.

      This is a computational paper, and the credibility of the results is based on the code used to generate them.

      The code is available at GitHub as required:

      https://github.com/omermazar/Colony-Exit-Bat-Simulation

      (2) Abstract:

      Line 22: 'To explore whether..' - replace 'whether' with 'how'?

      The sentence was rephrased as suggested by the reviewer.

      (2) Main text:

      Line 43: '...which may share...' - correct to '...which share...', as elegantly framed in the authors' previous work - jamming avoidance is unavoidable because all FM bats of a species still share >90% of spectral bandwidth despite a few kHz shift here and there.

      The sentence was rephrased as suggested by the reviewer.

      Line 49: The authors may wish to additionally cite the work of Fawcett et al. 2015 (J. Comp. Phys A & Biology Open)

      Thank you for the suggestion. We have included a citation to the work of Fawcett et al. (2015) in the revised manuscript.

      Line 61: This statement does not match the recent state of the literature. While the previous models may have assumed that all neighbours can be detected, there are models that specifically study the role of limited interaction arising from the potential inability to track all neighbours, and the effect of responding to only one/few neighbours at a time e.g. Bode et al. 2011 R. Soc. Interface, Jhawar et al. 2020 Nature Physics.

      We have added citations to the important studies suggested by the reviewer, as detailed in the Public Review above.

      Line 89: '..took all interference signals into account...' - what is meant by 'interference signals' - are the authors referring to reflections, unclear.

      We have revised the sentence and detailed the acoustic signals involved in the process: self-generated echoes, calls from conspecifics, and echoes from cave walls and other bats evoked by those calls, see lines 99-106.

      Figure 1A: The colour scheme with overlapping points makes the figure very hard to understand what is happening. The legend has colours from subfigures B-D, adding to the confusion.

      What does the yellow colour represent? This is not clear. Also, in general, the color schemes in the simulation trajectories and the legend are not the same, creating some amount of confusion for the reader. It would be good to make the colour schemes consistent and visually separable (e.g. consp. call direct is very similar to consp. echo from consp. call), and perhaps also if possible add a higher resolution simulation visualisation. Maybe it is best to separate out the colour legends for each sub-figure.

      The updated figure now includes clearer, more visually separable colors, and consistent color coding across all sub-panels. The yellow trajectory representing the focal bat’s flight path is now explicitly labeled, and we adjusted the color mapping of acoustic signals (e.g., conspecific calls vs. echoes) to improve distinction. We also revised the figure caption accordingly and ensured that the legend is aligned with the updated visuals. These modifications aim to enhance interpretability and reduce ambiguity for the reader.

      Figure C3: What is 'FB Channel', this is not explained in the legend.

      FB Channel’ stands for ‘Filter Bank Channel’. This clarification has been added to the caption of Figure 1. 

      Figure 3: Visually noticing that the colour legend is placed only on sub-figure A is tricky and readers may be left searching for the colour legend. Maybe lay out the legend horizontally on top of the entire figure, so it stands out?

      We have adjusted the placement of the color legend in Figure 3 to improve visibility and consistency.

      Line 141: '..the probability of exiting..' - how is this probability calculated - not clear.

      We have clarified in the revised text that the probability of exiting the cave within 15 seconds is defined as the number of bats that exited the cave within that time divided by the total number of bats in each scenario, see lines 159160.

      Line 142: What are the sample sizes here - i.e. how many simulation replicates were performed?

      We have clarified the number of repetitions in each scenario the revised text, as detailed in the Public Review above.

      Line 151: 'The jamming probability,...number of jammed echoes divided by the total number of reflected echoes' - it seems like these are referring to 'own' echoes or first-order reflections, it is important to clarify this.

      The reviewer is right. We have clarified it in the revised text, see lines 173175.

      Line 153: '..with a maximum difference of ...' - how is this difference calculated? What two quantities are being compared - not clear.

      We have revised the text to clarify that the 14.3% value reflects the maximum difference in jamming probability between the RM and PK models, which occurred at a density of 10 bats. The values at each density are shown in Figure 2D, see lines 175-177.

      Line 221: '..temporal aggregation helps..' - I'm assuming the authors meant temporal integration? However, I would caution against using the exact term 'temporal integration' as it is used in the field of audition to mean something different. Perhaps something like 'sensory integration' , or 'multi-call integration'

      To avoid ambiguity and better reflect the process modeled in our work, we have replaced the term "temporal aggregation" with "multi-call integration" throughout the revised manuscript. This term more accurately conveys the idea of combining information from multiple echolocation calls without conflicting with existing terminology.

      (4) Discussion

      Lines 302: 'Our model suggests...increasing the call-rate..' - not clear where this is explicitly tested or referred to in this manuscript. Can't see what was done to measure/quantify the effect of this variable in the Methods or anywhere else.

      We have rephrased this paragraph as detailed in the Public Review above, see lines 346-349.

      Line 319: 'spatial interference' - unclear what this means. This reviewer would strongly caution against creating new terms unless there is an absolute need for it. What is meant by 'interference' in this paper is hard to assess given that the word seems to be used as a synonym for jamming and also for actual physical wave-based interference.

      We have rephrased this paragraph as detailed in the Public Review above, see line 119-120, 366-367.

      Line 323: '..no benefit beyond a certain level...' - also not clear where this is explicitly tested. It seems like there was a set of simulations run for a variety of parameters but this is not written anywhere explicitly. What type of parameter search was done, was it all possible parameter combinations - or only a subset? This is not clear.

      We have rephrased this paragraph as detailed in the Public Review above, see lines 372-375.

      Line 324: '..ca. 110 dB-SPL.' - what reference distance?

      All call levels were simulated and reported in dB-SPL, referenced at 0.1 meters from the emitting bat. We have clarified it in the revised text in the relevant contexts and specifically in line 529.

      (5) Methods

      Line 389 : '...over a 2 x 1.5 m2 area..' It took a while to understand this statement and put it in context. Since there is no previous description of the entire L-arena, the reviewer took it to mean the simulations happened over the space of a 2 x 1.5 m2 area. Include a top-down description of the simulation's spatial setup and rephrase this sentence.

      To address the confusion, we revised the text to clarify that the full simulation environment represents a corridor-shaped cave measuring 14.5 × 2.5 meters, with a right-angle turn located 5.5 meters before the exit, as shown in Figure 1A. The 2 × 1.5 m area refers specifically to the small zone at the far end of the cave where bats begin their flight. The revised description now includes a clearer spatial overview to prevent ambiguity, see lines 456-460.

      Line 398: Replace 'High proximity' with 'Close proximity'

      Replaced.

      Line 427: 'uniform target strength of -23 dB' - at what distance is this target strength defined? Given the reference distance can vary by echolocation convention (0.1 or 1 m), one can't assess if this is a reasonable value or not.

      The reference distance for the reported target strength is 1 meter, in line with standard acoustic conventions. We have revised the text to clarify this explicitly (line 531).

      Also, independent of the reference distance, particularly with reference to bats, the target strength is geometry-dependent, based on whether the wings are open or not. Using the entire wingspan of a bat to parametrise the target strength is an overestimate of the available reflective area. The effective reflective area is likely to be somewhere closer to the surface area of the body and a fraction of the wingspan together. This is important to note and/or mention explicitly since the value is not experimentally parametrised.

      For comparison, experimentally based measurements used in Goetze et al. 2016 are -40 dB (presumably at 1 m since the source level is also defined at 1 m?), and Beleyur & Goerlitz 2019 show a range between -43 to -34 dB at 1 m.

      We agree with the reviewer that target strength in bats is strongly influenced by their geometry, particularly wing posture during flight. In our model, we simplified this aspect by using a constant target strength, as the detailed temporal variation in body and wing geometry is pseudo-random and not explicitly modeled. We acknowledge that this is a simplification, and have now stated this limitation clearly in the revised manuscript. We chose a fixed value of –23 dB at 1 meter to reflect a plausible mid-range estimate, informed by anatomical data and consistent with values reported for similarly sized species (Beleyur and Goerlitz, 2019). To support this, we directly measured the target strength of a 3D-printed RM bat model, obtaining –32dB. 

      Moreover, a sensitivity analysis across a wide range (–49 to –23 dB) confirmed that performance metrics remain largely stable, indicating that our conclusions are not sensitive to this parameter, and suggesting that our results hold for different-sized bats. See lines 384-390, 533-538, and Supplementary Figures 3 and 4 in the revised article. 

      Line 434: 'To model the bat's cochlea...'. Bats have two cochleas. This model only describes one, while the agents are also endowed with the ability to detect sound direction - which requires two ears/cochleas.... There is missing information about the steps in between that needs to be provided.

      We appreciate the reviewer’s observation. Indeed, our model is monaural, and simulates detection using a single cochlear-like filter bank receiver. We have clarified this in the revised text to avoid confusion. This paragraph specifically describes the detection stage of the auditory processing pipeline. The localization process, which builds on detection and includes directional estimation, is described in the following paragraph (see line 583 onward), as discussed in the next comment and response.

      Line 457: 'After detection, the bat estimates the range and Direction of Arrival...' This paragraph describes the overall idea, but not the implementation. What were the inputs and outputs for the range and DOA calculation performed by the agent? Or was this information 'fed' in by the simulation framework? If there was no explicit DOA step that the agent performed, but it was assumed that agents can detect DOA, then this needs to be stated.

      In the current simulation, the Direction of Arrival (DOA) was not modeled via an explicit binaural processing mechanism. Instead, based on experimental studies (Simmons et al., 1983; Popper and Fay, 1995).  we assumed that bats can estimate the direction of an echo with an angular error that depends on the signal-to-noise ratio (SNR). Accordingly, the inputs to the DOA estimation were the peak level of the desired echo, noise level, and the level of acoustic interference. The output was an estimated direction of arrival that included a random angular error, drawn from a normal distribution whose standard deviation varied with the SNR. We have revised the relevant paragraph (Lines 583-592) to clarify this implementation.

      Line 464: 'To evaluate the impact of the assumption...' - the 'self' and 'non-self' echoes can be distinguished perhaps using pragmatic time-delay cues, but also using spectro-temporal differences in individual calls/echoes. Do the agents have individual call structures, or do all the agents have the same call 'shape'? The echolocation parameters for the two modelled species are given, but whether there is call parameter variation implemented in the agents is not mentioned.

      In our relatively simple model, all individuals emit the same type of chirp call, with parameters adapted only based on the distance to the nearest detected object. However, individual variation is introduced by assigning each bat a terminal frequency drawn from a normal distribution with a standard deviation of 1 kHz, as described in the revised version -lines 519-520. This small variation is not used explicitly as a spectro-temporal cue for echo discrimination.

      In our model, all spectro-temporal variations—whether due to call structure or variations resulting from overlapping echoes from nearby reflectors—are processed through the filter bank, which compares the received echoes to the transmitted call during the detection stage. As such, the detection process itself can act as a discriminative filter, to some extent, based on similarity to the emitted call.

      We acknowledge that real bats likely rely on a variety of spectro-temporal features for distinguishing self from non-self-echoes—such as call duration, received level, multi-harmonic structure, or amplitude modulation. In our simulation, we focus on comparing two limiting conditions: full recognition of self-generated echoes versus full confusion. Implementing a more nuanced self-recognition mechanism based on temporal or spectral cues would be a valuable extension for future work.

      (6) References

      Reference 22: Formatting error - and extra '4' in the reference.

      The error has been fixed.

      (7) Thoughts/comments

      Even without 'recogntion' of walls & conspecifics, bats may be able to avoid obstacles - this is a neat result. Also, using their framework the authors show that successful 'blind' object-agnostic obstacle avoidance can occur only when supported by some sort of memory. In some sense, this is a nice intermediate step showing the role of memory in bat navigation. We know that bats have good long-term and long-spatial scale memory, and here the authors show that short-term spatial memory is important in situations where immediate sensory information is unreliable or unavailable.

      We appreciate the reviewer’s thoughtful summary. Indeed, one of the main takeaways of our study is that successful obstacle avoidance can occur even without explicit recognition of walls or conspecifics—provided that a clustered multi-call integration is in place. Our model shows that when immediate sensory information is unreliable, integrating detections over time becomes essential for effective navigation. This supports the broader view that memory, even on short timescales, plays an important role in bat behavior.

      (8) Reporting GLM results

      The p-value, t-statistic, and degrees of freedom are reported consistently across multiple GLM results. However, the most important part which is the effect size is not consistently reported - and this needs to be included in all results, and even in the table. The effect size provides an indicator of the parameter's magnitude, and thus scientific context.

      We agree that the effect size provides essential scientific context. In fact, we already include the effect size explicitly in Table 1, as shown in the “Effect Size” column for each tested parameter. These values describe the magnitude of each parameter’s effect on exit probability, jamming probability, and collision rate. In the main text, effect sizes are presented as concrete changes in performance metrics (e.g., “exit probability increased from 20% to 87%,” or “with a decrease of 3.5%±8% to 5.5%±5% (mean ± s.e.)”), which we believe improves interpretability and scientific relevance.  

      To further clarify this in the main text, we have reviewed the reported results and ensured that effect sizes are mentioned more consistently wherever GLM outcomes are discussed. Additionally, we have added a brief note in the table caption to emphasize that effect sizes are provided for all tested parameters.

      The 'tStat' appears multiple times and seems to be the output of the MATLAB GLM function. This acronym is specific to the MATLAB implementation and needs to be replaced with a conventionally used acronym such as 't', or the full form 't-statistic' too. This step is to keep the results independent of the programming language used.

      We have replaced all instances of tStat with the more conventional term ‘t’ throughout the manuscript to maintain consistency with standard reporting practices.

      Reviewer #2 (Recommendations for the authors):

      In addition to my public review, I had a few minor points that the authors may want to consider when revising their paper.

      (1) Figures 2, 3, and 4 may benefit from using different marker styles, in addition to different colors, to show the different cases.

      Thank you for the suggestion. In Figures 2–4, the markers represent means with standard error bars. To maintain clarity and consistency across all conditions, we have chosen to keep a standardized marker style – and we clarify this in the legend. We found that varying only the colors is sufficient for distinguishing between conditions without introducing visual clutter.

      (2) The text "PK" in the inset for Figure 2A is very difficult to read. I would suggest using grey as with "RM" in the other inset.

      We have updated the insert in Figure 2A to improve legibility.

      (3) Are the error bars in Figure 3 very small? I wasn't able to see them. If that is the case, the authors may want to mention this in the caption.

      You are correct—the error bars are present in all plots but appear very small due to the large number of simulation repetitions and low variability. We have revised the caption to explicitly mention this.

      (4) The species name of PK is spelled inconsistently (kuhli, khulli, and kuhlii).

      We have corrected the species name throughout the manuscript.

      (5) Table 1 is a great condensation of all the results, but the time to exit is missing. It may be helpful if summary statistics on that were here as well.

      We have added time-to-exit to the effect size column in Table 1, alongside the other performance metrics, to provide a more complete summary of the simulation results.

      (6) I may have missed it, but why are there two values for the exit probability when nominal flight speed is varied?

      The exit probability was not monotonic with flight speed, but rather showed a parabolic trend with a clear optimum. Therefore, we reported two values representing the effect before and after the peak. We have clarified this in the revised table and updated the caption accordingly.

      (7) Table 2 has an extra header after the page break on page 18.

      The extra header in Table 2 after the page break has been removed in the revised manuscript.

      (8) The G functions have 2 arguments in their definitions and Equation 1, but only one argument in Equations 2 and 3. I wasn't able to see why.

      Thank you for pointing this out. You are correct—this was a typographical error. We have corrected the argument notation in Equations 2 and 3 and explicitly included the frequency dependence of the gain (G) functions in both equations.

      (9) D_txrx was not defined but it was used in Equation 2.

      The variable D_txrx is defined in the equation notation section as: D<sub>₍ₜₓ</sub>r<sub>ₓ</sub> – the distance [m] between the transmitting conspecific and the receiving focal bat, from the transmitter’s perspective. We have now ensured that this definition is clearly linked to Equation 2 in the revised text. Moreover, we have added a supplementary figure that illustrates the geometric configuration defined by the equations to further support clarity, as described in the Public Review above.

      (10) It was hard for me to understand what was meant by phi_rx and phi_tx. These were described as angles between the rx or tx bats and the target, but I couldn't tell what the point defining the angle was. Perhaps a diagram would help, or more precise definitions.

      We have revised the caption to provide clearer and more precise definitions Additionally, we have included a geometric diagram as a supplementary figure, as noted in the Public Review above, to visually clarify the spatial relationships and angle definitions used in the equations, see lines 498-499.

      (11) Was the hearing threshold the same for both species?

      Yes. We have clarified it in the revised version.

      (12) Collision avoidance is described as turning to the "opposite direction" in the supplemental figure explaining the model. Is this 90 degrees or 180 degrees? If 90 degrees, how do these turns decide between right and left?

      In our model, the bat does not perform a fixed 90° or 180° turn. Instead, the avoidance behavior is implemented by setting the maximum angular velocity in the direction opposite to the detected echo. For example, if the obstacle or conspecific is detected on the bat’s right side, the bat begins turning left, and vice versa.

      This turning direction is re-evaluated at each decision step, which occurs after every echolocation pulse. The bat continues turning in the same direction if the obstacle remains in front, otherwise it resumes regular pathfinding. We have clarified this behavior in the updated figure caption and model description, see lines 478-493.

      Reviewer #3 (Recommendations for the authors):

      (1) Lines 27-31: These sentences mischaracterize the results. This claim appears to equate "the model works" with "this is what bats actually do." Also, the model does not indicate that bats' echolocation strategies are robust enough to mitigate the effects of jamming - this is self-evident from the fact that bats navigate successfully via echolocation in dense groups.

      Thank you for the comment. Our aim was not to claim that the model confirms actual bat behavior, but rather to demonstrate that simple and biologically plausible strategies—such as signal redundancy and basic pathfinding—are sufficient to explain how bats might cope with acoustic interference in dense settings. We have revised the wording to better reflect this goal and to avoid overinterpreting the model's implications.

      See abstract in the revised version.  

      (2) Line 37: This number underestimates the number of bats that form some of the largest aggregations of individuals worldwide - the free-tailed bats can form aggregations exceeding several million bats.

      We have revised the text to reflect that some bat species, such as free-tailed bats, are known to form colonies of several million individuals, which exceed the typical range. The updated sentence accounts for these extreme cases, see lines 36-37.

      (3) The flight densities explained in the introduction and chosen references are not representative of the literature - without providing additional justification for the chosen species, it can be interpreted that the selection of the species for the simulation is somewhat arbitrary. If the goal is to model dense emergence flight, why not use a species that has been studied in terms of acoustic and flight behavior during dense emergence flights---such as Tadarida brasiliensis?

      Our goal was to develop a general model applicable to a broad class of FMecholocating bat species. The two species we selected—Pipistrellus kuhlii (PK) and Rhinopoma microphyllum (RM)—span a wide range of signal characteristics: from wideband (PK) to narrowband (RM), providing a representative contrast in call structure. 

      Although we did not include Tadarida brasiliensis (TB) specifically, its echolocation calls are acoustically similar to RM in terminal frequency and fall between PK and RM in bandwidth. Therefore, we believe our findings are likely to generalize to TB and other FM-bats.

      Moreover, as noted in a previous response, the average inter-bat distance in our highest-density simulations (0.27 m) is still smaller than those reported for Tadarida brasiliensis during dense emergences—further supporting the relevance of our model to such scenarios.

      To support broader applicability, we also provide a supplementary graphical user interface (GUI) that allows users to modify key echolocation parameters and explore their impact on behavior—making the framework adaptable to additional species, including TB.

      (4) Line 78: It is not clear how (or even if) the simulated bats estimate the direction of obstacles. The explanation given in lines 457-463 is quite confusing. What is the acoustic/neurological mechanism that enables this direction estimation? If there is some mechanism (such as binaural processing), how does this extrapolate to 3D?

      This comment echoes a similar concern raised by a previous reviewer. As explained earlier, in the current simulation, the Direction of Arrival (DOA) was not modeled via an explicit binaural processing mechanism. The complete  is detailed in  to Reviewer #1, Line 457. This implementation is now clarified in the revised text, and a detailed description of the localization process is also provided in the Methods section (lines 583-592).

      (5) The authors propose they are modeling the dynamic echolocation of bats in the simulation (line 79), but it appears (whether this is due to a lack of information in the manuscript or true lack in the simulation) that the authors only modeled a flight response. How did the authors account for bats dynamically changing their echolocation? This is unclear and from what I can tell may just mean that the bats can switch between foraging phase call types depending on the distance to a detected obstacle. Can the authors elaborate more on this?

      The echolocation behavior of the bats—including dynamic call adjustments— was implemented in the simulation and is described in detail in the Methods section (lines 498-520 and Table 2). To avoid redundancy, the Results chapter originally referred to this section, but we have now added a brief explanation in the Results to clarify that the bats’ call parameters (IPI, duration, and frequency range) adapt based on the distance to detected objects, following empirically documented echolocation phases ("search," "approach," "buzz"). These dynamics are consistent with established bat behavior during navigation in cluttered environments such as caves.

      (6) Figure 1 C3: "Detection threshold": what is this and how was it derived?

      The caption also mentions yellow arrows, but they are absent from the figure. C4: Each threshold excursion is marked with an asterisk, but there are many more excursions than asterisks. Why are only some marked? Unclear.

      C3: The detection threshold is determined dynamically. It is set to the greater of either 7 dB above the noise level (0 dB-SPL)(Kick, 1982; Saillant et al., 1993; Sanderson et al., 2003; Boonman et al., 2013) or the maximal received level minus 70 dB, effectively applying a dynamic range of 70 dB. This clarification has been added to the Methods section. The yellow arrow has been added.

      C4: Thank you for this important observation. Only peaks marked with asterisks represent successful detections—those that were identified in both the interference-free and full detection conditions, as explained in the Methods. Other visible peaks result from masking signals or overlapping echoes from nearby reflectors, but they do not meet the detection criteria. To keep the figure caption concise, we have elaborated on this process more clearly in the revised Methods section. We added this information to the legend

      (7) Figure 2: A line indicating RM, No Masking is absent

      Thank you for pointing this out. The missing line for RM, No Masking has now been added in the revised version of Figure 2.

      (8) Line 121: "reflected off conspecifics". Does this mean echoes due to conspecifics?

      The phrase "reflected off conspecifics" refers to echoes originating from the bat’s own call and reflected off the bodies of nearby conspecifics. We have clarified the wording in the revised text to avoid confusion

      (9) Line 125: Why are low-frequency channels stimulated by higher frequencies? This needs further clarification.

      The cochlear filter bank in our model is implemented using gammatone filters, each modeled as an 8th-order Butterworth filter. Due to the non-ideal filter response and relatively broad bandwidths—especially in the lower-frequency channels—strong energy from the beginning of the downward FM chirp (at higher frequencies) can still produce residual activation in lower-frequency channels. While these stimulations are usually below the detection threshold, they may still be visible as early sub-threshold responses. Given the technical nature of this explanation (a property of the filter implementation) and it does not influence the detection outcomes, we have chosen not to elaborate on it in the figure caption or Methods.

      (10) Lines 146-150: This is an interesting finding. Is there a theoretical justification for it?

      This outcome arises directly from the simulation results. As noted in the Discussion (lines 359-365), although Pipistrellus kuhlii (PK) shows a modest advantage in jamming resistance due to its broader bandwidth, the redundancy in sensory information across calls—enabled by frequent echolocation—appears to compensate for these signal differences. As a result, the small variations in echo quality between species do not translate into significant differences in performance. We speculate that if the difference in jamming probability had been larger, performance disparities would likely have emerged.

      (11) Line 151: The authors define a jammed echo as an echo entirely missed due to masking. Is this appropriate? Doesn't echo mis-assignment also constitute jamming?

      We agree that echo mis-assignment can also degrade performance; however, in our model, we distinguish between two outcomes: (1) complete masking (echo not detected), and (2) detection with a localization error. As explained in the Methods (lines 500–507), we run the detection analysis twice—once with only desired echoes (“interference-free detection”) and once including masking signals (“full detection”). If a previously detected echo is no longer detected, it is classified as a jammed echo. If the echo is still detected but the delay shifts by more than 100 µs compared to the interference-free condition, it is also considered jammed. If the delay shift is smaller, it is treated as a detection with localization error rather than full jamming. We have clarified this distinction in the revised Methods section.

      (12) Figure 2-E: Detection probability statistics are of limited usefulness without accompanying false alarm rate (FAR) statistics. Do the authors have FAR numbers?

      We understand FAR to refer to instances where masking signals or other acoustic phenomena are mistakenly interpreted as real echoes from physical objects. As explained in the manuscript, we implemented two model versions: one without confusion, and one with full confusion.

      Figure 2E reports detection performance under the non-confusion model, in which only echoes from actual physical reflectors are used, and no false detections occur—hence, the false alarm rate is effectively zero in this condition. In the full-confusion model, all detected echoes—including those originating from masking signals or conspecific calls—are treated as valid detections, which may include false alarms. However, we did not explicitly quantify the false alarm rate as a separate metric in this simulation.

      We agree that tracking FAR could be informative and will consider incorporating it into future versions of the model.

      (13) Line 161: RM bats suffered from a significantly higher probability of the "desired conspecific's echoes" being jammed. What does "desired conspecific's echoes" mean? This is unclear.

      The term “desired conspecific's echoes” refers to echoes originating from the bat’s own call, reflected off nearby conspecifics, which are treated as relevant reflectors for collision avoidance. We have revised the wording in the text for clarity.

      (14) Line 188: Why didn't the size of the integration window affect jamming probability? I couldn't find this explained in the discussion.

      The jamming probability in our analysis is computed at the individual-echo level, prior to any temporal integration. Since the integration window is applied after the detection step, it does not influence whether a specific echo is masked (i.e., jammed) or not. Therefore, as expected, we did not observe a significant effect of integration window size on jamming probability.

      (15) Line 217-218: Why do the authors think this would be?

      Thank you for the thoughtful question. We agree that, in theory, increasing call intensity should raise the levels of both desired echoes and masking signals proportionally. However, in our model, the environmental noise floor and detection threshold remain constant, meaning that higher call intensities increase the signal-to-noise ratio (SNR) more effectively for weaker echoes, especially those at longer distances or with low reflectivity. This could lead to a higher likelihood of those echoes crossing the detection threshold, resulting in a small but measurable reduction in jamming probability.

      Additionally, the non-linear behavior of the filter-bank receiver—including such as thresholding at multiple stages—can introduce asymmetries in how increased signal levels affect the detection of target versus masking signals.

      That said, the effect size was small, and the improvement in jamming probability did not translate into any significant gain in behavioral performance (e.g., exit probability or collision rate), as shown in Figure 3C.

      (16) Line 233: I'm not sure I understand how a slightly improved aggregation model that clustered detected reflectors over one-second periods is different. Doesn't this just lead to on average more calls integrated into memory?

      While increasing the memory duration does lead to more detections being available, the enhanced aggregation model (we now refer to as multi-call clustering) differs fundamentally from the simpler one. As detailed in the Methods, it includes additional processing steps: clustering spatially close detections, removing outliers, and estimating wall directions based on the spatial structure of clustered echoes. In contrast, the simpler model treats each detection as an isolated point without estimating obstacle orientation. These additional steps allow for more robust environmental interpretation and significantly improve performance under high-confusion conditions. We have clarified it in revised text (lines 606-616) and added a Supplementary Figure 2B.

      (17) Table 1: What about conspecific target strength?

      We have now added the conspecific target strength as a tested parameter in Table 1, along with its tested range, default value, and measured effect sizes. A detailed sensitivity analysis is also presented in Supplementary Figure 4, demonstrating that variations in conspecific target strength had relatively minor effects on performance metrics.  

      (18) Figure 3-A: The x-axis is the number of calls in the integration window. But the leftmost sample on each curve is at 0 calls. Shouldn't this be 1?

      “0 calls” refers to the case where only the most recent call is used for pathfinding—without integrating any information from prior calls. The x-axis reflects the number of previous calls stored in memory, so a value of 0 still includes the current call. We’ve clarified this terminology in the figure caption.

      (19) Lines 282-283: This statement needs to be clarified that it is with the constraints of using a 2D simulation with at most 33 bats/m^2. It also should be clarified that it is assumed the bat can reliably distinguish between its own echoes and conspecific echoes, which is a very important caveat.

      We have revised the text to clarify that the results are based on a 2D simulation with a maximum tested density of 33 bats/m². We also now explicitly state that the model assumes bats can distinguish between their own echoes and those generated by conspecifics—an assumption we recognize as a simplification. These clarifications help place the results within the scope and constraints of the simulation. Moreover, as described in the text (and noted in previous response): the average distance to the nearest bat in our simulation is 0.27m (with 100 bats), whereas reported distances in very dense colonies are 0.5m

      (20) Line 294: What is this sentence referring to?

      The sentence refers to the finding that, even under high bat densities, a substantial portion of the echoes—particularly those reflected from nearby obstacles (e.g., 1 m away)—were jammed due to masking. Nevertheless, the bats in the simulation were still able to navigate successfully using partial sensory input. We have clarified the sentence in the revised text to make this point more explicit, see line 333-336.

      (21) Line 302: Was jamming less likely when IPI was higher or lower? I could not find this demonstrated anywhere in the manuscript.

      We agree that the original text was not sufficiently clear on this point. While we did not explicitly test fixed IPI values as a parameter, the model does simulate the natural behavior of decreasing IPI as bats approach obstacles. This behavior is supported by empirical observations and is incorporated into the echolocation dynamics of the simulation. We have clarified this point in the revised text (see Lines 346-351) and explained that while lower IPI introduces more acoustic overlap, it also increases redundancy and improves detection through temporal integration.

      (22) Lines 313-314: This is an interesting assumption, but it is not evident that is substantiated by the references.

      The claim is based on well-established principles in signal processing and bioacoustics. Wideband signals—such as those emitted by PK bats— distribute their energy over a broader frequency range, which makes them inherently more resistant to narrowband interference and masking. This concept is commonly applied in both biological and artificial sonar systems and is supported by empirical studies in bats and theory in acoustic sensing.

      For example, Beleyur & Goerlitz (2019) demonstrate that broader bandwidth calls improve detection in cluttered and jamming-prone environments. Similarly, Ulanovsky et al. (2004) and Schnitzler & Kalko (200) discuss how FM bats' wideband calls enhance temporal and spatial resolution, helping to reduce the impact of overlapping signals from conspecifics. These findings align with communication theory where spread-spectrum techniques improve robustness in noisy environments.

      We agree with the reviewer that this is an important point and we have updated the manuscript to clarify this rationale and cite the relevant literature accordingly – lines 631-363,

      (23) Lines 318-319: What is the justification for "probably"? Isn't this just a supposition?

      We agree with the reviewer’s point and have rephrased the sentence

      (24) Line 320: How does this 63% performance match the sentence in line 295?

      The sentence in Line 295 refers to the overall ability of the bats to navigate successfully despite high jamming levels, highlighting the robustness of the strategy under challenging conditions. The figure in Line 320 (63%) quantifies this performance under the most extreme simulated scenario (100 bats / 3 m²), where both spatial and acoustic interferences are maximal. We have rephrased the text in the revised version (lines 324-327).

      (25) Lines 341-345: It seems like this is more likely to be the main takeaway of the paper.

      As noted in the Public Review above, there is substantial literature supporting the assumption that bats can recognize their own echoes and distinguish them from those of conspecifics (e.g., Schnitzler, Bioscience, 2001; Kazial et al., 2001, 2008; Burnett & Masters, 2002; Chiu et al., 2009; Yovel et al., 2009; Beetz & Hechavarría, 2022). Therefore, we consider our assumption of selfrecognition to be well-supported, at least under typical conditions. That said, we agree that the impact of echo confusion on performance is significant and highlights a critical challenge in dense environments.

      To our knowledge, this is the first computational model to explicitly simulate both self-recognition and full echo confusion under high-density conditions. We believe that the combination of modeled constraints and the demonstrated robustness of simple sensorimotor strategies, even under worst-case assumptions, is what makes this contribution both novel and meaningful.

      (26) Lines 349-350: What is the aggregation model? What is meant by "integration"?

      We have revised the text to clarify that the “aggregation model” refers to a multi-call clustering process that includes clustering of detections, removal of outliers, and estimation of wall orientation, as described in detail in the revised Methods and Results sections.

      (27) Line 354: Again, why isn't this the assumption we're working under?

      As addressed in our response to Comment 25, our primary model assumes that bats can recognize their own echoes—an assumption supported by substantial empirical evidence. The alternative "full confusion" model was included to explore a worst-case scenario and highlight the behavioral consequences of failing to distinguish self from conspecific echoes. We assume that real bats may experience some degree of echo misidentification; however, our assumption of full confusion represents a worst-case scenario.

      (28) Line 382: "Under the assumption that..." I agree that bats probably can, but if we assume they can differentiate them all, where's the jamming problem?

      The assumption that bats can theoretically distinguish between different signal sources applies after successful detection. However, the jamming problem arises during the detection and localization stages, where acoustic interference can prevent echoes from crossing the detection threshold or distort their timing.

      (29) Lines 386-387: The paper referenced focused on JAR in the context of foraging. What changes were made to the simulation to switch to obstacle avoidance?

      While the simulation framework in Mazar & Yovel (2020) was developed to study jamming avoidance during foraging, the core components—such as the acoustic calculations, receiver model, and echolocation behavior—remain applicable. For the current study, we adapted the simulation extensively to address colony-exit behavior. These modifications include modeling cave walls as acoustic reflectors, implementing a pathfinding algorithm, integrating obstacle-avoidance maneuvers, and adapting the integration window and integration processes. These updates are detailed throughout the Methods section.

      (30) Line 400-402: Something doesn't add up with the statement: each decision relies on an integration window that records estimated locations of detected reflectors from the last five echolocation calls, with the parameter being tested between 1 and 10 calls. Can the authors reword this to make it less confusing?

      We have reworded the sentence to clarify that the default integration window includes five calls, while we systematically tested the effect of using 1 to 10 calls, see lines 486-487.

      (31) Line 393: "30 deg/sec" why was this value chosen?

      The turning rate of 30 deg/sec was manually selected to approximate the curvature of natural foraging flight paths observed in Rhinopoma microphyllum using on-board tags. Moreover, in Mazar & Yovel (2020), we showed that the flight dynamics of simulated bats in a closed room closely matched those of Pipistrellus kuhlii flying in a room of similar dimensions. However, in the current simulation, bats rarely follow a random-walk trajectory due to the structured environment and frequent obstacle detection. As a result, this parameter has no meaningful impact on the simulation outcomes.

      (32) Line 412: "Harmony" --- do you mean harmonic? And what is the empirical evidence that RM bats use the 2nd harmonic compared to the 1st?

      Perhaps showing a spectrogram of a real RM signal would be helpful.

      The typo-error was corrected. For reference See (Goldshtein et al., 2025)

      (33) Table 2: Something is incorrect with the table. The first row on the next page is the wrong species name. Also, where are the citations for these parameter values?

      The table header has been corrected in the revised version. The parameter values for flight and echolocation behavior were derived from existing literature and empirical data: Pipistrellus kuhlii parameters were based on Kalko (1995), and Rhinopoma microphyllum parameters were extracted from our own recordings using on-board tags, as described in Goldstein et al. (2025). We have added the appropriate citations to Table 2.

      (34) Line 442: How was the threshold level chosen?

      The detection threshold in each level is set to the greater of either 7 dB above the noise level (0 dB-SPL) or the maximal received level minus 70 dB, effectively applying a dynamic range of 70 dB.

      (35) Line 445: 100 micros: This is about 3cm. The resolution of PK is about 1cm. For RM it's about 10cm. So, this window is generous for PK, but too strict for RM.

      To keep the model simple and avoid introducing species-specific detection thresholds, we selected a biologically plausible compromise that could reasonably apply to both species. This simplification ensures consistency across simulations while remaining within the known behavioral range.

      (36) Line 448: What is the spectrum of the Gaussian noise, and did it change between PK and RM?

      We used the same white Gaussian noise with a flat spectrum across the relevant frequency range (10–80 kHz) for both species. We have clarified this in the revised text in lines 570-572.

      (37) Line 451: 4 milliseconds is 1.3m. Is this appropriate?

      The 4 milliseconds window was selected based on established auditory masking thresholds described in Mazar & Yovel (2020), and supported by (Popper and Fay, 1995) ch. 2.4.5, ((Blauert, 1997),  ch. 3.1 and (Mohl and Surlykke, 1989). These values provide conservative lower bounds on bats’ ability to cope with masking (Beleyur and Goerlitz, 2019). For simplicity, we used constant thresholds within each window, see lines 574-576.  

      (38) Line 452: Citation for the forward and backward masking durations?

      See the  to the previous comment.

      (39) Lines 460-461: This is unclear. How does the bat get directional information? The authors claim to be able to measure direction-of-arrival for each detection, but it is not clear how this is done

      As noted in our response to Reviewer 1 (Comment on Line 457), directional information is not computed via an explicit binaural model. Instead, we assume the bat estimates the direction of arrival with an angular error that depends on the SNR, based on established studies (e.g., Simmons et al., 1983; Popper & Fay, 1995). We have clarified this in the revised text in lines 583-592.

      (40) Line 467: It seems like the authors are modeling pulse-echo ambiguity, at least in this one alternative model, which is good! However the alternative model doesn't get much attention in the paper. Is there a reason for this?

      We would like to clarify that we did not model pulse-echo. In our confusion model, all echoes received within the IPI are attributed to the bat’s most recent call. This includes echoes that may in fact originate from conspecific calls, but the model does not assign self-echoes to earlier pulses or span multiple IPIs. Therefore, while the model captures echo confusion, it does not include true pulse-echo ambiguity. We have clarified this point in the revised text in lines 551-553.

      (41) Line 41: "continuous" is more appropriate than "constant".

      Thank you, we have rephrased the text accordingly.

      (42) Line 69: "band width" should be one word.

      Thank you, we have corrected it to “bandwidth”.

      (43) Line 79: "bats" should be in the possessive.

      Thank you, the text has been rephrased.

      (44) Line 128: "convoluted" don't you mean "convolved"?

      We have replaced “convoluted” with the correct term “convolved” in the revised text.

      (45) Please check your references, as there are some incomplete citations and typos.

      Thank you, we have reviewed and corrected all references for completeness and consistency.

      References

      Beetz, M.J. and Hechavarría, J.C. (2022) ‘Neural Processing of Naturalistic Echolocation Signals in Bats’, Frontiers in Neural Circuits, 16, p. 899370. Available at: https://doi.org/10.3389/FNCIR.2022.899370/BIBTEX.

      Beleyur, T. and Goerlitz, H.R. (2019) ‘Modeling active sensing reveals echo detection even in large groups of bats’, Proceedings of the National Academy of Sciences of the United States of America, 116(52), pp. 26662–26668. Available at: https://doi.org/10.1073/pnas.1821722116.

      Betke, M. et al. (2008) ‘Thermal Imaging Reveals Significantly Smaller Brazilian Free-Tailed Bat Colonies Than Previously Estimated’, Journal of Mammalogy, 89(1), pp. 18–24. Available at: https://doi.org/10.1644/07-MAMM-A-011.1.

      Blauert, J. (1997) ‘Spatial Hearing: The Psychophysics of Human Sound Localization (rev. ed.)’.

      Boerma, D.B. et al. (2019) ‘Wings as inertial appendages: How bats recover from aerial stumbles’, Journal of Experimental Biology, 222(20). Available at: https://doi.org/10.1242/JEB.204255/VIDEO-3.

      Boonman, A. et al. (2013) ‘It’s not black or white-on the range of vision and echolocation in echolocating bats’, Frontiers in Physiology, 4 SEP(September), pp. 1–12. Available at: https://doi.org/10.3389/fphys.2013.00248.

      Boonman, A.M., Parsons, S. and Jones, G. (2003) ‘The influence of flight speed on the ranging performance of bats using frequency modulated echolocation pulses’, The Journal of the Acoustical Society of America, 113(1), p. 617. Available at: https://doi.org/10.1121/1.1528175.

      Burnett, S.C. and Masters, W.M. (2002) ‘Identifying Bats Using Computerized Analysis and Artificial Neural Networks’, North American Symposium on Bat Research, 9.

      Chili, C., Xian, W. and Moss, C.F. (2009) ‘Adaptive echolocation behavior in bats for the analysis of auditory scenes’, Journal of Experimental Biology, 212(9), pp. 1392–1404. Available at: https://doi.org/10.1242/jeb.027045.

      Fujioka, E. et al. (2021) ‘Three-Dimensional Trajectory Construction and Observation of Group Behavior of Wild Bats During Cave Emergence’, Journal of Robotics and Mechatronics, 33(3), pp. 556–563. Available at: https://doi.org/10.20965/jrm.2021.p0556.

      Gillam, E.H. et al. (2010) ‘Echolocation behavior of Brazilian free-tailed bats during dense emergence flights’, Journal of Mammalogy, 91(4), pp. 967–975. Available at: https://doi.org/10.1644/09-MAMM-A-302.1.

      Goldshtein, A. et al. (2025) ‘Onboard recordings reveal how bats maneuver under severe acoustic interference’, Proceedings of the National Academy of Sciences, 122(14), p. e2407810122. Available at: https://doi.org/10.1073/PNAS.2407810122.

      Griffin, D.R., Webster, F.A. and Michael, C.R. (1958) ‘THE ECHOLOCATION OF FLYING INSECTS BY BATS ANIMAL BEHAVIOUR , Viii , 3-4’.

      Hagino, T. et al. (2007) ‘Adaptive SONAR sounds by echolocating bats’, International Symposium on Underwater Technology, UT 2007 - International Workshop on Scientific Use of Submarine Cables and Related Technologies 2007, pp. 647–651. Available at: https://doi.org/10.1109/UT.2007.370829.

      Hiryu, S. et al. (2008) ‘Adaptive echolocation sounds of insectivorous bats, Pipistrellus abramus, during foraging flights in the field’, The Journal of the Acoustical Society of America, 124(2), pp. EL51–EL56. Available at: https://doi.org/10.1121/1.2947629.

      Jakobsen, L. et al. (2024) ‘Velocity as an overlooked driver in the echolocation behavior of aerial hawking vespertilionid bats’. Available at: https://doi.org/10.1016/j.cub.2024.12.042. Jakobsen, L., Brinkløv, S. and Surlykke, A. (2013) ‘Intensity and directionality of bat echolocation signals’, Frontiers in Physiology, 4 APR(April), pp. 1–9. Available at: https://doi.org/10.3389/fphys.2013.00089.

      Jakobsen, L. and Surlykke, A. (2010) ‘Vespertilionid bats control the width of their biosonar sound beam dynamically during prey pursuit’, 107(31). Available at:

      https://doi.org/10.1073/pnas.1006630107.

      Kalko, E.K. V. (1995) ‘Insect pursuit, prey capture and echolocation in pipistrelle bats (Microchirptera)’, Animal Behaviour, 50(4), pp. 861–880.

      Kazial, K.A., Burnett, S.C. and Masters, W.M. (2001) ‘ Individual and Group Variation in Echolocation Calls of Big Brown Bats, Eptesicus Fuscus (Chiroptera: Vespertilionidae) ’, Journal of Mammalogy, 82(2), pp. 339–351. Available at: https://doi.org/10.1644/15451542(2001)082<0339:iagvie>2.0.co;2.

      Kazial, K.A., Kenny, T.L. and Burnett, S.C. (2008) ‘Little brown bats (Myotis lucifugus) recognize individual identity of conspecifics using sonar calls’, Ethology, 114(5), pp. 469– 478. Available at: https://doi.org/10.1111/j.1439-0310.2008.01483.x.

      Kick, S.A. (1982) ‘Target-detection by the echolocating bat, Eptesicus fuscus’, Journal of Comparative Physiology □ A, 145(4), pp. 431–435. Available at: https://doi.org/10.1007/BF00612808/METRICS.

      Kothari, N.B. et al. (2014) ‘Timing matters: Sonar call groups facilitate target localization in bats’, Frontiers in Physiology, 5 MAY. Available at: https://doi.org/10.3389/fphys.2014.00168.

      Mohl, B. and Surlykke, A. (1989) ‘Detection of sonar signals in the presence of pulses of masking noise by the echolocating bat , Eptesicus fuscus’, pp. 119–124.

      Moss, C.F. and Surlykke, A. (2010) ‘Probing the natural scene by echolocation in bats’, Frontiers in Behavioral Neuroscience. Available at: https://doi.org/10.3389/fnbeh.2010.00033.

      Neretti, N. et al. (2003) ‘Time-frequency model for echo-delay resolution in wideband biosonar’, The Journal of the Acoustical Society of America, 113(4), pp. 2137–2145. Available at: https://doi.org/10.1121/1.1554693.

      Popper, A.N. and Fay, R.R. (1995) Hearing by Bats. Springer-Verlag.

      Roy, S. et al. (2019) ‘Extracting interactions between flying bat pairs using model-free methods’, Entropy, 21(1). Available at: https://doi.org/10.3390/e21010042.

      Sabol, B.M. and Hudson, M.K. (1995) ‘Technique using thermal infrared-imaging for estimating populations of gray bats’, Journal of Mammalogy, 76(4). Available at: https://doi.org/10.2307/1382618.

      Saillant, P.A. et al. (1993) ‘A computational model of echo processing and acoustic imaging in frequency- modulated echolocating bats: The spectrogram correlation and transformation receiver’, The Journal of the Acoustical Society of America, 94(5). Available at: https://doi.org/10.1121/1.407353.

      Salles, A., Diebold, C.A. and Moss, C.F. (2020) ‘Echolocating bats accumulate information from acoustic snapshots to predict auditory object motion’, Proceedings of the National Academy of Sciences of the United States of America, 117(46), pp. 29229–29238. Available at: https://doi.org/10.1073/PNAS.2011719117/SUPPL_FILE/PNAS.2011719117.SAPP.PDF.

      Sanderson, M.I. et al. (2003) ‘Evaluation of an auditory model for echo delay accuracy in wideband biosonar’, The Journal of the Acoustical Society of America, 114(3), pp. 1648– 1659. Available at: https://doi.org/10.1121/1.1598195.

      Schnitzler, H., Bioscience, E.K.- and 2001, undefined (no date) ‘Echolocation by insecteating bats: we define four distinct functional groups of bats and find differences in signal structure that correlate with the typical echolocation ’, academic.oup.comHU Schnitzler, EKV KalkoBioscience, 2001•academic.oup.com [Preprint]. Available at: https://academic.oup.com/bioscience/article-abstract/51/7/557/268230 (Accessed: 17 March 2025).

      Schnitzler, H.-U. et al. (1987) ‘The echolocation and hunting behavior of the bat,Pipistrellus kuhli’, Journal of Comparative Physiology A, 161(2), pp. 267–274. Available at: https://doi.org/10.1007/BF00615246.

      Simmons, J.A. et al. (1983) ‘Acuity of horizontal angle discrimination by the echolocating bat , Eptesicus fuscus’. Simmons, J.A. and Kick, S.A. (1983) ‘Interception of Flying Insects by Bats’, Neuroethology and Behavioral Physiology, pp. 267–279. Available at: https://doi.org/10.1007/978-3-64269271-0_20.

      Surlykke, A., Ghose, K. and Moss, C.F. (2009) ‘Acoustic scanning of natural scenes by echolocation in the big brown bat, Eptesicus fuscus’, Journal of Experimental Biology, 212(7), pp. 1011–1020. Available at: https://doi.org/10.1242/JEB.024620.

      Theriault, D.H. et al. (no date) ‘Reconstruction and analysis of 3D trajectories of Brazilian free-tailed bats in flight’, cs-web.bu.edu [Preprint]. Available at: https://csweb.bu.edu/faculty/betke/papers/2010-027-3d-bat-trajectories.pdf (Accessed: 4 May 2023).

      Ulanovsky, N. and Moss, C.F. (2008) ‘What the bat’s voice tells the bat’s brain’, Proceedings of the National Academy of Sciences of the United States of America, 105(25), pp. 8491– 8498. Available at: https://doi.org/10.1073/pnas.0703550105. Vanderelst, D. and Peremans, H. (2018) ‘Modeling bat prey capture in echolocating bats : The feasibility of reactive pursuit’, Journal of theoretical biology, 456, pp. 305–314.

      Yovel, Y. et al. (2009) ‘The voice of bats: How greater mouse-eared bats recognize individuals based on their echolocation calls’, PLoS Computational Biology, 5(6). Available at: https://doi.org/10.1371/journal.pcbi.1000400.

      Yovel, Y. and Ulanovsky, N. (2017) ‘Bat Navigation’, The Curated Reference Collection in Neuroscience and Biobehavioral Psychology, pp. 333–345. Available at: https://doi.org/10.1016/B978-0-12-809324-5.21031-6.

    1. eLife Assessment

      In this fundamental manuscript, Richter et al. present a thorough anatomical characterization of the Drosophila melanogaster larval pharyngeal sensory system, which is involved in taste-guided behaviors. This study fills a major gap in the larval sensory map, providing a compelling neuroanatomical foundation for future investigations into sensory circuits and behavior. The data presented here are of exceptional quality and will be of interest to the Drosophila neurobiology community.

    2. Reviewer #1 (Public review):

      Summary:

      The authors provide a detailed ultrastructural analysis of the larval pharyngeal sensory organs, including the dorsal pharyngeal sensilla, dorsal pharyngeal organ, ventral pharyngeal sensilla, and posterior pharyngeal sensilla. Using electron microscopy and 3D reconstruction, Richter et al., present a comprehensive mapping and classification of pharyngeal sensory structures, defining mthe orphological type of pharyngeal sensilla based on ultrastructure and generating a neuron-to-sensillum map. These findings significantly advance our understanding of internal larval sensory systems and establish a robust framework for future functional studies in coordination with external sensory systems.

      Strengths:

      The application of high-resolution electron microscopy and 3D imaging analysis successfully overcomes technical challenges associated with visualizing deep internal structures. This enables an unprecedented level of anatomical detail of the larval pharyngeal sensory system. Thus, the study complements and completes existing maps of larval sensory circuits, contributing a comprehensive neuroanatomical characterization of larval sensory input pathways. These insights will inform future studies on larval behavior, sensory processing, and may also have applied relevance for insect control strategies.

      Weaknesses:

      While the manuscript is concise, clearly written, and methodologically rigorous, it primarily addresses a specialized readership with expertise in insect neuroanatomy.

    3. Reviewer #2 (Public review):

      Summary:

      This manuscript documents the structure of the pharyngeal nervous system of the Drosophila larva. The authors wanted to achieve a detailed ultrastructural reconstruction of the gustatory sensory organs in the Drosophila pharynx. Using serial EM and the associated bioinformatics tools, they have achieved their goal. The paper is written clearly and illustrated beautifully with 3D models and annotated sections. The data will significantly enrich the field of Drosophila neurobiology.

      Strengths:

      Given the dataset, the findings presented are solid and will be an important work of reference for the future.

      Weaknesses:

      Previous work, including EM, on the pharyngeal sensory organ is not sufficiently referenced and used for comparison with the data presented in this study.

    4. Author Response:

      We thank the reviewers and editors for their thoughtful and constructive feedback on our manuscript, “Morphology and ultrastructure of pharyngeal sense organs of Drosophila larvae.” We are pleased that both reviewers found our ultrastructural analysis and 3D reconstructions of the larval pharyngeal sensory system to be of high quality, and we appreciate the recognition of the study’s significance and potential impact on the Drosophila neurobiology field.

      We want to address the concern raised regarding the limited referencing and comparison with previous work on pharyngeal sensory organs, particularly in adult Drosophila and other insect species.

      As noted by the reviewers, our manuscript is concise and focused. We want to clarify that we initially prepared and submitted this study with the intention of it being considered as a Short Report, which comes with limitations on the number of characters and figures that can be included. During the submission process, we were asked by the editors if we would like to submit our work as a full-length Research Advance, which we agreed to.

      That said, we are now happy to expand the discussion in the broader context of related studies — including prior EM and anatomical work — which would enrich the manuscript and provide readers with a deeper comparative perspective.

      We are grateful for the positive assessment of our manuscript and for the opportunity to clarify this point.

      Sincerely,

      Vincent Richter and Andreas S. Thum

    1. eLife Assessment

      This important work provides convincing evidence of the cognitive and neural mechanisms that give rise to feelings of shame and guilt, as well as their transformation into compensatory behavior. The authors combine well-designed manipulations of responsibility and harm with computational cognitive modeling and neuroimaging to provide a comprehensive account of how emotions are experienced and acted upon.

    2. Reviewer #1 (Public review):

      Summary:

      This work provides important new evidence of the cognitive and neural mechanisms that give rise to feelings of shame and guilt, as well as their transformation into compensatory behavior. The authors use a well-designed interpersonal task to manipulate responsibility and harm, eliciting varying levels of shame and guilt in participants. The study combines behavioral, computational, and neuroimaging approaches to offer a comprehensive account of how these emotions are experienced and acted upon. Notably, the findings reveal distinct patterns in how harm and responsibility contribute to guilt and shame and how these factors are integrated into compensatory decision-making.

      Strengths:

      (1) Investigating both guilt and shame in a single experimental framework allows for a direct comparison of their behavioral and neural effects while minimizing confounds.

      (2) The study provides a novel contribution to the literature by exploring the neural bases underlying the conversion of shame into behavior.

      (3) The task is creative and ecologically valid, simulating a realistic social situation while retaining experimental control.

      (4) Computational modeling and fMRI analysis yield converging evidence for a quotient-based integration of harm and responsibility in guiding compensatory behavior.

      Weaknesses:

      (1) Post-experimental self-reports rely both on memory and on the understanding of the conceptual difference between the two emotions. Additionally, it is unclear whether the 16 scenarios were presented in random order; sequential presentation could have introduced contrast effects or demand characteristics.

      (2) In the neural analysis of emotion sensitivity, the authors identify brain regions correlated with responsibility-driven shame sensitivity and then use those brain regions as masks to test whether they were more involved in the responsibility-driven shame sensitivity than the other types of emotion sensitivity. I wonder if this is biasing the results. Would it be better to use a cross-validation approach? A similar issue might arise in "Activation analysis (neural basis of compensatory sensitivity)."

      Additional comments and questions:

      (1) Regarding the traits of guilt and shame, I appreciate using the scores from the subscales (evaluations and action tendencies) separately for the analyses (instead of a composite score). An issue with using the actions subscales when measuring guilt and shame proneness is that the behavioral tendencies for each emotion get conflated with their definitions, risking circularity. It is reassuring that the behavior evaluation subscale was significantly correlated with compensatory behavior (not only the action tendencies subscale). However, the absence of significant neural correlates for the behavior evaluation subscale raises questions: Do the authors have thoughts on why this might be the case, and any implications?

      (2) Regarding the computational model finding that participants seem to disregard self-interest, do the authors believe it may reflect the relatively small endowment at stake? Do the authors believe this behavior would persist if the stakes were higher? Additionally, might the type of harm inflicted (e.g., electric shock vs. less stigmatized/less ethically charged harm like placing a hand in ice-cold water) influence the weight of self-interest in decision-making?

      Taken together, the conclusions of the paper are well supported by the data. It would be valuable for future studies to validate these findings using alternative tasks or paradigms to ensure the robustness and generalizability of the observed behavioral and neural mechanisms.

    3. Reviewer #2 (Public review):

      Summary:

      The authors combined behavioral experiments, computational modeling, and functional magnetic resonance imaging (fMRI) to investigate the psychological and neural mechanisms underlying guilt, shame, and the altruistic behaviors driven by these emotions. The results revealed that guilt is more strongly associated with harm, whereas shame is more closely linked to responsibility. Compared to shame, guilt elicited a higher level of altruistic behavior. Computational modeling demonstrated how individuals integrate information about harm and responsibility. The fMRI findings identified a set of brain regions involved in representing harm and responsibility, transforming responsibility into feelings of shame, converting guilt and shame into altruistic actions, and mediating the effect of trait guilt on compensatory behavior.

      Strengths:

      This study offers a significant contribution to the literature on social emotions by moving beyond prior research that typically focused on isolated aspects of guilt and shame. The study presents a comprehensive examination of these emotions, encompassing their cognitive antecedents, affective experiences, behavioral consequences, trait-level characteristics, and neural correlates. The authors have introduced a novel experimental task that enables such a systematic investigation and holds strong potential for future research applications. The computational modeling procedures were implemented in accordance with current field standards. The findings are rich and offer meaningful theoretical insights. The manuscript is well written, and the results are clearly and logically presented.

      Weaknesses:

      In this study, participants' feelings of guilt and shame were assessed retrospectively, after they had completed all altruistic decision-making tasks. This reliance on memory-based self-reports may introduce recall bias, potentially compromising the accuracy of the emotion measurements.

      In many behavioral economic models, self-interest plays a central role in shaping individual decision-making, including moral decisions. However, the model comparison results in this study suggest that models without a self-interest component (such as Model 1.3) outperform those that incorporate it (such as Model 1.1 and Model 1.2). The authors have not provided a satisfactory explanation for this counterintuitive finding.

      The phrases "individuals integrate harm and responsibility in the form of a quotient" and "harm and responsibility are integrated in the form of a quotient" appear in the Abstract and Discussion sections. However, based on the results of the computational modeling, it is more accurate to state that "harm and the number of wrongdoers are integrated in the form of a quotient." The current phrasing misleadingly suggests that participants represent information as harm divided by responsibility, which does not align with the modeling results. This potentially confusing expression should be revised for clarity and accuracy.

      In the Discussion, the authors state: "Since no brain region associated with social cognition showed significant responses to harm or responsibility, it appears that the human brain encodes a unified measure integrating harm and responsibility (i.e., the quotient) rather than processing them as separate entities when both are relevant to subsequent emotional experience and decision-making." However, this interpretation overstates the implications of the null fMRI findings. The absence of significant activation in response to harm or responsibility does not necessarily imply that the brain does not represent these dimensions separately. Null results can arise from various factors, including limitations in the sensitivity of fMRI. It is possible that more fine-grained techniques, such as intracranial electrophysiological recordings, could reveal distinct neural representations of harm and responsibility. The interpretation of these null findings should be made with greater caution.

    4. Reviewer #3 (Public review):

      Summary:

      Zhu et al. set out to elucidate how the moral emotions of guilt and shame emerge from specific cognitive antecedents - harm and responsibility - and how these emotions subsequently drive compensatory behavior. Consistent with their prediction derived from functionalist theories of emotion, their behavioral findings indicate that guilt is more influenced by harm, whereas shame is more influenced by responsibility. In line with previous research, their results also demonstrate that guilt has a stronger facilitating effect on compensatory behavior than shame. Furthermore, computational modeling and neuroimaging results suggest that individuals integrate harm and responsibility information into a composite representation of the individual's share of the harm caused. Brain areas such as the striatum, insula, temporoparietal junction, lateral prefrontal cortex, and cingulate cortex were implicated in distinct stages of the processing of guilt and/or shame. In general, this work makes an important contribution to the field of moral emotions. Its impact could be further enhanced by clarifying methodological details, offering a more nuanced interpretation of the findings, and discussing their potential practical implications in greater depth.

      Strengths:

      First, this work conceptualizes guilt and shame as processes unfolding across distinct stages (cognitive appraisal, emotional experience, and behavioral response) and investigates the psychological and neural characteristics associated with their transitions from one stage to the next.

      Second, the well-designed experiment effectively manipulates harm and responsibility - two critical antecedents of guilt and shame.

      Third, the findings deepen our understanding of the mechanisms underlying guilt and shame beyond what has been established in previous research.

      Weaknesses:

      (1) Over the course of the task, participants may gradually become aware of their high error rate in the dot estimation task. This could lead them to discount their own judgments and become inclined to rely on the choices of other deciders. It is unclear whether participants in the experiment had the opportunity to observe or inquire about others' choices. This point is important, as the compensatory decision-making process may differ depending on whether choices are made independently or influenced by external input.

      (2) Given the inherent complexity of human decision-making, it is crucial to acknowledge that, although the authors compared eight candidate models, other plausible alternatives may exist. As such, caution is warranted when interpreting the computational modeling results.

      (3) I do not agree with the authors' claim that "computational modeling results indicated that individuals integrate harm and responsibility in the form of a quotient" (i.e., harm/responsibility). Rather, the findings appear to suggest that individuals may form a composite representation of the harm attributable to each individual (i.e., harm/the number of people involved). The explanation of the modeling results ought to be precise.

      (4) Many studies have reported positive associations between trait gratitude, social value orientation, and altruistic behavior. It would be helpful if the authors could provide an explanation about why this study failed to replicate these associations.

      (5) As the authors noted, guilt and shame are closely linked to various psychiatric disorders. It would be valuable to discuss whether this study has any implications for understanding or even informing the treatment of these disorders.

    1. eLife Assessment

      This is a useful analysis of STORM data that characterizes the clustering of active zones in retinogeniculate terminals across ages and in the absence of retinal waves. The design makes it possible to relate fixed time point structural data to a known outcome of activity-dependent remodeling. However, the evidence is incomplete, weakening the claims the authors make regarding how activity influences the clustering of these synapses. This basic criticism has not improved with revisions.

    2. Reviewer #1 (Public review):

      Summary

      The authors previously published a study of RGC boutons in the dLGN in developing wild-type mice and developing mutant mice with disrupted spontaneous activity. In the current manuscript, they have broken down their analysis of RGC boutons according to the number of Homer/Bassoon puncta associated with each vGlut3 cluster.

      The authors find that, in the first post-natal week, RGC boutons with multiple active zones (mAZs) are about a third as common as boutons with a single active zone (sAZ). The size of the vGluT2 cluster associated with each bouton was proportional to the number of active zones present in each bouton. Within the author's ability to estimate these values (n=3 per group, 95% of results expected to be within ~2.5 standard deviations), these results are consistent across groups: 1) dominant eye vs. non-dominant eye, 2) wild-type mice vs. mice with activity blocked, and at 3) ages P2, P4, and P8. The authors also found that mAZs and sAZs also have roughly the same number (about 1.5) of sAZs clustered around them (within 1.5 um).

      However, the authors do not interpret this consistency between groups as evidence that active zone clustering is not a specific marker or driver of activity dependent synaptic segregation. Rather, the authors perform a large number of tests for statistical significance and cite the presence or absence of statistical significance as evidence that "Eye-specific active zone clustering underlies synaptic competition in the developing visual system (title)". I don't believe this conclusion is supported by the evidence.

      Strengths

      The source dataset is high resolution data showing the colocalization of multiple synaptic proteins across development. Added to this data is labeling that distinguishes axons from the right eye from axons from the left eye. The first order analysis of this data showing changes in synapse density and in the occurrence of multi-active zone synapses is useful information about the development of an important model for activity dependent synaptic remodeling.

      Weaknesses

      In my previous review I argued that it was not possible to determine, from their analysis, whether the differences they were reporting between groups was important to the biology of the system. The authors have made some changes to their statistics (paired t-tests) and use some less derived measures of clustering. However, they still fail to present a meaningfully quantitative argument that the observed group differences are important. The authors base most of their claims on small differences between groups. There are two big problems with this practice. First, the differences between groups appear too small to be biologically important. Second, the differences between groups that are used as evidence for how the biology works are generally smaller than the precision of the author's sampling. That is, the differences are as likely to be false positives as true positives.

      (1) Effect size. The title claims: "Eye-specific active zone clustering underlies synaptic competition in the developing visual system". Such a claim might be supported if the authors found that mAZs are only found in dominant-eye RGCs and that eye-specific segregation doesn't begin until some threshold of mAZ frequency is reached. Instead, the behavior of mAZs is roughly the same across all conditions. For example, the clear trend in Figure 4C and D is that measures of clustering between mAZ and sAZ are as similar as could reasonably be expected by the experimental design. However, some of the comparisons of very similar values produced p-values < 0.05. The authors use this fact to argue that the negligible differences between mAZ and sAZs explain the development of the dramatic differences in the distribution of ipsilateral and contralateral RGCs.

      (2) Sample size. Performing a large number of significance tests and comparing p-values is not hypothesis testing and is not descriptive science. At best, with large sample sizes and controls for multiple tests, this approach could be considered exploratory. With n=3 for each group, many comparisons of many derived measures, among many groups, and no control for multiple testing, this approach constitutes a random result generator.

      The authors argue that n=3 is a large sample size for the type of high resolution / large volume data being used. It is true that many electron microscopy studies with n=1 are used to reveal the patterns of organization that are possible within an individual. However, such studies cannot control individual variation and are, therefore, not appropriate for identifying subtle differences between groups.<br /> In response to previous critiques along these lines, the authors argue they have dealt with this issue by limiting their analysis to within-individual paired comparisons. There are several problems with their thinking in this approach. The main problem is that they did not change the logic of their arguments, only which direction they pointed the t-tests. Instead of claiming that two groups are different because p < 0.05, they say that two groups are different because one produced p < 0.05 and the other produced p > 0.05. These arguments are not statistically valid or biologically meaningful.

      To the best of my understanding, the results are consistent with the following model:

      • RGCs form mAZs at large boutons (known)

      • About a quarter of week-one RGC boutons are mAZs (new observation)

      • Vesicle clustering is proportional to active zone number (~new observation)

      • RGC synapse density increases during the first post-week (known)

      • Blocking activity reduces synapse density (known)

      • Contralateral eye RGCs for more and larger synapses in the lateral dLGN (known)

      • With n=3 and effect sizes smaller than 1 standard deviation, a statistically significant result is about as likely to be a false positive as a true positive.

      • A true-positive statistically significant result does is not evidence of a meaningful deviation from a biological model.

      Providing plots that show the number of active zones present in boutons across these various conditions is useful. However, I could find no compelling deviation from the above default predictions that would influence how I see the role of mAZs in activity dependent eye-specific segregation.

      Below are critiques of most of the claims of the manuscript.

      Claim (abstract): individual retinogeniculate boutons begin forming multiple nearby presynaptic active zones during the first postnatal week.

      Confirmed by data.

      Claim (abstract): the dominant-eye forms more numerous mAZ contacts,

      Misleading: The dominant-eye (by definition) forms more contacts than the non-dominant eye. That includes mAZ.

      Claim (abstract): At the height of competition, the non-dominant-eye projection adds many single active zone (sAZ) synapses

      Weak: While the individual observation is strong, it is a surprising deviation based on a single n=3 experiment in a study that performed twelve such experiments (six ages, mutant/wildtype, sAZ/mAZ)

      Claim (abstract): Together, these findings reveal eye-specific differences in release site addition during synaptic competition in circuits essential for visual perception and behavior.

      False: This claim is unambiguously false. The above findings, even if true, do not argue for any functional significance to active zone clustering.

      Claim (line 84): "At the peak of synaptic competition midway through the first postnatal week, the non-dominant-eye formed numerous sAZ inputs, equalizing the global synapse density between the two eyes"

      Weak: At one of twelve measures (age, bouton type, genotype) performed with 3 mice each, one density measure was about twice as high as expected.

      Claim (line 172): "In WT mice, both mAZ (Fig. 3A, left) and sAZ (Fig. 3B, left) inputs showed significant eye-specific volume differences at each age."

      Questionable: There appears to be a trend, but the size and consistency is unclear.

      Claim (line 175): "the median VGluT2 cluster volume in dominant-eye mAZ inputs was 3.72 fold larger than that of non-dominant-eye inputs (Fig. 3A, left)."

      Cherry picking. Twelve differences were measured with an n of 3, 3 each time. The biggest difference of the group was cited. No analysis is provided for the range of uncertainty about this measure (2.5 standard deviations) as an individual sample or as one of twelve comparisons.

      Claim (line 174): "In the middle of eye-specific competition at P4 in WT mice, the median VGluT2 cluster volume in dominant-eye mAZ inputs was 3.72 fold larger than that of non-dominant-eye inputs (Fig. 3A, left). In contrast, β2KO mice showed a smaller 1.1 fold difference at the same age (Fig. 3A, right panel). For sAZ synapses at P4, the magnitudes of eye-specific differences in VGluT2 volume were smaller: 1.35-fold in WT (Fig. 3B, left) and 0.41-fold in β2KO mice (Fig. 3B, right). Thus, both mAZ and sAZ input size favors the dominant eye, with larger eye-specific differences seen in WT mice (see Table S3)."

      No way to judge the reliability of the analysis and trivial conclusion: To analyze effect size the authors choose the median value of three measures (whatever the middle value is). They then make four comparisons at the time point where they observed the biggest difference in favor of their hypothesis. There is no way to determine how much we should trust these numbers besides spending time with the mislabeled scatter plots. The authors then claim that this analysis provides evidence that there is a difference in vGluT2 cluster volume between dominant and non-dominant RGCs and that that difference is activity dependent. The conclusion that dominant axons have bigger boutons and that mutants that lack the property that would drive segregation would show less of a difference is very consistent with the literature. Moreover, there is no context provided about what 1.35 or 1.1 fold difference means for the biology of the system.

      Claim (189): "This shows that vesicle docking at release sites favors the dominant-eye as we previously reported but is similar for like eye type inputs regardless of AZ number."

      Contradicts core claim of manuscript: Consistent with previous literature, there is an activity dependent relative increase in vGlut2 clustering of dominant eye RGCs. The new information is that that activity dependence is more or less the same in sAZ and mAZ. The only plausible alternative is that vGlut2 scaling only increases in mAZ which would be consistent with the claims of their paper. That is not what they found. To the extent that the analysis presented in this manuscript tests a hypothesis, this is it. The claim of the title has been refuted by figure 3.

      Claim (line 235): "For the non-dominant eye projection, however, clustered mAZ inputs outnumbered clustered sAZ inputs at P4 (Fig. 4C, bottom left panel), the age when this eye adds sAZ synapses (Fig. 2C)."

      Misleading: The overwhelming trend across 24 comparisons is that the sAZ clustering looks like mAZ clustering. That is the objective and unambiguous result. Among these 24 underpowered tests (n=3), there were a few p-values < 0.05. The authors base their interpretation of cell behavior on these crossings.

      Claim (line 328): "The failure to add synapses reduced synaptic clustering and more inputs formed in isolation in the mutants compared to controls."

      Trivially true: Density was lower in mutant.

      Claim (line 332): "While our findings support a role for spontaneous retinal activity in presynaptic release site addition and clustering..."

      Not meaningfully supported by evidence: I could not find meaningful differences between WT and mutant beside the already known dramatic difference in synapse density.

    3. Reviewer #2 (Public review):

      Summary:

      In this manuscript, Zhang and Speer examine changes in the spatial organization of synaptic proteins during eye specific segregation, a developmental period when axons from the two eyes initially mingle and gradually segregate into eye-specific regions of the dorsal lateral geniculate. The authors use STORM microscopy and immunostain presynaptic (VGluT2, Bassoon) and postsynaptic (Homer) proteins to identify synaptic release sites. Activity-dependent changes of this spatial organization are identified by comparing the β2KO mice to WT mice. They describe two types of synapses based on Bassoon clustering: the multiple active zone (mAZ) synapse and single active zone (sAZ) synapse. In this revision, the authors have added EM data to support the idea that mAZ synapses represent boutons with multiple release sites. They have also reanalyzed their data set with different statistical approaches.

      Strengths:

      The data presented is of good quality and provides an unprecedented view at high resolution of the presynaptic components of the retinogeniculate synapse during active developmental remodeling. This approach offers an advance to the previous mouse EM studies of this synapse because of the CTB label allows identification of the eye from which the presynaptic terminal arises.

      Weaknesses:

      While the interpretation of this data set is much more grounded in this second revised submission, some of the authors' conclusions/statements still lack convincing supporting evidence. In particular, the data does not support the title: "Eye-specific active zone clustering underlies synaptic competition in the developing visual system". The data show that there are fewer synapses made for both contra- and ipsi- inputs in the β2KO mice-- this fact alone can account for the differences in clustering. There is no evidence linking clustering to synaptic competition. Moreover, the findings of differences in AZ# or distance between AZs that the authors report are quite small and it is not clear whether they are functionally meaningful.

    4. Reviewer #3 (Public review):

      This study is a follow-up to a recent study of synaptic development based on a powerful data set that combines anterograde labeling, immunofluorescence labeling of synaptic proteins, and STORM imaging (Cell Reports, 2023). Specifically, they use anti-Vglut2 label to determine the size of the presynaptic structure (which they describe as the vesicle pool size), anti-Bassoon to label active zones with the resolution to count them, and anti-Homer to identify postsynaptic densities. Their previous study compared the detailed synaptic structure across the development of synapses made with contra-projecting vs. ipsi-projecting RGCs and compared this developmental profile with a mouse model with reduced retinal waves. In this study, they produce a new detailed analysis on the same data set in which they classify synapses into "multi-active zone" vs. "single-active zone" synapses and assess the number and spacing of these synapses. The authors use measurements to make conclusions about the role of retinal waves in the generation of same-eye synaptic clusters. The authors interpret these results as providing insight into how neural activity drives synapse maturation, the strength of their conclusions is not directly tested by their analysis.

      Strengths:

      This is a fantastic data set for describing the structural details of synapse development in a part of the brain undergoing activity-dependent synaptic rearrangements. The fact that they can differentiate the eye of origin is what makes this data set unique over previous structural work. The addition of example images from the EM dataset provides confidence in their categorization scheme.

      Weaknesses:

      Though the descriptions of single vs multi-active zone synapses are important and represent a significant advance, the authors continue to make unsupported conclusions regarding the biological processes driving these changes. Although this revision includes additional information about the populations tested and the tests conducted, the authors do not address the issue raised by previous reviews. Specifically, they provide no assessment of what effect size represents a biologically meaningful result. For example, a more appropriate title is "The distribution of eye-specific single vs multi-active zone is altered in mice with reduced spontaneous activity" rather than concluding that this difference in clustering is somehow related to synaptic competition. Of course, the authors are free to speculate, but many of the conclusions of the paper are not supported by their results.

    1. eLife Assessment

      This manuscript uses modeling approaches to provide mechanistic insight into the structural and dynamic properties of enhancer-promoter interactions in Drosophila. Given the interest in this field, this is a timely approach, and the results give useful insights by providing predictions about the processivity of cohesin loop extrusion in Drosophila and concluding that the compartmental interaction strength is poised near criticality in the coil-globule phase space. The evidence provided to support some of the conclusions is, however, incomplete and would be strengthened by better considering some of the caveats in the data used to constrain the models, such as the use of "homie" genetic elements in the dynamic data. There is insufficient evidence provided for the dynamics being criticality-driven, and in addition, consideration of alternative models would further strengthen the conclusions of the manuscript.

    2. Reviewer #1 (Public review):

      Summary:

      This computational study investigates the physical mechanisms underlying enhancer-promoter (E-P) interactions across genomic distances in Drosophila chromosomes, motivated by a previously published study that revealed unexpectedly frequent long-range contacts challenging classical polymer models. The authors performed coarse-grained polymer simulations testing three chromatin organization models: ideal polymers, loop extrusion, and compartmental segregation, comparing their predictions to experimental Hi-C contact maps, mean E-P distances, and two-locus mean-squared displacement dynamics. They found that compartmental segregation best captured both the structural and dynamic features observed experimentally, while neither ideal chains nor loop extrusion alone could reproduce all experimental observables. The combination of compartmental segregation with loop extrusion further improved agreement with experimental data, suggesting these mechanisms might be involved in Drosophila chromatin organization.

      Strengths:

      The paper has two primary strengths:

      (1) The simulations are based on biologically interpretable mechanisms (compartmentalization and loop extrusion), which may facilitate making specific experimentally testable predictions.

      (2) The work uses a systematic approach to increase model complexity by directly fitting to data, first establishing that simple models fail to capture the data until arriving at a more complex model that does capture the data.

      Weaknesses:

      I have two major concerns (detailed below) and multiple minor concerns.

      Major concerns:

      (1) While the upside of the mechanistic simulations is that they are interpretable, the downside is that specific choices for the considered mechanism were made, and conclusions drawn from it are necessarily biased by the initial choices. In this paper, only two mechanisms were considered: loop extrusion and compartmentalization. Yet, it is not clear why these are the most likely underlying mechanisms that might determine the chromosome dynamics. Indeed, previous work (not cited in this paper) showed that Drosophila chromosome structure is not determined by loop extrusion: https://elifesciences.org/articles/94070.

      This should be acknowledged, and the main reasons for choosing these particular mechanisms should be laid out. The conclusions of the paper must then necessarily always be seen under the caveat that only these two mechanisms were considered.

      (2) Even within the framework of the approach, insufficient evidence is given to support the title of the paper "Criticality-driven enhancer-promoter dynamics in Drosophila chromosomes" for two reasons:

      (a) The fact that the best-fit parameters are near a coil-globule transition does not mean that the resulting dynamics are criticality-driven. To claim criticality, one would usually expect much more direct evidence, such as diverging correlation lengths. Furthermore, it would need to be shown that the key features of the dynamics (which should be defined, presumably the static and dynamic exponents) indeed depend on the parameters being at this transition. i.e., when tuning the simulations away from this parameter point, does the behaviour disappear? Only in this case can it be claimed that the behaviour is driven by this phenomenon.

      (b) The results section actually contains no mention of the coil-globule transition, and it is not clear in what way the parameters are close to this transition.

      Thus, three things are necessary:

      (i) How the parameters are close to the transition needs to be explained in detail.

      (ii) The divergence of observed dynamics whenever the parameters are tuned away from the transition needs to be demonstrated.

      (iii) Even if 1 and 2 are fulfilled, a more careful title should be chosen, such as "Polymer simulations near the coil-globule transition are consistent with enhancer-promoter dynamics in Drosophila chromosomes."

      Many of the results in the figures and results section are rather repetitive and could be compressed. The main result of Figure 1 - that the data are not described by an ideal chain - was already fully shown and established in the original paper from which the data are taken. Figure 2 is a negative result with near-identical panels to Figure 3. Figure 4B is hard to interpret.

      The paper makes no concrete suggestions for new experiments to test the hypotheses formulated. Since the paper can only claim that the simulations are consistent with the data, it would significantly strengthen the paper if testable predictions could be made.

    3. Reviewer #2 (Public review):

      Summary:

      In this work, Ganesh and colleagues use experimental data from Hi-C and from live-cell imaging to evaluate different polymer models of 3D genome organization in Drosophila based on both structural and dynamic properties. The authors consider several leading hypotheses, which are examined sequentially in increasing level of complexity - from the minimal Rouse polymer, to a model combining sequence-specific compartmentalization and loop-extrusion without extrusion blockers. They conclude that the combination of both compartmentalization and loop-extrusion gives the best agreement with the data. Their analysis also leads to concrete predictions about the processivity of cohesin loop extrusion in Drosophila, and a conclusion that the compartmental interaction strength is poised near criticality in the coil-globule phase space.

      Strengths:

      There is considerable interest in the field in understanding the mechanisms responsible for the 3D spatial organization genome and the dynamic movement of the genome, which has major implications for our understanding of long-range transcriptional regulation and other genome behaviors. The live-cell experimental work on which this study draws highlights the limitations of existing models to explain even the dynamic behaviors observed in the data, further exciting interest in further exploration. Therefore, this paper seeks to address an important gap in the field. The work is written in a well-organized, well-illustrated fashion. The text and figures are nicely integrated, easy to read, and explain challenging concepts with elegance and brevity in a manner that will be accessible to a broad audience.

      Weaknesses:

      The validity and utility of these conclusions are, in my view, substantially undermined by what appears to be unappreciated peculiarities of the live-cell data set that was used to constrain the model. The live-cell data comes from embryos were edited in a way that intentionally substantively changed both the 3D genome structure and dynamics specifically at the loci which are imaged, a case which is not at all explained by any of the models suggested nor acknowledged in the current work, nor compatible with the Hi-C data that simultaneously used to explain these models. As these ignored synthetic alterations have been previously shown to be determinative of transcriptional activity, the relevance of the author's work to transcriptional control (a prime motivation in the introduction) is unclear.

      The agreement in 3D organization, as represented in chromosome-scale contact frequency heatmaps, is substantially less impressive than the agreement seen in prior work with similar models. This discrepancy appears to be due in part to the unappreciated effects of the mentioned in the previous limitation, as well as inappropriate choices in metrics used to evaluate agreement. It is also not particularly surprising that combining more models, with more free parameters, results in an improvement in the quality of fit.

      Some major results, including both theoretical works and experimental ones, are ignored, despite their relevance to the stated objective of the work. The current manuscript and analysis could be improved substantially by a consideration of these works.

      I describe these issues in more detail below.

      Major issues:

      (1) The genetic element "homie" is present in a subset of the data: The experimental data used in this analysis come from different fly lines, half of which have been edited explicitly to alter genome structure and consequent transcriptional behavior, yet the authors are trying to fit with a common model - a problem which substantially undermines the utility of the analysis.

      Specifically, the authors evaluate the various models/simulations by comparing them to Hi-C from wildtype Drosophila embryos on the chromosome scale and 3D distances and dynamics from live cell imaging in genetically edited embryos, to a series of models in turn. The exercise fatally overlooks a critical fact, (admittedly not easily noticed in the work from Bruckner et al), that the fly embryos used for nearly all their analyses contain not only fluorescent labels, but also contain two copies of a powerful genetic sequence, "homie", known for its ability to dramatically change the 3D organization and dynamics of the genome. Whether or not the fluorescent labels themselves used in the study further alter structure and dynamics is not entirely clear (and will require further work beyond the scope of either study), but at least these fluorescent labels aren't known to dramatically affect 3D structure and dynamics the way homie is. The critical problem is that adding or removing the "homie", as shown in a collection of prior works I describe below in more detail, dramatically affects structure, dynamics, and gene expression. Whether or not the genome contains two distal cis-linked copies of homie fundamentally changes genome structure and dynamics, so to use one dataset which has this edit (the live-cell data) and one dataset which lacks it (the Hi-C data) is, in some sense, to guarantee failure of any model to match all the data.

      If the authors had chosen instead to focus exclusively on the 'no homie' genetic lines in the Brukner data, they would have a much smaller dataset (just 2 distances), which would not cover all the length scales of interest, but it would at least be a dataset not known to be contradictory to the Hi-C. The two 'no homie' lines make much more plausible candidates for the sort of generalizable polymer dynamics these authors seek to explain, as will hopefully be made more clear by a brief review of what is known about homie. I next describe the published data that support these conclusions about how homie affects 3D genome spatial organization and dynamics:

      What is "homie" and how does it affect 3D genome distances, dynamics, and gene expression?

      The genetic element "homie" was named by James Jaynes' lab ( Fujioka...Jaynes 2009) in reference to its remarkable "homing" ability - a fascinating and still poorly understood biological observation that some genetic sequences from Drosophila, when cloned on plasmids and reintegrated into the genome with p-elements, had a remarkable propensity to re-integrate near their endogenous sequence, (Hama et al., 1990; Kassis, 2002; Taillebourg and Dura, 1999; Bender and Hudson, 2000; Fujioka...Jaynes 2009). By contrast, most genetic elements tend to incorporate at random across the genome in such assays (with some bias for active chromatin).

      The Jaynes lab subsequently showed that flies carrying two copies of homie, one integrated in cis, ~140 kb distal from the endogenous element, formed preferential cis contacts with one another. Indeed, if a promoter and reporter gene were included at this distal integration site, the reporter gene would activate gene expression in the pattern normally seen by the gene, even-skipped. The endogenous copy of homie marks one border of ~16 kb mini-TAD which contains the even-skipped gene, (eve), and its developmental enhancers, so this functional interaction provides further evidence of physical proximity (as was also shown by 3C by Jaynes (Fujioka..., Schedl, Jaynes 2016), and later with elegant live imaging, by Jaynes and Gregor (Chen 2018)).

      Critically, if either copy of homie is deleted or substantially mutated, the 3D proximity is lost (Fujioka 2016, Chen 2018, Bruckner 2023), and the expression of the transgene is dramatically reduced (at 58 kb) or lost. Given the author's motivation of understanding "E-P" interactions, the fact that the increased 3D proximity provided by homie is as essential for transcription as the promoter itself at the ~150 kb distance, underscores that these are not negligible changes.

      These effects can be seen by plotting the data from Bruckner 2023, which includes data from labels with separations of 58 kb and ~150 kb "no homie" as well as homie. Unfortunately, the authors don't plot this data in the manuscript in the comparison of 3D distances, though the two-point MSD can be seen in Figure S13C, and laudably, the data is made public in a well-annotated repository on Zenodo, noted in the study. Note that the distance data in Figure S13 were filtered to exclude the transcriptionally off state, and are thus not the quantity the current authors are interested in. If they plot the published data for no homie, they will see the clear effect on the average 3D distance, R(s), and a somewhat stronger effect on the contact frequency P(s), which causes significant deviation from the trend-line followed by the homie-containing data.

      (2) The agreement between the "best performing" simulations for all models and the Hi-C data is not on par with prior studies using similar approaches, apparently due to some erroneous choices in how the optimization is carried out:

      Hi-C-comparison

      The 'best fit' simulation Hi-C looks strikingly different from the biological data in all comparisons, with clearly lower agreement than other authors have shown using highly similar methods (e.g., Shi and Thirumalai 2023; Di Pierro et al. 2017; Nuebler et al. 2018; Esposito et al. 2022; Conte et al. 2022), among many others. I believe this results from a few issues with how the current authors select and evaluate the data in their work:

      (a) Most works have used Pearson's correlation rather than Spearman's correlation when comparing simulation and Hi-C contact frequencies. Pearson's correlation is more appropriate when we expect the values to be linearly related, which they should be in this case, as they are constructed indeed to be measuring the same thing (contact frequency), just derived from two different methods. Spearman's correlation would have been justifiable for comparing how transcription output correlates with contact frequency. This may fix the bafflingly low correlations reported at lower adhesion values in Figure S2C.

      (b) Choice of adhesion strengths - The Hi-C map comparison in Figure 3 strongly suggests that a much more striking visual agreement would have been achieved if much weaker (but still non-zero) homotypic monomer affinity had been selected. In the authors' simulation, the monomer state (A/B identity) strongly dominates polymer position, resulting in the visual appearance of an almost black-and-white checkerboard. The data, meanwhile, look like a weak checkerboard superimposed on the polymer.

      (c) A further confounding problem is the aforementioned issue that the Hi-C data don't come from the edited cell lines, and that the interaction of the two Homie sites is vastly stronger than the compartment interactions of this region of the genome.

      (3) Some important concepts from the field are ignored:

      The crumpled/fractal globule model is widely discussed in the literature (including the work containing the data used in this study) - its exclusion from this analysis thus appears as a substantial gap/oversight:

      A natural alternative to the much-discussed Rouse polymer model is the "crumpled polymer" (Grosberg et al. 1988; Grosberg 2016; Halverson et al. 2011; Halverson et al. 2011), also known as the "fractal globule" (Lieberman-Aiden et al. 2009; Mirny 2011; Dekker and Mirny 2016; Boettiger et al. 2016), much discussed for the way it captures the ⅓ scaling of R(s), found for much of the genome (or, equivalently, the -1 exponent of the probability of contact as a function of genome separation, P(s)). Given the 1/3rd scaling in the data, and the fact that the original authors highlighted the crumpled model in addition to the Rouse model, it seems that this comparison would be instructive and the lack of discussion an oversight. Moreover, while prior works (e.g., Buckner, Gregor, 2023) used some traditional simplifying assumptions to estimate the MSD and relaxation time scaling of this model, I believe a more rigorous analysis with explicit simulations (as in Figure 1 for the Rouse model) would be instructive for the crumpled polymer simulations. Note the crumpled globule is not necessarily the same as the globule in the coil-globule transition discussed here - it requires some assumptions about non-entanglement to stay trapped in the meta-stable state which has the 1/3rd R(s) scaling that is indicative of this model, and not the 1/2 exhibited by equilibrium globules (for s<< length of the polymer) and dilute polymers alike.

      While the fit in Figure 2 appears to get closer to the 1/3rd exponent (B= 0.32), this appears to be a largely coincidental allusion of agreement - the simulation data in truth shows a systematic deviation, returning to the 1/2 scaling for distances from 500 kb to whole chromosomes. This feature is not very evident as the authors restrict the analysis to only the few points available in the experimental data, though had they tested intervening distances I expect they would show log-log P(s) is nonlinear (non-powerlaw) for distances less than the typical loop length up to a few fold larger than the loop length, and thereafter returns to the scaling provided by the 'base' polymer behavior. This appears to be Rouse-like in these authors' model, with R(s) going like 1/2, even though the data are closer to 1/3rd, as indeed most published simulated P(s) curves based on loop extrusion - e.g., (Fudenberg et al. 2016; Nuebler et al. 2018). In this vein, it would be instructive to the readers if the authors would include additional predictions from the simulation on the plot that lie at genomic separation distances not tested in the data, to better appreciate the predictions.

      Minor issues

      (1) I think it is too misleading to only describe the experimental data from Brukner as "E-P" interactions from Drosophila. It is important to note somewhere that this is not an endogenous interaction with a functional role in Drosophila - it is a synthetic interaction between enhancers in the vicinity of the eve gene and a synthetic promoter placed at a variable distance away. The uniformity is elegant - (it is the same pair of elements being studied at all distances), but also provides limited scope for generalization as suggested by the current text. Moreover, the enhancers were not directly labeled; rather, the 3D position of nascent RNA transcribed from eve was tracked with an RNA-binding protein and used as a proxy for the 3D position of the enhancers. There is not an individual enhancer at the eve locus that interacts with the transgene, but rather a collection of enhancers is distributed at different positions throughout the entire TAD, which contains eve, and must form separate loops to reach eve. Indeed, it was previously reported that differences in the local position of these enhancers, relative to eve, affect their ability to interact with the distal reporter gene and the endogenous eve gene (Chen 2018). There is also reported competition between these enhancers and the distal gene, which further complicates the analysis (especially since the state of eve and of its enhancers varies among the different cells as a function of stripe position) - see Chen 2018. All of this is ignored in the current work, despite the assertion of the application to understanding E-P interaction. A detailed discussion of these issues is not necessary, but I fear that ignoring them entirely is to invite further confusion and error.

      (2) I believe this sentence is overstated, given available data: " TAD borders are characterized by transitions between epigenetic states rather than by preferentially-bound CTCF [4, 23, 24]." Indeed, this claim has been repeatedly made in the literature as cited here. However, other data clearly demonstrate a strong enrichment of CTCF at TAD borders (and at epigenetic borders, which in Drosophila have a high correspondence with TAD borders, as the authors have already appropriately noted). See, for example, Figure 4 of Sexton Cell 2012, and compare to Figure 2 of Dixon 2012. Of minor note, CTCF peaks co-occupied by the Zinc Finger TF CP190 are more likely to be TAD borders than CTCF alone. How big a species-specific difference this is remains unclear, as it appears some mammalian CTCF-marked TAD boundaries may be co-occupied by additional ZNFs. While plenty of Drosophila TAD boundaries indeed lack CTCF, many are marked by CTCF, this is enriched relative to what would be expected by chance (or relative to the alignment of other TFs, like Twist or Eve with TAD boundaries), and it has been shown that CTCF loss is sufficient to remove a subset of these, see for example Figure 5 of (Kaushal et al. 2021) (though it is possible, most will require mutation of the all the border-associated factors that collectively bind many of the borders, dCTCF, CP190, mod(mdg4) and others).

      (3) This assertion is overstated given available data: "Although TAD boundaries in Drosophila are often associated with insulator proteins [20], there is no direct evidence that these elements block LEFs in vivo. Therefore, we did not impose boundary constraints in our simulations; LEFs were allowed to move freely unless stalled by collisions with other LEFs, with the possibility of crossover.". Deletion of insulator in Drosophila that lie within a common epigenetic state leads to fusion of TADs (e.g., Mateo et al., 2019 - deletion of the CTCF-marked Fub insulator, in posterior tissues where both flanks of Fub are active; Kaushal, 2021, has examples as well). Loss of CTCF causes a small number of TADs to fuse as measured by Hi-C. This is far from 'direct evidence that insulators block LEFs' - as the authors have already noted, even the idea that cohesin extrudes loops in Drosophila in the first place is indeed controversial. However, LEF activity and stalling at insulators would provide a very natural explanation of why chromatin in a shared epigenetic state should form distinct TADs, and why these TADs should fuse upon insulator deletion. Justifying the lack of stalling sites based on empirical data is thus not very convincing to this reviewer. I believe it would be more apt to simply describe this as a simplifying assumption, rather than the above phrase, which may be misleading.

    4. Author response:

      We thank the editors and the reviewers for their constructive comments, which have greatly helped us identify key areas to strengthen the manuscript. We acknowledge the validity of the major points raised, and we plan the following revisions:

      Criticality

      As suggested by Reviewer #1, we will carefully examine whether the dynamics we observe are indeed poised near criticality. We will perform additional analyses to assess how structural and dynamic features change when parameters are tuned away from the coil–globule transition, and we will revise the title and text to ensure that our claims are appropriately moderated.

      Role of the homie element

      We agree with Reviewer #2 that the presence of homie elements introduces major modifications to chromosome structure and dynamics. We initially considered that this factor might even explain the paradox described in Gregor’s work. In the first phase of our study, we carried out simulations including homie elements and found that the potential confounding effects are largely resolved if we restrict the analysis to trajectories prior to encounters between the two homie copies. We will include these simulations and expand the discussion accordingly in the revised version.

      Comparison to Hi-C data

      Both reviewers noted a visual discrepancy between experimental and simulated Hi-C maps. We will address this by testing alternative similarity measures (e.g., Pearson correlation, as suggested) and by exploring parameter ranges that may improve the agreement.<br /> Together, these modifications will strengthen the manuscript, clarify the scope of our conclusions, and directly address the reviewers’ central concerns.

    1. eLife Assessment

      This paper explores the role of extracellular vesicles in providing extracellular matrix signals for migration of vascular smooth muscle cells. The evidence, based on cell culture experiments and supporting imaging of human samples, is mostly convincing. The paper will be valuable for researchers investigating cell migration during vessel repair and atherogenesis.

    2. Reviewer #1 (Public review):

      In this revised submission from Kapustin et al., the authors have made significant changes to the manuscript. Namely, the authors have addressed several of the major issues with the original submission, providing a more concrete link between fibronectin and the secretion of extracellular vesicles. Additionally, the authors have moderated some of the conclusions to better suit the rigor of the experimental results and limitations of their approach. Generally, the findings convey an interesting cell autonomous pathway in which smooth muscle cells sense fibronectin, which canonically is a proinflammatory substrate with activating properties in many tissues. Fibronectin-mediated integrin signaling stimulates secretion of small extracellular vesicles containing collagen VI which is deposited into the surrounding extracellular matrix. Collagen VI itself gleaned from extracellular vesicle secretion seems to further alter smooth muscle cell morphodynamics. For this later finding, much of the mechanism behind collagen VI vesicle loading and secretion has yet to be worked out. The authors provide evidence of extracellular vesicles containing collagen VI trapped in fibronectin in atherosclerotic plaques providing a nice validation of their in vitro findings in a diseased human cohort. Some limitations do still exist in the manuscript in its current form such as the assessment of the vesicle origins, contents and their association with the actin cytoskeleton; however, the rigor and execution are much improved from the preceding version. Overall, the pathobiology underlying vascular smooth muscle remodeling in disease states is a critical area of research that warrants further exploration.

    3. Reviewer #2 (Public review):

      The findings in the current manuscript are interesting and valuable contributions to the fields of vascular biology and extracellular vesicle-related mechanisms. They suggest a potential role for smooth muscle cell-derived extracellular vesicles in presenting Type VI collagen to cells to orchestrate their migration, with proposed relevance to aberrant smooth muscle cell movements in the progression of atherosclerotic lesions. A wide range of assays are utilized to test various aspects of this working model, with the resulting data being largely solid and supporting several of the interpretations articulated by the authors. The revised manuscript has adequately addressed key weaknesses.

      The authors present data suggesting a working model in which vascular smooth muscle cells (vSMCs) are stimulated by fibronectin (FN) to generate small extracellular vesicles (sEVs) that harbor Type VI Collagen (collagen VI). These collagen VI-associated sEVs are suggested to accumulate in the extracellular matrix (ECM) and influence cell migration and adhesion dynamics, potentially contributing to disease progression in atherosclerosis. Majors strengths of this manuscript include robust imaging data and the inclusion of human-derived samples in their analysis. The authors also make a reasonable attempt to provide data to support the potential existence of these mechanistic connections, though some minor questions remain regarding data interpretation. The authors largely achieved their aims of finding evidence consistent with their interpretations, and they have presented logical support for their conclusions while acknowledging important limitations and caveats to their current study. This work will likely have a sustained impact on the field of sEV biology and potential intersections with vascular biology, including their methodology e.g., imaging approaches. As biologists continue to explore the role of sEVs in physiological and pathological processes, this work raises an interesting aspect that must be considered more broadly, and that is, what is the role of sEVs that are ECM-associated and not necessarily internalized by recipient cells? Are there discrete mechanisms that govern their role in maintaining and/or disrupting normal physiological processes? This manuscript makes an attempt to address these unresolved yet critical questions.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary

      In this investigation Kapustin et al. demonstrate that vascular smooth muscle cells (VSMCs) exposed to the extracellular matrix fibronectin stimulates the release of small extracellular vesicles (sEVs). The authors provide experimental evidence that stimulation of the actin cytoskeleton boosts sEV secretion and posit that sEVs harbor both fibronectin and collagen IV protein themselves which also, in turn, alter cell migration parameters. It is well established that fibronectin is associated with increased cell migration and adherence; therefore, this association with VSMCs is not novel.

      The reviewer is correct that FN has been associated with migration and adherence in previous studies.  However we have extended these observations to show that the extracellular fibronectin matrix stimulates small extracellular vesicle (sEVs) secretion by modulating the actin cytoskeleton. We also showed that sEVs are trapped in the extracellular matrix and that by presenting collagen VI induce early focal adhesion formation, reduce excessive cellular spreading and guide cell invasion directionality though a 3D matrix. Hence, sEVs mediate cell-matrix cross talk and change cell behaviour in the context of fibronectin matrix. This is critically important for vasculature where regulated VSMC invasion is essential for repair with its deregulation leading to pathology.

      The authors purport that sEV are largely born of filopodia origin; however, this data is not well executed and seems generally at odds with the presented data.

      Our experimental data showed that CD63 MVs are associated with filopodia in fixed and live cells (Fig 2E, 2F and Video S1) and that inhibition of filopodia formation using the formin inhibitor, SMIFH2 reduced sEV secretion on FN (Fig 2B). However, we agree with the reviewer that further studies are required to connect sEV secretion to filopodia.  To address this we have provided further data analysis but also toned down our conclusions regarding this point: . Changes include:

      (1) Title: Matrix-associated extracellular vesicles modulate smooth muscle cell adhesion and directionality by presenting collagen VI.

      (2) Results, section title: 2. FN-induced sEV secretion is modulated by Arp2/3 and formin-dependent actin cytoskeleton remodelling

      (3) Results, page 6 Line 27-44 and conclusion page 7, Ln 3 “Interestingly, CD63+ MVBs can be observed in filopodia-like structures suggesting that sEV secretion can also occur spatially via cellular protrusion-like filopodia but more studies are needed to confirm this hypothesis.”

      (4) Discussion, page 12, line 19. “Curiously we observed CD63+ MVB transport toward the filopodia tips as well as inhibition of sEV-secretion with filopodia formation inhibitors suggesting that sEV secretion can be directly linked to filopodia but further studies are needed to define the contribution of this pathway to the overall sEV secretion by cells.”

      Similarly, the effect of sEVs on parameters of cell migration has almost no magnitude of effect, making mechanism exploration somewhat nebulous.

      VSMC are mesenchymal-type cells with a low migration rate and we agree that the changes in the motility are not of great magnitude even for the positive controls suggesting that this is a complex, multifactorial process for VSMCs. In our experiments we collected data from >5000 individual cells to measure the average speed and found that fibronectin matrix on its own increased VSMC speed from ~0.61 um/min to ~0.68 μm/min (~12% raise) which was statistically significant (Fig 5A). Addition of a sEV inhibitor caused a modest but significant decrease in cellular speed. Interestingly, addition of ECM-associated sEVs did not influence cell speed in 2D or 3D assays. However in a 3D model we observed a 22% change in cell directionality (Fig 5G) and  a 235% change in cell alignment index (FMI, Fig 5H) which we believe is very strong evidence that VSMC-derived sEVs are involved in a regulation of VSMC invasion directionality.  These data are also in agreement with sEV effects in tumour cells (Sung et al., 2015) though this previous study did not identify the factor driving the directionality and we think our Collagen VI data extends significantly these previous observations. 

      Results, page 9: “Hence, ECM-associated sEVs have modest influence on VSMC speed but influence VSMC invasion directionality.”.  

      Lastly, the proposed mechanism of VSMCs responding to, and depositing, ECM proteins via sEVs was not rigorously executed; again, making the conclusions challenging for the reader to interpret.

      We appreciate the reviewer’s comment regarding the mechanistic aspects of VSMCs responding to and depositing ECM proteins via sEVs. In our revised manuscript, we have expanded the data demonstrating that sEVs can be retained within the extracellular matrix (see Figs 3A, 3B, S3A, S3B). Additionally, we show that collagen VI is present on the surface of sEVs, where it may modulate cell adhesion and influence the directionality of cell invasion (Fig 7E). Our results further indicate that both fibronectin (FN) and collagen VI can be recycled through multivesicular bodies (see Figs S3C, S3D, S3E–S3G). However, we acknowledge that the precise mechanisms governing the selective loading of ECM proteins onto sEVs, as well as the specific contributions of sEVs to overall ECM organization, remain to be fully elucidated and warrant further investigation. Based on our current evidence, we propose that collagen VI–loaded sEVs act primarily in a signaling capacity by modulating focal adhesion formation but are not directly involved in ECM structural remodeling.

      Results, page 7: To quantify ECM-trapped sEVs we applied a modified protocol for the sequential extraction of extracellular proteins using salt buffer (0.5M NaCl) to release sEVs which are loosely-attached to ECM via ionic interactions, followed by 4M guanidine HCl buffer (GuHCl) treatment to solubilize strongly-bound sEVs (Fig S3A) [42]. We quantified total sEV and characterised the sEV tetraspanin profile in conditioned media, and the 0.5M NaCl and GuHCl fractions using ExoView. The total particle count showed that EVs are both loosely bound and strongly trapped within the ECM. sEV tetraspanin profiling showed differences between these 3 EV populations.  While there was close similarity between the conditioned media and the 0.5M NaCl fraction with high abundance of CD63+/CD81+ sEVs as well as CD63+/CD81+/CD9+ in both fractions (Fig S3A). In contrast, the GuHCl fraction was particularly enriched with CD63+ and CD63+/CD81+ sEVs with very low abundance of CD9+ EVs (Fig S3A). The abundance of CD63+/CD81+ sEVs was confirmed independently by a CD63+ bead capture assay in the media and loosely bound fractions (Fig S3B).

      Results, page 7: We previously found that the serum protein prothrombin binds to the sEV surface both in the media and MVB lumen showing it is recycled in sEVs and catalyses thrombogenesis being on the sEV surface43. So we investigated whether FN can also be associated with sEV surface where it can be directly involved in sEV-cell cross-talk43.   We treated serum-deprived primary human aortic VSMCs with FN-Alexa568 and found that it was endocytosed and subsequently delivered to early and late endosomes together with fetuin A, another abundant serum protein that is a recycled sEV cargo and elevated in plaques (Figs S3C and S3D). CD63 visualisation with a different fluorophore (Alexa488) confirmed FN colocalization with CD63+ MVBs (Fig S3E). Next, we stained non-serum deprived VSMC cultured in normal growth media (RPMI supplemented with 20% FBS) with an anti-FN antibody and observed colocalization of CD63 and serum-derived FN.  Co-localisation was reducd likely due to competitive bulk protein uptake by non-deprived cells (Fig S3F). Notably, when we compared FN distribution in sparsely growing VSMCs versus confluent cells we found that FN intracellular spots, as well as colocalization with CD63, completely disappeared in the confluent state (Fig S3F and S3G). This correlated with nearly complete loss of CD63+/CD81+ sEV secretion by the confluent cells indicating that confluence abrogates intracellular FN trafficking as well as sEV secretion by VSMCs (Fig S3H). Finally, FN could be co-purified with sEVs from VSMC conditioned media (Fig S3I) and detected on the surface of sEVs by flow cytometry confirming its loading and secretion via sEVs (Fig 3C).

      Results: page 10  Collagen VI was the most abundant protein in VSMC-derived sEVs (Fig 7B, Table S7) and  was previously implicated in the interaction with the proteoglycan NG2[53] and suppression of cell spreading on FN[54]. To confirm the presence of collagen VI in ECM-associated sEVs we analysed sEVs extracted from the 3D matrix using 0.5M NaCl treatment and showed that both collagen VI and FN are present (Fig 7D). Next, we analysed the distribution of collagen VI using dot-blot. Alix staining was bright only upon permeabilization of sEV indicating that it is preferentially a luminal protein (Fig 7E). On the contrary, CD63 staining was similar in both conditions showing that it is surface protein (Fig 7E). Interestingly, collagen VI staining revealed that 40% of the protein is located on the outside surface with 60% in the sEV lumen (Fig 7E). 

      Discussion page 12. “In fact, we observed that an extensive secretion of sEVs effectively ceased protrusion activity; also VSMCs acquired a rounded morphology when “hovering” over the FN matrix decorated with sEVs (data not shown). Hence, it will be interesting in future studies to investigate whether sEVs can stimulate Rho activity by presenting adhesion modulators—particularly collagen VI—on their surface, thereby guiding cell directionality during invasion..”

      Discussion, page 14 “In summary, cooperative activation of integrin signalling and F-actin cytoskeleton pathways results in the secretion of sEVs which associate with the ECM and play a signalling role by controling FA formation and cell-ECM crosstalk. Further studies are needed to test these mechanisms across various cell types and ECM matrices.     

      Strengths

      The authors provide a comprehensive battery of cytoskeletal experiments to test how fibronectin and sEVs impact both sEV release and vascular smooth muscle cell migratory activation.

      We appreciate this comment reflecting our efforts to apply a range of orthogonal methods to show the role of the integrin/actin cytoskeleton in ECM-stimulated sEV secretion.

      Weaknesses

      Unfortunately, this article suffers from many weaknesses. First, the rigor of the experimental approach is low, which calls into question the merit of the conclusions. In this vein, there is a lack of proper controls or inclusion of experiments addressing alternative explanations for the phenotype or lack thereof.

      We acknowledge this comment and agree that there was not sufficient evidence to conclude that sEV secretion occurs via filopodia despite the microscopy/inhibitory data so this claim has now been excluded from the study. However we believe that our experimental data does clearly show that FN stimulates the secretion of collagenVI-loaded sEVs which are trapped by the ECM and have the capacity to modulate VSMC adhesion and invasion directionality. To support this, we have now extended the dataset in the revised version:

      (1) In addition to the use of inhibitors and live cell analysis we have added quantitative data confirming that a large proportion of CD63+ endosomes are associated with F-actin/cortactin tails and this colocalization is increased upon the inhibition of sEV secretion with 3-OMS (Fig  2D, Fig S2B).

      (2) We developed a method to extract ECM-associated sEVs and quantified/characterized these using ExoView Assays further confirming significant sEV entrapment by the ECM (Figs 3B, S3A, S3B).    

      (3) We extended the controls to confirm FN delivery to CD63+ endosomes and showed that FN recycling is stopped upon reaching cell confluence (Figs S3F, S3G and Fig S3H).

      (4) We included more intensive characterisation of human atherosclerotic plaque morphology (H&E, Masson’s trichrome staining, Orcein, elastin fibers staining) to confirm predominant accumulation of sEV in the neointima (Figs S4A, S4B and S4C). We also excluded an endothelial origin for the  CD81+ sEVs (Fig 4G).

      (5) We included individual cellular tracks to the 2D migration analysis to confirm the statistical significance and concluded that ECM-associated sEVs regulate cell invasion directionality but not the cell speed (Figs 5A and 5B).

      (6) We showed surface localisation of collagen VI on sEVs confirming that it can activate signalling pathways leading to early FA formation on the FN matrix  (Figs 7D and 7E).

      (7) We included alternative explanations for some of our data in the discussion.      

      Reviewer #2 (Public Review):

      Extracellular vesicles have recently gained significant attention across a wide variety of fields, and they have therefore been implicated in numerous physiological and pathophysiological processes. When such a discovery and an explosion of interest occur in science, there is often much excitement and hope for answers to mechanisms that have remained elusive and poorly understood. Unfortunately, there is an equal amount of hype and overstatement that may also be put forth in the name of "impact", but this temptation must be avoided so that scientists and the broader public are not misled by overreaching interpretations and statements that lack rigorous and fully convincing evidence.

      Thank you for your comment and we agree that investigating sEVs is particularly challenging due to the their heterogeneity and nano-size, as well as complex biogenesis mechanisms. ECM-associated sEVs is a very new direction for the EV field but one that is particularly relevant to the vasculature where cells must invade through a thick ECM and where the accumulation of ECM-bound EVs is a unique and documented phenomenon.  To further strengthen out conclusions we have included new data to support our statements but also excluded statements re: filopodia as the origin of sEVs, that are out of scope of our study and need to be investigated further.

      The study presented by Kapustin et al. is certainly intriguing and timely, and it offers an interesting working hypothesis for the fields of extracellular vesicles and vascular biology to consider. The authors do a reasonable job at detecting these small extracellular vesicles, though some aspects of data presentation are missing such as full Western blots with accompanying size markers for the viewer to more fully appreciate that data and comparisons being made (see Figures 1 and 7).

      We agree with the reviewer and have now included molecular weight markers (Fig 1F, 7C, 7D, S3I, S4E) and provided all original western blot scans (uncropped and unedited) to the eLife editor. 

      Much of the imaging data from cell-based experiments is strong and conducted with many cutting-edge tools and approaches. That said, the static images and the dynamic imaging fall short of being fully convincing that the small extracellular vesicles found in the neighboring extracellular matrix are indeed being deposited there via the smooth muscle cell filopodia. Many of the lines of evidence presented suggest that this could occur, but alternative hypotheses also exist that were not fully ruled out, such as the ECM-deposited vesicles were secreted more from the soma and/or the lamellipodia that are also emitted and retracted from the cells. In particular, the authors show very nice dynamic imaging (Supplementary Figure S2A and Supplemental Video S1) that is interpreted as "extracellular vesicles being released from the cell" and these are seen as "bursts" of fluorescent signal; however, none of these appear to occur in filopodia as they appear within the cell proper (a "burst" of signal vs. a more intense "streak" of signal), which would be a stronger and more consistent observation predicted by the working model proposed by the authors.

      Our live and fixed cell microscope data as well as inhibitor analysis showed that sEV secretion can be associated with the filopodia. However we agree with the reviewer that the data generated using pHluoron GFP marker clearly indicate that the majority of sEVs are secreted from the cell soma toward the ECM:

      To reflect this, we have added further changes:

      (1) Title: Matrix-associated extracellular vesicles modulate smooth muscle cell adhesion and directionality by presenting collagen VI.

      (2) Results, section title: 2. FN-induced sEV secretion is modulated by Arp2/3 and formin-dependent actin cytoskeleton remodelling

      (3)  Results, page 6 Line 27-36 “Formins and the Arp2/3 complex play a crucial role in the formation of filopodia, a cellular protrusion required for sensing the extracellular environment and cell-ECM interactions36. To test whether MVBs can be delivered to filopodia, we stained VSMCs for Myosin-10 (Myo10)37. We observed no difference between total filopodia number per cell on plastic or FN matrices (n=18±8 and n=14±3, respectively) however the presence of endogenous CD63+ MVBs along the Myo10-positive filopodia were observed in both conditions (Fig 2E, arrows). Filopodia have been implicated in sEV capture and delivery to endocytosis “hot-spots”38, so next we examined the directionality of CD63+ MVB movement in filopodia by overexpressing Myo10-GFP and CD63-RFP in live VSMCs. Importantly, we observed anterograde MVB transport toward the filopodia tip (Fig 2F and Supplementary Video S2) indicative of MVB secretion”.

      (4) Results, page 6, Ln 37-44 “We also attempted to visualise sEV release in filopodia using CD63-pHluorin where fluorescence is only observed upon the fusion of MVBs with the plasma membrane39. Using total internal reflection fluorescence microscopy (TIRF) we observed the typical “burst”-like appearance of sEV secretion at the cell-ECM interface in full agreement with an earlier report showing MVB recruitment to invadopodia-like structures in tumor cells18 (Fig S2B and Supplementary Video S1). Although we also observed an intense CD63-pHluorin staining along filopodia-like structures we were not able to detect typical “burst”-like events to confirm sEV secretion in filopodia. (Fig S2C and Supplemental Video S1)”.

      (5) Results, page 7 Ln 3 “Interestingly, CD63+ MVBs can be observed in filopodia-like structures suggesting that sEV secretion can also occur spatially via cellular protrusion-like filopodia but more studies are needed to confirm this hypothesis.”

      (6) Discussion, page 12, line 19. “Curiously we observed CD63+ MVB transport toward the filopodia tips as well as inhibition of sEV-secretion with filopodia formation inhibitors suggesting that sEV secretion can be directly linked to filopodia but further studies are needed to define the contribution of this pathway to the overall sEV secretion by cells.”

      Imaging of related human samples is certainly a strength of the paper, and the authors are commended for attempting to connect the findings from their cell culture experiments to an important clinical scenario. However, the marker selected for marking extracellular vesicles is CD81, which has been described as present on the endothelium of atherosclerotic plaques with a proposed role in the recruitment of monocytes into diseased arteries (Rohlena et al. Cardiovasc Res 2009). More data should address this potentially confounding interpretation of the signals presented in images within Figure 4.

      We thank the reviewer for this insightful comment that the  sEV marker CD81 can originate from endothelial cells in agreement with Rohlena et al., 2009.   To address this we investigated the spatial overlap between CD81 and the endothelial marker, CD31. We observed very strong CD81 staining in the intact endothelial cell (intima) layer and occasional CD31 positive cells in the neointima. Importantly, quantification of colocalization confirmed that 80% of CD81 in the neointima does not overlap with CD31 excluding an endothelial origin of these sEVs. (Fig 4G).  Moreover, we included complete morphological characterisation of the atherosclerotic plaques confirming that CD81 sEVs were primarily observed in the neointima where VSMCs constitute the cellular majority (Fig S4A, S4B, S4C and S4D).

      On a conceptual level, the idea that the small extracellular vesicles contain Type VI Collagen, and this element of their cargo is modulating smooth muscle cell migration, is an intriguing aspect of the authors' working model. Nevertheless, the evidence supporting this potential mechanism does not quite fit together as presented. It is not entirely clear how the collagen VI within the vesicles is somehow accessed by the smooth muscle cell filopodia during migration. Are the vesicles lysed open once on the extracellular matrix? If so, what is the proposed mechanism for that to occur? If not, how are the adhesion molecules on the smooth muscle cell surface engaging the collagen VI fibers that are contained within the vesicles? This aspect of the model does not quite fit together with the proposed mechanism and may be an interesting speculative interpretation, warranting further investigation, but it should not be considered a strong conclusion with sufficient convincing data supporting this idea.

      We thank the reviewer for their insightful comments regarding the mechanism by which collagen VI associated with sEVs could modulate smooth muscle cell adhesion and migration. To clarify, our new data suggest that collagen VI is predominantly present on the surface of the sEVs, as evidenced by Fig 7E. This surface localization strongly implies that collagen VI can be directly accessed by cell surface adhesion receptors, without the need for vesicle lysis or opening. While we cannot entirely rule out all alternative mechanisms, we consider vesicle rupture or lysis within the extracellular matrix to be a highly unlikely route for collagen VI exposure, given the known stability of sEVs under physiological conditions. We have added these points to clarify:

      (1) Results, page 10, Ln 45 “To confirm the presence of collagen VI in ECM-associated sEVs we analysed sEVs extracted from the 3D matrix using 0.5M NaCl treatment and showed that both collagen VI and FN are present (Fig 7D). Next, we analysed the distribution of collagen VI using dot-blot. Alix staining was bright only upon permeabilization of sEV indicating that it is preferentially a luminal protein (Fig 7E). On the contrary, CD63 staining was similar in both conditions showing that it is surface protein (Fig 7E). Interestingly, collagen VI staining revealed that 40% of the protein is located on the outside surface with 60% in the sEV lumen (Fig 7E).”

      (2) Discussion, page 13, Ln 2 “Hence, it will be interesting in future studies to investigate whether sEVs can stimulate Rho activity by presenting adhesion modulators—particularly collagen VI—on their surface, thereby guiding cell directionality during invasion..”

      (3) Discussion, page 14, Ln 30: In addition to collagen VI the unique adhesion cluster in VSMC-derived sEVS also includes EGF-like repeat and discoidin I-like domain-containing protein (EDIL3), transforming growth factor-beta-induced protein ig-h3 (TGFBI) and the lectin galactoside-binding soluble 3 binding protein (LGALS3BP) and these proteins are also directly implicated in activation of integrin signalling and cellular invasiveness85-87. Although we found that collagen VI plays the key role in sEV-induced early formation of FAs in VSMCs, it is tempting to speculate that the high sEV efficacy in stimulating FA formation is driven by cooperative action of this unique adhesion complex on the sEVs surface and targeting this novel sEV-dependent mechanism of VSMC invasion may open-up new therapeutic opportunities to modulate atherosclerotic plaque development or even to prevent undesired VSMC motility in restenosis.    .   

      (4) Abstract Figure

      On a technical level, some of the statistical analysis is not readily understood from the data presented. It is very much appreciated that the authors show many of the graphs with technical and biological replicate values in addition to the means and standard deviations (though this is not clearly stated in all figure legends). However, in figures such as Figure 5, there are bars shown and indicated to be different by statistical comparison (see panel B in Figure 5). It is not clear how the values for Group 1 (no FN, no 3-OMS, no sEV) are statistically different (denoted by three asterisks but no p value provided in the legend) than Group 3 (no FN, 3-OMS added, no sEV), when their means and standard deviations appear almost identical. If this is an oversight, this needs to be corrected. If this is truly the outcome, further explanation is warranted. A higher level of transparency in such instances would certainly go a long way in helping address the current crisis of mistrust within the scientific community and at the interface with society at-large.

      We thank the reviewer for their careful reading and important comments on the statistical analysis. We acknowledge that the technical and biological replicate data were not clearly reported in all figure legends and that the statistical approach for Figures 5A and 5B required clarification. In response, we have made several changes for greater transparency and rigor:

      First, we have now explicitly included the numbers of biological replicates (N) and technical replicates (n) in all relevant figure legends for Figures 1–7. In addition, the number of individual cell tracks is now annotated for the migration/invasion analyses, along with the mean values for each dataset.

      Upon review, we found that the original statistical analyses for Figures 5A and 5B were conducted using pooled averaged data. To address this, we have repeated the statistical tests using pooled individual cell track data, applying the Kruskal–Wallis test with Dunn’s multiple comparison correction. This more stringent approach revealed revised p-values, which are now indicated in Figures 5A and 5B.

      With these corrections, we reconfirm our major findings: In the 2D model, fibronectin (FN) coating promotes VSMC velocity, while inhibition of sEV secretion with 3-OMS leads to reduced cell speed (Fig. 5A). Addition of sEVs to the ECM had no effect on VSMC speed at baseline but did rescue cell speed and distance in the presence of 3-OMS, consistent with EVs acting primarily on invasion directionality rather than speed in both 2D and 3D models (Fig. 5A, 5D). Furthermore, sEVs continue to significantly impact VSMC invasion directionality (Figs. 5G, 5H), in agreement with previous reports in tumor cells (Sung et al., 2015).

      In summary, we have implemented the following revisions:

      (1) Figures 5A and 5B: Individual cell track data are now shown, and statistical analyses have been repeated using the Kruskal–Wallis test with Dunn’s multiple comparisons.

      (2) Figure legends and results sections: Numbers of biological and technical replicates, as well as individual data points, are now clearly stated.

      Results, page 9, line 14: The text has been updated to clarify the statistical approach and major findings as described above.

      We hope that these changes address the reviewer’s concerns and improve the transparency and reproducibility of our data presentation

      Reviewer #1 (Recommendations For The Authors):

      We are very thankful for the comprehensive review and comments which helped to improve our data.

      Figure 1.<br /> The authors clearly show that FN stimulation (immobilized or cell-derived) promotes sEV secretion via canonical integrin pathways. FN is a promigratory substrate, hence its extensive use as a cell adhesion aid; thus one could assume that simply plating on FN induces a pro-migratory phenotype (later data supports this notion). Does the addition of growth factors also increase sEV release? An endogenous function of FN is siloing of various GFs during clot formation. Also, FAK and SRC networks intersect with canonical RTK signaling in terms of promoting Rac1, CDC42 and other migration mediators. The reason I believe this is important is because the data could be interpreted in two ways: 1) FN induces pro-migration signaling and then sEVs are released, or visa versa, FN induces sEV release and migration is initiated. GF supplementation in the absence of FN would clarify this relationship.

      We thank the reviewer for this insightful comment regarding the possible role of growth factors (GFs) and the mechanistic relationship between FN stimulation, sEV secretion, and cell migration. We agree that FN is a well-established promoter of cell migration, and it is important to distinguish whether FN directly induces a pro-migratory phenotype or does so via sEV-mediated signaling.

      Our data show that FN stimulation markedly increases VSMC motility, as reflected by enhanced cell speed (Fig. 5A), an increased number of focal adhesions (Fig. 6E), and facilitated centripetal movement of FAs (Fig. 6F). Interestingly, ECM-associated sEVs appear to play a complementary but distinct role: they do not significantly affect cell migration speed (Fig. 5A) but instead guide cell invasion directionality (Figs. 5G, 5H), reduce the number of FAs per cell (Fig. 6E), and promote early peripheral FA formation (Fig. 6F). In light of these findings, we have updated our graphical abstract to reflect the unique cross-talk mediated by sEVs between VSMCs and the ECM.

      Regarding the influence of growth factors, we acknowledge that FN can bind and present different GFs, which could also contribute to changes in sEV secretion. Although our inhibition studies and integrin-blocking antibody results support a primary role for β1 integrin activation and actin assembly in triggering sEV secretion, we cannot entirely exclude the possibility that FN-bound growth factors play a role in this process. We have now incorporated this point into the discussion to address the reviewer’s suggestion.

      Discussion, page 14 , Ln 7 “Although our small inhibitors and integrin modulating antibody data clearly indicate that β1 activation triggers sEV secretion via activation of actin assembly we cannot fully rule out that FN may also be modulating growth factor activity which in turn contributes to sEV secretion by VSMCs<sup>23</sup>.  Excessive collagen and elastin matrix breakdown in atheroma has been tightly linked to acute coronary events hence it will be interesting to study the possible link between sEV secretion and plaque stability as sEV-dependent invasion is also likely to influence the necessary ECM degradation induced by invading cells<sup>96</sup>

      Figure 2.<br /> • The authors provide no evidence (or references) that SMIFH2 or CK666 halts filopodia extensions.

      Thank you for this important note. We have included the corresponding references:

      Results, page 5: “So next we tested the contribution of Arp2/3 and formins by using the small molecule inhibitors, CK666 and SMIFH2, respectively31, 32”.  

      • Is there an increase in filopodia density when plated on FN vs plastic? Similarly, if there are more filopodia present is that associated with more sEV? Please provide evidence in this regard.

      We agree that connecting the number of filopodia with the secretion level of sEVs may be an important clue if sEV secretion can be driven by FN-induced filopodia formation. However, Myosin10 staining to quantify filopodia (Fig 2E) showed no difference between VSMCs plated on plastic versus FN matrix. Therefore, we agree with the reviewer that the filopodia contribution to sEV secretion needs to be investigated further.  This idea is reflected in the following comments:

      (1) Results, page 6, Ln 29 “We observed no difference between total filopodia number per cell on plastic or FN matrices (n=18±8 and n=14±3, respectively) however the presence of endogenous CD63+ MVBs along the Myo10-positive filopodia were observed in both conditions (Fig 2E, arrows).

      (2) Results, page 6, Ln 37 “We also attempted to visualise sEV release in filopodia using CD63-pHluorin where fluorescence is only observed upon the fusion of MVBs with the plasma membrane39. Using total internal reflection fluorescence microscopy (TIRF) we observed the typical “burst”-like appearance of sEV secretion at the cell-ECM interface in full agreement with an earlier report showing MVB recruitment to invadopodia-like structures in tumor cells18 (Fig S2B and Supplementary Video S1). Although we also observed an intense CD63-pHluorin staining along filopodia-like structures we were not able to detect typical “burst”-like events to confirm sEV secretion in filopodia. (Fig S2C and Supplemental Video S1)..”

      (3) Discussion, page 12, Ln 15 : “Focal complexes either disassemble or mature into the elongated centripetally located FAs48. In turn, these mature FAs anchor the ECM to actin stress fibres and the traction force generated by actomyosin-mediated contractility pulls the FAs rearward and the cell body forward12, 13. Here we report that β1 integrin activation triggers sEV release followed by sEV entrapment by the ECM. Curiously we observed CD63+ MVB transport toward the filopodia tips as well as inhibition of sEV-secretion with filopodia formation inhibitors suggesting that sEV secretion can be directly linked to filopodia but further studies are needed to define the contribution of this pathway to the overall sEV secretion by cells..”

      As hinted above, this data could be interpreted in the light of generally inhibiting cell migration to blunt sEV shedding. Does cell confluence affect sEV release? If cells are cultured to 100% confluency this would limit filopodia formation regardless of ECM type. If sEV secretion remains elevated on FN in this culture condition it would suggest a lack of dependency on filopodia.

      We thank the reviewer for this thoughtful suggestion regarding the influence of cell confluence on sEV release and filopodia formation. To directly address this hypothesis, we performed additional experiments comparing VSMCs cultured at low and high confluency. As described in the revised Results (page 7, line 39), we found that high cellular confluency reduced FN recycling, as indicated by the marked decrease in intracellular FN-positive spots and loss of colocalization with CD63 (Figs S3F, S3G). Importantly, this was accompanied by a significant reduction in CD63+/CD81+ sEV secretion by confluent cells (Fig S3H). These results suggest that VSMC confluence, which suppresses filopodia formation, nearly abolishes both intracellular FN trafficking and sEV secretion, even in the presence of FN. Thus, under our experimental conditions, sEV secretion by VSMCs appears to be closely linked to dynamic cell–matrix interactions and is dramatically reduced when these processes are limited by confluence:

      (1) Results, page 7, Ln 39 : “Notably, when we compared FN distribution in sparsely growing VSMCs versus confluent cells we found that FN intracellular spots, as well as colocalization with CD63, completely disappeared in the confluent state (Fig S3F and S3G). This correlated with nearly complete loss of CD63+/CD81+ sEV secretion by the confluent cells indicating that confluence abrogates intracellular FN trafficking as well as sEV secretion by VSMCs (Fig S3H)..  

      • Inhibition of branched actin polymerization has been shown to reduce both exocytic and endocytic activity. Thus, it is hard to interpret the results of Fig. 2B than anything more than a generalized effect of losing actin.

      We thank the reviewer for this important point regarding the broad cellular functions of branched actin polymerization, and agree that generalized actin loss can influence both exocytic and endocytic pathways. To address this, we performed additional experiments and analyses to better define the relationship between branched actin structures and sEV-related processes in VSMCs.

      As described in the revised Results (page 6), we overexpressed ARPC2-GFP (an Arp2/3 subunit) together with F-tractin-RFP in VSMCs and carried out live-cell imaging. This approach revealed that Arp2/3 and F-actin organize into lamellipodial scaffolds at the cell cortex, as expected (Fig. S2A; Supplementary Video S2). Additionally, and more unexpectedly, we observed numerous Arp2/3– and F-actin–positive dynamic spots within the VSMC cytoplasm. These structures resemble actin comet tails seen in other systems, previously implicated in endosomal propulsion (Fig. S2A, arrow; Supplementary Video S2).

      Quantitative analysis confirmed that a substantial fraction of these dynamic F-actin/cortactin spots colocalized with CD63+ endosomes (Fig. 2D), and that these structures are indeed branched actin tails based on cortactin immunostaining. Furthermore, inhibition of SMPD3 (with 3-OMS) induced enlarged cortactin/F-actin/CD63+ complexes, morphologically similar to invadopodia (Fig. 2D, arrowheads), supporting a functional link between actin branching and MVB dynamics.

      To quantify the association, we calculated Manders’ colocalization coefficients for F-actin tails and CD63+ endosomal structures in fixed VSMCs, observing that ~50% of F-actin tails were associated with ~13% of endosomes. Upon 3-OMS treatment, this overlap increased further (Fig. S2B).

      Finally, using live-cell imaging (Fig 2C; Supplementary Video S4), we directly observed CD63+ MVBs being propelled through the cytoplasm by Arp2/3-driven actin tails, suggesting a mechanistic role for branched actin assembly in MVB intracellular transport, rather than a generalized effect of actin disruption alone.

      We believe these combined data reinforce a more specific mechanistic role for Arp2/3-mediated branched actin in MVB/endosome transport and, consequently, in sEV secretion in VSMCs—over and above an indirect effect of global actin loss. We hope these additional experiments and quantitative analyses address the reviewer’s concern and clarify the functional relevance of branched actin structures to sEV trafficking:

      (1) Results, page 6, Ln 3 “As regulators of branched actin assembly, the Arp2/3 complex and cortactin are thought to contribute to sEV secretion in tumour cells by mediating MVB intracellular transport and plasma membrane docking[28, 33]. Therefore, we overexpressed the Arp2/3 subunit, ARPC2-GFP and the F-actin marker, F-tractin-RFP in VSMCs and performed live-cell imaging. As expected, Arp2/3 and F-actin bundles formed a distinct lamellipodia scaffold in the cellular cortex (Fig S2A and Supplementary Video S2). Unexpectedly, we also observed numerous  Arp2/3/F-actin positive spots moving  through the VSMC cytoplasm that resembled previously described endosome actin tails observed in Xenopus eggs[33] and parasite infected cells where actin comet tails propel parasites via filopodia to neighbouring cells[34, 35] (Fig S2A, arrow, and Supplementary Video S2). Analysis of the intracellular distribution of Arp2/3 and CD63-positive endosomes in VSMCs showed CD63-MVB propulsion by the F-actin tail in live cells (Fig 2C and Supplementary Video S4).”

      (2) Results, New data Fig 2D, page 6, Ln 14. “we observed numerous F-actin spots in fixed VSMCs that were positive both for F-actin and cortactin indicating that these are branched-actin tails (Fig 2D). Moreover, cortactin/F-actin spots colocalised with CD63+ endosomes and addition of the SMPD3 inhibitor, 3-OMS, induced the appearance of enlarged doughnut-like cortactin/F-actin/CD63 complexes resembling invadopodia-like structures similar to those observed in tumour cells (Fig 2D, arrowheads)[18].”

      (3) Results, New data Fig S2B, page 6, Ln 19 “To quantify CD63 overlap with the actin tail-like structures, we extracted round-shaped actin structures and calculated the thresholded Manders colocalization coefficient (Fig S2B).  We observed overlap between F-actin tails and CD63 as well as close proximity of these markers in fixed VSMCs (Fig S2B). Approximately 50% of the F-actin tails were associated with 13% of all endosomes (tM1=0.44±0.23 and tM2= 0.13±0.06, respectively, N=3). Addition of 3-OMS enhanced this overlap further (tM1=0.75±0.18 and tM2=0.25±0.09) suggesting that Arp2/3-driven branched F-actin tails are involved in CD63+ MVB intracellular transport in VSMCs”

      • In video 1 the author states (lines 8-9; pg6) "intense CD63 staining along filopodia" Although, there is some fluorescence (not strong) in these structures, there was no visible exocytic activity. This data is more suggestive that sEVs (marked by CD63) are not associated with filopodia. The following conclusion statement the authors make is overreaching given this result.

      We thank the reviewer for this careful observation and agree that the previous conclusion regarding sEV release from filopodia was overstated. In response, we have revised both the Results and Discussion sections to more accurately reflect the data..

      (1) Results, page 6, Ln37 “We also attempted to visualise sEV release in filopodia using CD63-pHluorin where fluorescence is only observed upon the fusion of MVBs with the plasma membrane39. Using total internal reflection fluorescence microscopy (TIRF) we observed the typical “burst”-like appearance of sEV secretion at the cell-ECM interface in full agreement with an earlier report showing MVB recruitment to invadopodia-like structures in tumor cells18 (Fig S2B and Supplementary Video S1). Although we also observed an intense CD63-pHluorin staining along filopodia-like structures we were not able to detect typical “burst”-like events to confirm sEV secretion in filopodia. (Fig S2C and Supplemental Video S1)..”

      (2) Discussion, page 12, Ln19 “Curiously we observed CD63+ MVB transport toward the filopodia tips as well as inhibition of sEV-secretion with filopodia formation inhibitors suggesting that sEV secretion can be directly linked to filopodia but further studies are needed to define the contribution of this pathway to the overall sEV secretion by cells.”. 

      • Fig 2D and video 2 are wholly unconvincing with regard to sEV secretion sites. The authors could use their CD63-pHluroin construct to count exocytic events in the filopodia vs the whole cell. Given the movie, I have a suspicion this would not be significant. The authors could also perform staining CD63 in non-permeabilized cells to capture and count exocytic events at the plasma membrane as well as their location between groups.

      We thank the reviewer for these constructive suggestions and their critical assessment of our current data regarding the sites of sEV secretion. We agree that our CD63-pHluorin approach clearly indicates sEV secretion events in the soma at the cell–ECM interface, while we did not observe comparable events in filopodia. Accordingly, we have clarified these points in the revised manuscript.

      (1) Results, page 6, Ln37 “We also attempted to visualise sEV release in filopodia using CD63-pHluorin where fluorescence is only observed upon the fusion of MVBs with the plasma membrane39. Using total internal reflection fluorescence microscopy (TIRF) we observed the typical “burst”-like appearance of sEV secretion at the cell-ECM interface in full agreement with an earlier report showing MVB recruitment to invadopodia-like structures in tumor cells18 (Fig S2B and Supplementary Video S1). Although we also observed an intense CD63-pHluorin staining along filopodia-like structures we were not able to detect typical “burst”-like events to confirm sEV secretion in filopodia. (Fig S2C and Supplemental Video S1)..”

      (2) Discussion, page 12, Ln19 “Curiously we observed CD63+ MVB transport toward the filopodia tips as well as inhibition of sEV-secretion with filopodia formation inhibitors suggesting that sEV secretion can be directly linked to filopodia but further studies are needed to define the contribution of this pathway to the overall sEV secretion by cells.”. 

      • Fig. 2E and video 4. Again, the conclusions drawn from this data are very strained. First, no co-localization quantification is presented on the proportion of CD63 vesicles with actin. Once again, the movie, if anything convinces the reader that 95-99% of all CD63 vesicles are not associated with actin; therefore, this is an unlikely mechanism of transport.

      We thank the reviewer for this valuable comment and for highlighting the need for quantitative co-localization analysis. In response, we developed a method to systematically quantify F-actin and CD63 co-localization in fixed VSMCs, as now presented in new Figures 2D and S2B. We acknowledge that the majority of CD63+ endosomes are not associated with F-actin, consistent with the reviewer’s interpretation. However, our quantitative data now show that a specific subpopulation of MVBs appears to utilize this actin-based mechanism for transport. We believe this addresses the concern and more accurately reflects the prevalence and significance of the mechanism described.

      (1) Results, page 6 , Ln 19. “To quantify CD63 overlap with the actin tail-like structures, we extracted round-shaped actin structures and calculated the thresholded Manders colocalization coefficient (Fig S2B).  We observed overlap between F-actin tails and CD63 as well as close proximity of these markers in fixed VSMCs (Fig S2B). Approximately 50% of the F-actin tails were associated with 13% of all endosomes (tM1=0.44±0.23 and tM2= 0.13±0.06, respectively, N=3). Addition of 3-OMS enhanced this overlap further (tM1=0.75+/-0.18 and tM2=0.25+/-0.09) suggesting that Arp2/3-driven branched F-actin tails are involved in CD63+ MVB intracellular transport in VSMCs.”

      • Are there perturbations that increase filopodia numbers? A gain of function experiment would be valuable here.

      We thank the reviewer for this important suggestion regarding the potential value of gain-of-function experiments to clarify filopodia’s contribution to sEV release. In agreement with the reviewer’s scepticism, we have removed statements linking filopodia to sEV release from both the title and abstract to avoid overinterpretation. At present, our understanding of filopodia biology and the lack of robust tools to selectively and substantially increase filopodia numbers in VSMCs prevent us from directly addressing this question through gain-of-function assays. We acknowledge that future studies using established methods—such as overexpression of filopodia-inducing proteins (e.g., mDia2 or fascin)—could provide insight into whether an increased number of filopodia affects sEV release. However, such experiments are beyond the scope of the current manuscript. We have made the following changes to clarify these points:

      (1) Results, page 6, Ln37 “We also attempted to visualise sEV release in filopodia using CD63-pHluorin where fluorescence is only observed upon the fusion of MVBs with the plasma membrane39. Using total internal reflection fluorescence microscopy (TIRF) we observed the typical “burst”-like appearance of sEV secretion at the cell-ECM interface in full agreement with an earlier report showing MVB recruitment to invadopodia-like structures in tumor cells18 (Fig S2B and Supplementary Video S1). Although we also observed an intense CD63-pHluorin staining along filopodia-like structures we were not able to detect typical “burst”-like events to confirm sEV secretion in filopodia. (Fig S2C and Supplemental Video S1)..”

      (2) Discussion, page 12, Ln19 “Curiously we observed CD63+ MVB transport toward the filopodia tips as well as inhibition of sEV-secretion with filopodia formation inhibitors suggesting that sEV secretion can be directly linked to filopodia but further studies are needed to define the contribution of this pathway to the overall sEV secretion by cells.”. 

      Figure 3<br /> • Fig 3A. The CD63 staining is strongly associated with the entire plasma membrane. How are the authors distinguishing between normal membrane shedding and bona fida sEVs based on this staining alone (?)- this is insufficient as all membrane structures are seemingly positive. Additionally, there are very few sEVs in scrutinizing the provided images. For the "sEV secretion, fold change" graphs in previous figures, could the authors provide absolute values, or an indication of what these values are in absolute terms?

      We thank the reviewer for raising this important point regarding the specificity of CD63 staining and the need to distinguish bona fide sEVs from membrane fragments or general membrane shedding. We agree that CD63 staining alone at the plasma membrane or in the extracellular matrix is not sufficient to unequivocally identify sEVs. To address this, we employed several complementary approaches to rigorously characterize ECM-associated sEVs:

      First, using high-resolution iSIM imaging, we confirmed the association of CD63-positive particles specifically with the FN-rich matrix, and demonstrated that SMPD3 knockdown significantly reduced the number of CD63+ particles in the matrix (Fig. 3B; revised from Fig. S3A).

      Second, by incubating FN matrices with purified and fluorescently labeled sEVs, we directly observed efficient entrapment of these labeled sEVs within the matrices (Fig. 3E), confirming that sEVs can interact with and be retained by the ECM.

      Third, we developed and applied a sequential extraction protocol using mild salt buffer (0.5M NaCl) and strong denaturant (4M guanidine HCl) to selectively extract ECM-associated sEVs based on the strength of their association (see new Figs. S3A and S3B). Extracted vesicles were then characterized by ExoView analysis, which demonstrated a tetraspanin profile (CD63+/CD81+/CD9+) closely matching that of sEVs from conditioned media, providing evidence that these particles are true sEVs and not merely membrane debris. We also found that the more weakly bound (NaCl-extracted) fraction closely resembles media-derived sEVs, while the strongly bound (GuHCl-extracted) fraction is more enriched in CD63+ and CD63+/CD81+ sEVs but contains very few CD9+ vesicles, further supporting distinct extracellular vesicle subpopulations within the ECM.

      In addition, the abundance of CD63+/CD81+ sEVs in both media and ECM-derived fractions was independently validated by CD63 bead-capture assay (Fig. S3B).

      We hope these clarifications and the expanded data set address the reviewer’s concerns about sEV identification and quantification in the extracellular matrix:

      (1) Results, page 7, Ln 16. To quantify ECM-trapped sEVs we applied a modified protocol for the sequential extraction of extracellular proteins using salt buffer (0.5M NaCl) to release sEVs which are loosely-attached to ECM via ionic interactions, followed by 4M guanidine HCl buffer (GuHCl) treatment to solubilize strongly-bound sEVs (Fig S3A) 42. We quantified total sEV and characterised the sEV tetraspanin profile in conditioned media, and the 0.5M NaCl and GuHCl fractions using ExoView. The total particle count showed that EVs are both loosely bound and strongly trapped within the ECM. sEV tetraspanin profiling showed differences between these 3 EV populations.  While there was close similarity between the conditioned media and the 0.5M NaCl fraction with high abundance of CD63+/CD81+ sEVs as well as CD63+/CD81+/CD9+ in both fractions (Fig S3A). In contrast, the GuHCl fraction was particularly enriched with CD63+ and CD63+/CD81+ sEVs with very low abundance of CD9+ EVs (Fig S3A). The abundance of CD63+/CD81+ sEVs was confirmed independently by a CD63+ bead capture assay in the media and loosely bound fractions (Fig S3B).

      • A control of fig 3b would be helpful to parse out random uptake of extracellular debris verses targeted sEV internalization. It would be helpful if the authors added particles of similar size to that of the sEVs to test whether these structures are endocytosed/micropinocytosed at similar levels.

      We thank the reviewer for this useful suggestion regarding the need for better controls to distinguish specific sEV uptake from nonspecific internalization of extracellular debris or similarly sized particles. As a comparison, in our study we analyzed the uptake of both sEVs and serum proteins such as fibronectin and fetuin-A (Figs S3C and S3D), and observed similar patterns of intracellular trafficking. However, we acknowledge that inert nanoparticles or beads of a similar size to sEVs could serve as potential controls to assess nonspecific micropinocytosis or endocytosis.

      It is important to note, however, that the uptake of sEVs is strongly influenced by their surface protein composition and the so-called “protein corona.” Recent work from Prof. Khuloud T. Al-Jamal’s group underscores that exosome uptake mechanisms may be highly specific (Liam-Or et al., 2024), and studies from Mattias Belting’s lab have also shown the importance of heparan sulfate proteoglycans in exosome endocytosis (Cerezo-Magana et al., 2021). As a result, uptake comparisons with inert particles or beads may not fully recapitulate the specificity of sEV internalization, and distinct nanoparticle classes may rely on different uptake pathways.

      Figure 4<br /> • Fig. 4E,F,G. How are the authors determining the neointima and media compartments without ancillary staining for basement membrane or endothelial markers? Anatomic specific markers need to be incorporated here for the reader to evaluate the specificity of the FN and CD81 staining. It is also hard to understand the severity of the atherosclerotic lesion without a companion H&E cross section.

      We thank the reviewer for highlighting the need for more rigorous characterization of atherosclerotic lesion architecture and anatomical compartments in our study. In response, we have incorporated additional histological analyses and now provide ancillary staining and companion images to enable clear identification of the neointima and medial compartments, as well as to assess lesion severity (see new Figs S4A–S4D):

      (1)Results, page  8, Ln 28. . “To test if FN associates with sEV markers in atherosclerosis, we investigated the spatial association of FN with sEV markers using the sEV-specific marker CD81. Staining of atherosclerotic plaques with haematoxylin and eosin revealed well-defined regions with the neointima as well as tunica media layers formed by phenotypically transitioned or contractile VSMCs, respectively (Fig S4A). Masson's trichrome staining of atherosclerotic plaques showed abundant haemorrhages in the neointima, and sporadic haemorrhages in the tunica media (Fig S4B). Staining of atherosclerotic plaques with orcein indicated weak connective tissue staining in the atheroma with a confluent extracellular lipid core, and strong specific staining at the tunica media containing elastic fibres which correlated well with the intact elastin fibrils in the tunica media (Figs S4C and S4D). Using this clear morphological demarcation, we found that FN accumulated both in the neointima and the tunica media where it was significantly colocalised with the sEV marker, CD81 (Fig. 4D, 4E and 4F). Notably CD81 and FN colocalization was particularly prominent in cell-free, matrix-rich plaque regions (Figs. 4E and 4F).”

      • Figs s4c, S4d- proper controls are not provided. Again, a non-FN internalization control as well as a 4oC cold block negative control is required to interpret this data.

      We thank the reviewer for this valuable suggestion. To enhance the rigor of our internalization assays, we have now included several additional controls using alternative treatments, fluorophore combinations, and internalization conditions:

      a) We performed FN-Alexa568 uptake assays, followed by immunostaining for CD63 with a distinct fluorophore (Alexa488), to confirm the colocalization of internalized FN with CD63+ endosomal compartments in VSMCs (new Fig. S3E).

      b) We also stained VSMCs, cultured under normal growth conditions, with an anti-FN antibody to visualize intracellular serum-derived FN and again observed colocalization with CD63 (new Figs. S3F and S3G). Notably, in cells grown to confluence, we observed a complete loss of intracellular FN staining and FN/CD63 colocalization, suggesting that FN recycling is prominent in sparse, motile cells, but not in confluent populations.

      These additional controls strengthen our conclusions regarding FN internalization pathways and the conditions under which FN trafficking to the endosomal system occurs:

      (1) Results, page 7, Ln 31  We treated serum-deprived primary human aortic VSMCs with FN-Alexa568 and found that it was endocytosed and subsequently delivered to early and late endosomes together with fetuin A, another abundant serum protein that is a recycled sEV cargo and elevated in plaques (Figs S3C and S3D). CD63 visualisation with a different fluorophore (Alexa488) confirmed FN colocalization with CD63+ MVBs (Fig S3E). Next, we stained non-serum deprived VSMC cultured in normal growth media (RPMI supplemented with 20% FBS) with an anti-FN antibody and observed colocalization of CD63 and serum-derived FN.  Co-localisation was reduced likely due to competitive bulk protein uptake by non-deprived cells (Fig S3F). Notably, when we compared FN distribution in sparsely growing VSMCs versus confluent cells we found that FN intracellular spots, as well as colocalization with CD63, completely disappeared in the confluent state (Fig S3F and S3G)..

      • Can the authors please provide live and fixed imaging of FN and CD63-mediate filopodial secretion to amply support their conclusions.

      We have observed CD63 MVBs in both fixed (Fig 2E) and live VSMCs (Fig 2F) yet we agree that further studies are required to establish the contribution of filopodia to sEV secretion. Therefore, we have added the following changes:

      (1) Results, page 6, Ln37 “We also attempted to visualise sEV release in filopodia using CD63-pHluorin where fluorescence is only observed upon the fusion of MVBs with the plasma membrane39. Using total internal reflection fluorescence microscopy (TIRF) we observed the typical “burst”-like appearance of sEV secretion at the cell-ECM interface in full agreement with an earlier report showing MVB recruitment to invadopodia-like structures in tumor cells18 (Fig S2B and Supplementary Video S1). Although we also observed an intense CD63-pHluorin staining along filopodia-like structures we were not able to detect typical “burst”-like events to confirm sEV secretion in filopodia. (Fig S2C and Supplemental Video S1)..”

      (2) Discussion, page 12, Ln19 “Curiously we observed CD63+ MVB transport toward the filopodia tips as well as inhibition of sEV-secretion with filopodia formation inhibitors suggesting that sEV secretion can be directly linked to filopodia but further studies are needed to define the contribution of this pathway to the overall sEV secretion by cells.”. 

      Figure 5

      • Fig. 5A,B. The authors claim that sEV supplementation enhances VSMC migration speed and distance. The provided graphs show only a marginal increase in speed with sEV addition (A) but, concerningly, there is a four-star significant difference between the FN condition compared with FN+sEV (B) while the means appear the same. How are these conditions statistically different? The statistics seem off for these comparisons.

      We thank the reviewer for highlighting concerns regarding the statistical analysis in Figures 5A and 5B. In response, we have carefully re-examined our data and statistical approach to ensure accuracy and transparency.

      First, we have now included all individual cell migration tracks in the data representation for these figures. The statistical tests were repeated using the Kruskal–Wallis test with Dunn’s multiple comparison correction across all groups. This more stringent analysis confirmed our key findings: fibronectin (FN) stimulates VSMC migration speed, while inhibition of sEV secretion (with 3-OMS) reduces cellular speed (Fig. 5A). Addition of exogenous ECM-associated sEVs modestly restored cell speed in the presence of 3-OMS, but had no effect on baseline migration speed in 2D or 3D models (Figs. 5A, 5D).

      Regarding the four-star significance observed in the original Fig. 5B, the previous result reflected an analysis based on pooled group averages, which may have overstated marginal differences. The revised analysis, based on individual cell tracks, does not support a substantial difference between FN and FN+sEV groups. The revised p-values and comparisons are now provided directly on the figures and described in the figure legends. We also clearly report the numbers of biological replicates, technical replicates, and individual data points for every condition.

      Further, the modest effect of ECM-associated sEVs on speed is consistent with our observation that sEVs influence invasion directionality rather than baseline migration velocity, in agreement with previous findings in tumor models (Sung et al., 2015).

      The manuscript has been revised accordingly, with updates in:

      (1) Figures 5A and 5B: Individual cell track data are now shown, and statistical analyses have been repeated using the Kruskal–Wallis test with Dunn’s multiple comparisons.

      (2) Figure legends and results sections: Numbers of biological and technical replicates, as well as individual data points, are now clearly stated.

      (3) Results, page 9, line 14:  “FN as a cargo in sEVs promotes FA formation in tumour cells and increases cell speed14, 15. As we found that FN is loaded into VSMC-derived sEVs we hypothesized that ECM-entrapped sEVs can enhance cell migration by increasing cell adhesion and FA formation in the context of a FN-rich ECM. Therefore, we tested the effect of sEV deposition onto the FN matrix on VSMC migration in 2D and 3D models. We found that FN coating promoted VSMC velocity and inhibition of bulk sEV secretion with 3-OMS reduced VSMC speed in a 2D single-cell migration model (Figs. 5A, 5B) in agreement with previous studies using tumour cells14, 15. However, addition of sEVs to the ECM had no effect on VSMC speed at baseline but rescued cell speed and distance in the presence of the sEV secretion inhibitor, 3-OMS suggesting the EVs are not primarily regulating cell speed (Figs 5A and 5B).”

      (4) Results, page 9, Ln 29 “Hence, ECM-associated sEVs have modest influence on VSMC speed but influence VSMC invasion directionality.”.

      We hope that these changes address the reviewer’s concerns and improve the transparency and reproducibility of our data presentation

      • Fig d-h. Generally, the magnitude of the difference between the presented conditions are biologically insignificant. Several of the graphs show a four-star difference with means that appear equivalent with overlapping error bars. Do the authors conclude that a 0.1%, or less, effect between groups is biologically meaningful?

      We thank the reviewer for drawing attention to the apparent mismatch between statistical significance and biological relevance in Figures 5d–h. In response, we have reanalyzed the data using individual cell tracks and more stringent non-parametric statistical tests, as described above. This reanalysis confirmed that the magnitude of differences in migration speed and related parameters between the groups is minimal and not biologically meaningful. Thus, we no longer claim that sEVs significantly affect VSMC migration speed under these conditions in either 2D or 3D assays. Our revised manuscript now accurately reflects this finding in both the Results and Discussion sections, and the updated figures and legends clarify the true extent of any differences observed.

      Figure 6

      • Generally, the author's logic for looking into adhesion, focal adhesion and traction forces is hard to follow. If there are sEV-mediated migration differences, then there would inexorably be focal adhesion alterations. However, the data indicates few differences brought on by sEVs, which speaks to the lack of migration differences presented in Fig. 5. Overall, the sEV migration phenotype has so little of an effect, to then search for a mechanism seems destine to not turn up anything significant.

      We thank the reviewer for highlighting the importance of connecting the observed phenotypic effects of sEVs to the investigation of adhesion and focal adhesion mechanisms. While our revised analysis confirms that sEVs have little to no effect on VSMC migration speed or distance in 2D and 3D models, we did observe a robust effect of sEVs on the directionality of cell invasion (Figs. 5G and 5H). This prompted us to look more closely at pathways involved in cell guidance rather than bulk cell motility.

      Our proteomic comparison between larger EVs (10K fraction) and sEVs (100K fraction) revealed a unique adhesion complex present specifically on the sEVs—comprising collagen VI, TGFBI, LGALS3BP, and EDIL3 (Figs. 7A–C)—each of which has previously been implicated in integrin signaling, cell adhesion, or invasion. Functional blocking and knockdown studies further identified collagen VI as a key mediator in the regulation of cell adhesion and invasion directionality influenced by sEVs (Figs. 7F and 7I).

      In response to this mechanistic insight, we have modified the graphical abstract and discussion to clarify our approach:

      We now explicitly state that our focus has shifted from analyzing baseline migration speed to mechanisms guiding invasion directionality, in line with our key phenotypic findings.We highlight that the unique adhesion cluster identified on sEVs—including collagen VI and its cooperative partners—provides a strong mechanistic rationale for examining focal adhesion dynamics and ECM interactions, even in the absence of changes in migration velocity.Discussion excerpts (pages 13–14) have been updated to reflect this rationale and to summarize the potential significance of these findings for vascular biology and disease.

      We hope this clarifies the logic underlying our approach and justifies the mechanistic studies performed in this context:

      (1) Discussion, page 13, Ln 2  “Hence, it will be interesting in future studies to investigate whether sEVs can stimulate Rho activity by presenting adhesion modulators—particularly collagen VI—on their surface, thereby guiding cell directionality during invasion.”

      (2) Discussion, page 13, Ln 30  “In addition to collagen VI the unique adhesion cluster in VSMC-derived sEVS also includes EGF-like repeat and discoidin I-like domain-containing protein (EDIL3), transforming growth factor-beta-induced protein ig-h3 (TGFBI) and the lectin galactoside-binding soluble 3 binding protein (LGALS3BP) and these proteins are also directly implicated in activation of integrin signalling and cellular invasiveness85-87. Although we found that collagen VI plays the key role in sEV-induced early formation of FAs in VSMCs, it is tempting to speculate that the high sEV efficacy in stimulating FA formation is driven by cooperative action of this unique adhesion complex on the sEVs surface and targeting this novel sEV-dependent mechanism of VSMC invasion may open-up new therapeutic opportunities to modulate atherosclerotic plaque development or even to prevent undesired VSMC motility in restenosis”.    . 

      (3) Discussion, page 14, Ln 14 “In summary, cooperative activation of integrin signalling and F-actin cytoskeleton pathways results in the secretion of sEVs which associate with the ECM and play a signalling role by controlling FA formation and cell-ECM crosstalk. Further studies are needed to test these mechanisms across various cell types and ECM matrices.     ”.    

      Figure 7<br /> • The authors need to provide additional evidence Col IV is harbored in sEVs and not a contaminant of sEV isolation as VSMCs secrete a copious amount of this in culture. For instance, IHC of isolated sEVs stained for CD63 and Col IV as well as single cell staining of the same sort.

      We thank the reviewer for this important comment regarding the specificity of collagen VI detection in sEVs. To ensure that collagen VI is associated with bona fide sEVs—rather than being a contaminant resulting from high extracellular abundance—we performed a comparative analysis of vesicles isolated from the same conditioned media. Both proteomic mass spectrometry and western blotting revealed that collagen VI was exclusively present in the small EV (100K pellet) fraction and not in the larger EVs (10K pellet), as shown in Figs. 7B and 7C. Collagen VI was further identified in sEVs extracted from the ECM using our salt/guanidine protocol (new Fig. 7D).

      Reviewer #2 (Recommendations For The Authors):

      The authors have presented a nice collection of data with strong approaches to address their hypotheses. Nevertheless, an additional section within the Discussion would be welcome in addressing the potential limitations and important caveats to be considered alongside their study. These caveats and limitations could be reshaped by additional data supporting the ideas that: (1) small extracellular vesicles can be directly observed during their secretion from filopodia, (2) CD81 labeling in tissue can be interpreted clearly as extracellular vesicles and not the cell surface of other cell types (co-staining with an endothelial cell marker such as PECAM-1 perhaps), and (3) collagen VI within the vesicles is somehow accessed by adhesion molecules on the cell surface of migrating cells.

      We thank the reviewer for these important suggestions and we have now added further studies and modified our conclusions to reflect the data more accurately:

      (1) Results. Page 6, Ln37  “We also attempted to visualise sEV release in filopodia using CD63-pHluorin where fluorescence is only observed upon the fusion of MVBs with the plasma membrane39. Using total internal reflection fluorescence microscopy (TIRF) we observed the typical “burst”-like appearance of sEV secretion at the cell-ECM interface in full agreement with an earlier report showing MVB recruitment to invadopodia-like structures in tumor cells18 (Fig S2B and Supplementary Video S1). Although we also observed an intense CD63-pHluorin staining along filopodia-like structures we were not able to detect typical “burst”-like events to confirm sEV secretion in filopodia. (Fig S2C and Supplemental Video S1)”..  

      (2) Discussion, page 12, Ln18: “Here we report that β1 integrin activation triggers sEV release followed by sEV entrapment by the ECM. Curiously we observed CD63+ MVB transport toward the filopodia tips as well as inhibition of sEV-secretion with filopodia formation inhibitors suggesting that sEV secretion can be directly linked to filopodia but further studies are needed to define the contribution of this pathway to the overall sEV secretion by cells”..

      We quantified the colocalization of CD81 and CD31 to exclude the endothelial cell origin of sEVs and extended the characterisation of the atherosclerotic matrix as well as highlighting any limitations to interpretation ie re  CD81 ECM localisation: 

      (1) Results, page 8, Ln 43 “An enhanced expression of CD81 by endothelial cells in early atheroma has been previously reported so to study the contribution of CD81+ sEVs derived from endothelial cells  we investigated the localisation of CD31 and CD8145. In agreement with a previous study, we found that the majority of CD31 colocalises with CD81 (Thresholded Mander's split colocalization coefficient 0.54±0.11, N=6) indicating that endothelial cells express CD81 (Fig 4G)45. However, only a minor fraction of total CD81 colocalised with CD31 (Thresholded Mander's split colocalization coefficient 0.24±0.06, N=6) confirming that the majority of CD81 in the neointima is originating from the most abundant VSMCs.. 

      (2) Results, page 8, Ln 28: “To test if FN associates with sEV markers in atherosclerosis, we investigated the spatial association of FN with sEV markers using the sEV-specific marker CD81. Staining of atherosclerotic plaques with haematoxylin and eosin revealed well-defined regions with the neointima as well as tunica media layers formed by phenotypically transitioned or contractile VSMCs, respectively (Fig S4A). Masson's trichrome staining of atherosclerotic plaques showed abundant haemorrhages in the neointima, and sporadic haemorrhages in the tunica media (Fig S4B). Staining of atherosclerotic plaques with orcein indicated weak connective tissue staining in the atheroma with a confluent extracellular lipid core, and strong specific staining at the tunica media containing elastic fibres which correlated well with the intact elastin fibrils in the tunica media (Figs S4C and S4D). Using this clear morphological demarcation, we found that FN accumulated both in the neointima and the tunica media where it was significantly colocalised with the sEV marker, CD81 (Fig. 4D, 4E and 4F). Notably CD81 and FN colocalization was particularly prominent in cell-free, matrix-rich plaque regions (Figs. 4E and 4F). .”

      We showed that collagen VI is presented on the surface of sEVs:

      (1) Results, page 10, Ln43: “Collagen VI was the most abundant protein in VSMC-derived sEVs (Fig 7B, Table S7) and  was previously implicated in the interaction with the proteoglycan NG253 and suppression of cell spreading on FN54. To confirm the presence of collagen VI in ECM-associated sEVs we analysed sEVs extracted from the 3D matrix using 0.5M NaCl treatment and showed that both collagen VI and FN are present (Fig 7D). Next, we analysed the distribution of collagen VI using dot-blot. Alix staining was bright only upon permeabilization of sEV indicating that it is preferentially a luminal protein (Fig 7E). On the contrary, CD63 staining was similar in both conditions showing that it is surface protein (Fig 7E). Interestingly, collagen VI staining revealed that 40% of the protein is located on the outside surface with 60% in the sEV lumen (Fig 7E)

    1. eLife Assessment

      Decron and colleagues combine common psychiatric treatments with a probabilistic reward learning task and trial-by-trial ratings of affect, confidence, and engagement. Using computational cognitive modeling, they show that, while both treatments serve to counter negative biases in affect and confidence, cognitive distancing and antidepressant medication have dissociable effects on subjective evaluations and reward-based choice behavior. This work provides convincing evidence regarding an important line of investigation into the dynamic integration of affect, cognition, and learning.

    2. Reviewer #1 (Public review):

      Summary:

      This study examines how two common psychiatric treatments, antidepressant medication and cognitive distancing, influence baseline levels and moment-to-moment changes in happiness, confidence, and engagement during a reinforcement learning task. Combining a probabilistic selection task, trial-by-trial affect ratings, psychiatric questionnaires, and computational modeling, the authors demonstrate that each treatment has distinct effects on affective dynamics. Notably, the results highlight the key role of affective biases in how people with mental health conditions experience and update their feelings over time, and suggest that interventions like cognitive distancing and antidepressant medication may work, at least in part, by shifting these biases.

      Strengths:

      (1) Addresses an important question: how common psychiatric treatments impact affective biases, with potential translational relevance for understanding and improving mental health interventions.

      (2) The introduction is strong, clear, and accessible, making the study approachable for readers less familiar with the underlying literature.

      (3) Utilizes a large sample that is broadly representative of the UK population in terms of age and psychiatric symptom history, enhancing generalizability.

      (4) Employs a theory-driven computational modeling framework that links learning processes with subjective emotional experiences.

      (5) Uses cross-validation to support the robustness and generalizability of model comparisons and findings.

      Weaknesses:

      The authors acknowledge the limitations in the discussion section.

      Additional questions:

      (1) Group Balance & Screening for Medication Use: How many participants in the cognitive distancing and control groups were taking antidepressant medication? Why wasn't medication use included as part of the screening to ensure both groups had a similar number of participants taking medication?

      (2) Assessment of the Practice of Cognitive Distancing: Is there a direct or more objective method to evaluate whether participants actively engaged in cognitive distancing during the task, and to what extent? Currently, the study infers engagement indirectly through the outcomes, but does not include explicit measures of participants' use of the technique. Would including self-report check-ins throughout the task, asking participants whether they were actively engaging in cognitive distancing, have been useful? However, including frequent self-report check-ins would increase procedural differences between groups, making perhaps the tasks less comparable beyond the intended treatment manipulation. Maybe incorporating a question at the end of the task, asking how much they engaged in cognitive distancing, could offer a useful measure of subjective engagement without overly disrupting the task flow.

      Conclusion:

      This study advances our understanding of the mechanisms underlying mental health interventions. The combination of computational modeling with behavioral and affective data offers a powerful framework for understanding how treatments influence affective biases and dynamics. These findings are of broad interest across clinical and mental health sciences, cognitive and affective research, and applied translational fields focused on improving psychological well-being.

    3. Reviewer #2 (Public review):

      In this paper, Dercon and colleagues report on affective changes related to components of reinforcement learning and on the effects of brief training in psychological distancing and participants' self-reported antidepressant use. About 1,000 participants were assessed online, with half randomized to a brief training in psychological distancing with reminders to distance during the subsequent reinforcement learning (RL) task. Participants completed a battery of psychiatric questionnaires and answered questions about medication use, with about 14% of participants reporting current antidepressant use. All participants completed the RL task and rated their happiness, confidence, engagement, and (at the end of each block of trials) fatigue throughout the task. Computational models were used to estimate trial-by-trial values of expected value and prediction error and to assess the effects of these values on self-reported affect. Participants' affect ratings decreased over time, and participants with higher psychiatric symptoms (particularly anxiety/depressive symptoms) showed lower baseline affect and greater decreases in affect. Participants randomized to the distancing intervention and who reported antidepressant use differed in their affective ratings: distancing reduced the reductions in happiness over time, while antidepressant use was related to higher baseline happiness. Distancing also reduced the effects of trial-level expected value on happiness, while antidepressant use was related to a more enduring effect of trial-level values on happiness.

      Overall, this is an interesting paper with strong methods and an interesting approach. That psychiatric symptoms and cognitive distancing are related to affective ratings is not terribly novel; the relationship with antidepressant use is a bit more novel. The extension of the mood model to an RL task is a new contribution, as is the relationship of these effects with psychologically related manipulations.

      One major concern is the inference that can be drawn from the two "treatments": one is a brief instruction in a component of psychotherapy, and one is ongoing use of medication. The former is not a treatment in and of itself, but a (presumably) active ingredient of one. How to interpret antidepressant use as measured is unclear, e.g., are the residual symptoms in these participants an early indicator of treatment resistance? Are these participants with better access to health care? Are they receiving antidepressants for a mental health issue?

      There are some clarifications needed in the affect model as well.

    4. Reviewer #3 (Public review):

      Summary:

      The present manuscript investigates and proposes different mechanisms for the effects of two therapeutic approaches - cognitive distancing technique and use of antidepressants - on subjective ratings of happiness, confidence, and task engagement, and on the influence of such subjective experiences on choice behavior. Both approaches were found to link to changes in affective state dynamics in a choice task, specifically reduced drift (cognitive distancing) and increased baseline (antidepressant use). Results also suggest that cognitive distancing may reduce the weighing of recent expected values in the happiness model, while antidepressant use may reduce forgetting of choices and outcomes.

      Strengths:

      This is a timely topic and a significant contribution to ongoing efforts to improve our mechanistic understanding of psychopathology and devise effective novel interventions. The relevance of the manuscript's central question is clear, and the links to previous literature and the broader field of computational psychiatry are well established. The modelling approaches are thoughtful and rigorously tested, with appropriate model checks and persuasive evidence that modelling complements the theoretical argument and empirical findings.

      Weaknesses:

      Some vagueness and lack of clarity in theoretical mechanisms and interpretation of results leave outstanding questions regarding (a) the specific links drawn between affective biases, therapies aimed at mitigating them, and mental health function, and (b) the structure and assumptions of the modelling, and how they support the manuscript's central claims. Broadly, I do not fully understand the distinction between how choice behavior vs. affect are impacted separately or together by cognitive distancing. Clarification on this point is needed, possibly through a more explicit proposal of a mechanism (or several alternative mechanisms?) in the introduction and more explicit interpretation of the modelling results in the context of the cyclical choice-affect mechanism.

      (1) Theoretical framework and proposed mechanisms

      The link between affective biases and negative thinking patterns is a bit unclear. The authors seem to make a causal claim that "affective biases are precipitated and maintained by negative thinking patterns", but it is unclear what precisely these negative patterns are; earlier in the same paragraph, they state that affective biases "cause low mood" and possibly shift choices toward those that maintain low mood. So the directionality of the mechanism here is unclear - possibly explaining a bit more of the cyclic nature of this mechanism, and maybe clarifying what "negative thinking patterns" refer to will be helpful.

      More generally, this link between affect and choices, especially given the modelling results later on, should be clarified further. What is the mechanism by which these two impact each other? How do the models of choice and affect ratings in the RL task test this mechanism? I'm not quite sure the paper answers these questions clearly right now.

      The authors also seem to implicitly make the claim that symptoms of mental ill-health are at least in part related to choice behavior. I find this a persuasive claim generally; however, it is understated and undersupported in the introduction, to the point where a reader may need to rely on significant prior knowledge to understand why mitigating the impact of affective biases on choice behavior would make sense as the target of therapeutic interventions. This is a core tenet of the paper, and it would be beneficial to clarify this earlier on.

      It would be helpful to interpret a bit more clearly the findings from 3.4. on decreased drift in all three subjective assessments in the cognitive distancing group. What is the proposed mechanism for this? The discussion mentions that "attenuated declines [...] over time, [add] to our previously reported findings that this psychotherapeutic technique alters aspects of reward learning" - but this is vague and I do not understand, if an explanation for how this happens is offered, what that explanation is. Given the strong correlation of the drift with fatigue, is the explanation that cognitive distancing mitigates affect drift under fatigue? Or is this merely reporting the result without an interpretation around potential mechanisms?

      (Relatedly, aside from possibly explaining the drift parameter, do the fatigue ratings link with choice behavior in any way? Is it possible that the cognitive distancing was helping participants improve choices under fatigue?)

      (2) Task Structure and Modelling

      It is unclear what counted as a "rewarding" vs. "unrewarding" trial in the model. From my understanding of the task description, participants obtained positive or no reward (no losses), and verbal feedback, Correct/Incorrect. But given the probabilistic nature of the task, it follows that even some correct choices likely had unrewarding results. Was the verbal feedback still "Correct" in those cases, but with no points shown? I did not see any discussion on whether it is the #points earned or the verbal feedback that is considered a reward in the model. I am assuming the former, but based on previous literature, likely both play a role; so it would be interesting - and possibly necessary to strengthen the paper's argument - to see a model that assigns value to positive/negative feedback and earned points separately.

      From a theory perspective, it's interesting that the authors chose to assume separate learning rates for rewarding and non-rewarding trials. Why not, for example, separate reward sensitivity parameters? E.g., rather than a scaling parameter on the PE, a parameter modifying the r term inside the PE equation to, perhaps, assign different values to positive and zero points? (While I think overall the math works out similarly at the fitting time, this type of model should be less flexible on scaling the expected value and more flexible on scaling the actual #points / the subjective experience of the obtained verbal feedback, which seems more in line with the theoretical argument made in the introduction). The introduction explicitly states that negative biases "may cause low mood by making outcomes appear less rewarding" - which in modelling equations seems more likely to translate to different reward-perception biases, and not different learning rates. Alternatively, one might incorporate a perseveration parameter (e.g., similar to Collins et al. 2014) that would also accomplish a negative bias. Either of these two mechanisms seems perhaps worth testing out in a model - especially in a model that defines more clearly what rewarding vs. unrewarding may mean to the participant.

      If I understand correctly, the affect ratings models assume that the Q-value and the PE independently impact rating (so they have different weights, w2 and w3), but there is no parameter allowing for different impact for perceived rewarding and unrewarding outcomes? (I may be misreading equations 4-5, but if not, Q-value and PE impact the model via static rather than dynamic parameters.) Given the joint RL-affect fit, this seems to carry the assumption that any perceptual processing differences leading to different subjective perceptions of reward associated with each outcome only impact choice behavior, but not affect? (whereas affect is more broadly impacted, if I'm understanding this correctly, just by the magnitude of the values and PEs?) This is an interesting assumption, and the authors seem to have tested it a bit more in the Supplementary material, as shown in Figure S4. I'm wondering why this was excluded from the main text - it seems like the more flexible model found some potentially interesting differences which may be worth including, especially as they might shed additional insight into the influence of cognitive distancing on the cyclical choice-affect mechanisms proposed.

      Minor comments:

      If fatigue ratings were strongly associated with drift in the best-fitting model (as per page 13), I wonder if it would make sense to use those fatigue ratings as a proxy rather than allow the parameter to vary freely? (This does not in any way detract from the winning model's explanatory power, but if a parameter seems to be strongly explained by a variable we have empirical data for, it's not clear what extra benefit is earned by having that parameter in the model).

    1. eLife Assessment

      This important study describes the development and validation of an Automated Reproducible Mechano-stimulator (ARM), a tool for standardizing and automating tactile behavior experiments. The data supporting the use of the ARM system are compelling, and demonstrate that by removing experimenter effects on animals, it reduces variability in various parameters of stimulus application. Moreover, the authors demonstrate that any noise emitted from the ARM does not induce an increased stress state. Once commercially available, the ARM system has the potential to increase experimental reproducibility between laboratories in the somatosentation and pain fields.

    2. Reviewer #1 (Public review):

      Allodynia is commonly measured in the pain field using von Frey filaments, which are applied to a body region (usually hindpaw if studying rodents) by a human. While humans perceive themselves as being objective, as the authors noted, humans are far from consistent when applying these filaments. Not to mention, odors from humans, including of different sexes, can influence animal behavior. There is thus a major unmet need for a way to automate this tedious von Frey testing process, and to remove humans from the experiment. I have no major scientific concerns with the study, as the authors did an outstanding job of comparing this automated system to human experimenters in a rigorous and quantitative manner. They even demonstrated that their automated system can be used in conjunction with in vivo imaging techniques.

      While it is somewhat unclear how easy and inexpensive this device will be, I anticipate everyone in the pain field will be clamoring to get their hands on a system like this. And given the mechanical nature of the device, and propensity for mice to urinate on things, I also wonder how frequently the device breaks/needs to be repaired. Perhaps some details regarding cost and reliability of the device would be helpful to include, as these are the two things that could make researchers hesitant to adopt immediately.

      The only major technical concern, which is easy to address, is whether the device generates ultrasounic sounds that rodents can hear when idle or operational, across the ultrasonic frequencies that are of biological relevance (20-110 kHz). These sounds are generally alarm vocalizations and can create stress in animals, and/or serve as cues of an impending stimulus (if indeed they are produced by the device).

      Comments on revisions:

      Was Fig. 1 updated with the new apparatus design? i.e. to address issue of animal waste affecting function over time?

      I have no further comments.

    3. Reviewer #2 (Public review):

      Summary:

      Burdge, Juhmka et al describe the development and validation of a new automated system for applying plantar stimuli in rodent somatosensory behavior tasks. This platform allows the users to run behavior experiments remotely, removing experimenter effects on animals and reducing variability in manual application of stimuli. The system integrates well with other automated analysis programs that the lab has developed, providing a complete package for standardizing behavior data collection and analysis. The authors present extensive validations of the system against manual stimulus application. Proof of concept studies also show how the system can be used to better understand the effect of experimenters on behavior and the effects of how stimuli are presented on the micro features of the animal withdrawal response.

      Strengths:

      If widely adopted, ARM has the potential to reduce variability in plantar behavior studies across and within labs and provide a means to standardize results. It provides a way to circumvent the confounds that humans bring into performing sensitive plantar behavior tests (e.g. experimenter odors, experince, physical abilities, variation in stimulus application, sex). Furthermore, it can be integrated with other automated platforms, allowing for quicker analysis and potentially automated stimulus delivery. The manuscript also presents some compelling evidence on the effects of stimulus application time and height on withdrawals, which can potentially help labs that are manually applying stimuli standardize applications. The system is well validated and the results are clear and convincingly presented. Claims are well supported by experimental evidence.

      Weaknesses:

      ARM seems like a fantastic system that could be widely adopted, a primary weakness is that it is not currently available to other labs. This will eventually be remedied as it is commercialised.

    4. Reviewer #3 (Public review):

      Summary:

      This report describes the development and initial applications of the ARM (Automated Reproducible Mechano-stimulator), a programmable tool that delivers various mechanical stimuli to a select target (most frequently, a rodent hindpaw). Comparisons to traditional testing methods (e.g., experimenter application of stimuli) reveal that the ARM reduces variability in the anatomical targeting, height, velocity, and total time of stimulus application. Given that the ARM can be controlled remotely, this device was also used to assess effects of experimenter presence on reflexive responses to mechanical stimulation. Although not every experimenter had notable sex-dependent effects on animal behavior, use of the ARM never had this effect (for obvious reasons!). Lastly, the ARM was used to stimulate rodent hindpaws while measuring neuronal activity in the basolateral nucleus of the amygdala (BLA), a brain region that is associated with the negative affect of pain. This device, and similar automated devices, will undoubtedly reduce experimenter-related variability in reflexive mechanical behavior tests; this may increase experimental reproducibility between laboratories who are able to invest in this type of technology.

      Strengths:

      Clear examples of variability in experimenter stimulus application are provided and then contrasted with uniform stimulus application that is inherent to the ARM.

      The ARM is able to quickly oscillate between delivery of various mechanical stimuli; this is advantageous for experimental efficiency.

      New additions to the ARM and PAWS platforms have been methodically tested to ensure reproducibility and reliability.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      (1) Given the mechanical nature of the device and the propensity for mice to urinate on things, I also wonder how frequently the device breaks/needs to be repaired. Perhaps some details regarding the cost and reliability of the device would be helpful to include, as these are the two things that could make researchers hesitant to adopt immediately.

      We thank the reviewer for their astute observations. We also noted the problem of mouse waste and incorporated this concern into the redesign we mention in the text.

      “Mouse waste getting on mechanical parts was found to be a major concern for the initial version of the device. As part of the redesign, the linear stages were moved out from under the mice to avoid this problem. Despite this problem, the original version of the device has not had any of its stages break down yet. A common problem though was that stimulus tips would blunt or break if they hit the mesh of the mesh table, requiring replacement. This has been solved in the latest version through a new feature where the mesh is detected via the force sensor, prompting immediate stimulus withdrawal, avoiding damage.”

      In regards to cost and adoption, we have added this sentence to the final line of the discussion:

      “To promote wide adaptation of this device across as many labs as possible, a company, Tactorum Inc., has been formed.”

      (2) The only major technical concern, which is easy to address, is whether the device generates ultrasonic sounds that rodents can hear when idle or operational, across the ultrasonic frequencies that are of biological relevance (20-110 kHz). These sounds are generally alarm vocalizations and can create stress in animals, and/or serve as cues of an impending stimulus (if indeed they are produced by the device).

      The reviewer brings up an interesting question. The ARM does not make a lot of noise, but some of the noise it emits does range into the 20-110 kHz range, though besides this does not qualitatively have other similarities to a mouse vocalization. Based on this we tested whether the noise produced by the ARM causes stress in naïve mice.

      “A concern was raised that the noise of the ARM may cause stress in the mice tested. To test this, the open field test was performed with naïve mice (n=10) 2 feet from the ARM while the ARM either sat silent or ran through its habituation program, producing noise. The mouse's center point movement was then tracked in relation to the chamber, its edges, and center. No significant differences were found in distance traveled, center entrances, center, time in center, and latency to center entrance based on a student’s two-tailed t-test (Figure S1D-G). Based on this, neither stress nor locomotion differences were detected by this test, indicating the ARM does not induce an increased stress state due to its noise, even in non-habituated mice.”

      (3) This sentence in the intro may be inaccurate: "or the recent emergence of a therapeutic targeting voltage-gated sodium channels, that block pain in both rodents and humans such as VX-548 for NaV1.8 (Jones 2023)" Despite extensive searching, I have been unable to find a reference showing that VX-548 is antinociceptive in rodents (rats or mice). As for why this is the case, I do not know. One speculation: this drug may be selective for the human Nav1.8 channel (but again, I have found no references comparing specificity on human vs rodent Nav1.8 channels). To not mislead the field into thinking VX-548 works for rodents and humans, please remove "both rodents and" from the sentence above (unless you find a reference supporting VX-548 as being effective in pain assays with rodents. There is a PK/PD paper with rodents, but that only looks at drug metabolism, not efficacy with pain assays).

      We agree with the reviewer and have removed mention of the new Nav1.8 therapeutic also working in rodents.

      (4) In the intro paragraph where variability in measuring mechanical stimuli is described, there is a new reference from the Stucky lab that further supports the need for an automated way to measure allodynia, as they also found variability between experimenters. This would be a relevant reference to include: Rodriguez Garcia (2024) PMID: 38314814.

      Thanks to the reviewer for this relevant citation and we have updated the text to incorporate this:

      “Recent studies utilizing the manual highspeed analysis of withdrawal behavior analysis developed by Abdus-Saboor et al. 2019 has reproduced this sizable experimenter effect using the new technique. (Rodríguez García 2024)”

      (5) "a simple sin wave motion": should be "sine", correct throughout (multiple instances of "sin")

      Corrections made where relevant.

      Reviewer #2 (Public review):

      (1) ARM seems like a fantastic system that could be widely adopted, but no details are given on how a lab could build ARM, thus its usefulness is limited.

      The reviewer raises a good point, unfortunately the authors are constrained by university policies around patent law. That said efforts are being made to make the ARM widely available to interested researchers. As mentioned above to Reviewer 1’s comments, we end the discussion section with this sentence:

      “To promote wide adaptation of this device across as many labs as possible, a company, Tactorum Inc., has been formed.”

      (2) The ARM system appears to stop short of hitting the desired forces that von Frey filaments are calibrated toward (Figure 2). This may affect the interpretation of results.

      The reviewer gives an important observation. We amended the text to include more clarity on the max forces induced, and comments on causes beyond the delivery mechanism. It should be noted that a newly bought fresh set of von Frey’s was used.

      “With the same 1.4 and 2 g von Frey filaments Researcher 1 delivered max average forces of 1.5 g and 2.7 g, and Researcher 2 1.35 g and 2.4 g. The ARM delivered average max forces closest to the targeted forces, with 1.36 g and 1.9 g. (Figure 2C) Some of the error observed could be due to the error rate (+/- 0.05 g) in the force gauge and the von Frey set used.”

      (3) The authors mention that ARM generates minimal noise; however, if those sounds are paired with stimulus presentation, they could still prompt a withdrawal response. Including some 'catch' trials in an experiment could test for this.

      The reviewer makes a very useful suggestion that we incorporated into our carrageenan experiments. This new data can be found in Supplemental Figure 3F.

      “For the carrageenan model, three replicates of the force ramp stimulus were delivered to each paw, and catch trials were performed every 3<sup>rd</sup> trial to test whether the mice would respond to the noise of the ARM alone. During catch trials, the stimulus was delivered to the open air behind the mouse, and any movement within 5 seconds of stimulus delivery was counted as a response. These trials found a 96% response rate in true trials, with only a 7% rate in catch trials, indicating responses were not being driven by device noise.”

      (4) The experimental design in Figure 2 is unclear- did each experimenter have their own cohort of 10 mice, or was a single cohort of mice shared? If shared, there's some concern about repeat testing.

      Further clarification was added to avoid confusion on the methods used here.

      “Separate cohorts of 10 mice were used for ARM and manual delivery, with a week given between each researcher to avoid sensitization.”

      (5) In Figure 5 and S4, the order of the legends does not match the order of the graphs. This can be particularly confusing as the color scheme is not colorblind-friendly. Please consider revising the presentation of these figures.

      Corrections made where relevant.

      Reviewer #3 (Public review):

      (1) Limited details are provided for statistical tests and inappropriate claims are cited for individual tests. For example, in Figure 2, differences between researchers at specific forces are reported to be supported by a 2-way ANOVA; these differences should be derived from a post-hoc test that was completed only if the independent variable effects (or interaction effect) were found to be significant in the 2-way ANOVA. In other instances, statistical test details are not provided at all (e.g., Figures 3B, 3C, Figure 4, Figure 6G).

      We would like to thank the reviewer for pointing out the lack of clarity in the text on these statistical methods. We have added further details across the manuscript and shown below here in order to address this concern.

      “Both manual delivery and the ARM produced significant paw withdrawal percentage curves, a standard traditional measurement of mechanical sensitivity in the field (von Frey 1896, Dixon 1980, Chaplan 1994)(Figure 2E), with a 2-way ANOVA and a posthoc Tukey test detecting significant increases in comparing the 3 lower force VFH’s (0.02g, 0.07g, 0.16g) to the 2 highest force VFH’s (1g, 1.4g). This demonstrates that the ARM delivers results comparable to highly experienced researchers. However, a 2-way ANOVA and a posthoc Tukey test found that Researcher 2 elicited a significantly higher (p=0.0008) paw withdrawal frequency than Researcher 1 (Figure S2A) which corresponded with Researcher 2’s higher VFH application time as measured by the force sensor (Figure 2B).”

      “Adjustments were then made to the PAWS software to automate the measurement of withdrawal latency based on pose tracking data of the withdrawal response and the trajectory of the stimulus delivery encoded into the ARM. Testing of C57/BL6J (n=15) at baseline found significant decreases in withdrawal latency for pinprick compared to cotton swab stimuli delivered in identical ways by the ARM (Figure 3B) based on a 2-tailed student t-test.”

      “Mice injected with carrageenan (n=15) showed elevated shaking behavior (p=0.0385, 2-way ANOVA and a posthoc Tukey test) in response to pinprick stimuli in comparison to measurements at baseline (Figure 3C).”

      “Remote habituated mice showed a significant decrease (p=0.0217, 2-way ANOVA) in time to rest over the 3 days (Figure 4B), but no significant differences for any single day. The number of turns was measured for each group during the first 10 minutes of day 1 to act as a baseline, and then from 20 to 30 minutes for each day. Turn counts were then compared as a percentage of the baseline count for each group. This period was chosen as it the period when experiments start after the day of habituation on experimental days. It was found that remote-habituated mice showed significantly less turning on day 2 compared to mice habituated with a researcher present (p=0.024, 2-way ANOVA posthoc Tukey test), and that only the remote-habituated mice showed significantly decreased turning behavior on day 3 compared to day 1 (p=0.0234, 2-way ANOVA posthoc Tukey test) (Figure 4C).”

      “Sex-dependent differences were found in reflexive and affective behavioral components of the mouse withdrawal response when a researcher was present versus not for both reactions to innocuous and noxious stimuli. A 2-way ANOVA and a posthoc Tukey test found that cotton swab stimuli elicited increased male mouse reflexive paw withdrawal features, including max paw height (p=0.0413) and max paw velocity (Y-axis) (p=0.0424) when Researcher 1 was present compared to when no researcher was present (Figure 4E-F). Pinprick stimuli (Figure 4H-I) on the other hand led to increased max paw height (p=0.0436) and max paw velocity (Y-axis) (p=0.0406) in male mice compared to female mice when Researcher 1 was present.

      Analysis of the shaking behavior elicited by cotton swab and pinprick stimuli found no significant differences in shaking behavior duration (Figure 4SA-B) but found sex-dependent differences in paw distance traveled after the initial withdrawal, including during shaking and guarding behaviors. For cotton swab (Figure 4G) male mice showed significantly increased paw distance traveled compared to female mice when Researcher 2 was present (p=0.0468, 2-way ANOVA posthoc Tukey test) but not when Researcher 2 was present or no researcher was present. Pinprick stimuli also elicited sex-based increases in paw distance traveled (Figure 4J) in male mice when Researcher 2 was present compared to both male mice when no researcher was present (p=0.0149, 2-way ANOVA posthoc Tukey test) and female mice when Researcher 1 was present (p=0.0038, 2-way ANOVA posthoc Tukey test).”

      (2) In the current manuscript, the effects of the experimenter's presence on both habituation time and aspects of the withdrawal reflex are minimal for Researcher 2 and non-existent for Research 1. This is surprising given that Researcher 2 is female; the effect of experimenter presence was previously documented for male experiments as the authors appropriately point out (Sorge et al. PMID: 24776635). In general, this argument could be strengthened (or perhaps negated) if more than N=2 experiments were included in this assessment.

      The reviewer makes an important point regarding this data and the need for further experiments. We designed a new set of experiments to examine the effect of male and female researchers overall. It should be noted that this is rather noisy data given it was collected by three sets of male and female researchers over 3 weeks. That said a significant difference was found between mouse sexes when a male researcher was present. This is consistent with previous data, but as we discuss this does not invalidate previous data as researcher gender appears to be only one of the factors at work in researcher presence effects on mouse behavior, leading to individuals having the potential for greater or lesser effects than their overall gender. Our new results can be found in Figure 4K.

      “These results indicate that researcher presence at baseline can lead to significant differences in reflexive and affective pain behavior. In this case, male mice showed increased behavioral responses to both touch and pain behavior depending on whether the researcher was present. This led to sex differences in the affective and reflexive component of the withdrawal response when a researcher is present, which disappears when no researcher is present, or a different researcher is present. For this set of researchers, the female researcher elicited the greater behavioral effect. This appeared at first to contradict previous findings (Sorge 2024, Sorge 2014), but it was hypothesized that the effect of an individual researcher could easily vary compared to their larger gender group. To test this, 6 new researchers, half male and half female, were recruited and a new cohort of mice (n=15 male, n=15 female) was tested in each of their presence over the course of 3 weeks, controlling for circadian rhythms (Figure 4K). The newly added force ramp stimulus type was used for these experiments, with three replicates per trial, to efficiently measure mechanical threshold in a manner comparable to previous work. It was found that female mice showed significantly decreased mechanical threshold compared to male mice (p=0.034, Šídák's multiple comparisons test and student’s t-test) when a male researcher was present. This did not occur when a female researcher or no researcher was present. In the latter case of slight trend towards this effect was observed, but it was not significant (p=0.21), and may be the result of a single male researcher being responsible for handling and setting up the mice for all experiments.”

      “These findings indicate that sex-dependent differences in evoked pain behavior can appear and disappear based on which researcher/s are in the room. There is a trend towards male researchers overall having a greater effect, but individuals may have a greater or lesser effect on mouse behavior, independent of the gender or sex. This presents a confound that must be considered in the analysis of sex differences in pain and touch behavior which may explain some of the variation in findings from different researchers. Together, these results suggest that remote stimulus delivery may be the best way to eliminate variation caused by experimenter presence while making it easier to compare with data from researchers in your lab and others.”

      (3) The in vivo BLA calcium imaging data feel out of place in this manuscript. Is the point of Figure 6 to illustrate how the ARM can be coupled to Inscopix (or other external inputs) software? If yes, the following should be addressed: why do the up-regulated and down-regulated cell activities start increasing/decreasing before the "event" (i.e., stimulus application) in Figure 6F? Why are the paw withdrawal latencies and paw distanced travelled values in Figures 6I and 6J respectively so much faster/shorter than those illustrated in Figure 5 where the same approach was used?

      Thanks to the reviewer for bringing up this concern. We have included further text discussing this behavioral data and how it compares to previous work in this study.

      “Paw height and paw velocity were found to be consistent with data from figures 4E-I (male researcher and male mice) and 5C (stimulus intensity 2.5 and 4.5) for similar data, with slightly elevated measures of paw distance traveled and decreased paw withdrawal latency for the pinprick stimulus. This was likely caused by sensitization due to multiple stimulus deliveries over the course of the experiment, as due to logistics, 30 stimulus trials were delivered per session due to logistical constraints vs the max of 3 that were performed during previous experiments.”

      “This data indicates that the ARM is an effective tool for efficiently correlating in vivo imaging data with evoked behavioral data, including sub-second behavior. One limitation is that the neural response appears to begin slightly before stimulus impact (Figure 6F, 6SB). This was likely caused by a combination of the imprecise nature of ARM v1 paw contact detection and slight delays in the paw contact signal reaching the Inscopix device due to flaws in the software and hardware used, slowing down the signal. Improvements have been made to eliminate this delay as part of the ARM v2, which have been shown to eliminate this delay in vivo fiber photometry data recorded as part of new projects using the device.”

      (4) Another advance of this manuscript is the integration of a 500 fps camera (as opposed to a 2000 fps camera) in the PAWS platform. To convince readers that the use of this more accessible camera yields similar data, a comparison of the results for cotton swabs and pinprick should be completed between the 500 fps and 2000 fps cameras. In other words, repeat Supplementary Figure 3 with the 2000 fps camera and compare those results to the data currently illustrated in this figure.

      The reviewer makes a good point about the need for direct comparison between 500 fps and 2000 fps data. To address this we added data from same mice, from 2 weeks prior with a comparable set up. These new results can be found in Supplemental Figure 3.

      “Changes were made to PAWS to make it compatible with framerates lower than 2000 fps. This was tested using a 0.4 MP, 522 FPS, Sony IMX287 camera recording at 500 fps, and data recorded at 2000 fps by the previously used photron fastcam (Figure 3SC-F). The camera paired with PAWS was found to be sufficient to separate between cotton swab and pinprick withdrawal responses, suggesting it may be a useful tool for labs that cannot invest in a more expensive device. PAWS features measured from 500 fps video data were not significantly different from the 2000 fps data based on a 2 way ANOVA.”

      (5) In Figure 2F, the authors demonstrate that a von Frey experiment can be completed much faster with the ARM vs. manually. I don't disagree with that fact - the data clearly show this. I do, however, wonder if the framing of this feature is perhaps too positive; many labs wait > 30 s between von Frey filament applications to prevent receptive field sensitization. The fact that an entire set of ten filaments can be applied in < 50 s (< 3 s between filaments given that each filament is applied for 2 s), while impressive, may never be a feature that is used in a real experiment.

      The reviewer makes an important point about how different researchers perform these tests and the relevant timings. We have moderated the framing of these results to address this concern.

      “Further, we found that the ARM decreased the time needed to apply a stimulus 10 times to a mouse paw by 50.9% compared to manual delivery (Figure 2F). This effect size may decrease for researchers who leave longer delays between stimulus delivery, but the device should still speed up experiments by reducing aiming time and allowing researchers to quickly switch to a new mouse while waiting for the first.”

      (6) Why are different affective aspects of the hindpaw withdrawal shown in different figures? For example, the number of paw shakes is shown in Figure 3C, whereas paw shaking duration is shown in Figure 5D. It would be helpful - and strengthen the argument for either of these measures as being a reproducible, reliable measure of pain - if the same measure was used throughout.

      Thanks to the reviewer for pointing out this discrepancy. We have adjusted the figures and text to only use the Number of Paw Shakes for better consistency (Figure 5D and Figure 5-figure supplement 1C).

      (7) Is the distance the paw traveled an effective feature of the paw withdrawal (Figure 5E)? Please provide a reference that supports this statement.

      A relevant citation and discussion of this metric based on previous studies has been added.

      “Mice injected with carrageenan (n=15) showed elevated shaking behavior (p=0.0385) in response to pinprick stimuli in comparison to measurements at baseline (Figure 3C). This aligned with previous findings where PAWS has detected elevations in shaking and/or guarding behavior, examples of affective pain behavior, and post-peak paw distance traveled, which correlates with these behaviors in carrageenan pain models and has been to found to be a good measure of them in past studies (Bohic et al. 2023).”

      (8) Dedek et al. (PMID: 37992707) recently developed a similar robot that can also be used to deliver mechanical stimuli. The authors acknowledge this device's ability to deliver optogenetic and thermal stimuli but fail to mention that this device can deliver mechanical stimuli in a similar manner to the device described in this paper, even without experimenter targeting. Additional discussion of the Dedek et al. device is warranted.

      We would like to thank the reviewer for identifying  this omission. Discussion of this as well as further discussion of Dedek et al.’s automation prototyping work has been added.

      “Previous attempts at automating mechanical stimulus delivery, including the electronic von Frey (Martinov 2013) and dynamic plantar asthesiometer (Nirogi 2012), have focused on eliminating variability in stimulus delivery. In contrast to the ARM, both of these devices rely upon a researcher being present to aim or deliver the stimulus, can only deliver vFH-like touch stimuli, and only measure withdrawal latency/force threshold. Additionally, progress has been made in automating stimulus assays by creating devices with the goal of delivering precise optogenetic and thermal stimuli to the mouse’s hind paw (Dedek 2023, Schorscher-Petchu 2021). The Prescott team went farther and incorporated a component into their design to allow for mechanical stimulation but this piece appears to be limited to a single filament type that can only deliver a force ramp. As a result these devices and those previously discussed lack of customization for delivering distinct modalities of mechanosensation that the ARM allows for. Moreover, in its current form the automated aiming of some of these devices may not provide the same resolution or reliability of the ARM in targeting defined targets (Figure 1C), such as regions of the mouse paw that might be sensitized during chronic pain states. Due to the nature of machine learning pose estimation, substantial work, beyond the capacity of a single academic lab, in standardizing the mouse environment and building a robust model based on an extensive and diverse training data set will be necessary for automated aiming to match the reliability or flexibility of manual aiming. That said, we believe this work along with that of that of the other groups mentioned has set the groundwork from which a new standard for evoked somatosensory behavior experiments in rodents will be built.”

      (9) Page 2: von Frey's reference year should be 1896, not 1986.

      This typo has been fixed, thanks to the reviewer for noting it.

      “For more than 50 years, these stimuli have primarily been the von Frey hair (vFH) filaments that are delivered to the mouse paw from an experimenter below the rodent aiming, poking, and subsequently recording a paw lift (von Frey 1896, Dixon 1980, Chaplan 1994).”

      (10) Page 2: Zumbusch et al. 2024 also demonstrated that experimenter identification can impact mechanical thresholds, not just thermal thresholds.

      Text has been updated in order to note this important point.

      “A meta-analysis of thermal and mechanical sensitivity testing (Chesler 2002, Zumbusch 2024) found that the experimenter has a greater effect on results than the mouse genotype, making data from different individual experimenters difficult to merge.”

      (11) Page 2: One does not "deliver pain in the periphery". Noxious stimuli or injury can be delivered to the periphery, but by definition, pain is a sensation that requires a central nervous system.

      Text has been updated for improved accuracy.

      “Combining approaches to deliver painful stimuli with techniques mapping behavior and brain activity could provide important insights into brain-body connectivity that drives the sensory encoding of pain.”

    1. eLife Assessment

      This paper discusses the cognitive implications of potential intentional burial, wall engraving creation, and fire as light source use behaviors by relatively small-brained Homo naledi hominins. The discussion presented in the paper is valuable theoretically in its healthy questioning of prior assumptions concerning the socio-biological constraints of hominin meaning-making behavior. The discussion also contributes practically given that these behaviors have been ascribed to Homo naledi in two associated papers. Still, the strength of evidence in this contribution relies on the validity of the conclusions from the two associated papers, which remain actively questioned. The ultimate assessment of this work will vary among individual readers depending on how they view this debate, but if the conclusions from the associated papers hold up, the conclusions in the current paper can be considered solid.

    1. eLife Assessment

      This manuscript introduces a useful protein-stability-based fitness model for simulating protein evolution and unifying non-neutral models of molecular evolution with phylogenetic models. The model is applied to five viral proteins that are of structural and functional importance. While the general modelling approach is solid, and effectively preserves folding stability, the evidence for the model's predictive power remains limited, since it shows little improvement over neutral models in predicting protein evolution. The work should be of interest to researchers developing theoretical models of molecular evolution.

    2. Reviewer #1 (Public review):

      Summary:

      Ferreiro et al. present a method to simulate protein sequence evolution under a birth-death model where sequence evolution is guided by structural constraints on protein stability. The authors then use this model to explore the predictability of sequence evolution in several viral proteins. In principle, this work is of great interest to molecular evolution and phylodynamics, which has struggled to couple non-neutral models of sequence evolution to phylodynamic models like birth-death processes. Unfortunately, though, the model shows little improvement over neutral models in predicting protein sequence evolution, although it can predict protein stability better than models assuming neutral evolution. It appears that more work is needed to determine exactly what aspects of protein sequence evolution are predictable under such non-neutral phylogenetic models.

      Major concerns:

      (1) The authors have clarified the mapping between birth-death model parameters and fitness, but how fitness is modeled still appears somewhat problematic. The authors assume the death rate = 1 - birth rate. So a variant with a birth rate b = 1 would have a death rate d = 0 and so would be immortal and never die, which does not seem plausible. Also I'm not sure that this would "allow a constant global (birth-death) rate" as stated in line 172, as selection would still act to increase the population mean growth rate r = b - d. It seems more reasonable to assume that protein stability affects only either the birth or death rate and assume the other rate is constant, as in the Neher 2014 model.

      (2) It is difficult to evaluate the predictive performance of protein sequence evolution. This is in part due to the fact that performance is compared in terms of percent divergence, which is difficult to compare across viral proteins and datasets. Some protein sequences would be expected to diverge more because they are evolving over longer time scales, under higher substitution rates or under weaker purifying selection. It might therefore help to normalize the divergence between predicted and observed sequences by the expected or empirically observed amount of divergence seen over the timescale of prediction.

      (3) Predictability may also vary significantly across different sites in a protein. For example, mutations at many sites may have little impact on structural stability (in which case we would expect poor predictive performance) while even conservative changes at other sites may disrupt folding. I therefore feel that there remains much work to be done here in terms of figuring out where and when sequence evolution might be predictable under these types of models, and when sequence evolution might just be fundamentally unpredictable due to the high entropy of sequence space.

    3. Reviewer #2 (Public review):

      In this study, the authors aim to forecast the evolution of viral proteins by simulating sequence changes under a constraint of folding stability. The central idea is that proteins must retain a certain level of structural stability (quantified by folding free energy, ΔG) to remain functional, and that this constraint can shape and restrict the space of viable evolutionary trajectories. The authors integrate a birth-death population model with a structurally constrained substitution (SCS) model and apply this simulation framework to several viral proteins from HIV-1, SARS-CoV-2, and influenza.

      The motivation to incorporate biophysical constraints into evolutionary models is scientifically sound, and the general approach aligns with a growing interest in bridging molecular evolution and structural biology. The authors focus on proteins where immune pressure is limited and stability is likely to be a dominant constraint, which is conceptually appropriate. The method generates sequence variants that preserve folding stability, suggesting that stability-based filtering may capture certain evolutionary patterns.

      However, the study does not substantiate its central claim of forecasting. The model does not predict future sequences with measurable accuracy, nor does it reproduce observed evolutionary paths. Validation is limited to endpoint comparisons in a few datasets. While KL divergence is used to compare amino acid distributions, this analysis is only applied to a single protein (HIV-1 MA), and there is no assessment of mutation-level predictive accuracy or quantification of how well simulated sequences recapitulate real evolutionary paths. No comparison is made to real intermediate variants available from extensive viral sequencing datasets which gather thousands of sequences with detailed collection date annotation (SARS-CoV-2, Influenza, RSV).

      The selection of proteins is narrow and the rationale for including or excluding specific proteins is not clearly justified.

      The analyzed datasets are also under-characterized: we are not given insight into how variable the sequences are or how surprising the simulated sequences might be relative to natural diversity. Furthermore, the use of consensus sequences to represent timepoints is problematic, particularly in the context of viral evolution, where divergent subclades often coexist - a consensus sequence may not accurately reflect the underlying population structure.

      The fitness function used in the main simulations is based on absolute ΔG and rewards increased stability without testing whether real evolutionary trajectories tend to maintain, increase, or reduce folding stability over time for the particular systems (proteins) that are studied. While a variant of the model does attempt to center selection around empirical ΔG values, this more biologically plausible version is underutilized and not well validated.

      Ultimately, the model constrains sequence evolution to stability-compatible trajectories but does not forecast which of these trajectories are likely to occur. It is better understood as a filter of biophysically plausible outcomes than as a predictive tool. The distinction between constraint-based plausibility and sequence-level forecasting should be made clearer. Despite these limitations, the work may be of interest to researchers developing simulation frameworks or exploring the role of protein stability in viral evolution, and it raises interesting questions about how biophysical constraints shape sequence space over time.

    4. Author response:

      The following is the authors’ response to the current reviews.

      Reviewer #1 (Public review): 

      Summary: 

      Ferreiro et al. present a method to simulate protein sequence evolution under a birth-death model where sequence evolution is guided by structural constraints on protein stability. The authors then use this model to explore the predictability of sequence evolution in several viral proteins. In principle, this work is of great interest to molecular evolution and phylodynamics, which has struggled to couple non-neutral models of sequence evolution to phylodynamic models like birth-death processes. Unfortunately, though, the model shows little improvement over neutral models in predicting protein sequence evolution, although it can predict protein stability better than models assuming neutral evolution. It appears that more work is needed to determine exactly what aspects of protein sequence evolution are predictable under such non-neutral phylogenetic models. 

      We thank the reviewer for the positive comments about our work. We agree that further work is needed in the field of substitution models of molecular evolution to enable more accurate predictions of specific amino acid sequences in evolutionary processes.

      Major concerns: 

      (1) The authors have clarified the mapping between birth-death model parameters and fitness, but how fitness is modeled still appears somewhat problematic. The authors assume the death rate = 1 - birth rate. So a variant with a birth rate b = 1 would have a death rate d = 0 and so would be immortal and never die, which does not seem plausible. Also I'm not sure that this would "allow a constant global (birth-death) rate" as stated in line 172, as selection would still act to increase the population mean growth rate r = b - d. It seems more reasonable to assume that protein stability affects only either the birth or death rate and assume the other rate is constant, as in the Neher 2014 model. 

      The model proposed by Neher, et al. (2014), which incorporates a death rate (d) higher than 0 for any variant, was implemented and applied in the present method. In general, this model did not yield results different from those obtained using the model that assumes d = 1 – b, suggesting that this aspect may not be crucial for the study system. Next, the imposition of arbitrary death events based on an arbitrary death rate could be a point of concern. Regarding the original model, a variant with d = 0 can experience a decrease in fitness through the mutation process. In an evolutionary process, each variant is subject to mutation, and Markov models allow for the incorporation of mutations that decrease fitness (albeit with lower probability than beneficial ones, but they can still occur). All this information is included in the manuscript.

      (2) It is difficult to evaluate the predictive performance of protein sequence evolution. This is in part due to the fact that performance is compared in terms of percent divergence, which is difficult to compare across viral proteins and datasets. Some protein sequences would be expected to diverge more because they are evolving over longer time scales, under higher substitution rates or under weaker purifying selection. It might therefore help to normalize the divergence between predicted and observed sequences by the expected or empirically observed amount of divergence seen over the timescale of prediction. 

      AU: The study protein datasets showed different levels of sequence divergence over their evolutionary times, as indicated for each dataset in the manuscript. For some metrics, we evaluated the accuracy (or error) of the predictions through direct comparisons between real and predicted protein variants using percentages to facilitate interpretation: 0% indicates a perfect prediction (no error), while 100% indicates a completely incorrect prediction (total error). Regarding normalization of these evaluations, we respectfully disagree with the suggestion because diverse factors can affect (not only the substitution rate, but also the sample size, structural features of the protein that may affect stability when accommodating different sequences, among others) and this complicates defining a consistent and meaningful normalization criterion. Given that the manuscript provides detailed information for each dataset, we believe that the presentation of the prediction accuracy through direct comparisons between real and predicted protein variants, expressed as percentages of similarity, is the clearest way.

      (3) Predictability may also vary significantly across different sites in a protein. For example, mutations at many sites may have little impact on structural stability (in which case we would expect poor predictive performance) while even conservative changes at other sites may disrupt folding. I therefore feel that there remains much work to be done here in terms of figuring out where and when sequence evolution might be predictable under these types of models, and when sequence evolution might just be fundamentally unpredictable due to the high entropy of sequence space. 

      We agree with this reflection. Mutations can have different effects on folding stability, which are accounted for by the model presented in this study. However, accurately predicting the exact sequences of protein variants with similar stability remains difficult with current structurally constrained substitution models, and therefore, further work is needed in this regard. This aspect is indicated in the manuscript.

      We want to thank the reviewer again for taking the time to revise our work and for the insightful and helpful comments.

      Reviewer #2 (Public review): 

      In this study, the authors aim to forecast the evolution of viral proteins by simulating sequence changes under a constraint of folding stability. The central idea is that proteins must retain a certain level of structural stability (quantified by folding free energy, ΔG) to remain functional, and that this constraint can shape and restrict the space of viable evolutionary trajectories. The authors integrate a birth-death population model with a structurally constrained substitution (SCS) model and apply this simulation framework to several viral proteins from HIV-1, SARS-CoV-2, and influenza.

      The motivation to incorporate biophysical constraints into evolutionary models is scientifically sound, and the general approach aligns with a growing interest in bridging molecular evolution and structural biology. The authors focus on proteins where immune pressure is limited and stability is likely to be a dominant constraint, which is conceptually appropriate. The method generates sequence variants that preserve folding stability, suggesting that stability-based filtering may capture certain evolutionary patterns. 

      Correct. We thank the reviewer for the positive comments about our study.

      However, the study does not substantiate its central claim of forecasting. The model does not predict future sequences with measurable accuracy, nor does it reproduce observed evolutionary paths. Validation is limited to endpoint comparisons in a few datasets. While KL divergence is used to compare amino acid distributions, this analysis is only applied to a single protein (HIV-1 MA), and there is no assessment of mutation-level predictive accuracy or quantification of how well simulated sequences recapitulate real evolutionary paths. No comparison is made to real intermediate variants available from extensive viral sequencing datasets which gather thousands of sequences with detailed collection date annotation (SARS-CoV-2, Influenza, RSV). 

      There are several points in this comment.

      The presented method accurately predicts folding stability of forecasted variants, as shown through comparisons between real and predicted protein variants. However, as the reviewer correctly indicates, predicting the exact amino acid sequences remains challenging. This limitation is discussed in detail in the manuscript, where we also suggest that further improvements in substitution models of protein evolution are needed to better capture the evolutionary signatures of amino acid change at the sequence level, even between amino acids with similar physicochemical properties. Regarding the time points used for validation, the studied influenza NS1 dataset included two validation points. A key limitation in increasing the number of time points is the scarcity of datasets derived from monitoring protein evolution with sufficient molecular diversity between samples collected at consecutive time points (i.e., at least more than five polymorphic amino acid sites). 

      As described in the manuscript, calculating Kullback-Leibler (KL) divergence requires more than one sequence per studied time point. However, most datasets in the literature include only a single sequence per time point, typically a consensus sequence derived from bulk population sequencing. Generating multiple sequences per time point is experimentally more demanding, often requiring advanced methods such as single-virus sequencing or amplification of sublineages in viral subpopulations, as was done for the first dataset used in the study (Arenas, et al. 2016), which enabled the calculation of KL divergence. The extent to which the simulated sequences resemble real evolution is evaluated in the method validation. As noted, intermediate time point validation was performed using the influenza NS1 protein dataset. Although, as the reviewer indicates, thousands of viral sequences are available, these are usually consensus sequences from bulk sequencing. Indeed, many viral variants mainly differ through synonymous mutations, where the number of accumulated nonsynonymous mutations is small. For example, from the original Wuhan strain to the Omicron variant, the SARS-CoV-2 proteins Mpro and PLpro accumulated only 10 and 22 amino acid changes, respectively.

      Analyzing intermediate variants of concern (i.e., Gamma or Delta) would reduce this number affecting statistics. In addition, many available viral sequences are not consecutive in evolutionary terms (one dataset does not represent the direct origin of another dataset at a subsequent time point), which further limits their applicability in this study. There is little data from monitored protein evolution with consecutive samples. The most suitable studies usually involve in vitro virus evolution, but the data from these studies often show low genetic variability between samples collected at different time points. Finally, it is important to note that the presented method can only be applied to proteins with known 3D structures, as it relies on selection based on folding stability. Non-structural proteins cannot be analyzed using this approach. Future work could incorporate additional selection constraints, which may improve the accuracy of predictions. These considerations and limitations are indicated in the manuscript.

      The selection of proteins is narrow and the rationale for including or excluding specific proteins is not clearly justified. 

      The viral proteins included in the study were selected based on two main criteria, general interest and data availability. In particular, we included proteins from viruses that affect humans and for which data from monitored protein evolution, with sufficient molecular diversity between consecutive time points, is available. These aspects are indicated in the manuscript.

      The analyzed datasets are also under-characterized: we are not given insight into how variable the sequences are or how surprising the simulated sequences might be relative to natural diversity. Furthermore, the use of consensus sequences to represent timepoints is problematic, particularly in the context of viral evolution, where divergent subclades often coexist - a consensus sequence may not accurately reflect the underlying population structure. 

      The manuscript indicates the sequence identity among protein datasets of different time points, along with other technical details. Next, the evaluation based on comparisons between simulated and real sequences reflects how surprising the simulated sequences might be relative to natural diversity, considering that the real dataset is representative. We believe that the diverse study real datasets are useful to evaluate the accuracy of the method in predicting different molecular patterns. Regarding the use of consensus sequences, we agree that they provide an approximation. However, as previously indicated, most of the available data from monitored protein evolution consist of consensus sequences obtained through bulk sequencing. Additionally, analyzing every individual viral sequence within a viral population, which is typically large, would be ideal but computationally intractable.

      The fitness function used in the main simulations is based on absolute ΔG and rewards increased stability without testing whether real evolutionary trajectories tend to maintain, increase, or reduce folding stability over time for the particular systems (proteins) that are studied. While a variant of the model does attempt to center selection around empirical ΔG values, this more biologically plausible version is underutilized and not well validated.

      The applied fitness function, based on absolute ΔG, is well stablished in the field (Sella and Hirsh 2005; Goldstein 2013). The present study independently predicts ΔG for the real and simulated protein variants at each sampling point. This ΔG prediction accounts not only for negative design, informed by empirical data, but also for positive design based on the study data (Arenas, et al. 2013; Minning, et al. 2013), thereby enabling the detection of variation in folding stability among protein variants. These aspects are indicated in the manuscript. Therefore, in our view, the study provides a proper comparison of real and predicted evolutionary trajectories in terms of folding stability.

      Ultimately, the model constrains sequence evolution to stability-compatible trajectories but does not forecast which of these trajectories are likely to occur. It is better understood as a filter of biophysically plausible outcomes than as a predictive tool. The distinction between constraint-based plausibility and sequence-level forecasting should be made clearer. Despite these limitations, the work may be of interest to researchers developing simulation frameworks or exploring the role of protein stability in viral evolution, and it raises interesting questions about how biophysical constraints shape sequence space over time. 

      The presented method estimates the fitness of each protein variant, which can reflect the relative survival capacity of the variant. Therefore, despite the error due to evolutionary constraints not considered by the method, it indicates which variants are more likely to become fixed over time. In our view, the method does not merely filter plausible variants, rather, it generates predictions of variant survival through predicted fitness based on folding stability and simulations of protein evolution under structurally constrained substitution models integrated with birth-death population genetics approaches. The use of simulation-based approaches for prediction is well established in population genetics. For example, approaches such as approximate Bayesian computation (Beaumont, et al. 2002) rely on this strategy, and it has also been applied in other studies of forecasting evolution (e.g., Neher, et al. 2014). We believe that the distinction between forecasting folding stability and amino acid sequence is clearly shown in the manuscript, including the main text and the figures.

      Reviewer #2 (Recommendations for the authors): 

      I thank the authors for addressing the question about template switching, their clarification was helpful. However, the core concerns I raised remain unresolved: the claim that the method is useful for forecasting is not substantiated.  In order to support the paper's central claims or to prove its usefulness, several key improvements could be incorporated: 

      (1) Systematic analysis of more proteins: 

      The manuscript would be significantly strengthened by a systematic evaluation of model performance across a broader set of viral proteins, beyond the examples currently shown. Many human influenza and SARS-CoV-2 proteins have wellcharacterized structures or high-quality homology templates, making them suitable candidates. In the light of limited success of the method, presenting the model's behavior across a more comprehensive protein set, including those with varying structural constraints and immune pressures, would help assess generalizability and clarify the specific conditions under which the model is applicable. 

      Following a comment from the reviewer in a previous revision of the study, we included the analysis of an influenza NS1 protein dataset that contains two evaluation time points. Next, to validate the prediction method, it is necessary to have monitored protein sequences collected at least at two consecutive time points, with sufficient divergence between them to capture evolutionary signatures that allow for proper evaluation. Additionally, many data involve sequences that are not consecutive in evolutionary terms (one dataset is not a direct ancestor of another dataset existing at a posterior time point), which disallows their applicability in this study. Little data from monitored protein evolution with trustable consecutive (ancestor-descendant) samples exist. The most suitable studies often involve in vitro virus evolution, but they usually show low genetic variability between samples collected at different time points. Although thousands of sequences are available for some viruses, they are usually consensus sequences from bulk sequencing and often show a low number of nonsynonymous mutations at the study protein-coding gene between time points. For example, from the original Wuhan strain and the Omicron variant, the SARS-CoV-2 proteins Mpro and PLpro accumulated only 10 and 22 amino acid changes, respectively. Analyzing intermediate variants of concern (i.e., Gamma or Delta) would reduce this number affecting statistics. Thus, in practice, we found scarcity of data derived from monitoring protein evolution, with trustable ancestor and corresponding descendant data at consecutive time points and with sufficient molecular diversity between them (i.e., at least more than five polymorphic amino acid sites). In all, we believe that the diverse viral protein datasets used in the present study, along with the multiple analyzed datasets collected from monitored HIV-1 populations present in different patients, provide a representative application of the method, since notice that similar patterns were generally generated from the analysis of the different datasets.

      (2) Present clear data statistics: For each analyzed dataset, the authors should provide basic information about the number of unique sequences, levels of variability, and evolutionary divergence between start and end sequences. This would contextualize the forecasting task and clarify whether the simulations are non-trivial. In particular, it should be shown that the consensus sequence is indeed representative of the viral population at a given time point. In viral evolution we frequently observe co-circulation of subclades and the consensus sequence is then not representative. 

      For each dataset analyzed, the manuscript provides the sequence identity between samples at the study time points (which also informs about sequence variability), sample sizes, representative protein structure, and other technical details. The study assumes that consensus sequences, typically generated by bulk sequencing, are representative of the viral population. Next, samples at different time points should involve ancestor-descendant relationships, which is a requirement and one of the limitations to find appropriate data for this study, as noted in our previous response.

      (3) Explore other metrics for population level sequence comparison: 

      In the light of possible existence of subclades, mentioned above, the currently used metrics for sequence comparison may underestimate performance of the simulations. It would be sufficient to see some overlap of simulated clades and and the observed clades. 

      We found this to be a good idea. However, in practice, we believe that the criteria used to define subclades could introduce biases into the results. For some metrics, we evaluated the accuracy of the predictions through direct comparisons between all real and predicted protein variants, using percentages to facilitate interpretation. We believe that using subclades could potentially reduce the current prediction errors, but this would complicate the interpretation of the results, as they would be influenced by the subjective criteria used to define the subclades.

      Currently, the manuscript presents a plausible filtering framework rather than a predictive model. Without these additional analyses, the main claims remain only partially supported. 

      Please see our reply to the comment of the reviewer just before the section titled “Recommendations for the authors”.

      Response to some rebuttal statements: 

      (1) "Sequence comparisons based on the KL divergence require, at the studied time point, an observed distribution of amino acid frequencies among sites and an estimated distribution of amino acid frequencies among sites. In the study datasets, this is only the case for the HIV-1 MA dataset, which belongs to a previous study from one of us and collaborators where we obtained at least 20 independent sequences at each sampling point (Arenas, et al. 2016)" 

      The available Influenza and SARS-CoV-2 data gathers isolates annotated with exact collection dates, providing reach datasets for such analysis. 

      The available influenza and SARS-CoV-2 sequences are typically derived from bulk sequencing and, therefore, they are consensus sequences. As a result, they cannot be used to calculate KL divergence. Additionally, many of the indicated sequences from databases are not demonstrated to be consecutive in evolutionary terms (one dataset is not a direct ancestor of another dataset existing at a posterior time point), which disallows their applicability in this study. The most suitable studies often involve in vitro virus evolution, but they usually show low genetic variability between samples collected at different time points.

      (2) "Regarding extending the analysis to other time points (other variants of concern), we kindly disagree because Omicron is the variant of concern with the highest genetic distance to the Wuhan variant, and a high genetic distance is  required to properly evaluate the prediction method." 

      There have been many more variants of concern subsequent to Omicron which circulated in 2021. 

      A key aspect is the accumulation of diversity in the study proteins across different time points. The SARS-CoV-2 proteins Mpro and PLpro accumulated only 10 and 22 amino acid changes from the original Wuhan variant to Omicron, respectively.

      Analyzing intermediate variants of concern (e.g., Gamma or Delta) or those closely related to Omicron would reduce the number of accumulated mutations even further.   

      We want to thank the reviewer again for taking the time to revise our work and for the insightful and helpful comments.


      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review): 

      Summary: 

      Ferreiro et al. present a method to simulate protein sequence evolution under a birth-death model where sequence evolution is constrained by structural constraints on protein stability. The authors then use this model to explore the predictability of sequence evolution in several viral structural proteins. In principle, this work is of great interest to molecular evolution and phylodynamics, which have struggled to couple non-neutral models of sequence evolution to phylodynamic models like birth-death. Unfortunately, though, the model shows little improvement over neutral models in predicting protein evolution, and this ultimately appears to be due to fundamental conceptual problems with how fitness is modeled and linked to the phylodynamic birth-death model. 

      AU: We thank the reviewer for the positive comments about our work.

      Regarding predictive power, the study showed a good accuracy in predicting the real folding stability of forecasted protein variants under a selection model, but not under a neutral model. Next, predicting the exact sequences was more challenging. In this revised version, where we added additional real data, we found that the accuracy of this prediction can vary among proteins (i.e., the SCS model was more accurate than the neutral model in predicting sequences of the influenza NS1 protein at different time points). Still, we consider that efforts are required in the field of substitution models of molecular evolution. For example, amino acids with similar physicochemical properties can result in predictions with appropriate folding stability while different specific sequence. The development of accurate substitution models of molecular evolution is an active area of research with ongoing progress, but further efforts are still needed. Next, forecasting the folding stability of future real proteins is fundamental for proper forecasting protein evolution, given the essential role of folding stability in protein function and its variety of applications. Regarding the conceptual concerns related to fitness modeling, we clarify them in detail in our responses to the specific comments below.

      Major concerns:

      (1) Fitness model: All lineages have the same growth rate r = b-d because the authors assume b+d=1. But under a birth-death model, the growth r is equivalent to fitness, so this is essentially assuming all lineages have the same absolute fitness since increases in reproductive fitness (b) will simply trade off with decreases in survival (d). Thus, even if the SCS model constrains sequence evolution, the birthdeath model does not really allow for non-neutral evolution such that mutations can feed back and alter the structure of the phylogeny. 

      We thank the reviewer for this comment that aims to improve the realism of our model. In the model presented (but see later another model, derived from the proposal of the reviewer, that we have now implemented into the framework and applied it to the study data), the fitness predicted from a protein variant is used to obtain the corresponding birth rate of that variant. In this way, protein variants with high fitness have high birth rates leading to overall more birth events, while protein variants with low fitness have low birth rates resulting in overall more extinction events, which has biological meaning for the study system. The statement “All lineages have the same growth rate r = b-d” in our model is incorrect because, in our model, b and d can vary among lineages according to the fitness. For example, a lineage might have b=0.9, d=0.1, r=0.8, while another lineage could have b=0.6, d=0.4, r=0.2. Indeed, the statement “this is essentially assuming all lineages have the same absolute fitness” is incorrect. Clearly, assuming that all lineages have the same fitness would not make sense, in that situation the folding stability of the forecasted protein variants would be similar under any model, which is not the case as shown in the results. In our model, the fitness affects the reproductive success, where protein variants with a high fitness have higher birth rates leading to more birth events, while those with lower fitness have higher death rates leading to more extinction events. This parameterization is meaningful for protein evolution because the fitness of a protein variant can affect its survival (birth or extinction) without necessarily affecting its rate of evolution. While faster growth rate can sometimes be associated with higher fitness, a variant with high fitness does not necessarily accumulate substitutions at a faster rate. Regarding the phylogenetic structure, the model presented considers variable birth and death events across different lineages according to the fitness of the corresponding protein variants, and this affects the derived phylogeny (i.e., protein variants selected against can go extinct while others with high fitness can produce descendants). We are not sure about the meaning of the term “mutations can feed back” in the context of our system. Note that we use Markov models of evolution, which are well-stablished in the field (despite their limitations), and substitutions are fixed mutations, which still could be reverted later if selected by the substitution model (Yang 2006). Altogether, we find that the presented birth-death model is technically correct and appropriate for modeling our biological system. Its integration with structurally constrained substitution (SCS) models of protein evolution as Markov models follows general approaches of molecular evolution in population genetics (Yang 2006; Carvajal-Rodriguez 2010; Arenas 2012; Hoban, et al. 2012). We have now provided a more detailed description of the models in the manuscript.

      Apart from these clarifications about the birth-death model used, we could understand the point of the reviewer and following the suggestion we have now incorporated an additional birth-death model that accounts for variable global birth-death rate among lineages. Specifically, we followed the model proposed by Neher et al (2014), where the death rate is considered as 1 and the birth rate is modeled as 1 + fitness. In this model, the global birth-death rate can vary among lineages. We implemented this model into the computer framework and applied it to the data used for the evaluation of the models. The results indicated that, in general, this model yields similar predictive accuracy compared to the previous birth-death model. Thus, accounting for variability in the global birth-death rate does not appear to play a major role in the studied systems of protein evolution. We have now presented this additional birth-death model and its results in the manuscript.

      (2) Predictive performance: Similar performance in predicting amino acid frequencies is observed under both the SCS model and the neutral model. I suspect that this rather disappointing result owes to the fact that the absolute fitness of different viral variants could not actually change during the simulations (see comment #1). 

      As indicated in our previous answer, our study shows a good accuracy in predicting the real folding stability of forecasted protein variants under a selection model, but not under a neutral model. Next, predicting the exact sequences was more challenging, which was not surprising considering previous studies. In particular, inferring specific sequences is considerably challenging even for ancestral molecular reconstruction (Arenas, et al. 2017; Arenas and Bastolla 2020). Indeed, observed sequence diversity is much greater than observed structural diversity (Illergard, et al. 2009; Pascual-Garcia, et al. 2010), and substitutions between amino acids with similar physicochemical properties can yield modeled protein variants with more accurate folding stability, even when the exact amino acid sequences differ. As indicated, further work is demanded in the field of substitution models of molecular evolution. Next, in this revised version, where we included analyses of additional real datasets, we found that the accuracy of sequence prediction can vary among datasets. Notably, the analysis of an influenza NS1 protein dataset, with higher diversity than the other datasets studied, showed that the SCS model was more accurate than the neutral model in predicting sequences across different time points. Datasets with relatively high sequence diversity can contain more evolutionary information, which can improve prediction quality. In any case, as previously indicated, we believe that efforts are required in the field of substitution models of molecular evolution. Apart from that, forecasting the folding stability of future real proteins is an important advance in forecasting protein evolution, given the essential role of folding stability in protein function (Scheiblhofer, et al. 2017; Bloom and Neher 2023) and its variety of applications.

      Next, also as indicated in our previous response, the birth-death model used in this study accounts for variation in fitness among lineages producing variable reproductive success. The additional birth-death model that we have now incorporated, which considers variation of the global birth-death rate among lineages, produced similar prediction accuracy, suggesting a limited role in protein evolution modeling. Molecular evolution parameters, particularly the substitution model, appear to be more critical in this regard. We have now included these aspects in the manuscript.

      (3) Model assessment: It would be interesting to know how much the predictions were informed by the structurally constrained sequence evolution model versus the birth-death model. To explore this, the authors could consider three different models: 1) neutral, 2) SCS, and 3) SCS + BD. Simulations under the SCS model could be performed by simulating molecular evolution along just one hypothetical lineage. Seeing if the SCS + BD model improves over the SCS model alone would be another way of testing whether mutations could actually impact the evolutionary dynamics of lineages in the phylogeny. 

      In the present study, we compared the neutral model + birth-death (BD) with the SCS model + BD. Markov substitution models Q are applied upon an evolutionary time (i.e., branch length, t) and this allows to determine the probability of substitution events during that time period [P(t) = exp (Qt)]. This approach is traditionally used in phylogenetics to model the incorporation of substitution events over time. Therefore, to compare the neutral and SCS models in terms of evolutionary inference, an evolutionary time is required, in this case it is provided by the birth-death process. Thus, the cases 1) and 2) cannot be compared without an underlined evolutionary history. Next, comparisons in terms of likelihood, and other aspects, between models that ignore the protein structure and the implemented SCS models are already available in previous studies based on coalescent simulations or given phylogenetic trees (Arenas, et al. 2013; Arenas, et al. 2015). There, SCS models outperformed models that ignore evolutionary constraints from the protein structure, and those findings are consistent with the results obtained in the present study where we explored the application of these models to forecasting protein evolution. We would like to emphasize that forecasting the folding stability of future real proteins is a significant finding, folding stability is fundamental to protein function and has a variety of applications. We have now indicated these aspects in the manuscript.

      (4) Background fitness effects: The model ignores background genetic variation in fitness. I think this is particularly important as the fitness effects of mutations in any one protein may be overshadowed by the fitness effects of mutations elsewhere in the genome. The model also ignores background changes in fitness due to the environment, but I acknowledge that might be beyond the scope of the current work. 

      AU: This comment made us realize that more information about the features of the implemented SCS models should be included in the manuscript. In particular, the implemented SCS models consider a negative design based on the observed residue contacts in nearly all proteins available in the Protein Data Bank (Arenas, et al. 2013; Arenas, et al. 2015). This data is distributed with the framework, and it can be updated to incorporate new structures (further details are provided in the distributed framework documentation and practical examples). Therefore, the prediction of folding stability is a combination of positive design (direct analysis of the target protein) and negative design (consideration of background proteins from a database to improve the predictions), thus incorporating background molecular diversity. We have now indicated this important aspect in the manuscript. Regarding the fitness caused by the environment, we agree with the reviewer. This is a challenge for any method aiming to forecast evolution, as future environmental shifts are inherently unpredictable and may affect the accuracy of the predictions. Although one might attempt to incorporate such effects into the model, doing so risks overparameterization, especially when the additional factors are uncertain or speculative. We have now mentioned this aspect in the manuscript.

      (5) In contrast to the model explored here, recent work on multi-type birth-death processes has considered models where lineages have type-specific birth and/or death rates and therefore also type-specific growth rates and fitness (Stadler and Bonhoeffer, 2013; Kunhert et al., 2017; Barido-Sottani, 2023). Rasmussen & Stadler (eLife, 2019) even consider a multi-type birth-death model where the fitness effects of multiple mutations in a protein or viral genome collectively determine the overall fitness of a lineage. The key difference with this work presented here is that these models allow lineages to have different growth rates and fitness, so these models truly allow for non-neutral evolutionary dynamics. It would appear the authors might need to adopt a similar approach to successfully predict protein evolution. 

      We agree with the reviewer that robust birth-death models have been developed applying statistics and, in many cases, the primary aim of those studies is the development and refinement of the model itself. Regarding the study by Rasmussen and Stadler 2019, it incorporates an external evaluation of mutation events where the used fitness is specific for the proteins investigated in that study, which may pose challenges for users interested in analyzing other proteins. In contrast, our study takes a different approach. We implement a fitness function that can be predicted and evaluated for any type of structural protein (Goldstein 2013), making it broadly applicable. Actually, in this revised version we added the analysis of additional data of another protein (influenza NS1 protein) with predictions at different time points. In addition, we provide a freely available and well-documented computational framework to facilitate its use. The primary aim of our study is not the development of novel or complex birthdeath models. Rather, we aim to explore the integration of a standard birth-death model with SCS models for the purpose of predicting protein evolution. In the context of protein evolution, substitution models are a critical factor (Liberles, et al. 2012; Wilke 2012; Bordner and Mittelmann 2013; Echave, et al. 2016; Arenas, et al. 2017; Echave and Wilke 2017), and the presented combination with a birth-death model constitutes a first approximation upon which next studies can build to better understand this evolutionary system. We have now indicated these considerations in the manuscript.

      Reviewer #2 (Public review): 

      Summary: 

      In this study, "Forecasting protein evolution by integrating birth-death population models with structurally constrained substitution models", David Ferreiro and coauthors present a forward-in-time evolutionary simulation framework that integrates a birth-death population model with a fitness function based on protein folding stability. By incorporating structurally constrained substitution models and estimating fitness from ΔG values using homology-modeled structures, the authors aim to capture biophysically realistic evolutionary dynamics. The approach is implemented in a new version of their open-source software, ProteinEvolver2, and is applied to four viral proteins from HIV-1 and SARS-CoV-2. 

      Overall, the study presents a compelling rationale for using folding stability as a constraint in evolutionary simulations and offers a novel framework and software to explore such dynamics. While the results are promising, particularly for predicting biophysical properties, the current analysis provides only partial evidence for true evolutionary forecasting, especially at the sequence level. The work offers a meaningful conceptual advance and a useful simulation tool, and sets the stage for more extensive validation in future studies.

      We thank the reviewer for the positive comments on our study. Regarding the predictive power, the results showed good accuracy in predicting the folding stability of the forecasted protein variants. In this revised version, where we included analyses of additional real datasets, we found that the accuracy of sequence prediction can vary among datasets. Notably, the analysis of an influenza NS1 protein dataset, with higher diversity than the other datasets studied, showed that the SCS model was more accurate than the neutral model in predicting sequences across different time points. Datasets with relatively high sequence diversity can contain more evolutionary information, which can improve prediction quality. Still, we believe that further efforts are required in the field in improving the accuracy of substitution models of molecular evolution. Altogether, accurately forecasting the folding stability of future real proteins is fundamental for predicting their protein function and enabling a variety of applications. Also, we implemented the models into a freely available computer framework, with detailed documentation and a variety of practical examples.

      Strengths: 

      The results demonstrate that fitness constraints based on protein stability can prevent the emergence of unrealistic, destabilized variants - a limitation of traditional, neutral substitution models. In particular, the predicted folding stabilities of simulated protein variants closely match those observed in real variants, suggesting that the model captures relevant biophysical constraints. 

      We agree with the reviewer and appreciate the consideration that forecasting the folding stability of future real proteins is a relevant finding. For instance, folding stability is fundamental for protein function and affects several other molecular properties.

      Weaknesses: 

      The predictive scope of the method remains limited. While the model effectively preserves folding stability, its ability to forecast specific sequence content is not well supported. 

      Our study showed a good accuracy in predicting the real folding stability of forecasted protein variants under a selection model, but not under a neutral model. Predicting the exact sequences was more challenging, which was not surprising considering previous studies. In particular, inferring specific sequences is considerably challenging even for ancestral molecular reconstruction (Arenas, et al. 2017; Arenas and Bastolla 2020). Indeed, observed sequence diversity is much greater than observed structural diversity (Illergard, et al. 2009; Pascual-Garcia, et al. 2010), and substitutions between amino acids with similar physicochemical properties can yield modeled protein variants with more accurate folding stability, even when the exact amino acid sequences differ. As indicated, further work is demanded in the field of substitution models of molecular evolution. Next, in this revised version, where we included analyses of additional real datasets, we found that the accuracy of sequence prediction can vary among datasets. Notably, the analysis of an influenza NS1 protein dataset, with higher diversity than the other datasets studied, showed that the SCS model was more accurate than the neutral model in predicting sequences across different time points. Datasets with relatively high sequence diversity can contain more evolutionary information, which can improve prediction quality. In any case, as previously indicated, we believe that efforts are required in the field of substitution models of molecular evolution. Apart from that, forecasting the folding stability of future real proteins is an important advance in forecasting protein evolution, given the essential role of folding stability in protein function (Scheiblhofer, et al. 2017; Bloom and Neher 2023) and its variety of applications. We have now expanded these aspects in the manuscript.

      Only one dataset (HIV-1 MA) is evaluated for sequence-level divergence using KL divergence; this analysis is absent for the other proteins. The authors use a consensus Omicron sequence as a representative endpoint for SARS-CoV-2, which overlooks the rich longitudinal sequence data available from GISAID. The use of just one consensus from a single time point is not fully justified, given the extensive temporal and geographical sampling available. Extending the analysis to include multiple timepoints, particularly for SARS-CoV-2, would strengthen the predictive claims. Similarly, applying the model to other well-sampled viral proteins, such as those from influenza or RSV, would broaden its relevance and test its generalizability. 

      The evaluation of forecasting evolution using real datasets is complex due to several conceptual and practical aspects. In contrast to traditional phylogenetic reconstruction of past evolutionary events and ancestral sequences, forecasting evolution often begins with a variant that is evolved forward in time and requires a rough fitness landscape to select among possible future variants (Lässig, et al. 2017). Another concern for validating the method is the need to know the initial variant that gives rise to the corresponding future (forecasted) variants, and it is not always known. Thus, we investigated systems where the initial variant, or a close approximation, is known, such as scenarios of in vitro monitored evolution. In the case of SARS-CoV-2, the Wuhan variant is commonly used as the starting variant of the pandemic. Next, since forecasting evolution is highly dependent on the used model of evolution, unexpected external factors can be dramatic for the predictions. For this reason, systems with minimal external influences provide a more controlled context for evaluating forecasting evolution. For instance, scenarios of in vitro monitored virus evolution avoid some external factors such as host immune responses. Another important aspect is the availability of data at two (i.e., present and future) or more time points along the evolutionary trajectory, with sufficient genetic diversity between them to identify clear evolutionary signatures. Additionally, using consensus sequences can help mitigate effects from unfixed mutations, which should not be modeled by a substitution model of evolution. Altogether, not all datasets are appropriate to properly evaluate or apply forecasting evolution. These aspects are indicated in the manuscript. Sequence comparisons based on the KL divergence require, at the studied time point, an observed distribution of amino acid frequencies among sites and an estimated distribution of amino acid frequencies among sites. In the study datasets, this is only the case for the HIV-1 MA dataset, which belongs to a previous study from one of us and collaborators where we obtained at least 20 independent sequences at each sampling point (Arenas, et al. 2016). This aspect is now more clearly indicated in the manuscript. Regarding the Omicron datasets, we used 384 curated sequences of the Omicron variant of concern to construct the study data and we believe that it is a representative sample. The sequence used for the initial time point was the Wuhan variant (Wu, et al. 2020), which is commonly assumed to be the origin of the pandemic in SARS-CoV-2 studies. As previously indicated, the use of consensus sequences is convenient to avoid variants with unfixed mutations. Regarding extending the analysis to other time points (other variants of concern), we kindly disagree because Omicron is the variant of concern with the highest genetic distance to the Wuhan variant, and a high genetic distance is required to properly evaluate the prediction method. Actually, we noted that earlier variants of concern show a small number of fixed mutations in the study proteins, despite the availability of large numbers of sequences in databases such as GISAID. Additionally, we investigated the evolutionary trajectories of HIV-1 protease (PR) in 12 intra-host viral populations with predictions for up to four different time points. Apart from those aspects, following the proposal of the reviewer, we have now incorporated the analysis of an additional dataset of influenza NS1 protein (Bao, et al. 2008), with predictions for two different time points, to further assess the generalizability of the method. We have now included details of this influenza NS1 protein dataset and the predictions derived from it in the manuscript.

      It would also be informative to include a retrospective analysis of the evolution of protein stability along known historical trajectories. This would allow the authors to assess whether folding stability is indeed preserved in real-world evolution, as assumed in their model.

      Our present study does not aim to investigate the evolution of the folding stability over time, although it provides this information indirectly at the studied time points. Instead, the present study shows that the folding stability of the forecasted protein variants is similar to the folding stability of the corresponding real protein variants for diverse viral proteins, which provides an important evaluation of the prediction method. Next, the folding stability can indeed vary over time in both real and modeled evolutionary scenarios, and our present study is not in conflict with this. In that regard, which is not the aim of our present study, some previous phylogenetic-based studies have reported temporal fluctuations in folding stability for diverse protein data (Arenas, et al. 2017; Olabode, et al. 2017; Arenas and Bastolla 2020; Ferreiro, et al. 2022).

      Finally, a discussion on the impact of structural templates - and whether the fixed template remains valid across divergent sequences - would be valuable. Addressing the possibility of structural remodeling or template switching during evolution would improve confidence in the model's applicability to more divergent evolutionary scenarios.

      This is an important point. For the datasets that required homology modeling (in several cases it was not necessary because the sequence was present in a protein structure of the PDB), the structural templates were selected using SWISS-MODEL, and we applied the best-fitting template. We have now included in a supplementary table details about the fitting of the structural templates. Indeed, our proposal assumes that the protein structure is maintained over the studied evolutionary time, which can be generally reasonable for short timescales where the structure is conserved (Illergard, et al. 2009; Pascual-Garcia, et al. 2010). Over longer evolutionary timescales, structural changes may occur and, in such cases, modeling the evolution of the protein structure would be necessary. To our knowledge, modeling the evolution of the protein structure remains a challenging task that requires substantial methodological developments. Recent advances in artificial intelligence, particularly in protein structure prediction from sequence, may offer promising tools for addressing this challenge. However, we believe that evaluating such approaches in the context of structural evolution would be difficult, especially given the limited availability of real data with known evolutionary trajectories involving structural change. In any case, this is probably an important direction for future research. We have now included this discussion in the manuscript.

      Reviewer #1 (Recommendations for the authors): 

      (1) Abstract: "expectedly, the errors grew up in the prediction of the corresponding sequences" <- Not entirely clear what is meant by "errors grew up" or what the errors grew with.

      This sentence refers to the accuracy of sequence prediction in comparison to that of folding stability prediction. We have now clarified this aspect in the manuscript.

      (2) Lines 162-165: "Alternatively, if the fitness is determined based on the similarity in folding stability between the modeled variant and a real variant, the birth rate is assumed to be 1 minus the root mean square deviation (RMSD) in folding stability." <- What is the biological motivation for using the RMSD? It seems like a more stable variant would always have higher fitness, at least according to Equation 1.

      RMSD is commonly used in molecular biology to compare proteins in terms of structural distance, folding stability, kinetics, and other properties. It offers advantages such as minimizing the influence of small deviations while amplifying larger differences, thereby enhancing the detection of remarkable molecular changes. Additionally, RMSD would facilitate the incorporation of other biophysical parameters, such as structural divergences from a wild-type variant or entropy, which could be informative for fitness in future versions of the method. We have now included this consideration in the manuscript.

      (3) Lines 165-166: "In both cases, the death rate (d) is considered as 1-b to allow a constant global (birth-death) rate" <- This would give a constant R = b / (1-b) over the entire phylogenetic tree. For applications to pathogens like viruses with epidemic dynamics, this is extremely implausible. Is there any need to make such a restrictive assumption? 

      Regarding technical considerations of the model, we refer to our answer to the first public review comment. Next, a constant global rate of evolution was observed in numerous genes and proteins of diverse organisms, including viruses (Gojobori, et al.1990; Leitner and Albert 1999; Shankarappa, et al. 1999; Liu, et al. 2004; Lu, et al. 2018; Zhou, et al. 2019). However, following the comment of the reviewer, and as we indicated in our answer to the first public review comment, we have now implemented and evaluated an additional birth-death model that allows for variation in the global birth-death rate among lineages. We have implemented this additional model in the framework and described it along with its results in the manuscript.

      (4) Lines 187-188: "As a consequence, since b+d=1 at each node, tn is consistent across all nodes, according to (Harmon, 2019)." <- This would also imply that all lineages have a growth rate r = b - d, which under a birth-death model is equivalent to saying all lineages have the same fitness! 

      We clarified this aspect in our answer to the first public review comment. In particular, in the model presented, protein variants with higher fitness have higher birth rates, leading to more birth events, while protein variants with lower fitness have lower birth rates leading to more extinction events, which presents biological meaning for the study system. In our model b and d can vary among lineages according to the corresponding fitness (i.e., a lineage may have b=0.9, d=0.1, r=0.8; while another one may have b=0.6, d=0.4, r=0.2). Since the reproductive success varies among lineages in our model, the statement “this is essentially assuming all lineages have the same absolute fitness” is incorrect, although it could be interpreted like that in certain models. Fitness affects reproductive success, but fitness and growth rate of evolution are different biological processes (despite a faster growth rate can sometimes be associated with higher fitness, a variant with a high fitness not necessarily has to accumulate substitutions at a higher rate). An example in molecular adaptation studies is the traditional nonsynonymous to synonymous substitution rates ratio (dN/dS), where dN/dS (that informs about selection derived from fitness) can be constant at different rates of evolution (dN and dS). In any case, we thank the reviewer for raising this point, which led us to incorporate an additional birth-death model and inspired some ideas.  Thus, following the comment of the reviewer and as indicated in the answer to the first public review comment, we have now implemented and evaluated an additional birthdeath model that allows for variation in the global birth-death rate among lineages. The results indicated that this model yields similar predictive accuracy compared to the previous birth-death model. We have now included these aspects, along with the results from the additional model, in the manuscript.

      (5) Line 321-322: "For the case of neutral evolution, all protein variants equally fit and are allowed, leading to only birth events," <- Why would there only be birth events? Lineages can die regardless of their fitness. 

      AU: In the neutral evolution model, all protein variants have the same fitness, resulting in a flat fitness landscape. Since variants are observed, we allowed birth events. However, it assumed the absence of death events as no information independent of fitness is available to support their inclusion and quantification, thereby avoiding the imposition of arbitrary death events based on an arbitrary death rate. We have now provided a justification of this assumption in the manuscript.

      Reviewer #2 (Recommendations for the authors): 

      (1) Clarify the purpose of the alternative fitness mode ("ΔG similarity to a target variant"): 

      The manuscript briefly introduces an alternative fitness function based on the similarity of a simulated protein's folding stability to that of a real protein variant, but does not provide a clear motivation, usage scenario, or results derived from it. 

      The presented model provides two approaches for deriving fitness from predicted folding stability. The simpler approach assumes that a more stable protein variant has higher fitness than a less stable one. The alternative approach assigns high fitness to protein variants whose stability closely matches observed stability, acknowledging that the real observed stability is derived from the real selection process, and this approach considers negative design by contrasting the prediction with real information. For the analyses of real data in this study, we used the second approach, guided by these considerations. We have now clarified this aspect in the manuscript.

      (2) Report structural template quality and modeling confidence: 

      Since folding stability (ΔG) estimates rely on structural models derived from homology templates, the accuracy of these predictions will be sensitive to the choice and quality of the template structure. I recommend that the authors report, for each protein modeled, the template's sequence identity, coverage, and modeling quality scores. This will help readers assess the confidence in the ΔG estimates and interpret how template quality might impact simulation outcomes. 

      We agree with the reviewer and we have now included additional information in a supplementary table regarding sequence identity, modeling quality and coverage of the structural templates for the proteins that required homology modeling. The selection of templates was performed using the well-established framework SWISS-MODEL and the best-fitting template was chosen. Next, a large number of protein structures are available in the PDB for the study proteins, which favors the accuracy of the homology modeling. For some datasets, homology modeling was not required, as the modeled sequence was already present in an available protein structure. We have now included this information in the manuscript and in a supplementary table.

      (3) Clarify whether structural remodeling occurs during simulation: 

      It appears that folding stability (ΔG) for all simulated protein variants is computed by mapping them onto a single initial homology model, without remodeling the structure as sequences evolve. If correct, this should be clearly stated, as it assumes that the structural fold remains valid across all simulated variants. A discussion on the potential impact of structural drift would be welcome.

      We agree with the reviewer. As indicated in our answer to a previous comment, our method assumes that the protein structure is maintained over the studied evolutionary time, which is generally acceptable for short timescales where the structure is conserved (Illergard, et al. 2009; Pascual-Garcia, et al. 2010). At longer timescales the protein structure could change, requiring the modeling of structural evolution over the evolutionary time. To our knowledge, modeling the evolution of the protein structure remains a challenging task that requires substantial methodological developments. Recent advances in artificial intelligence, particularly in protein structure prediction from sequence, can be promising tools for addressing this challenge. However, we believe that evaluating such approaches in the context of structural evolution would be difficult, especially given the limited availability of real datasets with known evolutionary trajectories involving structural change. In any case, this is probably an important direction for future research. We have now included this discussion in the manuscript.

    1. eLife Assessment

      This work characterizes the function and localization of SLC4A1 variants associated with distal renal tubular acidosis in human patients. Cell culture and limited animal studies provide partial but incomplete support to the authors' claim that the variants disrupt normal protein degradative flux by alkalinizing the intracellular pH. The study is valuable in providing preliminary evidence for future exploration of the link between intracellular pH regulation by SLC4A1 and kidney cell function in vivo.

    2. Reviewer #1 (Public review):

      Summary:

      This study is an evaluation of patient variants in the kidney isoform of AE1 linked to distal renal tubular acidosis. Drawing on observations in the mouse kidney, this study extends findings to autophagy pathways in a kidney epithelial cell line.

      Strengths:

      Experimental data are convincing and nicely done.

      Weaknesses:

      Some data are lacking or not explained clearly. Mutations are not consistently evaluated throughout the study, which makes it difficult to draw meaningful conclusions.

    3. Reviewer #2 (Public review):

      Context and significance:

      Distal renal tubular acidosis (dRTA) can be caused by mutations in a Cl-/HCO3- exchanger (kAE1) encoded by the SLC4A1 gene. The precise mechanisms underlying the pathogenesis of the disease due to these mutations are unclear, but it is thought that loss of the renal intercalated cells (ICs) that express kAE1 and/or aberrant autophagy pathway function in the remaining ICs may contribute to the disease. Understanding how mutations in SLC4A1 affect cell physiology and cells within the kidney, a major goal of this study, is an important first step to unraveling the pathophysiology of this complex heritable kidney disease.

      Summary:

      The authors identify a number of new mutations in the SLC4A1 gene in patients with diagnosed dRTA that they use for heterologous experiments in vitro. They also use a dRTA mouse model with a different SLC4A1 mutation for experiments in mouse kidneys. Contrary to previous work that speculated dRTA was caused mainly by trafficking defects of kAE1, the authors observe that their new mutants (with the exception of Y413H, which they only use in Figure 1) traffic and localize at least partly to the basolateral membrane of polarized heterologous mIMCD3 cells, an immortalized murine collecting duct cell line. They go on to show that the remaining mutants induce abnormalities in the expression of autophagy markers and increased numbers of autophagosomes, along with an alkalinized intracellular pH. They also reported that cells expressing the mutated kAE1 had increased mitochondrial content coupled with lower rates of ATP synthesis. The authors also observed a partial rescue of the effects of kAE1 variants through artificially acidifying the intracellular pH. Taken together, this suggests a mechanism for dRTA independent of impaired kAE1 trafficking and dependent on intracellular pH changes that future studies should explore.

      Strengths:

      The authors corroborate their findings in cell culture with a well-characterized dRTA KI mouse and provide convincing quantification of their images from the in vitro and mouse experiments.

      Weaknesses:

      The data largely support the claims as stated, with some minor suggestions for improving the clarity of the work. Some of the mutants induce different strengths of effects on autophagy and the various assays than others, and it is not clear why this is from the present manuscript, given that they propose pHi and the unifying mechanism.

    4. Reviewer #3 (Public review):

      Summary:

      The authors have identified novel dRTA causing SLC4A1 mutations and studied the resulting kAE1 proteins to determine how they cause dRTA. Based on a previous study on mice expressing the dRTA kAE1 R607H variant, the authors hypothesize that kAE1 variants cause an increase in intracellular pH, which disrupts autophagic and degradative flux pathways. The authors clone these new kAE1 variants and study their transport function and subcellular localization in mIMCD cells. The authors show increased abundance of LC3B II in mIMCD cells expressing some of the kAE1 variants, as well as reduced autophagic flux using eGFP-RFP-LC3. These data, as well as the abundance of autophagosomes, serve as the key evidence that these kAE1 mutants disrupt autophagy. Furthermore, the authors demonstrate that decreasing the intracellular pH abrogates the expression of LC3B II in mIMCD cells expressing mutant SLC4A1. Lastly, the authors argue that mitochondrial function, and specifically ATP synthesis, is suppressed in mIMCD cells expressing dRTA variants and that mitochondria are less abundant in AICs from the kidney of R607H kAE1 mice. While the manuscript does reveal some interesting new results about novel dRTA causing kAE1 mutations, the quality of the data to support the hypothesis that these mutations cause a reduction in autophagic flux can be improved. In particular, the precise method of how the western blots and the immunofluorescence data were quantified, with included controls, would enhance the quality of the data and offer more supportive evidence of the authors' conclusions.

      Strengths:

      The authors cloned novel dRTA causing kAE1 mutants into expression vectors to study the subcellular localization and transport properties of the variants. The immunofluorescence images are generally of high quality, and the authors do well to include multiple samples for all of their western blots.

      Weaknesses:

      Inconsistent results are reported for some of the variants. For example, R295H causes intracellular alkalinization but also has no effect on intracellular pH when measured by BCECF. The authors also appear to have performed these in vitro studies on mIMCD cells that were not polarized, and therefore, the localization of kAE1 to the basolateral membrane seems unlikely, based upon images included in the manuscript. Additionally, there is no in vivo work to demonstrate that these kAE1 variants alter intracellular pH, including the R607H mouse, which is available to the authors. The western blots are of varying quality, and it is often unclear which of the bands are being quantified. For example, LAMP1 is reported at 100kDa, the authors show three bands, and it is unclear which one(s) are used to quantify protein abundance. Strikingly, the authors report a nonsensical value for their quantification of LCRB II in Figure 2, where the ratio of LCRB II to total LCRB (I + II) is greater than one. The control experiments with starvation and bafilomyocin are not supportive and significantly reduce enthusiasm for the authors' findings regarding autophagy. There are labeling errors between the manuscript and the figures, which suggest a lack of vigilance in the drafting process.

    1. eLife Assessment

      This study presents the important finding that lysosomal damage triggers inflammatory signaling through ubiquitination and the TAB-TAK1-IKK-NF-kB axis. The data obtained from the unbiased transcriptomic and proteomic analyses are convincing and provide invaluable information to the field. Although further experiments will be required to clarify how TAB2/3 are recruited after various types of lysosome damage, this work will be of interest to researchers in the fields of organelle biology and inflammation.

    2. Reviewer #1 (Public review):

      Summary:

      Lysosomal damage is commonly found in many diseases including normal aging and age-related disease. However, the transcriptional programs activated by lysosomal damage has not been thoroughly characterized. This study aims to investigate lysosome damage-induced major transcriptional responses and the underlying signaling basis. The authors have convincingly shown that lysosomal damage activates a ubiquitination-dependent signaling axis involving TAB, TAK1, and IKK, which culminate in the activation of NF-kB and subsequent transcriptional upregulation of pro-inflammatory genes and pro-survival genes. Overall, the major aims of this study are successfully achieved.

      Strengths:

      This study is well-conceived and strictly executed, leading to clear and well-supported conclusions. Through unbiased transcriptomics and proteomics screens, the authors identifies NF-kB as a major transcriptional program activated upon lysosome damage. TAK1 activation by lysosome damage-induced ubiquitination is found to be essential for NF-kB activation and MAP kinase signaling. The transcriptional and proteomic changes are shown to be largely driven by TAK1 signaling. Finally, the TAK1-IKK signaling is shown to provide resistance to apoptosis during lysosomal damage response. The main signaling axis of this pathway has been convincingly demonstrated.

      Overall, this study identifies major transcriptional responses following lysosomal damage through unbiased approaches. It is important to consider the impact of these pathways in disease settings where lysosomal integrity is compromised.

      Comments on revisions:

      The authors have adequately addressed all previous comments. I have no further recommendations.

    3. Reviewer #2 (Public review):

      Summary:

      Endo et al. investigate the novel role of ubiquitin response upon lysosomal damage in activating cellular signaling for cell survival. The authors provide a comprehensive transcriptome and proteome analysis of aging-related cells experiencing lysosomal damage, identifying transcription factors involved in transcriptome and proteome remodeling with a focus on the NF-κB signaling pathway. They further characterized the K63-ubiquitin-TAB-TAK1-NF-κB signaling axis in controlling gene expression, inflammatory responses, and apoptotic processes.

      Strengths:

      In the aging-related model, the authors provide a comprehensive transcriptome and characterize the K63-ubiquitin-TAB-TAK1-NF-κB signaling axis. Through compelling experiments and advanced tools, they elucidate its critical role in controlling gene expression, inflammatory responses, and apoptotic processes.

      Weaknesses:

      The study lacks deeper connections with previous research, particularly:

      • The established role of TAB-TAK1 in AMPK activation during lysosomal damage

      • The potential significance of TBK1 in NF-κB signaling pathways

      Comments on revisions:

      The authors have successfully addressed all the raised questions and the manuscript is now significantly improved.

    4. Reviewer #3 (Public review):

      Summary:

      The response to lysosomal damage is a fast-moving and timely field. Besides repair and degradation pathways, increasing interest has been focusing on damaged-induced signaling. The authors conducted both transcriptomics and proteomics to characterize the cellular response to lysosomal damage. They identify a signaling pathway leading to activation of NFkappaB. Based on this and supported by Western blot and microscopy data, the authors nicely show that TAB2/3 and TAK1 are activated at damaged lysosomes and kick off the pathway to alter gene expression, which induces cytokines and protect from cell death. TAB2/3 activation is proposed to occur through K63 ubiquitin chain formation. Generally, this is a careful and well conducted study that nicely delineates the pathway under lysosomal stress. The "omics" data serves a valuable resource for the field. More work should be invested into how TAB2/3 are activated at the damaged lysosomes, also to increase novelty in light of previous reports.

      Strengths:

      Generally, this is a careful and well-conducted study that nicely delineates how the NFkB pathway is activated under lysosomal stress and modulates cell behavior. The "omics" data serves as a valuable resource for the field.

      Weaknesses:

      While activation of TAB2/3 by K63-linked Ub chains is convincing, more work needs to be done on how they are recruited by distinct damage types to probe relevance for different pathophysiological conditions."

      Comments on revisions:

      The authors have addressed much of my criticism. Specifically, they have put (with new experiments) the data on the TAB2/3-TAK1 pathway in perspective to the previously reported LUBAC-mediated activation of NFkB. They also addressed the question about the significance of K63-linked chains for TAB2/3 activation with new complementation experiments (a K63-specific NZF mutant failed to rescue).

      The third point (types of damage as triggers) raises more questions, though. The authors find that, in contrast to LLOMe, GPN or DC661-induced damage does not activate TAK1 (consistent with lower damage levels). However, the authors still observe K63 ubiquitylation. This goes along with their finding that TAB2 is recruited in the absence of any ubiquitylation (blocked by TAK-243). It argues that TAB2 is recruited by an unknown cue (that may be damage-specific) and then activated by K63. The authors need to clarify whether TAB2 is or is not recruited in the GPN/DC661 conditions (in which K63 occurs, but TAK1 is not activated). The point about the effects of other damage types was also raised by reviewer #1 and should be solved. The fact that TAB2 is recruited independently of K63 should also be visualized in the model. The manuscript will then be an important contribution to the field.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      Lysosomal damage is commonly found in many diseases including normal aging and age-related disease. However, the transcriptional programs activated by lysosomal damage have not been thoroughly characterized. This study aimed to investigate lysosome damage-induced major transcriptional responses and the underlying signaling basis. The authors have convincingly shown that lysosomal damage activates a ubiquitination-dependent signaling axis involving TAB, TAK1, and IKK, which culminates in the activation of NF-kB and subsequent transcriptional upregulation of pro-inflammatory genes and pro-survival genes. Overall, the major aims of this study were successfully achieved.

      Strengths:

      This study is well-conceived and strictly executed, leading to clear and well-supported conclusions. Through unbiased transcriptomics and proteomics screens, the authors identified NF-kB as a major transcriptional program activated upon lysosome damage. TAK1 activation by lysosome damage-induced ubiquitination was found to be essential for NF-kB activation and MAP kinase signaling. The transcriptional and proteomic changes were shown to be largely driven by TAK1 signaling. Finally, the TAK1-IKK signaling was shown to provide resistance to apoptosis during lysosomal damage response. The main signaling axis of this pathway was convincingly demonstrated.

      Weaknesses:

      One weakness was the claim of K63-linked ubiquitination in lysosomal damage-induced NF-kB activation. While it was clear that K63 ubiquitin chains were present on damaged lysosomes, no evidence was shown in the current study to demonstrate the specific requirement of K63 ubiquitin chains in the signaling axis being studied. Clarifying the roles of K63-linked versus other types of ubiquitin chains in lysosomal damage-induced NF-kB activation may improve the mechanistic insights and overall impact of this study.

      Another weakness was that the main conclusions of this study were all dependent on an artificial lysosomal damage agent. It will be beneficial to confirm key findings in other contexts involving lysosomal damage.

      We would like to thank Reviewer #1 for the positive and constructive comments on our study. For a main concern regarding the molecular mechanism by which TAB proteins are activated in response to lysosomal damage, we have added the experimental results to support that the lysosomal accumulation of K63 ubiquitin chains serves as a trigger to activate the TAB-TAK1 pathway. We also investigated and discussed the role of LUBAC-mediated M1 ubiquitin chains in NF-kB activation and the effects of other lysosomal-damaging compounds. Please see the response to “Reviewer #3 (Public review): Suggestions:”.

      Reviewer #2 (Public review):

      Summary:

      Endo et al. investigate the novel role of ubiquitin response upon lysosomal damage in activating cellular signaling for cell survival. The authors provide a comprehensive transcriptome and proteome analysis of aging-related cells experiencing lysosomal damage, identifying transcription factors involved in transcriptome and proteome remodeling with a focus on the NF-κB signaling pathway. They further characterized the K63-ubiquitin-TAB-TAK1-NF-κB signaling axis in controlling gene expression, inflammatory responses, and apoptotic processes.

      Strengths:

      In the aging-related model, the authors provide a comprehensive transcriptome and characterize the K63-ubiquitin-TAB-TAK1-NF-κB signaling axis. Through compelling experiments and advanced tools, they elucidate its critical role in controlling gene expression, inflammatory responses, and apoptotic processes.

      Weaknesses:

      The study lacks deeper connections with previous research, particularly:

      • The established role of TAB-TAK1 in AMPK activation during lysosomal damage

      • The potential significance of TBK1 in NF-κB signaling pathways

      We would like to thank Reviewer #2 for the helpful comments on our study. To achieve a more comprehensive understanding of the signaling pathways involved in the lysosomal damage response, we investigated additional related signal mediators, such as TBK1 and LUBAC. The citations related to AMPK have been incorporated.

      Reviewer #3 (Public review):

      Summary:

      The response to lysosomal damage is a fast-moving and timely field. Besides repair and degradation pathways, increasing interest has been focusing on damaged-induced signaling. The authors conducted both transcriptomics and proteomics to characterize the cellular response to lysosomal damage. They identify a signaling pathway leading to activation of NFkappaB. Based on this and supported by Western blot and microscopy data, the authors nicely show that TAB2/3 and TAK1 are activated at damaged lysosomes and kick off the pathway to alter gene expression, which induces cytokines and protect from cell death. TAB2/3 activation is proposed to occur through K63 ubiquitin chain formation. Generally, this is a careful and well conducted study that nicely delineates the pathway under lysosomal stress. The "omics" data serves as a valuable resource for the field. More work should be invested into how TAB2/3 are activated at the damaged lysosomes, also to increase novelty in light of previous reports.

      Strengths:

      Generally, this is a careful and well-conducted study that nicely delineates the pathway under lysosomal stress. The "omics" data serves as a valuable resource for the field.

      Weaknesses:

      More work should be invested into how TAB2/3 are activated at the damaged lysosomes, also to increase novelty in light of previous reports. Moreover, different damage types should be tested to probe relevance for different pathophysiological conditions.

      We would like to thank Reviewer #3 for the valuable comments on our study. We have added the experimental results to address two concerns raised by Reviewer #3. Please see the response to “Reviewer #3 (Public review): Suggestions:”.

      Suggestions:

      (1) A recent paper claims that NFkappaB is activated by Otulin/M1 chains upon lysosome damage through TBK1 (PMID: 39744815). In contrast, Endo et al. nicely show that ubiquitylation is needed (shown by TAK-243) for NFkB activation but only have correlative data to link it specifically to K63 chains. On page 15, line 11, the authors even argue a "potential" involvement of K63. This point should be better dealt with. Can the authors specifically block K63 formation? K63R overexpression or swapping would be one way. Is the K63 ligase ITCH involved (PMID: 38503285) or any other NEDD4-like ligase? This could be compared to LUBAC inhibition. Also, the point needs to be dealt with more controversially in the discussion as these are alternative claims (M1 vs K63, TAB vs TBK1).

      It is well-characterized that the NZF domain of TAB proteins preferentially associates with K63-linked ubiquitin chains. Therefore, we performed the add-back experiment using siRNA-resistant TAB2 WT and mutants incapable of binding to K63-linked ubiquitin chains, dNZF and E685A, to elucidate the requirement of K63 ubiquitin chains for TAK1 activation. We investigated whether the add-back of TAB2 mutants rescues the activation of TAK1 in TAB2-depleted cells (Fig. 2E). TAB2 WT, but not dNZF and E685A, rescued TAK1 activation in response to LLOMe, suggesting that the specific interaction of TAB proteins and K63 ubiquitin chains is a key mechanism to activate TAK1. We also found that the treatment of an E1 inhibitor TAK-243 effectively prevented the lysosomal accumulation of K63 ubiquitin chains, but TAB2 was recruited to damaged lysosomes (Fig. S2B). This suggests that the recruitment of TAB proteins to damaged lysosomes is independent of the association with K63 ubiquitin chains. Collectively, it is postulated that TAB proteins require interaction with K63 ubiquitin chains for TAK1 activation, but not for recruitment to damaged lysosomes. We have added the sentences (p9, lines 7-20, and p10, lines 8-10).

      Next, we confirmed that LUBAC functions are essential for NF-kB activation in the lysosomal damage response. RNF31/HOIP is a component of LUBAC that catalyzes M1 ubiquitination. The depletion of RNF31 showed no significant effects on TAK1 activation, but abolished IKK activation (Fig. S4G). It is well-characterized that LUBAC-mediated M1 ubiquitin chains recruit IKK subunits and transduce the signaling to downstream in the canonical pathway. We assume that K63 ubiquitin chains in damaged lysosomes initially activate TAB-TAK1 and trigger LUBAC-mediated M1 ubiquitination, and subsequently, M1 ubiquitination functions to recruit the IKK complex. Consequently, activated TAK1 phosphorylates IKK subunits in damaged lysosomes, leading to NF-kB activation. We also examined whether TBK1 is involved in the activation of NF-kB. TBK1 was phosphorylated upon LLOMe, and depletion of TAB and TAK1 resulted in a slight reduction of TBK1 phosphorylation (Fig. S4D, E). The treatment of a TBK1 inhibitor BX-795 exhibited no or little effects on TAK1 activation, but abolished phosphorylation of IKK and IkBa (Fig. S4F). These suggest that TBK1 is required for the activation of NF-kB. We have added the sentences (p13, line 13-p14, line 10).

      As mentioned by Reviewer #3, it is important to identify the E3 ligase responsible for K63 ubiquitination in the lysosomal damage response. We have been aiming to identify such E3 ligase(s). However, depletions of ITCH and other E3 ligases that have been tested exhibited no or little effects on K63 ubiquitination and TAK1 activation.  We would like to explore E3 ligase(s) in future study.

      (2) It would be interesting to know what the trigger is that induces the pathway. Lipid perturbation by LLOMe is a good model, but does activation also occur with GPN (osmotic swelling) or lipid peroxidation (oxidative stress) that may be more broadly relevant in a pathophysiological way? Moreover, what damage threshold is needed? Does loss of protons suffice? Can activation be induced with a Ca2+ agonist in the absence of damage?

      To further clarify the initial trigger that induces TAB-TAK1 activation coupled with lysosomal damage, we examined other damage sources, GPN and DC661, which induce hyperosmotic stress and lipid peroxidation in lysosomes, respectively, thereby resulting in lysosomal membrane damage. Under our experimental conditions, the treatment of these compounds did not result in significant accumulation of Gal-3, indicating a reduced level of lysosomal membrane permeabilization compared with LLOMe (Fig. S2C, D), and no or little TAK1 activation was observed (Fig. S2E). TAB proteins require their association with K63 ubiquitin chains for TAK1 activation. It is therefore postulated that the severe lysosomal membrane permeabilization that triggers the formation and cytosolic exposure of K63 ubiquitin chains may be a determinant of TAB-TAK1 activation. In our future work, we would like to examine broad stimulation of lysosomal damage and further elucidate the initial mechanism of TAB-TAK1 activation. We have added the sentences (p9, line 21-p10, line 7).

      (3) The authors nicely define JNK and p38 activation. This should be emphasized more, possibly also in the abstract, as it may contribute to the claim of increased survival fitness.

      We further tested whether the inhibition of JNK affects the anti-apoptotic effect (Fig. S5B). The inhibition of JNK resulted in an increase in the cleaved caspase-3. This suggests that the anti-apoptotic action in the lysosomal damage response requires JNK as well as IKK. We have added the sentences in results to emphasize the pivotal role of stress-induced MAPKs (p15, lines 7-11).

      Reviewer #1 (Recommendations for the authors):

      (1) Although the ubiquitination-TAB-TAK1-IKK axis was previously characterized in other contexts, specific evidence supporting lysosomal recruitment of these components by ubiquitination during lysosome damage would be beneficial.

      We found that the treatment of an E1 inhibitor TAK-243 abolished the lysosomal accumulation of K63 ubiquitin chains, but TAB2 and TAK1 were recruited to damaged lysosomes (Fig. S2B). This suggests that the recruitment of TAB proteins to damaged lysosomes is independent of the association with K63-linked ubiquitin chains. Next, we investigated whether the add-back of TAB2 mutants incapable of binding K63 ubiquitin chains rescues the activation of TAK1 in TAB2-depleted cells (Fig. 2E). K63 ubiquitin binding of TAB2 was essential for TAK1 activation in response to LLOMe. Taken together, it is suggested that TAB proteins require their interaction with K63 ubiquitin chains for TAK1 activation, but not for recruitment to damaged lysosomes. We have added the sentences (p9, lines 7-20, and p10, lines 8-10). Please also see the response to “Reviewer #3 (Public review): Suggestions:”.

      (2) The activation of p38 and JNK by lysosomal damage does not fit well into the main conclusions of the paper, since IKK knockdown was sufficient to block cellular resistance to apoptosis (caspase cleavage in Fig. 5f). Are p38 and JNK also important for cell survival during lysosomal damage?

      We found that the inhibition of JNK resulted in an increase in the cleaved caspase-3, suggesting that the anti-apoptotic action in the lysosomal damage response requires both IKK and JNK (Fig. S5B). We have added the sentences (p15, lines 7-11).

      (3) Cell death tests are recommended to support the conclusions related to apoptosis.

      As suggested by Reviewer #1, we performed the cell death assay using propidium iodide (PI) and confirmed that HeLa cells co-treated with LLOMe and TAK-243 or HS-276 exhibited increased cell death (Fig. 5E). This indicates a direct correlation between the degree of caspase-3 cleavage and cell death, possibly apoptosis.

      (4) Page 8, line 19-21, gal3 is not exposed upon lysosomal damage. It is recruited from the cytosol by the exposed beta-galactoside-containing glycans on lysosomal membrane proteins.

      We have corrected the corresponding sentence (p7, lines 17-20).

      (5) Carefully checking grammar throughout the text is recommended. Below are a few examples:

      a) Page 4, line 10, remove "that".

      b) "K63 ubiquitin" shall be replaced with "K63 ubiquitination" or "K63 ubiquitin chains".

      c) Page 8, line 9, "remain" should be "remains".

      We have carefully checked the revised manuscript.

      Reviewer #2 (Recommendations for the authors):

      Despite the novelty and significance of these findings in advancing the field, several technical and experimental limitations require further clarification:

      We have responded to each comment. Please see below.

      The manuscript should introduce or discuss previous research showing that TAB-TAK1 facilitates AMPK activation during lysosomal damage and TAK1's increased association with damaged lysosomes (PMID: 31995728).

      We have added the reference (PMID: 31995728) and the sentences (p17, lines 15-20).

      Figure 2A: The differential LAMP1 staining intensity between control and LLOMe-treated cells needs explanation. The weaker LAMP1 signal in control and puncta changes, especially during 5-minute LLOMe treatment, require detailed clarification

      We have added the explanation (p8, lines 17-21).

      Recent literature (PMID: 34585663) reports TBK1 activation during lysosomal damage. The authors should investigate or discuss whether TBK1 potentially contributes to NF-κB signaling in this context.

      We experimentally investigated whether TBK1 is involved in the TAB-TAK1 pathway. We confirmed that TBK1 was activated upon LLOMe (Fig. S4D). Depletions of TAB and TAK1 exhibited a modest decrease in TBK1 phosphorylation (Fig. S4E). The inhibition of TBK1 by BX-795 did not affect TAK1 activation, but abolished phosphorylation of IKK and IkBa (Fig. S4F). This suggests that TBK1 is required for NF-kB activation. We have added the reference (PMID: 34585663) and the sentences (p13, lines 13-21, p14, lines 8-10, and p18, lines 15-20).

      The introduction of lysosomal damage response lacks comprehensive mechanistic information. For example, while ESCRT is discussed, other critical mechanisms such as lipid transfer and stress granule formation in lysosomal repair should be incorporated. Moreover, mTOR and AMPK signaling pathways undergo significant changes upon lysosomal damage.

      We have added the sentences (p3, lines 16-18, and p3, line 21-p4, line 1).

      The statement "lysosomal permeabilization causes the dissociation of mTORC1 from lysosomes" should explicitly reference PMID: 29625033.

      We have added the suggested reference (PMID: 29625033, p4, line 19).

      The claim that "The elimination of damaged lysosomes through lysophagy requires a period of more than half a day" needs a specific publication citation.

      We have added the reference (PMID: 23921551) to claim the time-scale of lysosomal clearance (p4, line 21).

      Figure 1G: The label "WO after 2h" lacks explanation in the figure legend and requires detailed interpretation.

      To simplify the figures, we have deleted the label “WO after 2 h” (Fig. 1G, 3F, 5D, F-J, S4G, S5A). Instead, we have added the explanation in the figure legends (Fig. 1G).

      Reviewer #3 (Recommendations for the authors):

      (1) page 8, line 13: it is recommended to phrase colocalisation "at" damaged lysosomes rather than "in" damaged lysosomes as the resolution does not allow the claim of influx into lysosomes.

      We have corrected the word (p8, line 17).

      (2) page 11, line 22: why is "whereas" used to link two events driven by the same mechanism.

      We have corrected the word (p13, line 8).

    1. eLife Assessment

      This important work describes the adaptation and evaluation of two red-shifted anion channelrhodopsins (RubyACRs) for optogenetic inhibition in Drosophila. The study provides convincing evidence for the effectiveness of RubyACRs in fly neurons, including electrophysiology, calcium imaging, and behavioral analysis. With minor revisions to address potential toxicity and compatibility with 2-photon imaging, this paper and the publicly available fly lines it describes will be resources that are of value to the neuroscience community.

    2. Reviewer #1 (Public review):

      Summary:

      This study by Bushey et al., focuses on two newly released red-shifted anion-Channelrhodopsins (A1ACR and HfACR, referred as Ruby-ACRs) in Drosophila. Here, the authors use a combination of electrophysiology, calcium imaging, and behavioral analyses to demonstrate the advantages of Ruby-ACRs over previous optogenetic silencers like the green-shifted GtACR1 and the blue-shifted GtACR2: higher photocurrent, faster kinetics, and operating at a light spectrum range that prevents unwanted behavioral effects in the fly. The availability of these new red-shifted silencers constitutes a great addition to the Drosophila genetic toolkit.

      Strengths:

      (1) The authors generate both UAS and LexAop RubyACR reagents and test them in a variety of preparations (electrophysiological recordings, calcium imaging, different behavioral paradigms) that cover the breadth of the fly research environment.

      (2) The optical stimulation parameters are carefully measured and characterized. Especially impressive is that they managed to titrate over both wavelength and intensity across their various assays. This provides a comprehensive dataset to the community.

      (3) Tools are made available to the community through the stock center.

      Weaknesses:

      (1) The authors could better describe their construct and choice of parameters for the chosen construct. I am specifically wondering about the following points:

      a) Why use that particular backbone (not the most commonly used one across recent literature (pJFRC7 is more common).

      b) Why do the CsChrimson and GTACR1 have a Kir sequence in it, and why did the authors not put this in the RubyACRs? I would also prefer if authors don't refer to GtACR1 as GTACR-Kir in text (e.g., in line 72); instead, they should either refer to it as GtACR1 or GtACR1-kir-mVenus (based on the full genotype mentioned in their table at the end). Same for CsChrimson-kir. From what I understand, this is just a Kir trafficking sequence and not the entire Kir sequence, which can confuse the readers.

      c) Finally, I would also encourage authors to deposit plasmids on Addgene.

      (2) Figure 2 is interesting, but it is a bit unfortunate that there is a YFP baseline in most of the samples here (except Chrimson88; this should also be mentioned). I wonder how the YFP baseline impacts this data. Could the high intensity stimulation (red light) lead to bleaching of YFP or tdTomato that reduces the baseline in the green channel? All this also makes me wonder if authors tried tagging the RubyACRs with other fluorophores or non-fluorescent tags and how that impacted their functioning. Non-YFP-tagged versions would be more useful for applications involving GCaMP imaging.

      (3) Another point for Figure 2: Since RubyACRs seem to have such a broad activation range, I wonder how much the imaging light (920nm) impacts the baseline in these experiments. If there were plots without the red light stimulation and just varying imaging light intensity, that could be useful to the research community.

      (4) Also, for Figures 2C - D, in the methods authors indicate that the stimulation light intensities were progressively increased. Could this lead to desensitization of opsin? Wouldn't randomized intensities be a better way to do this? Perhaps it should be mentioned as a caveat.

      (5) In Figure 3E the bottom middle panel Vglut-Gal4,GtACR1 shows a major increase in walking at light onset. This seems very different than all other conditions, and I could not find any discussion of this. It would help if some explanation were provided for this.

    3. Reviewer #2 (Public review):

      Summary:

      Bushey et al. investigate the feasibility of using RubyACRs, specifically A1ACR1 and HfACR1 (described previously in (Govorunova et al., 2020)) as red-shifted inhibitory opsins in Drosophila melanogaster. The study employs a wide range of techniques to demonstrate successful neuronal inhibition. Electrophysiology experiments established that HfACR1 was most effective at hyperpolarizing cells, compared to A1ACR1 and GtACR1; both RubyACRs also appeared to be more effective than GtACR1 when the latter was actuated by green light. The authors further demonstrate successful neuronal inhibition using calcium imaging. RubyACRs were also shown to be useful in in vivo behavioral setups, specifically in spontaneous locomotion, associative learning, and courtship paradigms. In the courtship assay, in particular, the authors test multiple wavelengths of light at various light intensities, thus providing a rigorous analysis of the RubyACRs' efficacy under different light conditions.

      Strengths:

      The work provides the Drosophila field with a promising new tool. Red-shifted opsins are particularly advantageous in behavioral assays as red light penetrates the cuticle better than green or blue light, and provides less visual stimulation to the fly. It is also ideal for imaging as it allows for simultaneous optogenetic stimulation and GCamp imaging. A particular strength of the paper is the direct demonstration of RubyACR's capacity to inhibit neurons via electrophysiology and calcium imaging. Furthermore, inhibition effects in the three behavioral assays are strong and convincing. Given the apparent efficacy of RubyACRs and the advantages of a red-sensitive anion channelrhodopsin, this tool has great potential.

      Weaknesses:

      This work convincingly demonstrates the efficacy and potential utility of RubyACRs in Drosophila for imaging and behavior. However, the lethality/toxicity of RubyACRs is a relevant concern that should be addressed in-depth rather than glossed over, as it may pose a major obstacle to use. Discussing this issue in the present study will also help guide potential users and will set the stage for potential future efforts to ameliorate RubyACRs as optogenetic inhibitors.

      Major concerns:

      (1) Table 1 demonstrates high lethality in the RubyACRs compared to GtACR1. For example, in the MI04979-VGlut driver, GtACR1 expression resulted in 32.9% lethality, while HfACR1 expression resulted in 98.7% lethality. This lethality presents an obstacle to the potential adoption of this tool, and should be discussed in detail, rather than in passing. The authors might like to present "% lethality" rather than "% survived", as the former is more relevant when discussing the relative yield and health of flies that can be used in experiments.

      (2) In Figure 3D, driver>opsin flies have lower locomotion during the baseline (i.e., dark) phase, compared to opsin-only controls or GtACR1 flies. For some comparisons, flies are walking around 10-fold slower. For example, in the case of VGlut-GAL4>HfACR1, test flies are walking at <1 mm/s, while "Empty" test flies are walking at ~10 mm/s. This suggests that, for these drivers, neuronal and/or network function is affected. It opens the possibility that the lethality and locomotor defects could be due to cell-autonomous toxicity. We ask the authors to provide a description of this effect in the Results and to discuss it in the Discussion. Relatedly, VGlut-GAL4>GtACR1 flies in red light exhibit a locomotion increase, but this data is not mentioned in the text. The use of differing scales for the Y-axes in these panels can be confusing when the reader is expected to compare velocity across different panels. It would be best if the y-axes were set to a single range, e.g., 0 to 12 mm/s.

      (3) Lethality in broad drivers could result from cell-autonomous toxicity or neuronal dysfunction resulting from RubyACR expression. Ideally, the authors would address or even investigate the possible mechanisms of toxicity of the RubyACRs. Do cells and/or synapses expressing RubyACRs have normal morphology and function? For example, the authors could compare cell survival between flies with RubyACR expression and flies with a fluorescent protein with no opsin. The authors may also want to present lethality data for other, less broad drivers (such as MB320C, which was used for the associative memory assay) in order to demonstrate whether this problem is confined to broad drivers such as VGlut-GAL4, or if this is a problem with narrow drivers as well. If new experiments are not possible, these issues should at least be mentioned in the Discussion.

      Minor concerns

      (1) The specific method used for quantifying lethality is mentioned briefly in Table 1 but is not detailed in the Methods. The authors derive lethality by comparing to a sibling control group with either the opsin or the driver alone, but the opsin alone or driver alone may cause some lethality by themselves. We suggest the use of a viability assay, e.g. (Rockwell et al., 2019), which would give potential users a clearer picture of which developmental stage is most affected by opsin expression, as well as allow opsin-only, driver-only and experimental groups to be assessed separately (lethality would then be reported as the % of embryos that reach each stage of development, and eventually enclosure).

      (2) For the calcium imaging analysis in Figure 2, the U-shaped curve observed for mean ΔF/F0 for A1ACR1 and HfACR1 may not be due to actual desensitization for the channels, as the authors suggest (lines 143-145), but may be due simply to a shifting baseline. The authors use the 5-s period preceding stimulation onset as F0, but in some cases (e.g., HfACR1 at 250 uW/mm2), calcium fluorescence rises above baseline and remains high post-stimulation (ΔF/F0 of +0.5, which we observe is the same magnitude as the ΔF/F0 of -0.5 observed during inhibition), thus affecting the ΔF/F0 for subsequent trials. The authors should discuss this incomplete recovery in the text, or (if available) use a static channel instead to provide a stable F0 for calculating ΔF/F0. Alternatively, if the authors wish to rigorously test the hypothesis that high light intensity indeed results in desensitization of these channels, they may consider using different flies for each light intensity or longer inter-stimulus intervals.

      (3) For Figure 3C (Flybowl assay), the authors mention that "simply expressing the opsins decreased baseline locomotor activity compared to empty driver lines". However, the "Empty" controls in 3C appear to refer to opsin-only controls, not driver-only controls. The driver-only controls are not presented in the figure. The use of "empty" differs between the text and the figure, as the text refers to "empty" driver lines, while the figure uses "empty" to apparently refer to opsin-only controls. We recommend changing the terminology across all figures to be unambiguous, e.g., by using "opsin-only" or "driver-only" as opposed to the ambiguous "empty". In addition, the fact that opsin-only controls move less than driver-only controls may suggest some toxicity as a result of the opsin-only construct; this should be discussed further.

      (4) Figures 4 and 5 lack the reporting of driver-only controls.

      (5) Figures 3 and 4 lack positive controls; that is, the benchmarking of the efficacy of RubyACRs in their respective behavioral paradigms against a known inhibitor, e.g., GtACR1 with green light. To confirm that this GtACR1 transgene is functional, the authors could include GtACR1 with green light as a positive control for these two figures, as they have done for Figure 5-supplement 2 and 3.

      (6) Several citations are missing. In their discussion, the authors highlight that shorter wavelengths of light are more attenuated by tissue (lines 278-281); this should be accompanied by the relevant citations (Inagaki et al., 2014). Similarly, the claim that behavioral experiments exhibit greater sensitivity to shorter wavelengths should be substantiated (lines 281-283).

      References:

      Govorunova EG, Sineshchekov OA, Li H, Wang Y, Brown LS, Spudich JL. 2020. RubyACRs, nonalgal anion channelrhodopsins with highly red-shifted absorption. Proc Natl Acad Sci U S A 117:22833-22840.

      Inagaki HK, Jung Y, Hoopfer ED, Wong AM, Mishra N, Lin JY, Tsien RY, Anderson DJ. 2014. Optogenetic control of Drosophila using a red-shifted channelrhodopsin reveals experience-dependent influences on courtship. Nat Methods 11:325-332.

      Rockwell AL, Beaver I, Hongay CF. 2019. A direct and simple method to assess Drosophila melanogaster's viability from embryo to adult. J Vis Exp e59996.

    4. Reviewer #3 (Public review):

      Summary:

      This study by Bushey et al. adapts and evaluates two newly developed red-shifted optogenetic inhibitors, A1ACR1 and HfACR1, collectively referred to as RubyACRs, for neuronal silencing in Drosophila melanogaster. Traditional optogenetic inhibitors such as GtACR1 and GtACR2 are activated by green (~515 nm) and blue (~470 nm) light, respectively, which poses several limitations in Drosophila. Specifically, shorter-wavelength light suffers from reduced tissue penetration and increased absorption, and is visible to flies, potentially confounding behavioral assays, particularly those involving visual processing. In contrast, RubyACRs are activated by red light (~610-660 nm), which penetrates the cuticle more effectively and thus can be more potent in manipulating fly behavior. In the current manuscript, the authors first demonstrate that both A1ACR1 and HfACR1 can be robustly expressed in fly neurons and are properly trafficked to the plasma membrane. Upon red-light stimulation, both opsins produce strong and sustained hyperpolarization in larval motor neurons, outperforming GtACR1 in both magnitude and temporal dynamics. Next, using two-photon calcium imaging in the visual system, the authors further demonstrate that activation of RubyACRs significantly reduces GCaMP6s signal, indicating that they can reliably inhibit neuronal activity. Importantly, unlike reported in some mammalian studies, RubyACRs do not appear to trigger paradoxical depolarization at axon terminals in the fly visual system, as no evidence of aberrant depolarization is observed in motion-detecting Mi1 neurons.

      In the second part of the manuscript, the authors characterize the effects of RubyACRs on fly behavior (walking, learning, and courtship song). Using the inhibition of genetically labelled neurons that regulate these behaviors, the authors demonstrate that stimulation of RubyACRs leads to potent suppression of locomotion, courtship song, or dopamine-dependent associative learning.

      Strengths:

      Altogether, the experiments conducted in this manuscript demonstrate that RubyACRs are powerful tools for optogenetic inhibition in Drosophila, with advantages in spectral compatibility, behavioral specificity, and potential applications in vivo two-photon calcium imaging.

      Weaknesses:

      The manuscript is strong, but it can be further improved with a few additional analyses and minor revisions. Especially, a more detailed evaluation of RubyACRs with two-photon excitation will help clarify to what extent these opsins can be simultaneously used together with green GECIs, such as GCaMPs.

    5. Author response:

      We thank the reviewers for their thoughtful and thorough consideration of the work. We appreciate the positive reception they give the work, and plan to address several of the comments with further experiments. To outline that work (and ensure that we are on the right track to addressing those concerns), we summarize the core concerns that prompt new experiments:

      (1) Does the YFP tag on the ACRs interfere with simultaneous GCaMP imaging of RubyACR-expressing cells and could bleaching of the YFP complicate interpretation of the experiments here?

      We will test whether 920 nm (2p) and 650 nm (1p) excitation cause YFP bleaching that interferes with interpretation of inhibitory calcium (i.e. GCaMP) signals. Because the YFP tag enhances opsin sensitivity, we prioritized these tagged RubyACRs for initial characterization. FLAG-tagged ACRs are in progress, but will take time to fully characterize. Considering that the RubyACR-EYFP versions work very well, and in many cases people will want the YFP tag, either for visualizing expression or to maximize sensitivity, we feel the current work is a valuable contribution on its own. Indeed several labs have already requested these lines.

      (2) Are the ACRs activated by two-photon illumination?

      We will examine GCaMP signals at increasing 2p intensities to determine whether imaging unintentionally activates RubyACRs, as well as whether 2p illumination could be used for intentional opsin activation.

      (3) How toxic is the expression of these opsins?

      We will update the quantification of toxicity in Table 1 to include all the drivers we used in this study. In fact the toxicity we observed was primarily with the vGlut driver, which was why that was the only information in the table. The other drivers we used did not appreciably reduce survival rate, but showing the one case where it did have a big effect left a strong and understandably inaccurate impression that toxicity was a big pitfall. We note that the widely used CSChrimson has similar % survival to the RubyACRs when expressed with these vGlut drivers.

      We also plan to examine whether ACR expression leads to cell-autonomous perturbations. We will determine whether expression leads to some frequency of neuronal cell death, and we will evaluate whether any morphological effects occur.

      We will also clarify in the Discussion that potential toxicity may be driver-specific (as it is here) and should be evaluated case-by-case by investigators using the tool.

      (4) Use functional imaging to confirm inhibition of the neurons used only for behavioral experiments (pIP10 & PPL1-γ1pedc)

      We will perform these imaging experiments. One caveat is that inhibition may not be readily detectable with GCaMP, as the resting calcium levels in pIP10 and PPL1-γ1pedc neurons may already be quite low. This differs from the non-spiking Mi1 neurons, where inhibition was clearly observed with GCaMP. For this reason, we consider the behavioral results stronger evidence of efficacy, but we agree that imaging could provide useful supporting evidence, recognizing that a negative result would be difficult to interpret.

      (5) Confirm that the GtACR1 will inhibit locomotion in the flybowl when activated with green light, its spectral peak.

      We will perform this benchmark experiment. Please note that our intention with this study was to find an effective red-light activated opto-inhibitor because these wavelengths are much less perturbing to behavior. In that respect, regardless of GtACR1’s performance with green light, the RubyACRs clearly provide important new tools for Drosophila behavioral neuroscience.

    1. eLife Assessment

      This manuscript is useful as it demonstrates that Rv2577, a Fe³⁺/Zn²⁺-dependent metallophosphatase, is secreted by Mycobacterium bovis BCG and localizes to the nucleus of mammalian cells, altering transcriptional and inflammatory responses. However, the study is incomplete as it lacks activity validation in macrophage cells and with virulent Mycobacterium tuberculosis strains. It is necessary to confirm Rv2577 secretion from a virulent strain and to clarify the direct or indirect role of MmpE in modulating host pathways, together with mechanistic insight into how MmpE influences lysosomal biogenesis and trafficking.

    2. Reviewer #1 (Public review):

      Summary:

      Review of the manuscript titled " Mycobacterial Metallophosphatase MmpE acts as a nucleomodulin to regulate host gene expression and promotes intracellular survival".

      The study provides an insightful characterization of the mycobacterial secreted effector protein MmpE, which translocates to the host nucleus and exhibits phosphatase activity. The study characterizes the nuclear localization signal sequences and residues critical for the phosphatase activity, both of which are required for intracellular survival.

      Strengths:

      (1) The study addresses the role of nucleomodulins, an understudied aspect in mycobacterial infections.

      (2) The authors employ a combination of biochemical and computational analyses along with in vitro and in vivo validations to characterize the role of MmpE.

      Weaknesses:

      (1) While the study establishes that the phosphatase activity of MmpE operates independently of its NLS, there is a clear gap in understanding how this phosphatase activity supports mycobacterial infection. The investigation lacks experimental data on specific substrates of MmpE or pathways influenced by this virulence factor.

      (2) The study does not explore whether the phosphatase activity of MmpE is dependent on the NLS within macrophages, which would provide critical insights into its biological relevance in host cells. Conducting experiments with double knockout/mutant strains and comparing their intracellular survival with single mutants could elucidate these dependencies and further validate the significance of MmpE's dual functions.

      (3) The study does not provide direct experimental validation of the MmpE deletion on lysosomal trafficking of the bacteria.

      (4) The role of MmpE as a mycobacterial effector would be more relevant using virulent mycobacterial strains such as H37Rv.

    3. Reviewer #2 (Public review):

      Summary:

      In this paper, the authors have characterized Rv2577 as a Fe3+/Zn2+ -dependent metallophosphatase and a nucleomodulin protein. The authors have also identified His348 and Asn359 as critical residues for Fe3+ coordination. The authors show that the proteins encode for two nuclease localization signals. Using C-terminal Flag expression constructs, the authors have shown that the MmpE protein is secretory. The authors have prepared genetic deletion strains and show that MmpE is essential for intracellular survival of M. bovis BCG in THP-1 macrophages, RAW264.7 macrophages, and a mouse model of infection. The authors have also performed RNA-seq analysis to compare the transcriptional profiles of macrophages infected with wild-type and MmpE mutant strains. The relative levels of ~ 175 transcripts were altered in MmpE mutant-infected macrophages and the majority of these were associated with various immune and inflammatory signalling pathways. Using these deletion strains, the authors proposed that MmpE inhibits inflammatory gene expression by binding to the promoter region of a vitamin D receptor. The authors also showed that MmpE arrests phagosome maturation by regulating the expression of several lysosome-associated genes such as TFEB, LAMP1, LAMP2, etc. These findings reveal a sophisticated mechanism by which a bacterial effector protein manipulates gene transcription and promotes intracellular survival.

      Strength:

      The authors have used a combination of cell biology, microbiology, and transcriptomics to elucidate the mechanisms by which Rv2577 contributes to intracellular survival.

      Weakness:

      The authors should thoroughly check the mice data and show individual replicate values in bar graphs.

    4. Reviewer #3 (Public review):

      Summary:

      In this manuscript titled "Mycobacterial Metallophosphatase MmpE Acts as a Nucleomodulin to Regulate Host Gene Expression and Promote Intracellular Survival", Chen et al describe biochemical characterisation, localisation and potential functions of the gene using a genetic approach in M. bovis BCG and perform macrophage and mice infections to understand the roles of this potentially secreted protein in the host cell nucleus. The findings demonstrate the role of a secreted phosphatase of M. bovis BCG in shaping the transcriptional profile of infected macrophages, potentially through nuclear localisation and direct binding to transcriptional start sites, thereby regulating the inflammatory response to infection.

      Strengths:

      The authors demonstrate using a transient transfection method that MmpE when expressed as a GFP-tagged protein in HEK293T cells, exhibits nuclear localisation. The authors identify two NLS motifs that together are required for nuclear localisation of the protein. A deletion of the gene in M. bovis BCG results in poorer survival compared to the wild-type parent strain, which is also killed by macrophages. Relative to the WT strain-infected macrophages, macrophages infected with the ∆mmpE strain exhibited differential gene expression. Overexpression of the gene in HEK293T led to occupancy of the transcription start site of several genes, including the Vitamin D Receptor. Expression of VDR in THP1 macrophages was lower in the case of ∆mmpE infection compared to WT infection. This data supports the utility of the overexpression system in identifying potential target loci of MmpE using the HEK293T transfection model. The authors also demonstrate that the protein is a phosphatase, and the phosphatase activity of the protein is partially required for bacterial survival but not for the regulation of the VDR gene expression.

      Weaknesses:

      (1) While the motifs can most certainly behave as NLSs, the overexpression of a mycobacterial protein in HEK293T cells can also result in artefacts of nuclear localisation. This is not unprecedented. Therefore, to prove that the protein is indeed secreted from BCG, and is able to elicit transcriptional changes during infection, I recommend that the authors (i) establish that the protein is indeed secreted into the host cell nucleus, and (ii) the NLS mutation prevents its localisation to the nucleus without disrupting its secretion.

      Demonstration that the protein is secreted: Supplementary Figure 3 - Immunoblotting should be performed for a cytosolic protein, also to rule out detection of proteins from lysis of dead cells. Also, for detecting proteins in the secreted fraction, it would be better to use Sauton's media without detergent, and grow the cultures without agitation or with gentle agitation. The method used by the authors is not a recommended protocol for obtaining the secreted fraction of mycobacteria.

      Demonstration that the protein localises to the host cell nucleus upon infection: Perform an infection followed by immunofluorescence to demonstrate that the endogenous protein of BCG can translocate to the host cell nucleus. This should be done for an NLS1-2 mutant expressing cell also.

      (2) In the RNA-seq analysis, the directionality of change of each of the reported pathways is not apparent in the way the data have been presented. For example, are genes in the cytokine-cytokine receptor interaction or TNF signalling pathway expressed more, or less in the ∆mmpE strain?

      (3) Several of these pathways are affected as a result of infection, while others are not induced by BCG infection. For example, BCG infection does not, on its own, produce changes in IL1β levels. As the authors did not compare the uninfected macrophages as a control, it is difficult to interpret whether ∆mmpE induced higher expression than the WT strain, or simply did not induce a gene while the WT strain suppressed expression of a gene. This is particularly important because the strain is attenuated. Does the attenuation have anything to do with the ability of the protein to induce lysosomal pathway genes? Does induction of this pathway lead to attenuation of the strain? Similarly, for pathways that seem to be downregulated in the ∆mmpE strain compared to the WT strain, these might have been induced upon infection with the WT strain but not sufficiently by the ∆mmpE strain due to its attenuation/ lower bacterial burden.

      (4) CHIP-seq should be performed in THP1 macrophages, and not in HEK293T. Overexpression of a nuclear-localised protein in a non-relevant line is likely to lead to several transcriptional changes that do not inform us of the role of the gene as a transcriptional regulator during infection.

      (5) I would not expect to see such large inflammatory reactions persisting 56 days post-infection with M. bovis BCG. Is this something peculiar for an intratracheal infection with 1x107 bacilli? For images of animal tissue, the authors should provide images of the entire lung lobe with the zoomed-in image indicated as an inset.

      (6) For the qRT-PCR based validation, infections should be performed with the MmpE-complemented strain in the same experiments as those for the WT and ∆mmpE strain so that they can be on the same graph, in the main manuscript file. Supplementary Figure 4 has three complementary strains. Again, the absence of the uninfected, WT, and ∆mmpE infected condition makes interpretation of these data very difficult.

      (7) The abstract mentions that MmpE represses the PI3K-Akt-mTOR pathway, which arrests phagosome maturation. There is not enough data in this manuscript in support of this claim. Supplementary Figure 5 does provide qRT-PCR validation of genes of this pathway, but the data do not indicate that higher expression of these pathways, whether by VDR repression or otherwise, is driving the growth restriction of the ∆mmpE strain.

      (8) The relevance of the NLS and the phosphatase activity is not completely clear in the CFU assays and in the gene expression data. Firstly, there needs to be immunoblot data provided for the expression and secretion of the NLS-deficient and phosphatase mutants. Secondly, CFU data in Figure 3A, C, and E must consistently include both the WT and ∆mmpE strain.

    5. Author response:

      Reviewer #1 (Public review):

      Summary:

      Review of the manuscript titled " Mycobacterial Metallophosphatase MmpE acts as a nucleomodulin to regulate host gene expression and promotes intracellular survival".

      The study provides an insightful characterization of the mycobacterial secreted effector protein MmpE, which translocates to the host nucleus and exhibits phosphatase activity. The study characterizes the nuclear localization signal sequences and residues critical for the phosphatase activity, both of which are required for intracellular survival.

      Strengths:

      (1) The study addresses the role of nucleomodulins, an understudied aspect in mycobacterial infections.

      (2) The authors employ a combination of biochemical and computational analyses along with in vitro and in vivo validations to characterize the role of MmpE.

      Weaknesses:

      (1) While the study establishes that the phosphatase activity of MmpE operates independently of its NLS, there is a clear gap in understanding how this phosphatase activity supports mycobacterial infection. The investigation lacks experimental data on specific substrates of MmpE or pathways influenced by this virulence factor.

      We thank the reviewer for this insightful comment and agree that identification of the substrate of MmpE is important to fully understand its role in mycobacterial infection.

      MmpE is a putative purple acid phosphatase (PAP) and a member of the metallophosphoesterase (MPE) superfamily. Enzymes in this family are known for their catalytic promiscuity and broad substrate specificity, acting on phosphomonoesters, phosphodiesters, and phosphotriesters (Matange et al., Biochem J., 2015). In bacteria, several characterized MPEs have been shown to hydrolyze substrates such as cyclic nucleotides (e.g., cAMP) (Keppetipola et al., J Biol Chem, 2008; Shenoy et al., J Mol Biol, 2007), nucleotide derivatives (e.g., AMP, UDP-glucose) (Innokentev et al., mBio, 2025), and pyrophosphate-containing compounds (e.g., Ap4A, UDP-DAGn) (Matange et al., Biochem J., 2015). Although the binding motif of MmpE has been identified, determining its physiological substrates remains challenging due to the low abundance and instability of potential metabolites, as well as the limited sensitivity and coverage of current metabolomic technologies in mycobacteria.

      (2) The study does not explore whether the phosphatase activity of MmpE is dependent on the NLS within macrophages, which would provide critical insights into its biological relevance in host cells. Conducting experiments with double knockout/mutant strains and comparing their intracellular survival with single mutants could elucidate these dependencies and further validate the significance of MmpE's dual functions.

      We thank the reviewer for the comment. In our study, we demonstrate that both the nuclear localization and phosphatase activity of MmpE are required for full virulence (Figure 3D–E). Importantly, deletion of the NLS motifs did not impair MmpE’s phosphatase activity in vitro (Figure 2F), indicating that its enzymatic function is structurally independent of its nuclear localization. These findings suggest that MmpE functions as a bifunctional protein, with distinct and non-overlapping roles for its nuclear trafficking and phosphatase activity. We have expanded on this point in the Discussion section “MmpE Functions as a Bifunctional Protein with Nuclear Localization and Phosphatase Activity”.

      (3) The study does not provide direct experimental validation of the MmpE deletion on lysosomal trafficking of the bacteria.

      We thank the reviewer for the comment. The role of Rv2577/MmpE in phagosome maturation has been demonstrated in M. tuberculosis, where its deletion increases colocalization with lysosomal markers such as LAMP-2 and LAMP-3 (Forrellad et al., Front Microbiol, 2020). In our study, we found that mmpE deletion in M. bovis BCG led to upregulation of lysosomal genes, including TFEB, LAMP1, LAMP2, and v-ATPase subunits, compared to the wild-type strain. These results suggest that MmpE may regulate lysosomal trafficking by interfering with phagosome–lysosome fusion.

      To further validate MmpE’s role in phagosome maturation, we will perform fluorescence colocalization assays in THP-1 macrophages infected with BCG/wt, ∆mmpE, complemented, and NLS-mutant strains. Co-staining with LAMP1 and LysoTracker will allow us to assess whether the ∆mmpE mutant is more efficiently trafficked to lysosomes.

      (4) The role of MmpE as a mycobacterial effector would be more relevant using virulent mycobacterial strains such as H37Rv.

      We thank the reviewer for the comment. Previously, the role of Rv2577/MmpE as a virulence factor has been demonstrated in M. tuberculosis CDC 1551, where its deletion significantly reduced bacterial replication in mouse lungs at 30 days post-infection (Forrellad et al., Front Microbiol, 2020). However, that study did not explore the underlying mechanism of MmpE function. In our work, we found that MmpE enhances M. bovis BCG survival in both macrophages (THP-1 and RAW264.7) and mice (Figure 2A-B, Figure 6A), consistent with its proposed role in virulence. To investigate the molecular mechanism by which MmpE promotes intracellular survival, we used M. bovis BCG as a biosafe surrogate and this model is widely accepted for studying mycobacterial pathogenesis (Wang et al., Nat Immunol, 2025; Wang et al., Nat Commun, 2017; Péan et al., Nat Commun, 2017).

      Reviewer #2 (Public review):

      Summary:

      In this paper, the authors have characterized Rv2577 as a Fe3+/Zn2+ -dependent metallophosphatase and a nucleomodulin protein. The authors have also identified His348 and Asn359 as critical residues for Fe3+ coordination. The authors show that the proteins encode for two nuclease localization signals. Using C-terminal Flag expression constructs, the authors have shown that the MmpE protein is secretory. The authors have prepared genetic deletion strains and show that MmpE is essential for intracellular survival of M. bovis BCG in THP-1 macrophages, RAW264.7 macrophages, and a mouse model of infection. The authors have also performed RNA-seq analysis to compare the transcriptional profiles of macrophages infected with wild-type and MmpE mutant strains. The relative levels of ~ 175 transcripts were altered in MmpE mutant-infected macrophages and the majority of these were associated with various immune and inflammatory signalling pathways. Using these deletion strains, the authors proposed that MmpE inhibits inflammatory gene expression by binding to the promoter region of a vitamin D receptor. The authors also showed that MmpE arrests phagosome maturation by regulating the expression of several lysosome-associated genes such as TFEB, LAMP1, LAMP2, etc. These findings reveal a sophisticated mechanism by which a bacterial effector protein manipulates gene transcription and promotes intracellular survival.

      Strength:

      The authors have used a combination of cell biology, microbiology, and transcriptomics to elucidate the mechanisms by which Rv2577 contributes to intracellular survival.

      Weakness:

      The authors should thoroughly check the mice data and show individual replicate values in bar graphs.

      We kindly appreciate the reviewer for the advice. We will update the relevant mice data in the revised manuscript.

      Reviewer #3 (Public review):

      Summary:

      In this manuscript titled "Mycobacterial Metallophosphatase MmpE Acts as a Nucleomodulin to Regulate Host Gene Expression and Promote Intracellular Survival", Chen et al describe biochemical characterisation, localisation and potential functions of the gene using a genetic approach in M. bovis BCG and perform macrophage and mice infections to understand the roles of this potentially secreted protein in the host cell nucleus. The findings demonstrate the role of a secreted phosphatase of M. bovis BCG in shaping the transcriptional profile of infected macrophages, potentially through nuclear localisation and direct binding to transcriptional start sites, thereby regulating the inflammatory response to infection.

      Strengths:

      The authors demonstrate using a transient transfection method that MmpE when expressed as a GFP-tagged protein in HEK293T cells, exhibits nuclear localisation. The authors identify two NLS motifs that together are required for nuclear localisation of the protein. A deletion of the gene in M. bovis BCG results in poorer survival compared to the wild-type parent strain, which is also killed by macrophages. Relative to the WT strain-infected macrophages, macrophages infected with the ∆mmpE strain exhibited differential gene expression. Overexpression of the gene in HEK293T led to occupancy of the transcription start site of several genes, including the Vitamin D Receptor. Expression of VDR in THP1 macrophages was lower in the case of ∆mmpE infection compared to WT infection. This data supports the utility of the overexpression system in identifying potential target loci of MmpE using the HEK293T transfection model. The authors also demonstrate that the protein is a phosphatase, and the phosphatase activity of the protein is partially required for bacterial survival but not for the regulation of the VDR gene expression.

      Weaknesses:

      (1)   While the motifs can most certainly behave as NLSs, the overexpression of a mycobacterial protein in HEK293T cells can also result in artefacts of nuclear localisation. This is not unprecedented. Therefore, to prove that the protein is indeed secreted from BCG, and is able to elicit transcriptional changes during infection, I recommend that the authors (i) establish that the protein is indeed secreted into the host cell nucleus, and (ii) the NLS mutation prevents its localisation to the nucleus without disrupting its secretion.

      We kindly appreciate the reviewer for the advice and will include the relevant experiments in the revised manuscript. The localization of WT MmpE and the NLS mutated MmpE will be tested in the BCG infected macrophages.

      Demonstration that the protein is secreted: Supplementary Figure 3 - Immunoblotting should be performed for a cytosolic protein, also to rule out detection of proteins from lysis of dead cells. Also, for detecting proteins in the secreted fraction, it would be better to use Sauton's media without detergent, and grow the cultures without agitation or with gentle agitation. The method used by the authors is not a recommended protocol for obtaining the secreted fraction of mycobacteria.

      We agree with the reviewer and we will further validate the secretion of MmpE using the tested protocol.

      Demonstration that the protein localises to the host cell nucleus upon infection: Perform an infection followed by immunofluorescence to demonstrate that the endogenous protein of BCG can translocate to the host cell nucleus. This should be done for an NLS1-2 mutant expressing cell also.

      We will add this experiment in the revised manuscript.

      (2) In the RNA-seq analysis, the directionality of change of each of the reported pathways is not apparent in the way the data have been presented. For example, are genes in the cytokine-cytokine receptor interaction or TNF signalling pathway expressed more, or less in the ∆mmpE strain?

      We thank the reviewer for pointing this out and fully agree that conventional KEGG pathway enrichment diagrams do not convey the directionality of individual gene expression changes within each pathway. While KEGG enrichment analysis identifies pathways that are statistically overrepresented among differentially expressed genes, it does not indicate whether individual genes within those pathways are upregulated or downregulated.

      To address this, we re-analyzed the expression trends of DEGs within each significantly enriched KEGG pathway. The results show that key immune-related pathways, including cytokine–cytokine receptor interaction, TNF signaling, NF-κB signaling, and chemokine signaling, are collectively upregulated in THP-1 macrophages infected with ∆mmpE strain compared to those infected with the wild-type BCG strain. The full list of DEGs will be provided in the supplementary materials. The complete RNA-seq dataset has been deposited in the GEO database, and the accession number will be included in the revised manuscript.

      (3) Several of these pathways are affected as a result of infection, while others are not induced by BCG infection. For example, BCG infection does not, on its own, produce changes in IL1β levels. As the author s did not compare the uninfected macrophages as a control, it is difficult to interpret whether ∆mmpE induced higher expression than the WT strain, or simply did not induce a gene while the WT strain suppressed expression of a gene. This is particularly important because the strain is attenuated. Does the attenuation have anything to do with the ability of the protein to induce lysosomal pathway genes? Does induction of this pathway lead to attenuation of the strain? Similarly, for pathways that seem to be downregulated in the ∆mmpE strain compared to the WT strain, these might have been induced upon infection with the WT strain but not sufficiently by the ∆mmpE strain due to its attenuation/ lower bacterial burden.

      We thank the reviewer for the comment. We will update qRT-PCR data with the uninfected macrophages as a control in the revised manuscript.

      Wild-type Mycobacterium bovis BCG strain still has the function of inhibiting phagosome maturation (Branzk et al., Nat Immunol, 2014; Weng et al., Nat Commun, 2022). Forrellad et al. previously identified Rv2577/MmpE as a virulence factor in M. tuberculosis and disruption of the MmpE gene impairs the ability of M. tuberculosis to arrest phagosome maturation (Forrellad et al., Front Microbiol, 2020). In our study, transcriptomic and qRTPCR data (Figures 4C and G, S4C) show that deletion of mmpE in M. bovis BCG leads to upregulation of lysosomal biogenesis and acidification genes, including TFEB, LAMP1, and vATPase. To further validate MmpE’s role in phagosome maturation, we will perform fluorescence colocalization assays in THP-1 macrophages infected with BCG/wt, ∆mmpE, complemented, and NLS-mutant strains. Co-staining with LAMP1 and LysoTracker will assess whether the ∆mmpE mutant is more efficiently trafficked to lysosomes.

      Furthermore, CFU assays demonstrated that the ∆mmpE strain exhibits markedly reduced bacterial survival in both human THP-1 and murine RAW264.7 macrophages, as well as in mice, compared to the wild-type strain (Figures 4A and C, 6A). These findings suggest that the loss of MmpE compromises bacterial survival, likely due to enhanced lysosomal trafficking and acidification. This supports previous studies showing that increased lysosomal activity promotes mycobacterial clearance (Gutierrez et al., Cell, 2004; Pilli et al., Immunity, 2012).

      (4) CHIP-seq should be performed in THP1 macrophages, and not in HEK293T. Overexpression of a nuclear-localised protein in a non-relevant line is likely to lead to several transcriptional changes that do not inform us of the role of the gene as a transcriptional regulator during infection.

      We thank the reviewer for the comment. We performed ChIP-seq in HEK293T cells is based on the fact that this cell line is widely used in ChIP-based assays due to its high transfection efficiency, robust nuclear protein expression, and well-annotated genome (Lampe et al., Nat Biotechnol, 2024; Marasco et al., Cell, 2022). These features make HEK293T an ideal system for the initial identification of genome wide chromatin binding profiles of novel nuclear effectors such as MmpE.

      Furthermore, we validated the major observations in THP-1 macrophages, including (i) RNAseq of THP-1 cells infected with either WT BCG or ∆mmpE strains revealed significant transcriptional changes in immune and lysosomal pathways (Figure 4A); (ii) Integrated analysis of CUT&Tag and RNA-seq data identified 298 genes in infected THP-1 cells that exhibited both MmpE binding and corresponding expression changes. Among these, VDR was validated as a direct transcriptional target of MmpE using EMSA and ChIP-PCR (Figures 5E-J, S5D-F). Notably, the signaling pathways associated with MmpE-bound genes, including PI3K-Akt-mTOR signaling and lysosomal function, substantially overlap with those transcriptionally modulated in infected THP-1 macrophages (Figures 4B-G, S4B-C, S5C-D), further supporting the biological relevance of the ChIP-seq data obtained from HEK293T cells.

      (5) I would not expect to see such large inflammatory reactions persisting 56 days postinfection with M. bovis BCG. Is this something peculiar for an intratracheal infection with 1x107 bacilli? For images of animal tissue, the authors should provide images of the entire lung lobe with the zoomed-in image indicated as an inset.

      We thank the reviewer for the comment. The lung inflammation peaked at days 21–28 and had clearly subsided by day 56 across all groups (Figure 6B), consistent with the expected resolution of immune responses to an attenuated strain like M. bovis BCG. This temporal pattern is in line with previous studies using intravenous or intratracheal BCG vaccination in mice and macaques, which also demonstrated robust early immune activation followed by resolution over time (Smith et al., Nat Microbiol, 2025; Darrah et al., Nature, 2020).

      In this study, the infectious dose (1×10⁷ CFU intratracheally) was selected based on previous studies in which intratracheal delivery of 1×10⁷CFU produced consistent and measurable lung immune responses and pathology without causing overt illness or mortality (Xu et al., Sci Rep, 2017; Niroula et al., Sci Rep, 2025). We will provide whole-lung lobe images with zoomed-in insets in the revised manuscript.

      (6) For the qRT-PCR based validation, infections should be performed with the MmpEcomplemented strain in the same experiments as those for the WT and ∆mmpE strain so that they can be on the same graph, in the main manuscript file. Supplementary Figure 4 has three complementary strains. Again, the absence of the uninfected, WT, and∆mmpE infected condition makes interpretation of these data very difficult.

      We thank the reviewer for the comment. As suggested, we will conduct the qRT-PCR experiment including the uninfected, WT, ∆mmpE, Comp-MmpE, and the three complementary strains infecting THP-1 cells. The updated data will be provided in the revised manuscript.

      (7) The abstract mentions that MmpE represses the PI3K-Akt-mTOR pathway, which arrests phagosome maturation. There is not enough data in this manuscript in support of this claim. Supplementary Figure 5 does provide qRT-PCR validation of genes of this pathway, but the data do not indicate that higher expression of these pathways, whether by VDR repression or otherwise, is driving the growth restriction of the ∆mmpE strain.

      We thank the reviewer for the comment. The role of MmpE in phagosome maturation was previously characterized. Disruption of mmpE impairs the ability of M. tuberculosis to arrest lysosomal trafficking (Forrellad et al., Front Microbiol, 2020). In this study, we further found that MmpE suppresses the expression of key lysosomal genes, including TFEB, LAMP1, LAMP2, and ATPase subunits (Figure 4G), suggesting MmpE is involved in arresting phagosome maturation. As noted, the genes in the PI3K–Akt–mTOR pathway are upregulated in ∆mmpE-infected macrophages (Figure S5C).

      To functionally validate this, we will conduct two complementary experimental approaches:

      (i) Immunofluorescence assays: We will assess phagosome maturation and lysosomal fusion in THP-1 cells infected with BCG/wt, ∆mmpE, Comp-MmpE, and NLS mutant strains. Colocalization of intracellular bacteria with LAMP1 and LysoTracker will be quantified to determine whether the ∆mmpE strain is more efficiently trafficked to lysosomes.

      (ii) CFU assays: We will perform CFU assays in THP-1 cells infected with BCG/wt or ∆mmpE in the presence or absence of PI3K-Akt-mTOR pathway inhibitors (e.g., Dactolisib), to assess whether activation of this pathway contributes to the intracellular growth restriction observed in the ∆mmpE strain.

      (8) The relevance of the NLS and the phosphatase activity is not completely clear in the CFU assays and in the gene expression data. Firstly, there needs to be immunoblot data provided for the expression and secretion of the NLS-deficient and phosphatase mutants. Secondly, CFU data in Figure 3A, C, and E must consistently include both the WT and ∆mmpE strain.

      We thank the reviewer for the comment. We will provide immunoblot data for the expression and secretion of the NLS-deficient and phosphatase mutants. Additionally, we will revise Figure 3A, 3C, and 3E to consistently include both the WT and ΔmmpE strains in the CFU assays.

      Reference

      Branzk N, Lubojemska A, Hardison SE, Wang Q, Gutierrez MG, Brown GD, Papayannopoulos V (2014) Neutrophils sense microbe size and selectively release neutrophil extracellular traps in response to large pathogens Nat Immunol 15:1017-25.

      Darrah PA, Zeppa JJ, Maiello P, Hackney JA, Wadsworth MH 2nd, Hughes TK, Pokkali S, Swanson PA 2nd, Grant NL, Rodgers MA, Kamath M, Causgrove CM, Laddy DJ, Bonavia A, Casimiro D, Lin PL, Klein E, White AG, Scanga CA, Shalek AK, Roederer M, Flynn JL, Seder RA (2020) Prevention of tuberculosis in macaques after intravenous BCG immunization Nature 577:95-102.

      Forrellad MA, Blanco FC, Marrero Diaz de Villegas R, Vázquez CL, Yaneff A, García EA, Gutierrez MG, Durán R, Villarino A, Bigi F (2020) Rv2577 of Mycobacterium tuberculosis Is a virulence factor with dual phosphatase and phosphodiesterase functions Front Microbiol 11:570794.

      Gutierrez MG, Master SS, Singh SB, Taylor GA, Colombo MI, Deretic V (2004) Autophagy is a defense mechanism inhibiting BCG and Mycobacterium tuberculosis survival in infected macrophages Cell 119:753-66.

      Innokentev A, Sanchez AM, Monetti M, Schwer B, Shuman S (2025) Efn1 and Efn2 are extracellular 5'-nucleotidases induced during the fission yeast response to phosphate starvation mBio 16: e0299224.

      Keppetipola N, Shuman S (2008) A phosphate-binding histidine of binuclear metallophosphodiesterase enzymes is a determinant of 2',3'-cyclic nucleotide phosphodiesterase activity J Biol Chem 283:30942-9.

      Lampe GD, King RT, Halpin-Healy TS, Klompe SE, Hogan MI, Vo PLH, Tang S, Chavez A, Sternberg SH (2024) Targeted DNA integration in human cells without double-strand breaks using CRISPR-associated transposases Nat Biotechnol 42:87-98.

      Marasco LE, Dujardin G, Sousa-Luís R, Liu YH, Stigliano JN, Nomakuchi T, Proudfoot NJ, Krainer AR, Kornblihtt AR (2022) Counteracting chromatin effects of a splicing-correcting antisense oligonucleotide improves its therapeutic efficacy in spinal muscular atrophy Cell 185:2057-2070.e15.

      Matange N, Podobnik M, Visweswariah SS (2015) Metallophosphoesterases: structural fidelity with functional promiscuity Biochem J 467:201-16.

      Niroula N, Ghodasara P, Marreros N, Fuller B, Sanderson H, Zriba S, Walker S, Shury TK, Chen JM (2025) Orally administered live BCG and heat-inactivated Mycobacterium bovis protect bison against experimental bovine tuberculosis Sci Rep 15:3764.

      Péan CB, Schiebler M, Tan SW, Sharrock JA, Kierdorf K, Brown KP, Maserumule MC,

      Menezes S, Pilátová M, Bronda K, Guermonprez P, Stramer BM, Andres Floto R, Dionne MS (2017) Regulation of phagocyte triglyceride by a STAT-ATG2 pathway controls mycobacterial infection Nat Commun 8:14642.

      Pilli M, Arko-Mensah J, Ponpuak M, Roberts E, Master S, Mandell MA, Dupont N, Ornatowski W, Jiang S, Bradfute SB, Bruun JA, Hansen TE, Johansen T, Deretic V (2012) TBK-1 promotes autophagy-mediated antimicrobial defense by controlling autophagosome maturation Immunity 37:223-34.

      Shenoy AR, Capuder M, Draskovic P, Lamba D, Visweswariah SS, Podobnik M (2007) Structural and biochemical analysis of the Rv0805 cyclic nucleotide phosphodiesterase from Mycobacterium tuberculosis J Mol Biol 365:211-25.

      Smith AA, Su H, Wallach J, Liu Y, Maiello P, Borish HJ, Winchell C, Simonson AW, Lin PL, Rodgers M, Fillmore D, Sakal J, Lin K, Vinette V, Schnappinger D, Ehrt S, Flynn JL (2025) A BCG kill switch strain protects against Mycobacterium tuberculosis in mice and non-human primates with improved safety and immunogenicity Nat Microbiol 10:468-481.

      Wang J, Ge P, Qiang L, Tian F, Zhao D, Chai Q, Zhu M, Zhou R, Meng G, Iwakura Y, Gao GF, Liu CH (2017) The mycobacterial phosphatase PtpA regulates the expression of host genes and promotes cell proliferation Nat Commun 8:244.

      Wang J, Li BX, Ge PP, Li J, Wang Q, Gao GF, Qiu XB, Liu CH (2015) Mycobacterium tuberculosis suppresses innate immunity by coopting the host ubiquitin system Nat Immunol 16:237–245

      Weng Y, Shepherd D, Liu Y, Krishnan N, Robertson BD, Platt N, Larrouy-Maumus G, Platt FM (2022) Inhibition of the Niemann-Pick C1 protein is a conserved feature of multiple strains of pathogenic mycobacteria Nat Commun 13:5320.

      Xu X, Lu X, Dong X, Luo Y, Wang Q, Liu X, Fu J, Zhang Y, Zhu B, Ma X (2017) Effects of hMASP2 on the formation of BCG infection-induced granuloma in the lungs of BALB/c mice Sci Rep 7:2300.

    1. eLife Assessment

      This important study applies a novel signal decomposition method to disentangle distinct signals contributing to the decision-making process, and provides convincing evidence for the operation of separate sensory encoding, attentional orienting, and ramping evidence accumulation signals. These findings are consistent with previous work, except for the absence of a motor component, which may relate to limitations of the analysis approach.

    2. Reviewer #1 (Public review):

      From my reading, this study aimed to achieve two things:

      (1) A neurally-informed account of how Pieron's and Fechner's laws can apply in concert at distinct processing levels.

      (2) A comprehensive map in time and space of all neural events intervening between stimulus and response in an immediately-reported perceptual decision.

      I believe that the authors achieved the first point, mainly owing to a clever contrast comparison paradigm, but with good help also from a new topographic parsing algorithm they created. With this, they found that the time intervening between an early initial sensory evoked potential and an "N2" type process associated with launching the decision process varies inversely with contrast according to Pieron's law. Meanwhile, the interval from that second event up to a neural event peaking just before response increases with contrast, fitting Fechner's law, and a very nice finding is that a diffusion model whose drift rates are scaled by Fechner's law, fit to RT, predicts the observed proportion of correct responses very well. These are all strengths of the study.

      The second, generally stated aim above is, in the opinion of this reviewer, unconvincing and ill-defined. Presumably, the full sequence of neural events is massively task-dependent, and surely it is more in number than just three. Even the sensory evoked potential typically observed for average ERPs, even for passive viewing, would include a series of 3 or more components - C1, P1, N1, etc. So are some events being missed? Perhaps the authors are identifying key events that impressively demarcate Pieron- and Fechner-adherent sections of the RT, but they might want to temper the claim that they are finding ALL events. In addition, the propensity for topographic parsing algorithms to potentially lump together distinct processes that partially co-evolve should be acknowledged.

      To take a salient example, the last neural event seems to blend the centroparietal positivity with a more frontal midline negativity, some of which would capture the CNV and some motor-execution related components that are more tightly time-locked to, of course, the response. If the authors plotted the traditional single-electrode ERP at the frontal focus and centroparietal focus separately, they are likely to see very different dynamics and contrast- and SAT-dependency. What does this mean for the validity of the multivariate method? If two or more components are being lumped into one neural event, wouldn't it mean that properties of one (e.g., frontal burstiness at response) are being misattributed to the other (centroparietal signal that also peaks but less sharply at response)?

      Also related to the method, why must the neural events all be 50 ms wide, and what happens if that is changed? Is it realistic that these neural events would be the same duration on every trial, even if their duration was a free parameter? This might be reasonable for sensory and motor components, but unlikely for cognitive.

      In general, I wonder about the analytic advantage of the parsing method - the paradigm itself is so well-designed that the story may be clear from standard average event-related potential analysis, and this might sidestep the doubts around whether the algorithm is correctly parsing all neural events.

      In particular, would the authors consider plotting CPP waveforms in the traditional way, across contrast levels? The elegant design is such that the C1 component (which has similar topography) will show up negative and early, giving way to the CPP, and these two components will show opposite amplitude variations (not just temporal intervals as is this paper's main focus), because the brighter the two gratings, the stronger the aggregate early sensory response but the weaker the decision evidence due to Fechner. I believe this would provide a simple, helpful corroborating analysis to back up the main functional interpretation in the paper.

      The first component is picking up on the C1 component (which is negative for these stimulus locations), not a "P100". Please consult any visual evoked potential study (e.g., Luck, Hillyard, etc).

      It is unexpected that this does not vary in latency with contrast - see, for example. Gebodh et al (2017, Brain Topography) - and there is little discussion of this. Could it be that nonlinear trends were not correctly tested for?

      There is very little analysis or discussion of the second stage linked to attention orientation - what would the role of attention orientation be in this task? Is it spatial attention directed to the higher contrast grating (and if so, should it lateralise accordingly?), or is it more of an alerting function the authors have in mind here?

    3. Reviewer #2 (Public review):

      Summary:

      The authors decomposed response times into component processes and manipulated the duration of these processes in opposing directions by varying contrast, and overall by manipulating speed-accuracy tradeoffs. They identify different processes and their durations by identifying neural states in time and validate their functional significance by showing that their properties vary selectively as expected with the predicted effects of the contrast manipulation. They identify 3 processes: stimulus encoding, attention orienting, and decision. These map onto classical event-related potentials. The decision-making component matched the CPP, and its properties varied with contrast and predicted decision-accuracy, while also exhibiting a burst not characteristic of evidence accumulation.

      Strengths:

      The design of the experiment is remarkable and offers crucial insights. The analysis techniques are beyond state-of-the-art, and the analyses are well motivated and offer clear insights.

      Weaknesses:

      It is not clear to me that the results confirm that there are only 3 processes, since e.g., motor preparation and execution were not captured. While the authors discuss this, this is a clear weakness of the approach, as other components may also have been missed. It is also unclear to what extent topographies map onto processes, since, e.g., different combinations of sources can lead to the same scalp topography.

    4. Reviewer #3 (Public review):

      Summary:

      In this manuscript, the authors examine the processing stages involved in perceptual decision-making using a new approach to analysing EEG data, combined with a critical stimulus manipulation. This new EEG analysis method enables single-trial estimates of the timing and amplitude of transient changes in EEG time-series, recurrent across trials in a behavioural task. The authors find evidence for three events between stimulus onset and the response in a two-spatial-interval visual discrimination task. By analysing the timing and amplitude of these events in relation to behaviour and the stimulus manipulation, the authors interpret these events as related to separable processing stages for stimulus encoding, attention orientation, and decision (deliberation). This is largely consistent with previous findings from both event-related potentials (across trials) and single-trial estimates using decoding techniques and neural network approaches.

      Strengths:

      This work is not only important for the conceptual advance, but also in promoting this new analysis technique, which will likely prove useful in future research. For the broader picture, this work is an excellent example of the utility of neural measures for mental chronometry.

      Weaknesses:

      The manuscript would benefit from some conceptual clarifications, which are important for readers to understand this manuscript as a stand-alone work. This includes clearer definitions of Piéron's and Fechner's laws, and a fuller description of the EEG analysis technique. The manuscript, broadly, but the introduction especially, may be improved by clearly delineating the multiple aims of this project: examining the processes for decision-making, obtaining single-trial estimates of meaningful EEG-events, and whether central parietal positivity reflects ramping activity or steps averaged across trials. A fuller discussion of the limitations of the work, in particular, the absence of motor contributions to reaction time, would also be appreciated.

      At times, the novelty of the work is perhaps overstated. Rather, readers may appreciate a more comprehensive discussion of the distinctions between the current work and previous techniques to gauge single-trial estimates of decision-related activity, as well as previous findings concerning distinct processing stages in decision-making. Moreover, a discussion of how the events described in this study might generalise to different decision-making tasks in different contexts (for example, in auditory perception, or even value-based decision-making) would also be appreciated.

    1. eLife Assessment

      This important report describes the changing antiviral activity of IFIT1 across mammals and in response to distinct viruses, likely as a result of past arms races. One of the main strengths of the manuscript is the breadth of mammalian IFIT1 orthologs and viruses that were tested, as well as the thoroughness of the positive selection analysis. Overall the evidence is convincing, and the discussion conveys well the limitations due to physical interactions with other IFITs that are not accounted for.

    2. Reviewer #2 (Public review):

      McDougal et al. describe the surprising finding that IFIT1 proteins from different mammalian species inhibit replication of different viruses, indicating that evolution of IFIT1 across mammals has resulted in host species-specific antiviral specificity. Before this work, research into the antiviral activity and specificity of IFIT1 had mostly focused on the human ortholog, which was described to inhibit viruses including vesicular stomatitis virus (VSV) and Venezuelan equine encephalitis virus (VEEV) but not other viruses including Sindbis virus (SINV) and parainfluenza virus type 3 (PIV3). In the current work, the authors first perform evolutionary analyses on IFIT1 genes across a wide range of mammalian species and reveal that IFIT1 genes have evolved under positive selection in primates, bats, carnivores, and ungulates. Based on these data, they hypothesize that IFIT1 proteins from these diverse mammalian groups may show distinct antiviral specificities against a panel of viruses. By generating human cells that express IFIT1 proteins from different mammalian species, the authors show a wide range of antiviral activities of mammalian IFIT1s. Most strikingly, they find several IFIT1 proteins that have completely different antiviral specificities relative to human IFIT1, including IFIT1s that fail to inhibit VSV or VEEV, but strongly inhibit PIV3 or SINV. These results indicate that there is potential for IFIT1 to inhibit a much wider range of viruses than human IFIT1 inhibits. Electrophoretic mobility shift assays (EMSAs) suggest that some of these changes in antiviral specificity can be ascribed to changes in direct binding of viral RNAs. Interestingly, they also find that chimpanzee IFIT1, which is >98% identical to human IFIT1, fails to inhibit any tested virus. Replacing three residues from chimpanzee IFIT1 with those from human IFIT1, one of which has evolved under positive selection in primates, restores activity to chimpanzee IFIT1. Together, these data reveal a vast diversity of IFIT1 antiviral specificity encoded by mammals, consistent with an IFIT1-virus evolutionary "arms race".

      Overall, this is a very interesting and well-written manuscript that combines evolutionary and functional approaches to provide new insight into IFIT1 antiviral activity and species-specific antiviral immunity. The conclusion that IFIT1 genes in several mammalian lineages are evolving under positive selection is supported by the data. The virology results, which convincingly show that IFIT1s from different species have distinct antiviral specificity, are the most surprising and exciting part of the paper. As such, this paper will be interesting for researchers studying mechanisms of innate antiviral immunity, as well as those interested in species-specific antiviral immunity. Moreover, it may prompt others to test a wide range of orthologs of antiviral factors beyond those from humans or mice, which could further the concept of host-specific innate antiviral specificity. Additional areas for improvement, which are mostly to clarify the presentation of data and conclusions, are described below.

      Strengths:

      (1) This paper is a very strong demonstration of the concept that orthologous innate immune proteins can evolve distinct antiviral specificities. Specifically, the authors show that IFIT1 proteins from different mammalian species are able to inhibit replication of distinct groups of viruses, which is most clearly illustrated in Figure 4G. This is an unexpected finding, as the mechanism by which IFIT1 inhibits viral replication was assumed to be similar across orthologs. While the molecular basis for these differences remains unresolved, this is a clear indication that IFIT1 evolution functionally impacts host-specific antiviral immunity and that IFIT1 has the potential to inhibit a much wider range of viruses than previously described.

      (2) By revealing these differences in antiviral specificity across IFIT1 orthologs, the authors highlight the importance of sampling antiviral proteins from different mammalian species to understand what functions are conserved and what functions are lineage- or species-specific. These results might therefore prompt similar investigations with other antiviral proteins, which could reveal a previously undiscovered diversity of specificities for other antiviral immunity proteins.

      (3) The authors also surprisingly reveal that chimpanzee IFIT1 shows no antiviral activity against any tested virus despite only differing from human IFIT1 by eight amino acids. By mapping this loss of function to three residues on one helix of the protein, the authors shed new light on a region of the protein with no previously known function.

      (4) Combined with evolutionary analyses that indicate that IFIT1 genes are evolving under positive selection in several mammalian groups, these functional data indicate that IFIT1 is engaged in an evolutionary "arms race" with viruses, which results in distinct antiviral specificities of IFIT1 proteins from different species.

      Weaknesses:

      (1) Some of the results and discussion text could be more focused on the model of evolution-driven changes in IFIT1 specificity. In particular, the majority of the residue mapping is on the chimpanzee protein, where it would appear that this protein has lost all antiviral function, rather than changing its antiviral specificity like some other examples in this paper. As such, the connection between the functional mapping of individual residues with the positive selection analysis and changes in antiviral specificity is not present. While the model that changes in antiviral specificity have been positively selected for is intriguing, it is not supported by data in the paper.

      (2) The strength of the differences in antiviral specificity could be highlighted to a greater degree. Specifically, the text describes a number of interesting examples of differences in inhibition of viruses from Figure 3C and 3D, and 4C-F. The revised version has added some clarity by at least providing raw data for 3C and 3D for the reader to make their own comparisons, but it is still difficult to quickly assess which are the most interesting comparisons to make (e.g. for future mapping of residues that might be important).

    3. Reviewer #3 (Public review):

      Summary:

      This manuscript by McDougal et al, demonstrates species-specific activities of diverse IFIT1 orthologs, and seeks to utilize evolutionary analysis to identify key amino acids under positive selection that contribute to antiviral activity of this host factor. While the authors identify amino acid residues important for antiviral activity of some orthologs, and propose a possible mechanism by which these residues may function, the significance or applicability of these findings to other orthologs is unclear. However, the subject matter is of interest to the field, and these findings contribute to the body of knowledge regarding IFIT1 evolution.

      Strengths:

      Assessment of multiple IFIT1 orthologs shows the wide variety of antiviral activity of IFIT1, and identification of residues outside of the known RNA binding pocket in the protein suggests additional novel mechanisms which may regulate IFIT1 activity.

      Weaknesses:

      Given that there appears to be very little overlap observed in orthologs that inhibited the viruses tested, it's possible that other amino acids may be key drivers of antiviral activity in these other orthologs. Thus, it's difficult to conclude whether the findings that residues 362/4/6 are important for IFIT1 activity can be broadly applied to other orthologs, or whether these are unique to human and chimpanzee IFIT1. While additional molecular studies of the impact of these mutations on IFIT1 function (e.g. impact on IFIT complex formation) would lend further insight, as it stands, these findings demonstrate a role for these residues in IFIT1 activity.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review): 

      Summary: 

      McDougal et al. aimed to characterize the antiviral activity of mammalian IFIT1 orthologs. They first performed three different evolutionary selection analyses within each major mammalian clade and identified some overlapping positive selection sites in IFIT1. They found that one site that is positively selected in primates is in the RNA-binding exit tunnel of IFIT1 and is tolerant of mutations to amino acids with similar biochemical properties. They then tested 9 diverse mammalian IFIT1 proteins against VEEV, VSV, PIV3, and SINV and found that each ortholog has distinct antiviral activities. Lastly, they compared human and chimpanzee IFIT1 and found that the determinant of their differential anti-VEEV activity may be partly attributed to their ability to bind Cap0 RNA. 

      Strengths: 

      The study is one of the first to test the antiviral activity of IFIT1 from diverse mammalian clades against VEEV, VSV, PIV3, and SINV. Cloning and expressing these 39 IFIT1 orthologs in addition to single and combinatorial mutants is not a trivial task. The positive connection between anti-VEEV activity and Cap0 RNA binding is interesting, suggesting that differences in RNA binding may explain differences in antiviral activity. 

      Weaknesses: 

      The evolutionary selection analyses yielded interesting results, but were not used to inform follow-up studies except for a positively selected site identified in primates. Since positive selection is one of the two major angles the authors proposed to investigate mammalian IFIT1 orthologs with, they should integrate the positive selection results with the rest of the paper more seamlessly, such as discussing the positive selection results and their implications, rather than just pointing out that positively selected sites were identified. The paper should elaborate on how the positive selection analyses PAML, FUBAR, and MEME complement one another to explain why the tests gave them different results. Interestingly, MEME which usually provides more sites did not identify site 193 in primates that was identified by both PAML and FUBAR. The authors should also provide the rationale for choosing to focus on the 3 sites identified in primates only. One of those sites, 193, was also found to be positively selected in bats, although the authors did not discuss or integrate that finding into the study. In Figure 1A, they also showed a dN/dS < 1 from PAML, which is confusing and would suggest negative selection instead of positive selection. Importantly, since the authors focused on the rapidly evolving site 193 in primates, they should test the IFIT1 orthologs against viruses that are known to infect primates to directly investigate the impact of the evolutionary arms race at this site on IFIT1 function. 

      We thank the reviewer for their assessment and for acknowledging the breadth of our dataset regarding diverse IFIT1s, number of viruses tested, and the functional data that may correlate biochemical properties of IFIT1 orthologous proteins with antiviral function. We have expanded the introduction and results sections to better explain and distinguish between PAML, FUBAR, and MEME analyses. Furthermore, we have expanded the discussion to incorporate the observation that site 193 is rapidly evolving in bats, as well as the observation that nearby sites to the TPR4 loop were identified as rapidly evolving in all clades of mammals tested. We also do observe an overall gene dN/dS of <1, however this is simply the average across all codons of the entire gene and does not rule out positive selection at specific sites. This is observed for other restriction factors, as many domains are undergoing purifying selection to retain core functions (e.g enzymatic function, structural integrity) while other domains (e.g. interfaces with viral antagonists or viral proteins) show strong positive selection. Specific examples include the restriction factors BST-2/Tetherin (PMID: 19461879) and MxA (PMID: 23084925). Furthermore, we agree that testing more IFIT1-sensitive viruses that naturally infect primates with our IFIT1 193 mutagenesis library would shed light on the influence of host-virus arms races at this site. However, VEEV naturally does also infect humans as well as at least one other species of primate (PMID: 39983680).

      Below we individually address the reviewers' claims of inaccurate data interpretation.

      Some of the data interpretation is not accurate. For example: 

      (1) Lines 232-234: "...western blot analysis revealed that the expression of IFIT1 orthologs was relatively uniform, except for the higher expression of orca IFIT1 and notably lower expression of pangolin IFIT1 (Figure 4B)." In fact, most of the orthologs are not expressed in a "relatively uniform" manner e.g. big brown bat vs. shrew are quite different. 

      We have now included quantification of the western blots to allow the reader to compare infection results with the infection data (Updated Figure 4B and 4G). We have also removed the phrase “relatively uniform” from the text and have instead included text describing the quantified expression differences.

      (2) Line 245: "...mammalian IFIT1 species-specific differences in viral suppression are largely independent of expression differences." While it is true that there is no correlation between protein expression and antiviral activity in each species, the authors cannot definitively conclude that the species-specific differences are independent of expression differences. Since the orthologs are clearly not expressed in the same amounts, it is impossible to fully assess their true antiviral activity. At the very least, the authors should acknowledge that the protein expression can affect antiviral activity. They should also consider quantifying the IFIT1 protein bands and normalizing each to GAPDH for readers to better compare protein expression and antiviral activity. The same issue is in Line 267. 

      We have now included quantification and normalization of the western blots to allow the reader to compare infection results with the infection data (Updated Figure 4B and 4G). Furthermore, we acknowledge in the text that expression differences may affect antiviral potency in infection experiments.

      (3) Line 263: "SINV... was modestly suppressed by pangolin, sheep, and chinchilla IFIT1 (Figure 4E)..." The term "modestly suppressed" does not seem fitting if there is 60-70% infection in cells expressing pangolin and chinchilla IFIT1. 

      We have modified the text to say “significantly suppressed” rather than “modestly suppressed.”

      (4) The study can be significantly improved if the authors can find a thread to connect each piece of data together, so the readers can form a cohesive story about mammalian IFIT1. 

      We appreciate the reviewer’s suggestion and have tried to make the story including more cohesive through commentary on positive selection and by using the computational analysis to first inform potential evolutionary consequences of IFIT1 functionality first by an intraspecies (human) approach, and then later an interspecies approach with diverse mammals that have great sequence diversity. Furthermore, we point out that almost all IFIT1s tested in the ortholog screen were also included in our computational analysis allowing for the potential to connect functional observations with those seen in the evolutionary analyses.

      Reviewer #2 (Public review): 

      McDougal et al. describe the surprising finding that IFIT1 proteins from different mammalian species inhibit the replication of different viruses, indicating that the evolution of IFIT1 across mammals has resulted in host speciesspecific antiviral specificity. Before this work, research into the antiviral activity and specificity of IFIT1 had mostly focused on the human ortholog, which was described to inhibit viruses including vesicular stomatitis virus (VSV) and Venezuelan equine encephalitis virus (VEEV) but not other viruses including Sindbis virus (SINV) and parainfluenza virus type 3 (PIV3). In the current work, the authors first perform evolutionary analyses on IFIT1 genes across a wide range of mammalian species and reveal that IFIT1 genes have evolved under positive selection in primates, bats, carnivores, and ungulates. Based on these data, they hypothesize that IFIT1 proteins from these diverse mammalian groups may show distinct antiviral specificities against a panel of viruses. By generating human cells that express IFIT1 proteins from different mammalian species, the authors show a wide range of antiviral activities of mammalian IFIT1s. Most strikingly, they find several IFIT1 proteins that have completely different antiviral specificities relative to human IFIT1, including IFIT1s that fail to inhibit VSV or VEEV, but strongly inhibit PIV3 or SINV. These results indicate that there is potential for IFIT1 to inhibit a much wider range of viruses than human IFIT1 inhibits. Electrophoretic mobility shift assays (EMSAs) suggest that some of these changes in antiviral specificity can be ascribed to changes in the direct binding of viral RNAs. Interestingly, they also find that chimpanzee IFIT1, which is >98% identical to human IFIT1, fails to inhibit any tested virus. Replacing three residues from chimpanzee IFIT1 with those from human IFIT1, one of which has evolved under positive selection in primates, restores activity to chimpanzee IFIT1. Together, these data reveal a vast diversity of IFIT1 antiviral specificity encoded by mammals, consistent with an IFIT1-virus evolutionary "arms race". 

      Overall, this is a very interesting and well-written manuscript that combines evolutionary and functional approaches to provide new insight into IFIT1 antiviral activity and species-specific antiviral immunity. The conclusion that IFIT1 genes in several mammalian lineages are evolving under positive selection is supported by the data, although there are some important analyses that need to be done to remove any confounding effects from gene recombination that has previously been described between IFIT1 and its paralog IFIT1B. The virology results, which convincingly show that IFIT1s from different species have distinct antiviral specificity, are the most surprising and exciting part of the paper. As such, this paper will be interesting for researchers studying mechanisms of innate antiviral immunity, as well as those interested in species-specific antiviral immunity. Moreover, it may prompt others to test a wide range of orthologs of antiviral factors beyond those from humans or mice, which could further the concept of host-specific innate antiviral specificity. Additional areas for improvement, which are mostly to clarify the presentation of data and conclusions, are described below. 

      Strengths: 

      (1) This paper is a very strong demonstration of the concept that orthologous innate immune proteins can evolve distinct antiviral specificities. Specifically, the authors show that IFIT1 proteins from different mammalian species are able to inhibit the replication of distinct groups of viruses, which is most clearly illustrated in Figure 4G. This is an unexpected finding, as the mechanism by which IFIT1 inhibits viral replication was assumed to be similar across orthologs. While the molecular basis for these differences remains unresolved, this is a clear indication that IFIT1 evolution functionally impacts host-specific antiviral immunity and that IFIT1 has the potential to inhibit a much wider range of viruses than previously described. 

      (2) By revealing these differences in antiviral specificity across IFIT1 orthologs, the authors highlight the importance of sampling antiviral proteins from different mammalian species to understand what functions are conserved and what functions are lineage- or species-specific. These results might therefore prompt similar investigations with other antiviral proteins, which could reveal a previously undiscovered diversity of specificities for other antiviral immunity proteins. 

      (3) The authors also surprisingly reveal that chimpanzee IFIT1 shows no antiviral activity against any tested virus despite only differing from human IFIT1 by eight amino acids. By mapping this loss of function to three residues on one helix of the protein, the authors shed new light on a region of the protein with no previously known function. 

      (4) Combined with evolutionary analyses that indicate that IFIT1 genes are evolving under positive selection in several mammalian groups, these functional data indicate that IFIT1 is engaged in an evolutionary "arms race" with viruses, which results in distinct antiviral specificities of IFIT1 proteins from different species. 

      Weaknesses: 

      (1) The evolutionary analyses the authors perform appear to indicate that IFIT1 genes in several mammalian groups have evolved under positive selection. However, IFIT1 has previously been shown to have undergone recurrent instances of recombination with the paralogous IFIT1B, which can confound positive selection analyses such as the ones the authors perform. The authors should analyze their alignments for evidence of recombination using a tool such as GARD (in the same HyPhy package along with MEME and FUBAR). Detection of recombination in these alignments would invalidate their positive selection inferences, in which case the authors need to either analyze individual non-recombining domains or limit the number of species to those that are not undergoing recombination. While it is likely that these analyses will still reveal a signature of positive selection, this step is necessary to ensure that the signatures of selection and sites of positive selection are accurate. 

      (2) The choice of IFIT1 homologs chosen for study needs to be described in more detail. Many mammalian species encode IFIT1 and IFIT1B proteins, which have been shown to have different antiviral specificity, and the evolutionary relationship between IFIT1 and IFIT1B paralogs is complicated by recombination. As such, the assertion that the proteins studied in this manuscript are IFIT1 orthologs requires additional support than the percent identity plot shown in Figure 3B. 

      (3) Some of the results and discussion text could be more focused on the model of evolution-driven changes in IFIT1 specificity. In particular, the chimpanzee data are interesting, but it would appear that this protein has lost all antiviral function, rather than changing its antiviral specificity like some other examples in this paper. As such, the connection between the functional mapping of individual residues with the positive selection analysis is somewhat confusing. It would be more clear to discuss this as a natural loss of function of this IFIT1, which has occurred elsewhere repeatedly across the mammalian tree. 

      (4) In other places in the manuscript, the strength of the differences in antiviral specificity could be highlighted to a greater degree. Specifically, the text describes a number of interesting examples of differences in inhibition of VSV versus VEEV from Figure 3C and 3D, but it is difficult for a reader to assess this as most of the dots are unlabeled and the primary data are not uploaded. A few potential suggestions would be to have a table of each ortholog with % infection by VSV and % infection by VEEV. Another possibility would be to plot these data as an XY scatter plot. This would highlight any species that deviate from the expected linear relationship between the inhibition of these two viruses, which would provide a larger panel of interesting IFIT1 antiviral specificities than the smaller number of species shown in Figure 4. 

      We thank the reviewer for their fair assessment of our manuscript. As the reviewer requested, we performed GARD analysis on our alignments used for PAML, FUBAR, and MEME (New Supp Fig 1). By GARD, we found 1 or 2 predicted breakpoints in each clade. However, much of the sequence was after or between the predicted breakpoints. Therefore, we were able to reanalyze for sites undergoing positive selection in the large region of the sequence that do not span the breakpoints. We were able to validate almost all sites originally identified as undergoing positive selection still exhibit signatures of positive selection taking these breakpoints into account: primates (11/12), bats (14/16), ungulates (30/37), and carnivores (2/4). To further validate our positive selection analysis, we used Recombination Detection Program 4 (RDP4) to remove inferred recombinant sequences from the primate IFIT1 alignment and performed PAML, FUBAR, and MEME. Once again, the sites in our original anlaysis were largely validated by this method. Importantly, sites 170, 193, and 366 in primates, which are discussed in our manuscript, were found to be undergoing positive selection in 2 of the 3 analyses using alignments after the indicated breakpoint in GARD and after removal of recombinant sequences by RDP4. We have updated the text to acknowledge IFIT1/IFIT1B recombination more clearly and include the GARD analysis as well as PAML, FUBAR, and MEME reanalysis taking into account predicted breakpoints by GARD and RDP4. Furthermore, to increase evidence that the sequences used in this study for both computational and functional analysis are IFIT1 orthologs rather than IFIT1B, we have included a maximum likelihood tree after aligning coding sequences on the C-terminal end (corresponding to bases 907-1437 of IFIT1). In Daughtery et al. 2016 (PMID: 27240734) this strategy was used to distinguish between IFIT1 and IFITB. All sequences used in our study grouped with IFIT1 sequences (including many confirmed IFIT1 sequences used in Daughterty et al.) rather than IFIT1B sequences or IFIT3. This new data, including the GARD, RDP4, and maximum likelihood tree is included as a new Supplementary Figure 1.

      We also agree with the reviewer that it is possible that chimpanzee IFIT1 has lost antiviral function due to the residues 364 and 366 that differ from human IFIT1. We have updated the discussion sections to include the possibility that chimpanzee IFIT1 is an example of a natural loss of function that has occurred in other species over evolution as well as the potential consequences of this occurrence. Regarding highlighting the strength of differences in antiviral activity between IFIT1 orthologs, we have included several updates to strengthen the ability of the reader to assess these differences. First, we have included a supplementary table that includes the infection data for each ortholog from the VEEV and VSV screen to allow for readers to evaluate ranked antiviral activity of the species that suppress these viruses. In addition, the silhouettes next to the dot plots indicate the top ranked hits in order of viral inhibition (with the top being the most inhibitory) giving the reader a visual representation in the figure of top antiviral orthologs during our screen. We have also updated the figure legend to inform the reader of this information.

      Reviewer #3 (Public Review):  

      Summary: 

      This manuscript by McDougal et al, demonstrates species-specific activities of diverse IFIT1 orthologs and seeks to utilize evolutionary analysis to identify key amino acids under positive selection that contribute to the antiviral activity of this host factor. While the authors identify amino acid residues as important for the antiviral activity of some orthologs and propose a possible mechanism by which these residues may function, the significance or applicability of these findings to other orthologs is unclear. However, the subject matter is of interest to the field, and these findings could be significantly strengthened with additional data.

      Strengths:

      Assessment of multiple IFIT1 orthologs shows the wide variety of antiviral activity of IFIT1, and identification of residues outside of the known RNA binding pocket in the protein suggests additional novel mechanisms that may regulate IFIT1 activity.

      Weaknesses:

      Consideration of alternative hypotheses that might explain the variable and seemingly inconsistent antiviral activity of IFIT1 orthologs was not really considered. For example, studies show that IFIT1 activity may be regulated by interaction with other IFIT proteins but was not assessed in this study.

      Given that there appears to be very little overlap observed in orthologs that inhibited the viruses tested, it's possible that other amino acids may be key drivers of antiviral activity in these other orthologs. Thus, it's difficult to conclude whether the findings that residues 362/4/6 are important for IFIT1 activity can be broadly applied to other orthologs, or whether these are unique to human and chimpanzee IFIT1. Similarly, while the hypothesis that these residues impact IFIT1 activity in an allosteric manner is an attractive one, there is no data to support this.  

      We thank the reviewer for their fair assessment of our manuscript. To address the weaknesses that the reviewer has pointed out we have expanded the discussion to more directly address alternate hypotheses, such as the possibility of IFIT1 activity being regulated by interaction with other IFIT proteins. Furthermore, we expanded the discussion to include an alternate hypothesis for the role of residues 364 and 366 in primate IFIT1 besides allosteric regulation. In addition, we did not intend to claim or imply that residues 364/6 are the key drivers of antiviral activity for all IFITs tested. However, we speculate that within primates these residues may play a key role as these residues differ between chimpanzee IFIT1 (which lacks significant antiviral activity towards the viruses tested in this study) and human IFIT1 (which possesses significant antiviral activity). In addition, these residues seem to be generally conserved in primate species, apart from chimpanzee IFIT1. We have included changes to the text to more clearly indicate that we highlight the importance of these residues specifically for primate IFIT1, but not necessarily for all IFIT1 proteins in all clades.

      Reviewer #1 (Recommendations for the authors): 

      (1) The readers would benefit from a more detailed background on the concept and estimation of positive selection for the readers, including the M7/8 models in PAML. 

      We have included more information in the text to provide a better background for the concepts of positive selection and how PAML tests for this using M7 and M8 models.

      (2) Presentation of data 

      a) Figure 3C and 3D: is there a better way to present the infection data so the readers can tell the ranked antiviral activity of the species that suppress VEEV? 

      We have included a supplementary table that includes the infection data for each ortholog from the VEEV and VSV screen to allow for readers to evaluate ranked antiviral activity of the species that suppress these viruses. In addition, the silhouettes next to the dot plots indicate the top ranked hits in order of viral inhibition (with the top being the most inhibitory). We have updated the figure legend to inform the reader of this information as well.

      b) Figure 4C and 4D: consider putting the western blot in Supplementary Figure 1 underneath the infection data or with the heatmap so readers can compare it with the antiviral activity. 

      We have also included quantification of the western blots performed to evaluate IFIT1 expression during the experiments shown in Figure 4C and 4D in an updated Figure 4B. We have also included normalized expression values with the heatmap shown in an updated Figure 4G so the reader can evaluate potential impact of protein expression on antiviral activity for all infection experiments shown in figure 4.

      (3) Line 269-270: as a rationale for narrowing the species to human, black flying fox, and chimp IFIT1, human and black flying fox were chosen because they strongly inhibit VEEV, but pangolin wasn't included even though it had the strongest anti-VEEV activity? 

      The rationale for narrowing the species to human, black flying fox, and chimpanzee IFIT1 was related to the availability of biological tools, high quality genome/transcriptome sequencing databases, and other factors. Specifically human and chimp IFIT1 are closely related but have variable antiviral activities, making their comparison highly relevant. Bats are well established as reservoirs for diverse viruses, whereas the reservoir status of many other mammals is less well defined. Furthermore, purifying large amounts of high quality IFIT1 protein after bacterial expression was another limitation to functional studies. We have added this information into the manuscript text.

      (4) Figure 5A: to strengthen the claim that "species-specific antiviral activities of IFIT1s can be partly explained by RNA binding potential", it would be good to include one more positive and one more negative control. In other words, test the cap0 RNA binding activity of an IFIT1 ortholog that strongly inhibits VEEV and an ortholog that does not. It would also be good to discuss why chimp IFIT1 still shows dose-dependent RNA binding yet it is one of the weakest at inhibiting VEEV. 

      We appreciate the reviewer's suggestion to include more controls and expand the dataset. While we understand the potential value of expanding the dataset, we believe that human IFIT1 serves as a robust positive control and human IFIT1 R187 (RNA-binding deficient) serves as an established negative control. Future experiments with other purified IFITs from other species will indeed strengthen evidence linking IFIT1 species-specific activity and RNA-binding.

      Regarding chimpanzee IFIT1, we acknowledge there appears to be some dose-dependent Cap0 RNA-binding. However, the binding affinity is much weaker than that of human or black flying fox IFIT1. We speculate that during viral infection reduced binding affinity could impair the ability of chimpanzee IFIT1 to efficiently sequester viral RNA and inhibit viral translation. This reduction in binding affinity may, therefore, allow the cell to be overwhelmed by the exponential increase in viral RNA during replication resulting in an ineffective antiviral IFIT1. In the literature, a similar phenomenon is observed by Hyde et. al (PMID: 24482115). In this study, the authors test mouse Ifit1 Cap0 RNA binding by EMSA of the 5’ UTR sequence of VEEV RNA containing an A or G at nucleotide position 3. EMSA shows binding of both the A3 and G3 Cap0 VEEV RNA sequences, however stronger Ifit1 binding is observed for A3 Cap0 RNA sequence. The consequences of the reduced Ifit1 binding of the G3 Cap0 VEEV RNA are observed in vitro by a substantial increase in viral titers produced from cells as well as an increase in protein produced in a luciferase-based translation assay. The authors also show in vivo relevance of this reduction of Ifit1 binding as WT B6 mice infected with VEEV containing the A3 UTR exhibited 100% survival, while WT B6 mice infected with VEEV containing the G3 UTR survived at a rate of only ~25%. Therefore, the literature supports that a decrease in Cap0 RNA binding by an IFIT protein (while still exhibiting Cap0 RNA binding) observed by EMSA can result in considerable alterations of viral infection both in vitro and in vivo.

      Minor: 

      (1) Line 82: "including 5' triphosphate (5'-ppp-RNA), or viral RNAs..." having a comma here will make the sentence clearer. 

      We have improved the clarity of this sentence. It now reads, “IFIT1 binds uncapped 5′triphosphate RNA (5′-ppp-RNA) and capped but unmethylated RNA (Cap0, an m<sup>7</sup>G cap lacking 2′-O methylation).”

      (2) Line 100: "...similar mechanisms have been at least partially evolutionarily conserved in IFIT proteins to restrict viral infection by IFIT proteins". 

      We have updated the text to improve clarity by revising the sentence to “VEEV TC-83 is sensitive to human IFIT1 and mouse Ifit1B, indicating at least partial conservation of antiviral function by IFIT proteins."

      (3) Line 109: "signatures of rapid evolution or positive selection" would put positive selection second because that is the more technical term that can benefit from the more layperson term (rapid evolution). 

      We have updated this sentence incorporating this suggestion. “Positive selection, or rapid evolution, is denoted by a high ratio of nonsynonymous to synonymous substitutions (dN/dS >1).”

      (4) Lines 116-117: "However, this was only assessed in a few species" would benefit from a citation. 

      We have inserted the citation.

      (5) Line 127 heading: "IFIT1 is rapidly evolving in mammals" would be more accurate to say "in major clades of mammals". 

      We have updated the text to include this suggestion.

      (6) Line 165: "IFIT1 L193 mutants". 

      We have updated the text to rephrase this for clarity.

      (7) Line 170: two strains of VEEV were mentioned in the Intro, so it would be good to specify which strain of VEEV was used?

      We have updated the text to clarify the VEEV strain. In this study, all experiments were performed using the VEEV TC-83 strain.

      (8) Line 174: "Indeed, all mutants at position 193, whether hydrophobic or positively charged, inhibited VEEV similarly to the WT..." It should read "all hydrophobic and positively charged mutants inhibited VEEV similarly to the WT...". 

      We corrected as suggested. 

      (9) Line 204: what are "control cells"? Cells that are mock-infected, or cells without IFIT1? 

      We have updated the text to improve clarity. What we refer to as control cells, were cells expressing an empty vector control rather than an IFIT1.

      (10) Need to clarify n=2 and n=3 replicates throughout the manuscript. Does that refer to three independent experiments? Or an experiment with triplicate wells/samples? 

      We have updated the text to say “independent experiments” instead of “biological replicates” to prevent any confusion.  All n=2 or n=3 replicates denote independent experiments.

      (11) Line 254: "dominant antiviral effector against the related human parainfluenza virus type 5..." 

      We have updated the text to improve clarity.

      (12) Line 271: "The black flying fox (Pteropus alecto), is a model megabat species..." scientific name was italicized here but not elsewhere. Remove comma.

      We have updated the text accordingly.

      (13) Line 293: "...chimpanzee IFIT1 lacked these properties" but chimp IFIT1 can bind cap0 RNA, just at a lower level. 

      We have updated the text to acknowledge that chimpanzee IFIT1 can bind cap0 RNA, albeit at a lower level than human IFIT1.

      (14) Figure 6B: please fix the x-axis labels. They're very cramped. 

      We have updated the x-axis labels for figure 6B and figure 6D to improve clarity.

      (15) Line 609: "...trimmed and aligned"? 

      Our phrasing is to indicate that coding sequences were aligned, and gaps were removed to reduce the chance of false positive signal by underrepresented codons such as gaps or short insertions. We have removed “trimmed” from the text and changed the text to say “aligned sequences” to increase clarity.

      Reviewer #2 (Recommendations for the authors): 

      (1) Numbers less than 10 should be spelled out throughout the manuscript (e.g. line 138). 

      We have updated the text to reflect the request.

      (2) Line 165: "expression of IFIT1 193 mutants" should be rephrased. 

      We have updated the text to rephrase this sentence for clarity.

      (3) A supplemental table or file should be included that contains the accession number and species names of sequences used for evolutionary analyses and for functional testing. In addition, the alignments that were used for positive selection can be included.  

      We have included a supplemental file containing accession numbers, species names for evolutionary analysis and functional studies. In addition, this table includes the infection data for each IFIT1 homolog for the screen performed in figure 3.

      (4) The discussion of potential functions of the C-terminus of IFIT1 should include possible interactions with other proteins. In particular, the C-terminus of IFIT1 has been shown to interact with IFIT3 in a way that modulates its activity (PMID: 29525521). Although residues 362-366 were not shown in that paper to interact with a fragment of IFIT3, it is possible that these residues may be important for interaction with full-length IFIT3 or some other IFIT1 binding partner. 

      We thank the reviewer for their suggestion. We have expanded the discussion to explore the possibility that residues 364 and 366 of IFIT1 may be involved in IFIT1-IFIT3 interactions and consequently Cap0 RNA-binding and antiviral activity.

      (5) The quantification of the EMSAs should be described in more detail. In particular, from looking at the images shown in Figure 5A, it would appear that human and chimpanzee IFIT1 show similar degrees of probe shift, while the human R187H panel shows no shifting at all. However, the quantification shows chimpanzee IFIT1 as being statistically indistinguishable from human R187H. Additional information on how bands were quantified and whether they were normalized to unshifted RNA would be helpful in attempting to resolve this visual discordance. 

      EMSAs were quantified by determining Adj. Vol. Intensity in ImageLab (BioRad), which subtracts background signal, after imaging at the same exposure and SYBR Gold staining time. To determine Adj. Vol. Intensity, we drew a box (same size for each gel and lane for each replicate) for each lane above the free probe. These values were not normalized to unshifted RNA, however equal RNA was loaded. While the ANOVA shows no significant difference, between human R187H and chimpanzee IFIT1 band shift intensity, this is potentially due to the between group variance in the ANOVA. The increase in the AUC value for chimpanzee IFIT1 is 36.4% higher than R187H.

      The AUC of Adj. Vol. Intensity of human IFIT1 band shift is roughly 2-fold more than that of chimpanzee IFIT1. We believe this matches with the visual representation as well, as human IFIT1 has a darker “upper” band in the shift, as well as a clear dark “lower” band that is not well defined in the chimpanzee shift. Furthermore, the upper band of the chimpanzee IFIT1 shift appears to be as intense in the 400nM as the upper band in the 240nM human IFIT1 lane, without taking into account the lower band seen for human IFIT1 as well. We included this quantification as kD was unable to be calculated due to no clear probe disappearance and we do not intend for this quantification to act as a substitute for binding affinity calculations, rather to aid the reader in data interpretation.

      Reviewer #3 (Recommendations for the authors): 

      (1) IFIT1 has been demonstrated to function in conjunction with other IFIT proteins, do you think the absence of antiviral activity is due to isolated expression of IFIT1 without these cofactors, and therefore might explain why there was little overlap observed in orthologs that inhibited the viruses tested (Figure 3, lines 209-210). 

      We do not believe that isolated expression of IFIT1 without cofactors (such as orthologous IFIT proteins) would fully explain the disparities in antiviral activity as many IFIT1s that expressed inhibited either VSV or VEEV in our screen. However, we acknowledge that the expression of IFIT1 alone does create a limitation in our study as IFIT1 antiviral activity and RNA-binding can be modulated by interactions with other IFIT proteins. Therefore, we do believe that it is possible that co-expression of IFIT1 with other IFITs from a given species might potentially enhance antiviral activity. Future studies may shed light on this.

      (2) Figure 5 - Calculating the Kd for each protein would be more informative. How does the binding affinity of these IFIT1 proteins compare to that which has previously been reported? 

      We are unable to accurately determine kD as there is not substantial diminished signal of the free probe. Therefore, we are only able to compare IFIT1 protein binding between species without accurate mathematical calculation of binding affinity. Our result does appear similar to that of mouse Ifit1 binding to VEEV RNA (PMID: 24482115), in which the authors also do not calculate a kD for their RNA EMSA.

      (3) Mutants 364 and 366 may not have direct contact with RNA, but RNA EMSA data presented suggest that the binding affinity may be different (though this is hard to conclude without Kd data). Additional biochemical data with these mutants might provide more insight here. 

      We agree that further studies using 364 and 366 double mutant human and chimpanzee protein in EMSAs would provide additional biochemical data and provide insight into the role of these residues in direct RNA binding. We acknowledge this is a limitation of our study as we provide only genetic data demonstrating the importance of these residues.

      (4) Given that there appears to be very little overlap observed in orthologs that inhibited the viruses tested, it's possible that other amino acids may be key drivers of antiviral activity in these other orthologs. Thus, it's difficult to conclude whether the findings that residues 362/4/6 are important for IFIT1 activity can be broadly applied to other orthologs. A more systematic assessment of the role of these mutations across multiple diverse orthologs would provide more insight here. Do other antiviral proteins show this trend (ie exhibit little overlap in orthologs that inhibit these viruses). What do you think might be driving this? 

      We agree that other residues outside of 364 and 366 may be key drivers of antiviral activity across the IFTI1 orthologs tested. We do not hypothesize that this will broadly apply across IFIT1 from diverse clades of mammals as overall amino acid identity can differ by over 30%. However, based on the chimpanzee and human IFIT1 data, as well as sequence alignment within primates specifically, we believe these residues may be key for primate (but not necessarily other clades of mammals) IFIT1 antiviral activity.

      Regarding if other antiviral proteins show little overlap in orthologs that inhibit a given virus, to our knowledge such a functional study with this large and divergent dataset of orthologs has not been performed. However, there are many examples of restriction factors exhibiting speciesspecific antiviral activity when ortholog screens have been performed. For example, HIV was reported to be suppressed by MX2 orthologs from human, rhesus macaque, and African green monkey, but not sheep or dog MX2 (PMID: 24760893). In addition, foamy virus was inhibited by the human and rhesus macaque orthologs of PHF11, but not the mouse and feline orthologs (PMID: 32678836). Furthermore, studies from our lab have shown variability in RTP4 ortholog antiviral activity inhibition towards viruses much as hepatitis C virus (HCV), West Nile virus (WNV), and Zika virus (ZIKV) (PMID: 33113352).

    1. eLife Assessment

      In this valuable contribution, the authors present a novel and versatile probabilistic tool for classifying tracking behaviors and understanding parameters for different types of single-particle motion. The software package will be broadly applicable to single-particle tracking studies. The methodology has been convincingly tested by computational comparisons and experimental data, although the mathematical foundation for the hypothesis testing method can be further strengthened.

    2. Reviewer #1 (Public review):

      Summary:

      Weiss and co-authors presented a versatile probabilistic tool. aTrack helps in classifying tracking behaviors and understanding important parameters for different types of single particle motion types: Brwonian, Confined, or Directed motion. The tool can be used further to analyze populations of tracks and the number of motion states. This is a stand-alone software package, making it user-friendly for a broad group of researchers.

      Strengths:

      This manuscript presents a novel method for trajectory analysis.

      Comments on revisions:

      The authors have strengthened and improved the manuscript

    3. Reviewer #2 (Public review):

      Summary:

      The authors present a software package "aTrack" for identification of motion types and parameter estimation in single-particle tracking data. The software is based on maximum likelihood estimation of the time-series data given an assumed motion model and likelihood ratio tests for model selection. They characterized the performance of the software mostly on simulated data and showed that it is applicable to experimental data.

      Strengths:

      Although many tools exist in the single-particle tracking (SPT) field, this particular software package is developed using an innovative mathematical model and a probabilistic approach. It also provide inference of motion types, which are critical to answer biological questions in SPT experiments.

      (1) The authors adopt a novel mathematical framework, which is unique in the SPT field.

      (2) The authors have validated their method extensively using simulated tracks and compared to existing methods when appropriate.

      (3) The code is freely available

      Weaknesses:

      The authors did a good job during the revision to address most of the weaknesses in my (as well as other reviewer's) first round of review. Nevertheless, the following issue is still not fully addressed.<br /> The hypothesis testing method presented here lacks rigorous statistical foundation. The authors improved on this point after the revision, but in their newly added SI section "Statistical Test", only justified their choices using "hand-waving" arguments (i.e. there is not a single reference to proper statistical textbooks or earlier works in this important section). I understand that sometimes mathematical rigor comes later after some intuition-guided choices of critical parameters seems to work, but nevertheless need to point it out as a remaining weakness.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review): 

      Summary: 

      Weiss and co-authors presented a versatile probabilistic tool. aTrack helps in classifying tracking behaviors and understanding important parameters for different types of single particle motion types: Brownian, Confined, or Directed motion. The tool can be used further to analyze populations of tracks and the number of motion states. This is a stand-alone software package, making it user-friendly for a broad group of researchers. 

      Strengths: 

      This manuscript presents a novel method for trajectory analysis. 

      Weaknesses: 

      (1) In the results section, is there any reason to choose the specific range of track length for determining the type of motion? The starting value is fine, and would be short enough, but do the authors have anything to report about how much is too long for the model? 

      We chose to test the range of track lengths (five-to-hundreds of steps) to cover the broad range of scenarios arising from single proteins or fluorophores to brighter objects with more labels.  While there is no upper-limit per se, the computation time of our method scales linearly with track length, 100 time-points takes ~2 minutes to run on a standard consumer-level desktop CPU. We have added the following sentence to note the time-cost with trajectory length:  

      “The recurrent formula enables our model computation time to scale linearly with the number of time points.”

      (2) Robustness to model mismatches is a very important section that the authors have uplifted diligently. Understanding where and how the model is limited is important. For example, the authors mentioned the limitation of trajectory length, do the authors have any information on the trajectory length range at which this method works accurately? This would be of interest to readers who would like to apply this method to their own data. 

      We agree that limitations are important to estimate, and trajectory length is an important consideration when choosing how to analyze a dataset. We report the categorization certainty, i.e. the likelihood differences, for a range of track lengths (Fig. 2 a,c, Fig. 3c-d, and Fig. 4 c,g.).

      For example, here are the key plots from Fig. 2 quantifying the relative likelihoods, where being within the light region is necessary. The light areas represent a useful likelihood ratio.

      We only performed analysis up to track lengths of 600 time steps but parameter estimations and significance can only improve when increasing the track length as long as the model assumptions are verified. The broader limitations and future opportunities for new methods are now expanded upon in the discussion, for example switching between states and model and state and model ambiguities (bound vs very slow diffusion vs very slow motion).

      (3) aTrack extracts certain parameters from the trajectories to determine the motion types. However, it is not very clear how certain parameters are calculated. For example, is the diffusion coefficient D calculated from fitting, and how is the confinement factor defined and estimated, with equations? This information will help the readers to understand the principles of this algorithm.

      We apologize for the confusion. All the model parameters are fit using the maximum likelihood approach. To make this point clearer in the manuscript, we have made three changes:

      (1) We modified the following sentence to replace “determined” with "fit”:

      “Finally, Maximum Likelihood Estimation (MLE) is used to fit the underlying parameter value”

      (2) We added the following sentence in the main text :

      “In our model, the velocity is the characteristic parameter of directed motion and the confinement factor represents the force within a potential well. More precisely, the confinement factor $l$ is defined such that at each time step the particle position is updated by $l$ times the distance particle/potential well center (see the Methods section for more details).”.

      (3) We have added a new section in the methods, called Fitting Method, where we have added the explanation below:

      “For the pure Brownian model, the parameters are the diffusion coefficient and the localization error. For the confinement model, the parameters are the diffusion coefficient, the localization error, confinement factor, and the diffusion coefficientof the potential well. For the directed model, the parameters are the diffusion coefficient, the localization error, the initial velocity and the acceleration variance.

      These parameters are estimated using the maximum likelihood approach which consists in finding the parameters that maximize the likelihood. We realize this fitting step using gradient descent via a TensorFlow model. All the estimates presented in this article are obtained from a single set of initial parameters to demonstrate that the convergence capacity of aTrack is robust to the initial parameter values.”

      (4) The authors mentioned the scenario where a particle may experience several types of motion simultaneously. How do these motions simulated and what do they mean in terms of motion types? Are they mixed motion (a particle switches motion types in the same trajectory) or do they simply present features of several motion types? It is not intuitive to the readers that a particle can be diffusive (Brownian) and direct at the same time. 

      In the text, we present an example where one can observe this type of motion to help the reader understand when this type of motion can be met: “Sometimes, particles undergo diffusion and directed motion simultaneously, for example, particles diffusing in a flowing medium (Qian 1991).”

      This is simulated by the addition of two terms affecting the hidden position variable before adding a localization term to create the observed variable. In the analysis, this manifests as non-zero values for the diffusion coefficient and the linear velocity. For example, Figure 4g and the associated text, where a single particle moves with a directed component and a Brownian diffusion component at each step.

      We did not simulate transitions between types of motion. Switching is not treated by this current model; however, this limitation is described in the discussion and our team and others are currently working on addressing this challenge.

      Reviewer #2 (Public Review): 

      Summary: 

      The authors present a software package "aTrack" for identification of motion types and parameter estimation in single-particle tracking data. The software is based on maximum likelihood estimation of the time-series data given an assumed motion model and likelihood ratio tests for model selection. They characterized the performance of the software mostly on simulated data and showed that it is applicable to experimental data. 

      Strengths: 

      A potential advantage of the presented method is its wide applicability to different motion types. 

      Weaknesses: 

      (1) There has been a lot of similar work in this field. Even though the authors included many relevant citations in the introduction, it is still not clear what this work uniquely offers. Is it the first time that direct MLE of the time-series data was developed? Suggestions to improve would include (a) better wording in the introduction section, (b) comparing to other popular methods (based on MSD, step-size statistics (Spot-On, eLife 2018;7:e33125), for example) using the simulated dataset generated by the authors, (c) comparing to other methods using data set in challenges/competitions (Nat. Comm (2021) 12:6253).  

      We thank the reviewer for this suggestion and agree that the explanation of the innovative aspects of our method in the introduction was not clear enough. We have now modified the introduction to better explain what is improved here compared to previous approaches.

      “The main innovations of this model are: 1) it uses analytical recurrence formulas to perform the integration step for complex motion, improving speed and accuracy; 2) it handles both confined and directed motion; 3) anomalous parameters, such as the center of the potential well and the velocity vector are allowed to change through time to better represent tracks with changing directed motion or confinement area; and lastly 4) for a given track or set of tracks, aTrack can determine whether tracks can be statistically categorized as confined or directed, and the parameters that best describe their behavior, for example, diffusion coefficient, radius of confinement, and speed of directed motion.”

      Regarding alternatives, we compare our method in the text to the best-performing algorithm of the

      2021 Anomalous Diffusion (AnDi) Challenge challenge mentioned by the reviewer in Figure 6 (RANDI, Argun et al, arXiv, 2021, Muñoz-Gil et al, Nat Com. 2021). Notably, both methods performed similarly on fBm, but ours was more robust in cases where there were small differences between the process underlying the data and the model assumptions, a likely scenario in real datasets. Regarding Spot-On, this was not mentioned as it only deals with multiple populations of Brownian diffusers, preventing a quantitative comparison.

      (2) The Hypothesis testing method presented here has a number of issues: first, there is no definition of testing statistics. Usually, the testing statistics are defined given a specific (Type I and/or Type II) error rate. There is also no discussion of the specificity and sensitivity of the testing results (i.e. what's the probability of misidentification of a Brownian trajectory as directed? etc).

      We now explain our statistical approach and how to perform hypothesis testing with our metric in a new supplementary section, Statistical test. 

      We use the likelihood ratio as a more conservative alternative to the p-value. In Fig S2, we show that our metric is an upper bound of the p-value and can be used to perform hypothesis testing with a chosen type I error rate. 

      Related, it is not clear what Figure 2e (and other similar plots) means, as the likelihood ratio is small throughout the parameter space. Also, for likelihood ratio tests, the authors need to discuss how model complexity affects the testing outcome (as more complex models tend to be more "likely" for the data) and also how the likelihood function is normalized (normalization is not an issue for MLE but critical for ratio tests). 

      We present the likelihood ratio as an upper bound of the p-value. Therefore, we can reject the null hypothesis if it is smaller than a given threshold, e.g. 0.05, but this number should be decreased if multiple tests are performed. The colorscale we show in the figure is meant to highlight the working range (light), and ambiguous range (dark) of the method.

      As the reviewer mentions, we expect the alternative hypothesis to result in higher likelihoods than the simpler null hypothesis for null hypothesis tracks, but, as seen in the Fig S2, the likelihood ratio of a dataset corresponding to the null hypothesis is strongly skewed toward its upper limit 1. This means that for most of the tracks, the likelihood is not (or little) affected by the model complexity. The likelihoods of all the models are normalized so their integrals over the data equals 1/A with A the area of the field of view which is independent of the model complexity.

      (3) Relating to the mathematical foundation (Figure 1b). The measured positions are drawn as direct arrows from the real position states: this infers instantaneous localization. In reality, there is motion blur which introduces a correlation of the measured locations. Motion blur is known to introduce bias in SPT analysis, how does it affect the method here? 

      The reviewer raises an important point as our model does not explicitly consider motion blur. We have now added a paragraph that presents how our model performs in case of motion blur in the section called Robustness to model mismatches. This section and the corresponding new Supplemental Fig. S7 demonstrate that the estimated diffusion length is accurate so long as the static localization error is higher than the dynamic localization error. If the dynamic localization error is higher, our model systematically underestimates the diffusion length by a factor 0.81 = (2/3)<sup>0.5</sup> which can be corrected for with an added post-processing step.  

      (4) The authors did not go through the interpretation of the figure. This may be a matter of style, but I find the figures ambiguous to interpret at times.  

      We thank the reviewer for their feedback on improving the readability. To avoid overly repetitive and lengthy sections of text, we have opted for a concise approach. This allows us to present closely related panels at the same point in the text, while not ignoring important variations and tests. Considering this feedback and the reviewers, we have added more information and interpretation throughout our manuscript to improve interpretability.

      (5) It is not clear to me how the classification of the 5 motion types was accomplished. 

      We have modified the specific text related to this figure to describe an illustrative example to show how one could use aTrack on a dataset where not that much is known: First, we present the method to determine the number of states; second, we verify the parameter estimates correspond to the different states.  

      Classifying individual tracks is possible. While not done in the section corresponding to Fig. 5, this is done in Fig. 7 and a new supplementary plot, Fig. S9b (shown below). In brief, this is accomplished with our method by computing the likelihood of each track given each state. The probability that a given track is in state k equals the likelihood of the track given the state divided by the sum of the likelihoods given the different states. 

      (6) Figure 3. Caption: what is ((d_{est}-0.1)/0.1)? Also panel labeled as "d" should be "e". 

      Thank you for bringing these errors to our attention, the panel and caption have been corrected.

      Reviewer #3 (Public Review): 

      Summary: 

      In this work, Simon et al present a new computational tool to assess non-Brownian single-particle dynamics (aTrack). The authors provide a solid groundwork to determine the motion type of single trajectories via an analytical integration of multiple hidden variables, specifically accounting for localization uncertainty, directed/confined motion parameters, and, very novel, allowing for the evolution of the directed/confined motion parameters over time. This last step is, to the best of my knowledge, conceptually new and could prove very useful for the field in the future. The authors then use this groundwork to determine the motion type and its corresponding parameter values via a series of likelihood tests. This accounts for obtaining the motion type which is statistically most likely to be occurring (with Brownian motion as null hypothesis). Throughout the manuscript, aTrack is rigorously tested, and the limits of the methods are fully explored and clearly visualised. The authors conclude with allowing the characterization of multiple states in a single experiment with good accuracy and explore this in various experimental settings. Overall, the method is fundamentally strong, wellcharacterised, and tested, and will be of general interest to the single-particle-tracking field. 

      Strengths: 

      (1) The use of likelihood ratios gives a strong statistical relevance to the methodology. There is a sharp decrease in likelihood ratio between e.g. confinement of 0.00 and 0.05 and velocity of 0.0 and 0.002 (figure 2c), which clearly shows the strength of the method - being able to determine 2nm/timepoint directed movement with 20 nm loc. error and 100 nm/timepoint diffusion is very impressive. 

      We apologize for the confusion, the directed tracks in Fig 2 have no Brownian-motion component, i.e. D=0. We have made this clearer in the main text. Specifically, this section of the text refers to a track in linear motion with 2 nm displacements per step. With 70 time points (69 steps), a single particle which moved from 138 nm with a localization error of 20 nm (95% uncertainty range of 80 nm) can be statistically distinguished from slow diffusive motion.

      In Fig. 4g, we explore the capabilities of our method to detect if a diffusive particle also has a directed motion component. 

      (2) Allowing the hidden variables of confinement and directed motion to change during a trajectory (i.e. the q factor) is very interesting and allows for new interpretations of data. The quantifications of these variables are, to me, surprisingly accurate, but well-determined. 

      (3) The software is well-documented, easy to install, and easy to use. 

      Weaknesses: 

      (1) The aTrack principle is limited to the motions incorporated by the authors, with, as far as I can see, no way to add new analytical non-Brownian motion. For instance, being able to add a dynamical stateswitching model (i.e. quick on/off switching between mobile and non-mobile, for instance, repeatable DNA binding of a protein), could be of interest. I don't believe this necessarily has to be incorporated by the authors, but it might be of interest to provide instructions on how to expand aTrack.  

      We agree that handling dynamic state switching is very useful and highlight this potential future direction in the discussion. The revised text reads:

      “An important limitation of our approach is that it presumes that a given track follows a unique underlying model with fixed parameters. In biological systems, particles often transition from one motion type to another; for example, a diffusive particle can bind to a static substrate or molecular motor (46). In such cases, or in cases of significant mislinkings, our model is not suitable. However, this limitation can be alleviated by implicitly allowing state transitions with a hidden Markov Model (15) or alternatives such as change-point approaches (30, 47, 48), and spatial approaches (49).”

      (2) The experimental data does not very convincingly show the usefulness of aTrack. The authors mention that SPBs are directed in mitosis and not in interphase. This can be quantified and studied by microscopy analysis of individual cells and confirming the aTrack direction model based on this, but this is not performed. Similarly, the size of a confinement spot in optical tweezers can be changed by changing the power of the optical tweezer, and this would far more strongly show the quantitative power of aTrack. 

      We agree with the reviewer and have revised the biological experiment section significantly to better illustrate the potential of aTrack in various use cases.

      Now, we show an experiment to quantify the effect of LatA, an actin inhibitor, on the fraction of directed tracks obtained with aTrack. We find that LatA significantly decreases directed motion while a LatA-resistant mutant is not affected (Fig7a-c).

      As suggested by the reviewer, we have expanded the optical tweezer experiment by varying the laser power. As expected, increasing the laser power decreases the confinement radius.

      (3) The software has a very strict limit on the number of data points per trajectory, which is a user input. Shorter trajectories are discarded, while longer trajectories are cut off to the set length. It is not explained why this is necessary, and I feel it deletes a lot of useful data without clear benefit (in experimental conditions).

      We thank the reviewer for this recommendation; we have now modified the architecture of our model to enable users to consider tracks of multiple lengths. Note that the computation time is proportional to the longest track length times the number of tracks.  

      Reviewer #2 (Recommendations For The Authors): 

      Develop a better mathematical foundation for the likelihood ratio tests. 

      We added more explanation of the likelihood ratio tests and their interpretation a new section entitled Statistical test in the supplementary information to address this recommendation.

      Place this work in clearer contexts. 

      We have now revised the introduction to better contextualize this work.

      Improve manuscript clarity. 

      Based on reviewer feedback and input from others, we have addressed this point throughout the article to improve readability.

      Make the code available. 

      The code is available on https://github.com/FrancoisSimon/aTrack, now including code for track generation.

      Reviewer #3 (Recommendations For The Authors): 

      (1) I believe the underlying model presented in Figure 1 is of substantial impact, especially when considering it as a simulation tool. I would suggest the authors make their method also available as a simulator (as far as I can tell, this is not explicitly done in their code repository, although logically the code required for the simulator should already be in the codebase somewhere). 

      Thank you for this suggestion, the simulation scripts are now on the Github repository together with the rest of the analysis method. https://github.com/FrancoisSimon/aTrack

      (2) The authors should explore and/or discuss the effects of wrong trajectory linking to their method. Throughout the text, fully correct trajectory linking is assumed and assessed, while in real experiments, it is often the case that trajectory linking is wrong, e.g. due to blinking emitters, imaging artefacts, high-density localizations, etc etc. This would have a major impact on the accuracy of trajectories, and it is extremely relevant to explore how this is translated to the output of aTrack. 

      As the reviewer notes, our current model does not account for track mislinking. This limits the method to data with lower fluorophore-densities, which is the typical use-case for SPT. We have added a brief description of the issue into the discussion of limitations.  

      (3) aTrack only supports 2D-tracking, but I don't believe there is a conceptual reason not to have this expanded to three dimensions. 

      The stand-alone software is currently limited to 2D tracks, however, the aTrack Python package works for any number of dimensions (i.e. 1-3). Note that since the current implementation assumes a single localization error for all axes, more modifications may be required for some types of 3D tracking. See https://github.com/FrancoisSimon/aTrack for more details about aTrack implementations.

      (4) Crucial information is missing in the experimental demonstrations. Especially in the NP-bacteria dataset, I miss scalebars, and information on the number of tracks. It is not explained why 5 different states are obtained - especially because I would naively expect three states: immobile NPs (e.g. stuck to glass), diffusing NPs, and NPs attached to bacteria, and thus directed. Figure 7e shows three diffusive states (why more than one?), no immobile states (why?), and two directed states (why?). 

      We thank the reviewer for pointing out these issues. We have now added scalebars and more experimental details to the figure and text as well as modifying the plot to more clearly emphasize the directed nanoparticles that are attached to cells from the diffusive nanoparticles.  

      Likely, our focal plane was too high to see the particles stuck on glass. The multiple diffusive states may be caused by different sizes of nanoparticle complexes, the multiple directed states can be caused by the fact that directed motion of the cell-attached-nanoparticles occasionally shows drastic changes of orientations. We have also clarified in the text how multiple states can help handle a heterogeneous population as was shown by Prindle et al. 2022, Microbiol Spectr. The characterization and phenotyping of microbial populations by nanoparticle tracking was published in Zapata et al. 2022, Nanoscale. 

      (5) I don't think I agree that 'robustness to model mismatches' is a good thing. Very crudely, the fact that aTrack finds fractional Brownian motion to be normal Brownian motion is technically a downside - and this should be especially carefully positioned if (in the future) a fractional Brownian motion model would be added to aTrack. I think that the author's point can be better tested by e.g. widely varying simulated vs fitted loc precision/diffusion coefficient (which are somewhat interchangeable).

      In this context, our intention in describing the robustness to “model mismatches” refers to classifying subdiffusion as subdiffusive irrespective of the exact subdiffusion motion physics (as well as superdiffusion), that is, to use aTrack how MSD analysis is often deployed. This is important in the context of real-world applications where simple mathematical models cannot perfectly represent real tracks with greater complexity. 

      Inevitably, some fraction of tracks with a pure Brownian motion may appear to match with a fractional Brownian motion, and thus statistical tests are needed to determine if this is significant. In general, aTrack finds fBm to be normal Brownian motion only when the anomalous coefficient is near 1, i.e. when the two models are indeed the same. When analysing fBm tracks with anomalous coefficients of 0.5 or 1.5, aTrack find that these tracks are better explained by our confined diffusion model or directed motion model, respectively (Please see Fig. 6a, copied below). 

      To better clarify our objective, the section now has a brief introduction that reads:

      “One of the most important features of a method is its robustness to deviations from its assumptions. Indeed, experimental tracking data will inevitably not match the model assumptions to some degree, and models need to be resilient to these small deviations.”  

      Smaller points: 

      (1) It is not clear what a biological example is of rotational diffusion. 

      We modified the text to better explain the use of rotational diffusion.

      (2) The text in the section on experimental data should be expanded and clarified, there currently are multiple 'floating sentences' that stop halfway, and it does not clearly describe the biological relevance and observed findings.  

      We thank the reviewer for pointing out this issue. We have reworked the experimental section to better and more clearly explain the biological relevance of the findings.

      (3) Caption of figure 3: 'd' should be 'e'. 

      (4) Caption of Figure 7: log-likelihood should be Lconfined - Lbrownian, I believe. 

      (5) Equation number missing in SI first sentence. 

      (6) Supplementary Figure 1 top part access should be Lc-Lb instead of Ld-Lb. 

      We have made these corrections, thank you for bringing them to our attention.

    1. eLife Assessment

      The paper reports valuable findings about the mechanism of regulation of the heat shock response in plants that acts as a brake to prevent hyperactivation of the stress response, which have theoretical or practical implications for a subfield. The study presented by the authors provides solid methods, data, and analysis that broadly support the claims. This report presents helpful information regarding new spliced HSFs forms in Arabidopsis that highlights key information in the understanding of heat stress and plant growth.

    2. Reviewer #2 (Public review):

      Summary:

      The authors report that Arabidopsis short HSFs S-HsfA2, S-HsfA4c, and S-HsfB1 confer extreme heat. They have truncated DNA binding domains that bind to a new heat-regulated element. Considering Short HSFA2, the authors have highlighted the molecular mechanism by which S-HSFs prevent HSR hyperactivation via negative regulation of HSP17.6B. The S-HsfA2 protein binds to the DNA binding domain of HsfA2, thus preventing its binding to HSEs, eventually attenuating HsfA2-activated HSP17.6B promoter activity. This report adds insights to our understanding of heat tolerance and plant growth.

      Strengths:

      (1) The manuscript represents ample experiments to support the claim.

      (2) The manuscript covers a robust number of experiments and provides specific figures and graphs to in support of their claim.

      (3) The authors have chosen a topic to focus on stress tolerance in changing environment.

      (4) The authors have summarized the probable mechanism using a figure.

      Weaknesses:

      Quite minimum

      (1) Fig. 3. the EMSA to reveal binding

      (2) Alignment of supplementary figures 6-7.

    3. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      In the present work, Chen et al. investigate the role of short heat shock factors (S-HSF), generated through alternative splicing, in the regulation of the heat shock response (HSR). The authors focus on S-HsfA2, an HSFA2 splice variant containing a truncated DNA-binding domain (tDBD) and a known transcriptional-repressor leucin-rich domain (LRD). The authors found a two-fold effect of S-HsfA2 on gene expression. On the one hand, the specific binding of S-HsfA2 to the heat-regulated element (HRE), a novel type of heat shock element (HSE), represses gene expression. This mechanism was also shown for other S-HSFs, including HsfA4c and HsfB1. On the other hand, S-HsfA2 is shown to interact with the canonical HsfA2, as well as with a handful of other HSFs, and this interaction prevents HsfA2 from activating gene expression. The authors also identified potential S-HsfA2 targets and selected one, HSP17.6B, to investigate the role of the truncated HSF in the HSR. They conclude that S-HsfA2-mediated transcriptional repression of HSP17.6B helps avoid hyperactivation of the HSR by counteracting the action of the canonical HsfA2.

      The manuscript is well written and the reported findings are, overall, solid. The described results are likely to open new avenues in the plant stress research field, as several new molecular players are identified. Chen et al. use a combination of appropriate approaches to address the scientific questions posed. However, in some cases, the data are inadequately presented or insufficient to fully support the claims made. As such, the manuscript would highly benefit from tackling the following issues:

      (1) While the authors report the survival phenotypes of several independent lines, thereby strengthening the conclusions drawn, they do not specify whether the presented percentages are averages of multiple replicates or if they correspond to a single repetition. The number of times the experiment was repeated should be reported. In addition, Figure 7c lacks the quantification of the hsp17.6b-1 mutant phenotype, which is the background of the knock-in lines. This is an essential control for this experiment

      For the seedling survival rates and gene expression levels, we added statistical analysis based on at least two independent experiments. Figure 6E of the revised manuscript shows the phenotypes of the WT, hsp17.6b-1, HSP17.6B-KI, and HSP17.6B-OE plants and the statistical analysis of their seedling survival rates after heat exposure.

      (2) In Figure 1c, the transcript levels of HsfA2 splice variants are not evident, as the authors only show the quantification of the truncated variant. Moreover, similar to the phenotypes discussed above, it is unclear whether the reported values are averages and, if so, what is the error associated with the measurements. This information could explain the differences observed in the rosette phenotypes of the S-HsfA2-KD lines. Similarly, the gene expression quantification presented in Figures 4 and 5, as well as the GUS protein quantification of Figure 3F, also lacks this crucial information.

      RT‒qPCR analysis of the expression of these genes from at least two independent experiments was performed. We also added these missing information to the figure legends.

      (3) The quality of the main figures is low, which in some cases prevents proper visualization of the data presented. This is particularly critical for the quantification of the phenotypes shown in Figure 1b and for the fluorescence images in Figures 4f and 5b. Also, Figure 9b lacks essential information describing the components of the performed experiments.

      We apologize; owing to the limitations of equipment and technology, we will attempt to obtain high-quality images in the future. A detailed description of Figure 9b is provided in the methods section.

      (4) Mutants with low levels of S-HsfA2 yield smaller plants than the corresponding wild type. This appears contradictory, given that the proposed role of this truncated HSF is to counteract the growth repression induced by the canonical HSF. What would be a plausible explanation for this observation? Was this phenomenon observed with any of the other tested S-HSFs?

      We found that the constitutive expression of S-HsfA2 inhibits Arabidopsis growth. Considering this, Arabidopsis plants do not produce S-HsfA2 under normal conditions to avoid growth inhibition. However, under heat stress, Arabidopsis plants generate S-HsfA2, which contributes to heat tolerance and growth balance. In the revised manuscript, we provided supporting data indicating that S-HsfA4c-GFP or S-HsfB1-RFP constitutive expression confers Arabidopsis extreme heat stress sensitivity but inhibits root growth (Supplemental Figure S8). Therefore, this phenomenon is also observed in S-HsfA4c-GFP or S-HsfB1-RFP.

      (5) In some cases, the authors make statements that are not supported by the results:<br /> (i) the claim that only the truncated variant expression is changed in the knock-down lines is not supported by Figure 1c;

      In three S-HsfA2-KD lines, RT‒PCR splicing analysis revealed that HsfA2-II but not HsfA2-III is easily detected. In the revised manuscript, we added RT‒qPCR analysis, and the results revealed that the abundance of HsfA2-III and HsfA2-II but not that of the full-length HsfA2 mRNA significantly decreased under extreme heat (Figure 1C). Considering that HsfA2-III but not HsfA2-II is a predominant splice variant under extreme heat (Liu et al., 2013), S-HsfA2-KD may lead to the knockdown of alternative HsfA2 splicing transcripts, especially HsfA2-III.

      (ii) the increase in GUS signal in Figure 3a could also result from local protein production;

      We included this possibility in the results analysis.

      (iii) in Figure 6b, the deletion of the HRE abolishes heat responsiveness, rather than merely altering the level of response; and

      In the revised manuscript, we added new data concerning the roles of HREs and HSEs in the response of the HSP17.6B promoter to heat stress (Figure 6A). These results suggest that the HRE and HSE elements are responsible for the response of the HSP17.6B promoter to heat stress and that the HRE negatively regulates the HSP17.6B promoter at 37°C, whereas the HSE is positive at 42°C.

      (iv) the phenotypes in Figure 8b are not clear enough to conclude that HSP17.6B overexpressors exhibit a dwarf but heat-tolerant phenotype.

      When grown in soil, the HSP17.6B-OE seedlings presented a dwarf phenotype compared with the WT control. Heat stress resulted in browning of the WT leaves, but the leaves of the HSP17.6B-OE plants remained green, suggesting that the HSP17.6B-OE seedlings also presented a heat-tolerant phenotype in the soil. These results are qualitative but not quantitative experimental data; therefore, the conclusions are adjusted in the abstract and results sections.

      Reviewer #2 (Public review):

      Summary:

      The authors report that Arabidopsis short HSFs S-HsfA2, S-HsfA4c, and S-HsfB1 confer extreme heat. They have truncated DNA binding domains that bind to a new heat-regulated element. Considering Short HSFA2, the authors have highlighted the molecular mechanism by which S-HSFs prevent HSR hyperactivation via negative regulation of HSP17.6B. The S-HsfA2 protein binds to the DNA binding domain of HsfA2, thus preventing its binding to HSEs, eventually attenuating HsfA2-activated HSP17.6B promoter activity. This report adds insights to our understanding of heat tolerance and plant growth.

      Strengths:

      (1) The manuscript represents ample experiments to support the claim.

      (2) The manuscript covers a robust number of experiments and provides specific figures and graphs in support of their claim.

      (3) The authors have chosen a topic to focus on stress tolerance in a changing environment.

      Weaknesses:

      (1) One s-HsfA2 represents all the other s-Hsfs; S-HsfA4c, and S-HsfB1. s-Hsfs can be functionally different. Regulation may be positive or negative. Maybe the other s-hsfs may positively regulate for height and be suppressed by the activity of other s-hsfs.

      In this study, we used S-HsfA2, S-HsfA4c, and S-HsfB1 data to support the view that “splice variants of HSFs generate new plant HSFs”. We also noted that S-HsfA2 cannot represent a traditional S-HSF. S-HsfA4c and S-HsfB1 may have functions other than S-HsfA2 because of their different C-terminal motifs or domains. Different S-HSFs might participate in the same biological process, such as heat tolerance, through the coregulation of downstream genes. We added this information to the discussion section.

      (2) Previous reports on gene regulations by hsfs can highlight the mechanism.

      In the introduction section, we included these references concerning HSFs and S-HSFs.

      (3) The Materials and Methods section could be rearranged so that it is based on the correct flow of the procedure performed by the authors.

      The materials and methods and results sections are arranged in the logical order.

      (4) Graphical representation could explain the days after sowing data, to provide information regarding plant growth.

      The days after sowing (DAS) for the age of the Arabidopsis seedlings are stated in the Materials and Methods section and figure legends.

      (5) Clear images concerning GFP and RFP data could be used.

      We provided high-quality images of S-HsfA2-GFP and the GFP control (Figure 3 in the revised manuscript).

      Reviewing Editor comments:

      The EMSA shown in Figures 2, 3, 4, and 5, which are critical to support the manuscript's claims, are of poor quality, without any repeats to support. In addition, there is not much information about how these EMSA were done. I suggest including better EMSA in a new version of this manuscript.

      Thank you for your suggestion. We added the missing information, including the detailed EMSA method and experiment repeat times in the methods section and figure legends. We provide high-quality images of HRE probes binding to nuclear proteins (Figure 4E).

      Reviewer  #1 (Recommendations for the authors):

      (1) The paper is overall well-written, but it could greatly benefit from reorganizing the results subsections. Currently, there are entire subsections dedicated to supplementary figures (e.g., lines 177-191) and main figures split into different subsections (e.g., lines 237-246). It is recommended to organize all the information related to a main figure into a single subsection and to incorporate the description of the corresponding supplementary figures. This would imply a general reorganization of the figures, moving some information to the supplementary data (for instance, the data in Figure 4 could be supplementary to Figure 5) and vice versa (Supplementary Figure 4 should be incorporated into main Figure 2, as it presents very important results). Also, Figures 7 and 8 would be better presented if merged into a single figure/subsection.

      Thank you for your suggestion. We have merged some figures into a single figure according to the main information. In the current version, there are 8 main figures, which includes a new figure.

      (2) Survival phenotypes vary widely, making reliable statistical analysis challenging. The chlorophyll and fresh weight quantifications presented in figures such as Figure 5 appear to effectively describe the phenomenon and allow for statistical comparisons. Figures 1 and 7 would benefit from including these measurements if the variability in survival percentages is too high to calculate statistical differences reliably. Also, in Figure 8, all chlorophyll measurements should be normalized to fresh weight rather than seedling number due to the dwarfism observed in the overexpressor lines.

      Thank you for pointing out your concerns. We added statistical analysis based on at least two independent experiments, including Figures 1 and 7, to the original manuscript. In Figure 8 in the original manuscript, chlorophyll measurements were normalized to fresh weight.

      (3) Typos: in Figure 3a it should be "min" not "mim"; in Supplementary Figure 3, the GFP and merge images are swapped.

      We apologize for these errors, and we have corrected them. Supplementary Figure 3 was replaced with new images and was included in Figure 3 in the revised manuscript.

      Reviewer  #2 (Recommendations for the authors):

      (1) The abstract states "How this process is prevented to ensure proper plant growth has not been determined." The authors can be the first to do this, by adding graphical data on the height difference in hSfA2-arabidopis and wild-type Arabidopsis.

      Thank you and agree with you. We have added this information to the new working model (Figure 8)

      (2) The authors claim that Arabidopsis S-HsfA2, S-HsfA4c, and S-HsfB1; but have used S-HsfA2 to understand the action. The mechanisms being unknown for S-HsfA4c, and S-HsfB1 cannot be represented by S-HsfA2 to represent the mechanism.

      Thank you for your valuable comments. In this study, we used S-HsfA2, S-HsfA4c, and S-HsfB1 data to support the view that “splice variants of HSFs generate new plant HSFs”. We also noted that S-HsfA2 cannot represent a traditional S-HSF because S-HsfA4c and S-HsfB1 may have functions other than S-HsfA2. Therefore, we deleted “representative S-HSF” from the revised manuscript. In the future, we will conduct in-depth research on the relevant mechanisms of S-HsfA4c and S-HsfB1 under your guidance.

      (3) The authors can include which of the HSFs interacted with other genes of Arabidopsis reported by other researchers are positively or negatively regulated in heat response/ growth or the balance.

      In the introduction section, we included these genes. AtHsfA2, AtHsfA3, and BhHsf1 confer heat tolerance in Arabidopsis but also result in a dwarf phenotype in plants (Ogawa et al., 2007; Yoshida et al., 2008; Zhu et al., 2009).

      (4) The authors have started from the subsection plant materials and growth conditions. It is unclear from where the authors have found these HSF mutant Arabidopsis? Is it a continuation of some other work? As a reader, I am utterly confused because of the arrangement of the materials and methods section.

      We apologize for the lack of detailed information in the Materials and methods section. These mutants were purchased from AraShare (Fuzhou, China) and verified via PCR and RT‒qPCR. We added the missing information.

      (5) Is the DAS - Days After Sowing - represented as a graph or table? This will add data to the plant growth section to clearly state the difference between the mutants and the wild-type.

      In this study, the age of the Arabidopsis seedlings was calculated as days after sowing (DAS), as stated in the Materials and Methods section and figure legends.

      (6) Heat stress treatment after gus staining looks absurd. Should it not follow after plant materials and growth conditions, which should ideally be after the plant transformation and cloning section? The initial step is definitely about plasmid construction. Kindly rearrange.

      Thank you for your valuable suggestions. We have rearranged the logical order of the materials and methods.

      (7) The expression of GFP and RFP was not clearly seen in the images. This could be because of the poor resolution of the images added.

      We obtained high-quality images of S-HsfA2-GFP (Figure 3 in the revised manuscript).

      (8) We live in an age where it is widely known that genes are not functioning independently but are coregulated and coregulate other proteins. The authors can address the role of these spliced variants on gene regulation and compare them with the HSFs.

      We agree with your suggestion. In this study, HSP17.6B was identified as a direct gene of S-HsfA2 and HsfA2, which can partly explain the role of S-HsfA2 in heat resistance and growth balance. However, the mechanical mechanism by which S-HsfA2 regulates heat tolerance and growth balance may not be limited to HSP17.6B. On the basis of the current data, we propose that the putative S-HsfA2-DERB2A-HsfA3 module might be associated with the roles of S-HsfA2 in heat tolerance and growth balance. Please refer to the discussion section for a detailed explanation.

      (9) Regulatory elements can be validated in relation to their interaction with proven HSFs.

      Supplemental Figure S3 shows that His6-HsfA2 failed to bind to the HRE in vitro.

      (10) The authors seem to be biased toward heat stress and have not worked enough on plant growth. Biochemical data and images on plant growth could be added to bring out the novelty of this manuscript.

      Thank you for your suggestion. We added new data indicating that, compared with the wild-type control, S-HsfA2-GFP, S-HsfA4c-GFP, or S-HsfB1-GFP overexpression inhibited root length (Supplemental Figure 8).

      (11) Line 251 on page 11 of the submitted manuscript says that the s-Hsfs were previously identified by Liu et al. (2013) yet in the abstract the authors claim that these s-HsFs are NEW kinds of HSF with a unique truncated DNA-binding domain (tDBD) that binds a NEW heat-regulated element (HRE).

      In our previous report, several S-HSFs, including S-HsfA2, S-HsfA4, S-HsfB1, and S-HsfB2a, were identified primarily in Arabidopsis (Liu et al., 2013). In this study, we further characterized S-HsfA2, S-HsfA4, and S-HsfB1 and revealed several features of S-HSFs. Therefore, we claim that these S-HSFs are new kinds of HSFs.

      (12) What are these NEW kinds of HRE? Which genes have these HRE? Was an in silico study conducted to study it or can any reports can be cited?

      HREs, i.e., heat-regulated elements, are newly identified heat-responsive elements in this study. The sequences of HREs are partially related to traditional heat shock elements (HSEs). Because we did not identify the essential nucleic acids required for t-DBD binding to the HRE, we did not perform an in silico study.

      (13) S-HSFs may interact with existing HSFs. Have the authors thought in this direction? It can have a role in positively regulating other sHSFs or regulating multiple expressing genes related to plant growth and other functions. This needs to be explored.

      Thank you for this point. Given that the overexpression of Arabidopsis HsfA2 or HsfA3 inhibits growth under nonstress conditions, we discussed this direction from the perspective of the interaction of S-HsfA2 with HsfA2 or HsfA3 in the revised manuscript.

      (14) The authors need to concentrate on the presentation and arrangement of both their materials and methods and result section and write them in a systematic manner (or following a workflow).

      The materials, methods and results sections are arranged in logical order.

      (15) The authors have used references in the results section which can be added to the discussion section to make it more accurate.

      Thank you for your suggestions. We have moved some references to the discussion section, but the necessary references remain in the results section.

    1. eLife Assessment

      This manuscript reports the development and characterization of iGABASnFR2, a genetically encoded GABA sensor that demonstrates substantially improved performance compared to its predecessor, iGABASnFR1. The work is comprehensive and methodologically rigorous, combining high-throughput mutagenesis, functional screening, structural analysis, biophysical characterization, and in vivo validation. The significance of the findings is fundamental, and the supporting evidence is compelling. iGABASnFR2 represents a notable advance in GABA sensor engineering, enabling enhanced imaging of GABA transmission both in brain slices and in vivo, and constitutes a timely, technically robust addition to the molecular toolkit for neuroscience research.

    2. Reviewer #1 (Public review):

      Summary:

      This manuscript by Kolb and Hasseman et al. introduces a significantly improved GABA sensor, building on the pioneering work of the Janelia team. Given GABA's role as the main inhibitory neurotransmitter and the historical lack of effective optical tools for real-time in vivo GABA dynamics, this development is particularly impactful. The new sensor boasts an enhanced signal-to-noise ratio (SNR) and appropriate kinetics for detecting GABA dynamics in both in vitro and in vivo settings. The study is well-presented, with convincing and high-quality data, making this tool a valuable asset for future research into GABAergic signaling.

      Strengths:

      The core strength of this work lies in its significant advancement of GABA sensing technology. The authors have successfully developed a sensor with higher SNR and suitable kinetics, enabling the detection of GABA dynamics both in vitro and in vivo. This addresses a critical gap in neuroscience research, offering a much-needed optical tool for understanding the most important inhibitory neurotransmitter. The clear representation of the work and the convincing, high-quality data further bolster the manuscript's strengths, indicating the sensor's reliability and potential utility. We anticipate this tool will be invaluable for further investigation of GABAergic signaling.

      Weaknesses:

      Despite the notable progress, a key limitation is that the current generation of GABA sensors, including the one presented here, still exhibits inferior performance compared to state-of-the-art glutamate sensors. While this work is a substantial leap forward, it highlights that further improvements in GABA sensors would still be highly beneficial for the field to match the capabilities seen with glutamate sensors.

    3. Reviewer #2 (Public review):

      Summary:

      This manuscript presents the development and characterization of iGABASnFR2, a genetically encoded GABA sensor with markedly improved performance over its predecessor, iGABASnFR1. The study is comprehensive and methodologically rigorous, integrating high-throughput mutagenesis, functional screening, structural analysis, biophysical characterization, and in vivo validation. iGABASnFR2 represents a significant advancement in GABA sensor engineering and application in imaging GABA transmission in slice and in vivo. This is a timely and technically strong contribution to the molecular toolkit for neuroscience.

      Strengths:

      The authors apply a well-established sensor optimization pipeline and iterative engineering strategy from single-site to combinatorial mutants to engineer iGABASnFR2. The development of both positive and negative going variants (iGABASnFR2 and iGABASnFR2n) offers experimental flexibility. The structure and interpretation of the key mutations provide insights into the working mechanism of the sensor, which also suggest optimization strategies. Although individual improvements in intrinsic properties are incremental, their combined effect yields clear functional gains, enabling detection of direction-selective GABA release in the retina and volume-transmitted GABA signaling in somatosensory cortex, which were challenging or missed using iGABASnFR1.

      Weaknesses:

      With minor revisions and clarifications, especially regarding membrane trafficking, this manuscript will be a valuable resource for probing inhibitory transmission.

    1. eLife Assessment

      This paper performs a valuable critical reassessment of anatomical and functional data, proposing a reclassification of the mouse visual cortex in which almost all the higher visual areas are consolidated into a single area V2. However, the evidence supporting this unification is incomplete, as the key experimental observations that the model attempts to reproduce do not accurately reflect the literature. This study will likely be of interest to neuroscientists focused on the mouse visual cortex and the evolution of cortical organization.

    2. Reviewer #1 (Public review):

      Summary:

      In this manuscript, the authors argue that defining higher visual areas (HVAs) based on reversals of retinotopic tuning has led to an over-parcellation of secondary visual cortices. Using retinotopic models, they propose that the HVAs are more parsimoniously mapped as a single area V2, which encircles V1 and exhibits complex retinotopy. They reanalyze functional data to argue that functional differences between HVAs can be explained by retinotopic coverage. Finally, they compare the classification of mouse visual cortex to that of other species to argue that our current classification is inconsistent with those used in other model species.

      Strengths:

      This manuscript is bold and thought-provoking, and is a must-read for mouse visual neuroscientists. The authors take a strong stance on combining all HVAs, with the possible exception of area POR, into a single V2 region. Although I suspect many in the field will find that their proposal goes too far, many will agree that we need to closely examine the assumptions of previous classifications to derive a more accurate areal map. The authors' supporting analyses are clear and bolster their argument. Finally, they make a compelling argument for why the classification is not just semantic, but has ramifications for the design of experiments and analysis of data.

      Weaknesses:

      Although I enjoyed the polemic nature of the manuscript, there are a few issues that weaken their argument.

      (1) Although the authors make a compelling argument that retinotopic reversals are insufficient to define distinct regions, they are less clear about what would constitute convincing evidence for distinct visual regions. They mention that a distinct area V3 has been (correctly) defined in ferrets based on "cytoarchitecture, anatomy, and functional properties", but elsewhere argue that none of these factors are sufficient to parcellate any of the HVAs in mouse cortex, despite some striking differences between HVAs in each of these factors. It would be helpful to clearly define a set of criteria that could be used for classifying distinct regions.

      (2) On a related note, although the authors carry out impressive analyses to show that differences in functional properties between HVAs could be explained by retinotopy, they glossed over some contrary evidence that there are functional differences independent of retinotopy. For example, axon projections to different HVAs originating from a single V1 injection - presumably including neurons with similar retinotopy - exhibit distinct functional properties (Glickfeld LL et al, Nat Neuro, 2013). As another example, interdigitated M2+/M2- patches in V1 show very different HVA connectivity and response properties, again independent of V1 location/retinotopy (Meier AM et al., bioRxiv). One consideration is that the secondary regions might be considered a single V2 with distinct functional modules based on retinotopy and connectivity (e.g., V2LM, V2PM, etc).

      (3) Some of the HVAs-such as AL, AM, and LI-appear to have redundant retinotopic coverage with other HVAS, such as LM and PM. Moreover, these regions have typically been found to have higher "hierarchy scores" based on connectivity (Harris JA et al., Nature, 2019; D'Souza RD et al., Nat Comm, 2022), though unfortunately, the hierarchy levels are not completely consistent between studies. Based on existing evidence, there is a reasonable argument to be made for a hybrid classification, in which some regions (e.g., LM, P, PM, and RL) are combined into a single V2 (though see point #2 above) while other HVAs are maintained as independent visual regions, distinct from V2. I don't expect the authors to revise their viewpoint in any way, but a more nuanced discussion of alternative classifications is warranted.

    3. Reviewer #2 (Public review):

      Summary:

      The study by Rowley and Sedigh-Sarvestani presents modeling data suggesting that map reversals in mouse lateral extrastriate visual cortex do not coincide with areal borders, but instead represent borders between subregions within a single area V2. The authors propose that such an organization explains the partial coverage in higher-order areas reported by Zhuang et al., (2017). The scheme revisits an organization proposed by Kaas et al., (1989), who interpreted the multiple projection patches traced from V1 in the squirrel lateral extrastriate cortex as subregions within a single area V2. Kaas et al's interpretation was challenged by Wang and Burkhalter (2007), who used a combination of topographic mapping of V1 connections and receptive field recordings in mice. Their findings supported a different partitioning scheme in which each projection patch mapped a specific topographic location within single areas, each containing a complete representation of the visual field. The area map of mouse visual cortex by Wang and Burkhalter (2007) has been reproduced by hundreds of studies and has been widely accepted as ground truth (CCF) (Wang et al., 2020) of the layout of rodent cortex. In the meantime, topographic mappings in marmoset and tree shew visual cortex made a strong case for map reversals in lateral extrastriate cortex, which represent borders between functionally diverse subregions within a single area V2. These findings from non-rodent species raised doubts about whether during evolution, different mammalian branches have developed diverse partitioning schemes of the cerebral cortex. Rowley and Sedigh-Sarvestani favor a single master plan in which, across evolution, all mammalian species have used a similar blueprint for subdividing the cortex.

      Strengths:

      The story illustrates the enduring strength of science in search of definitive answers.

      Weaknesses:

      To me, it remains an open question whether Rowley and Sedigh-Sarvestani have written the final chapter of the saga. A key reason for my reservation is that the areas the maps used in their model are cherry-picked. The article disregards published complementary maps, which show that the entire visual field is represented in multiple areas (i.e. LM, AL) of lateral extrastriate cortex and that the map reversal between LM and AL coincides precisely with the transition in m2AChR expression and cytoarchitecture (Wang and Burkhalter, 2007; Wang et al., 2011). Evidence from experiments in rats supports the gist of the findings in the mouse visual cortex (Coogan and Burkhalter, 1993).

      (1) The selective use of published evidence, such as the complete visual field representation in higher visual areas of lateral extrastriate cortex (Wang and Burkhalter, 2007; Wang et al., 2011) makes the report more of an opinion piece than an original research article that systematically analyzes the area map of mouse visual cortex we have proposed. No direct evidence is presented for a single area V2 with functionally distinct subregions.

      (2) The article misrepresents evidence by commenting that m2AChR expression is mainly associated with the lower field. This is counter to published findings showing that m2AChR spans across the entire visual field (Gamanut et al., 2018; Meier et al., 2021). The utility of markers for delineating areal boundaries is discounted, without any evidence, in disregard of evidence for distinct areal patterns in early development (Wang et al., 2011). Pointing out that markers can be distributed non-uniformly within an area is well-familiar. m2AChR is non-uniformly expressed in mouse V1, LM and LI (Ji et al., 2015; D'Souza et al., 2019; Meier et al., 2021). Recently, it has been found that the patchy organization within V1 plays a role in the organization of thalamocortical and intracortical networks (Meier et al., 2025). m2AChR-positive patches and m2AChR-negative interpatches organize the functionally distinct ventral and dorsal networks, notably without obvious bias for upper and lower parts of the visual field.

      (3) The study has adopted an area partitioning scheme, which is said to be based on anatomically defined boundaries of V2 (Zhuang et al., 2017). The only anatomical borders used by Zhuang et al. (2017) are those of V1 and barrel cortex, identified by cytochrome oxidase staining. In reality, the partitioning of the visual cortex was based on field sign maps, which are reproduced from Zhuang et al., (2017) in Figure 1A. It is unclear why the maps shown in Figures 2E and 2F differ from those in Figure 1A. It is possible that this is an oversight. But maintaining consistent areal boundaries across experimental conditions that are referenced to the underlying brain structure is critical for assigning modeled projections to areas or sub-regions. This problem is evident in Figure 2F, which is presented as evidence that the modeling approach recapitulates the tracings shown in Figure 3 of Wang and Burkhalter (2007). The dissimilarities between the modeling and tracing results are striking, unlike what is stated in the legend of Figure 2F.

      (4) The Rowley and Sedigh-Sarvestani find that the partial coverage of the visual field in higher order areas shown by Zhuang et al (2017) is recreated by the model. It is important to caution that Zhuang et al's (2017) maps were derived from incomplete mappings of the visual field, which was confined to -25-35 deg of elevation. This underestimates the coverage we have found in LM and AL. Receptive field mappings show that LM covers 0-90 deg of azimuth and -30-80 elevation (Wang and Burkhalter, 2007). AL covers at least 0-90 deg of azimuth and -30-50 deg of elevation (Wang and Burkhalter, 2007; Wang et al., 2011). These are important differences. Partial coverage in LM and AL underestimates the size of these areas and may map two projection patches as inputs to subregions of a single area rather than inputs to two separate areas. Complete, or nearly complete, visual representations in LM and AL support that each is a single area. Importantly, both areas are included in a callosal-free zone (Wang and Burkhalter, 2007). The surrounding callosal connections align with the vertical meridian representation. The single map reversal is marked by a transition in m2AChR expression and cytoarchitecture (Wang et al., 2011).

      (5) The statement that the "lack of visual field overlap across areas is suggestive of a lack of hierarchical processing" is predicated on the full acceptance of the mappings by Zhuang et al (2017). Based on the evidence reviewed above, the reclassification of visual areas proposed in Figure 1C seems premature.

      (6) The existence of lateral connections is not unique to rodent cortex and has been described in primates (Felleman and Van Essen, 1991).

      (7) Why the mouse and rat extrastriate visual cortex differ from those of many other mammals is unclear. One reason may be that mammals with V2 subregions are strongly binocular.

    4. Reviewer #3 (Public review):

      Summary:

      The authors review published literature and propose that a visual cortical region in the mouse that is widely considered to contain multiple visual areas should be considered a single visual area.

      Strengths:

      The authors point out that relatively new data showing reversals of visual-field sign within known, single visual areas of some species require that a visual field sign change by itself should not be considered evidence for a border between visual areas.

      Weaknesses:

      The existing data are not consistent with the authors' proposal to consolidate multiple mouse areas into a single "V2". This is because the existing definition of a single area is that it cannot have redundant representations of the visual field. The authors ignore this requirement, as well as the data and definitions found in published manuscripts, and make an inaccurate claim that "higher order visual areas in the mouse do not have overlapping representations of the visual field". For quantification of the extent of overlap of representations between 11 mouse visual areas, see Figure 6G of Garrett et al. 2014. [Garrett, M.E., Nauhaus, I., Marshel, J.H., and Callaway, E.M. (2014). Topography and areal organization of mouse visual cortex. The Journal of neuroscience 34, 12587-12600. 10.1523/JNEUROSCI.1124-14.2014.

    5. Author response:

      eLife Assessment:

      This paper performs a valuable critical reassessment of anatomical and functional data, proposing a reclassification of the mouse visual cortex in which almost all the higher visual areas are consolidated into a single area V2. However, the evidence supporting this unification is incomplete, as the key experimental observations that the model attempts to reproduce do not accurately reflect the literature . This study will likely be of interest to neuroscientists focused on the mouse visual cortex and the evolution of cortical organization.

      We do not agree or understand which 'key experimental observations' that the model attempts to reproduce do not accurately reflect the literature. The model reproduces a complete map of the visual field, with overlap in certain regions. When reversals are used to delineate areas, as is the current custom, multiple higher order areas are generated, and each area has a biased and overlapping visual field coverage. These are the simple outputs of the model, and they are consistent with the published literature, including recent publications such as Garrett et al. 2014 and Zhuang et al. 2017, a paper published in this journal. The area boundaries produced by the model are not identical to area boundaries in the literature, because the model is a simplification.

      Reviewer #1 (Public review):

      Summary:

      In this manuscript, the authors argue that defining higher visual areas (HVAs) based on reversals of retinotopic tuning has led to an over-parcellation of secondary visual cortices. Using retinotopic models, they propose that the HVAs are more parsimoniously mapped as a single area V2, which encircles V1 and exhibits complex retinotopy. They reanalyze functional data to argue that functional differences between HVAs can be explained by retinotopic coverage. Finally, they compare the classification of mouse visual cortex to that of other species to argue that our current classification is inconsistent with those used in other model species.

      Strengths:

      This manuscript is bold and thought-provoking, and is a must-read for mouse visual neuroscientists. The authors take a strong stance on combining all HVAs, with the possible exception of area POR, into a single V2 region. Although I suspect many in the field will find that their proposal goes too far, many will agree that we need to closely examine the assumptions of previous classifications to derive a more accurate areal map. The authors' supporting analyses are clear and bolster their argument. Finally, they make a compelling argument for why the classification is not just semantic, but has ramifications for the design of experiments and analysis of data.

      Weaknesses:

      Although I enjoyed the polemic nature of the manuscript, there are a few issues that weaken their argument.

      (1) Although the authors make a compelling argument that retinotopic reversals are insufficient to define distinct regions, they are less clear about what would constitute convincing evidence for distinct visual regions. They mention that a distinct area V3 has been (correctly) defined in ferrets based on "cytoarchitecture, anatomy, and functional properties", but elsewhere argue that none of these factors are sufficient to parcellate any of the HVAs in mouse cortex, despite some striking differences between HVAs in each of these factors. It would be helpful to clearly define a set of criteria that could be used for classifying distinct regions.

      We agree the revised manuscript would benefit from a clear discussion of updated rules of area delineation in the mouse. In brief, we argue that retinotopy alone should not be used to delineate area boundaries in mice, or any other species. Although there is some evidence for functional property, architecture, and connectivity changes across mouse HVAs, area boundaries continue to be defined primarily, and sometimes solely (Garrett et al., 2014; Juavinett et al., 2018; Zhuang et al., 2017), based on retinotopy. We acknowledge that earlier work (Wang and Burkhalter, 2007; Wang et al., 2011) did consider cytoarchitecture and connectivity alongside retinotopy, but more recent work has shifted to a focus on retinotopy as indicated by the currently accepted criterion for area delineation.  

      As reviewer #2 points out, the present criteria for mouse visual area delineation can be found in the Methods section of: [Garrett, M.E., Nauhaus, I., Marshel, J.H., and Callaway, E.M. (2014)].

      Criterion 1: Each area must contain the same visual field sign at all locations within the area.

      Criterion 2: Each visual area cannot have a redundant representation of visual space.

      Criterion 3: Adjacent areas of the same visual field sign must have a redundant representation.

      Criterion 4: An area's location must be consistently identifiable across experiments.

      As discussed in the manuscript, recent evidence in higher order visual cortex of tree shrews and rats led us to question the universality of these criteria across species. Specifically, tree shrew V2, macaque V2, and marmoset DM, exhibit reversals in visual field-sign in what are defined as single visual areas. This suggests that criterion 1 should be updated. It also suggests that Criterion 2 and 3 should be updated since visual field sign reversals often co-occur with retinotopic redundancies, since reversing course in the direction of progression along the visual field can easily lead to coverage of visual field regions already traveled.  

      More broadly, we argue that topography is just one of several criteria that should be considered in area delineation. We understand that few visual areas in any species meet all criteria, but we emphasize that topography cannot consistently be the sole satisfied criterion – as it currently appears to be for many mouse HVAs. Inspired by a recent perspective on cortical area delineation (Petersen et al., 2024), we suggest the following rules, that will be worked into the revised version of the manuscript. Topography is a criterion, but it comes after considerations of function, architectonics and connectivity.

      (1) Function—Cortical areas differ from neighboring areas in their functional properties  

      (2) Architectonics—Cortical areas often exhibit distinctions from neighboring areas in multiple cyto- and myeloarchitectonic markers

      (3) Connectivity—Cortical areas are characterized by a specific set of connectional inputs and outputs from and to other areas

      (4) Topography—Cortical areas often exhibit a distinct topography that balances maximal coverage of the sensory field with minimal redundancy of coverage within an area.

      As we discuss in the manuscript, although there are functional, architectonic, and connectivity differences across mouse HVAs, they typically vary smoothly across multiple areas – such that neighboring areas share the same properties and there are no sharp borders. For instance, sharp borders in cytoarchitecture are generally lacking in the mouse HVAs. A notable exceptions to this is the clear and sharp change in m2AChR expression that occurs between LM and AL (Wang et al., 2011). 

      (2) On a related note, although the authors carry out impressive analyses to show that differences in functional properties between HVAs could be explained by retinotopy, they glossed over some contrary evidence that there are functional differences independent of retinotopy. For example, axon projections to different HVAs originating from a single V1 injection - presumably including neurons with similar retinotopy - exhibit distinct functional properties (Glickfeld LL et al, Nat Neuro, 2013). As another example, interdigitated M2+/M2- patches in V1 show very different HVA connectivity and response properties, again independent of V1 location/retinotopy (Meier AM et al., bioRxiv). One consideration is that the secondary regions might be considered a single V2 with distinct functional modules based on retinotopy and connectivity (e.g., V2LM, V2PM, etc).

      Thank you for the correction. We will revise the text to discuss (Glickfeld et al., 2013), as it remains some of the strongest evidence in favor of retinotopy-independent functional specialization of mouse HVAs. However, one caveat of this study is the size of the V1 injection that is the source of axons studied in the HVAs. As apparent in Figure 1B, the large injection covers nearly a quarter of V1. It is worth nothing that (Han et al., 2018) found, using single-cell reconstructions and MAPseq, that the majority of V1 neurons project to multiple nearby HVA targets. In this experiment the tracing does not suffer from the problem of spreading over V1’s retinotopic map, and suggests that, presumably retinotopically matched, locations in each area receive shared inputs from the V1 population rather than a distinct but spatially interspersed subset. In fact, the authors conclude “Interestingly, the location of the cell body within V1 was predictive of projection target for some recipient areas (Extended Data Fig. 8). Given the retinotopic organization of V1, this suggests that visual information from different parts of visual field may be preferentially distributed to  specific target areas, which is consistent with recent findings (Zhuang et al., 2017)”. Given an injection covering a large portion of the retinotopic map, and the fact that feed-forward projections from V1 to HVAs carry coarse retinotopy - it is difficult to prove that functional specializations noted in the HVA axons are retinotopyindependent. This would require measurement of receptive field location in the axonal boutons, which the authors did not perform (possibly because the SNR of calcium indicators prevented such measurements at the time).  

      Another option would be to show that adjacent neurons in V1, that project to far-apart HVAs, exhibit distinct functional properties on par with differences exhibited by neurons in very different parts of V1 due to retinotopy. In other words, the functional specificity of V1 inputs to HVAs at retinotopically identical locations is of the same order as those that might be gained by retinotopic biases. To our knowledge, such a study has not been conducted, so we have decided to measure the data in collaboration with the Allen Institute. As part of the Allen Institute’s pioneering OpenScope project, we will make careful two-photon and electrophysiology measurements of functional properties, including receptive field location, SF, and TF in different parts of the V1 retinotopic map. Pairing this data with existing Allen Institute datasets on functional properties of neurons in the HVAs will allow us to rule in, or rule-out, our hypotheses regarding retinotopy as the source of functional specialization in mouse HVAs. We will update the discussion in the revised manuscript to better reflect the need for additional evidence to support or refute our proposal.

      Meier AM et al., bioRxiv 2025 (Meier et al., 2025) was published after our submission, but we are thankful to the reviewers for guiding our attention to this timely paper. Given the recent findings on the influence of locomotion on rodent and primate visual cortex, it is very exciting to see clearly specialized circuits for processing self-generated visual motion in V1. However, it is difficult to rule out the role of retinotopy as the HVA areas (LM, AL, RL) participating in the M2+ network less responsive to self-generated visual motion exhibit a bias for the medial portion of the visual field and the HVA area (PM) involved in the M2- network responsive to self-generated visual motion exhibit a bias for the lateral (or peripheral) parts of the visual field. For instance, a peripheral bias in area PM has been shown using retrograde tracing as in Figure 6 of (Morimoto et al., 2021), single-cell anterograde tracing  as in Extended Data Figure 8 of (Han et al., 2018), and functional imaging studies (Zhuang et al., 2017). Recent findings in the marmoset also point to visual circuits in the peripheral, but not central, visual field being significantly modulated by selfgenerated movements (Rowley et al., 2024). 

      However, a visual field bias in area PM that selectively receive M2- inputs is at odds with the clear presence of modular M2+/M2- patches across the entire map of V1 (Ji et al., 2015).  One possibility supported by existing data is that neurons in M2- patches, as well as those in M2+ patches, in the central representation of V1 make fewer or significantly weaker connections with area PM compared to areas LM, AL and RL. Evidence to the contrary would support retinotopy-independent and functionally specialized inputs from V1 to HVAs.

      (3) Some of the HVAs-such as AL, AM, and LI-appear to have redundant retinotopic coverage with other HVAS, such as LM and PM. Moreover, these regions have typically been found to have higher "hierarchy scores" based on connectivity (Harris JA et al., Nature, 2019; D'Souza RD et al., Nat Comm, 2022), though unfortunately, the hierarchy levels are not completely consistent between studies. Based on existing evidence, there is a reasonable argument to be made for a hybrid classification, in which some regions (e.g., LM, P, PM, and RL) are combined into a single V2 (though see point #2 above) while other HVAs are maintained as independent visual regions, distinct from V2. I don't expect the authors to revise their viewpoint in any way, but a more nuanced discussion of alternative classifications is warranted.

      We understand that such a proposal would combine a subset of areas with matched field sign (LM, P, PM, and RL) would be less extreme and received better by the community. This would create a V2 with a smooth map without reversals or significant redundant retinotopic coverage. However, the intuition we have built from our modeling studies suggest that both these areas, and the other smaller areas with negative field sign (AL, AM, LI), are a byproduct of a complex single map of the visual field that exhibits reversals as it contorts around the triangular and tear-shaped boundaries of V1. In other words, we believe the redundant coverage and field-sign changes/reversals are a byproduct of a single secondary visual field in V2 constrained by the cortical dimensions of V1. That being said, we understand that area delineations are in part based on a consensus by the community. Therefore we will continue to discuss our proposal with community members, and we will incorporate new evidence supporting or refuting our hypothesis, before we submit our revised manuscript.

      Reviewer #2 (Public review):

      Summary:

      The study by Rowley and Sedigh-Sarvestani presents modeling data suggesting that map reversals in mouse lateral extrastriate visual cortex do not coincide with areal borders, but instead represent borders between subregions within a single area V2. The authors propose that such an organization explains the partial coverage in higher-order areas reported by Zhuang et al., (2017). The scheme revisits an organization proposed by Kaas et al., (1989), who interpreted the multiple projection patches traced from V1 in the squirrel lateral extrastriate cortex as subregions within a single area V2. Kaas et al's interpretation was challenged by Wang and Burkhalter (2007), who used a combination of topographic mapping of V1 connections and receptive field recordings in mice. Their findings supported a different partitioning scheme in which each projection patch mapped a specific topographic location within single areas, each containing a complete representation of the visual field. The area map of mouse visual cortex by Wang and Burkhalter (2007) has been reproduced by hundreds of studies and has been widely accepted as ground truth (CCF) (Wang et al., 2020) of the layout of rodent cortex. In the meantime, topographic mappings in marmoset and tree shew visual cortex made a strong case for map reversals in lateral extrastriate cortex, which represent borders between functionally diverse subregions within a single area V2. These findings from non-rodent species raised doubts about whether during evolution, different mammalian branches have developed diverse partitioning schemes of the cerebral cortex. Rowley and Sedigh-Sarvestani favor a single master plan in which, across evolution, all mammalian species have used a similar blueprint for subdividing the cortex.

      Strengths:

      The story illustrates the enduring strength of science in search of definitive answers.

      Weaknesses:

      To me, it remains an open question whether Rowley and Sedigh-Sarvestani have written the final chapter of the saga. A key reason for my reservation is that the areas the maps used in their model are cherry-picked. The article disregards published complementary maps, which show that the entire visual field is represented in multiple areas (i.e. LM, AL) of lateral extrastriate cortex and that the map reversal between LM and AL coincides precisely with the transition in m2AChR expression and cytoarchitecture (Wang and Burkhalter, 2007; Wang et al., 2011). Evidence from experiments in rats supports the gist of the findings in the mouse visual cortex (Coogan and Burkhalter, 1993).

      We would not claim to have written the final chapter of the saga. Our goal was to add an important piece of new evidence to the discussion of area delineations across species. We believe this new evidence supports our unification hypothesis.  We also believe that there are several missing pieces of data that could support or refute our hypothesis. We have begun a collaboration to collect some of this data.  

      (1) The selective use of published evidence, such as the complete visual field representation in higher visual areas of lateral extrastriate cortex (Wang and Burkhalter, 2007; Wang et al., 2011) makes the report more of an opinion piece than an original research article that systematically analyzes the area map of mouse visual cortex we have proposed. No direct evidence is presented for a single area V2 with functionally distinct subregions.

      This brings up a nuanced issue regarding visual field coverage. Wang & Burkhalter, 2007 Figure 6 shows the receptive field of sample neurons in area LM that cover the full range between 0 and 90 degrees of azimuth, and -40 to 80 degree of elevation – which essentially matches the visual field coverage in V1. However, we do not know whether these neurons are representative of most neurons in area LM. In other words, while these single-cell recordings along selected contours in cortex show the span of the visual field coverage, they may not be able to capture crucial information about its shape, missing regions of the visual field or potential bias. To mitigate this, visual field maps measured with electrophysiology are commonly produced by even sampling across the two dimensions of the visual area, either by moving a single electrode along a grid-pattern (e.g. (Manger et al., 2002)), or using a grid-liked multi-electrode probe (e.g. (Yu et al., 2020)). This was not carried out either in Wang & Burkhalter 2007 or Wang et al. 2011.  Even sampling of cortical space is time consuming and difficult with electrophysiology, but efficient with functional imaging. Therefore, despite the likely under-estimation of visual field coverage, imaging techniques are valuable in that they can efficiently exhibit not only the span of the visual field of a cortical region, but also its shape and bias.  

      Multiple functional imaging studies that simultaneously measure visual field coverage in V1 and HVAs report a bias in the coverage of HVAs, relative to that in V1 (Garrett et al., 2014; Juavinett et al., 2018; Zhuang et al., 2017). While functional imaging will likely underestimate receptive fields compared to electrophysiology, the consistent observation of an orderly bias for distinct parts of the visual field across the HVAs suggests that at least some of the HVAs do not have full and uniform coverage of the visual field comparable to that in V1. For instance, (Garrett et al., 2014) show that the total coverage in HVAs, when compared to V1, is typically less than half (Figure 6D) and often irregularly shaped.

      Careful measurements of single-cell receptive fields, using mesoscopic two-photon imaging across the HVAs would settle this question. As reviewer #1 points out, this is technically feasible, though no dataset of this kind exists to our knowledge.

      (2) The article misrepresents evidence by commenting that m2AChR expression is mainly associated with the lower field. This is counter to published findings showing that m2AChR spans across the entire visual field (Gamanut et al., 2018; Meier et al., 2021). The utility of markers for delineating areal boundaries is discounted, without any evidence, in disregard of evidence for distinct areal patterns in early development (Wang et al., 2011). Pointing out that markers can be distributed non-uniformly within an area is well-familiar. m2AChR is non-uniformly expressed in mouse V1, LM and LI (Ji et al., 2015; D'Souza et al., 2019; Meier et al., 2021). Recently, it has been found that the patchy organization within V1 plays a role in the organization of thalamocortical and intracortical networks (Meier et al., 2025). m2AChR-positive patches and m2AChR-negative interpatches organize the functionally distinct ventral and dorsal networks, notably without obvious bias for upper and lower parts of the visual field.

      We wrote that “Future work showed boundaries in labeling of histological markers such as SMI-32 and m2ChR labeling, but such changes mostly delineated area LM/AL (Wang et al., 2011) and seemed to be correlated with the representation of the lower visual field.” The latter statement regarding the representation of the lower visual field is directly referencing the data in Figure 1 of (Wang et al., 2011), which is titled “Figure 1: LM/AL border identified by the transition of m2AChR expression coincides with receptive field recordings from lower visual field.” Similar to the Wang et al., we were simply referring to the fact that the border of area LM/AL co-exhibits a change in m2AChR expression as well as lower-visual field representation.  

      (3) The study has adopted an area partitioning scheme, which is said to be based on anatomically defined boundaries of V2 (Zhuang et al., 2017). The only anatomical borders used by Zhuang et al. (2017) are those of V1 and barrel cortex, identified by cytochrome oxidase staining. In reality, the partitioning of the visual cortex was based on field sign maps, which are reproduced from Zhuang et al., (2017) in Figure 1A. It is unclear why the maps shown in Figures 2E and 2F differ from those in Figure 1A. It is possible that this is an oversight. But maintaining consistent areal boundaries across experimental conditions that are referenced to the underlying brain structure is critical for assigning modeled projections to areas or sub-regions. This problem is evident in Figure 2F, which is presented as evidence that the modeling approach recapitulates the tracings shown in Figure 3 of Wang and Burkhalter (2007). The dissimilarities between the modeling and tracing results are striking, unlike what is stated in the legend of Figure 2F.

      Thanks for this correction. By “anatomical boundaries of higher visual cortex”, we meant the cortical boundary between V1 and higher order visual areas on one end, and the outer edge of the envelope that defines the functional boundaries of the HVAs in cortical space (Zhuang et al., 2017). The reviewer is correct that we should have referred to these as functional boundaries. The word ‘anatomical’ was meant to refer to cortical space, rather than visual field space.

      More generally though, there is no disagreement between the partitioning of visual cortex in Figure 1 and 2. Rather, the portioning in Figure 1 is directly taken from Zhuang et al., (2017) whereas those in Figure 2 are produced by mathematical model simulation. As such, one would not expect identical areal boundaries between Figure 2 and Figure 1. What we aimed to communicate with our modeling results, is that a single area can exhibit multiple visual field reversals and retinotopic redundancies if it is constrained to fit around V1 and cover a visual field approximately matched to the visual field coverage in V1. We defined this area explicitly as a single area with a single visual field (boundaries shown in Figure 2A). So  the point of our simulation is to show that even an explicitly defined single area can appear as multiple areas if it is constrained by the shape of mouse V1, and if visual field reversals are used to indicate areal boundaries. As in most models, different initial conditions and parameters produce a complex visual field which will appear as multiple HVAs when delineated by areal boundaries. What is consistent however, is the existence of complex single visual field that appears as multiple HVAs with partially overlapping coverage.

      Similarly, we would not expect a simple model to exactly reproduce the multi-color tracer injections in Wang and Burkhalter (2007). However, we find it quite compelling that the model can produce multiple groups of multi-colored axonal projections beyond V1 that can appear as multiple areas each with their own map of the visual field using current criteria, when the model is explicitly designed to map a single visual field. We will explain the results of the model, and their implications, better in the revised manuscript.

      (4) The Rowley and Sedigh-Sarvestani find that the partial coverage of the visual field in higher order areas shown by Zhuang et al (2017) is recreated by the model. It is important to caution that Zhuang et al's (2017) maps were derived from incomplete mappings of the visual field, which was confined to -25-35 deg of elevation. This underestimates the coverage we have found in LM and AL. Receptive field mappings show that LM covers 0-90 deg of azimuth and -30-80 elevation (Wang and Burkhalter, 2007). AL covers at least 0-90 deg of azimuth and -30-50 deg of elevation (Wang and Burkhalter, 2007; Wang et al., 2011). These are important differences. Partial coverage in LM and AL underestimates the size of these areas and may map two projection patches as inputs to subregions of a single area rather than inputs to two separate areas. Complete, or nearly complete, visual representations in LM and AL support that each is a single area. Importantly, both areas are included in a callosal-free zone (Wang and Burkhalter, 2007). The surrounding callosal connections align with the vertical meridian representation. The single map reversal is marked by a transition in m2AChR expression and cytoarchitecture (Wang et al., 2011).

      This is a good point. We do not expect that expanding the coverage of V1 will change the results of the model significantly. However, for the revised manuscript, we will update V1 coverage to be accurate, repeat our simulations, and report the results.  

      (5) The statement that the "lack of visual field overlap across areas is suggestive of a lack of hierarchical processing" is predicated on the full acceptance of the mappings by Zhuang et al (2017). Based on the evidence reviewed above, the reclassification of visual areas proposed in Figure 1C seems premature.

      The reviewer is correct. In the revised manuscript, we will be careful to distinguish bias in visual field coverage across areas from presence or lack of visual field overlap.  

      (6) The existence of lateral connections is not unique to rodent cortex and has been described in primates (Felleman and Van Essen, 1991).

      (7) Why the mouse and rat extrastriate visual cortex differ from those of many other mammals is unclear. One reason may be that mammals with V2 subregions are strongly binocular.

      This is an interesting suggestion, and careful visual topography data from rabbits and other lateral eyed animals would help to evaluate it. For what it’s worth, tree shrews are lateral eyed animals with only 50 degrees of binocular visual field and also show V2 subregions.

      Reviewer #3 (Public review):

      Summary:

      The authors review published literature and propose that a visual cortical region in the mouse that is widely considered to contain multiple visual areas should be considered a single visual area.

      Strengths:

      The authors point out that relatively new data showing reversals of visual-field sign within known, single visual areas of some species require that a visual field sign change by itself should not be considered evidence for a border between visual areas.

      Weaknesses:

      The existing data are not consistent with the authors' proposal to consolidate multiple mouse areas into a single "V2". This is because the existing definition of a single area is that it cannot have redundant representations of the visual field. The authors ignore this requirement, as well as the data and definitions found in published manuscripts, and make an inaccurate claim that "higher order visual areas in the mouse do not have overlapping representations of the visual field". For quantification of the extent of overlap of representations between 11 mouse visual areas, see Figure 6G of Garrett et al. 2014. [Garrett, M.E., Nauhaus, I., Marshel, J.H., and Callaway, E.M. (2014). Topography and areal organization of mouse visual cortex. The Journal of neuroscience 34, 12587-12600. 10.1523/JNEUROSCI.1124-14.2014.

      Thank you for this correction, we admit we should have chosen our words more carefully. In the revised manuscript, we will emphasize that higher order visual areas in the mouse do have some overlap in their representations but also exhibit bias in their coverage. This is consistent with our proposal and in fact our model simulations in Figure 2E also show overlapping representations along with differential bias in coverage. However, we also note Figure 6 of Garret et al. 2014 provides several pieces of evidence in support of our proposal that higher order areas are sub-regions of a single area V2. Specifically, the visual field coverage of each area is significantly less than that in V1 (Garret et al. 2014, Figure 6D). While the imaging methods used in Garret et al. likely under-estimate receptive fields, one would assume they would similarly impact measurements of coverage in V1 and HVAs. Secondly, each area exhibits a bias towards a different part of the visual field (Figure 6C and E), that this bias is distinct for different areas but proceeds in a retinotopic manner around V1 - with adjacent areas exhibiting biases for nearby regions of the visual field (Figure 6E). Thus, the biases in the visual field coverage across HVAs appear to be related and not independent of each other. As we show in our modeling and in Figure 2, such orderly and inter-related biases can be created from a single visual field constrained to share a border with mouse V1.   

      With regards to the existing definition of a single area: we did not ignore the requirement that single areas cannot have redundant representations of the visual field. Rather, we believe that this requirement should be relaxed considering new evidence collected from other species, where multiple visual field reversals exist within the same visual area. We understand this issue is nuanced and was not made clear in the original submission.  

      In the revised manuscript, we will clarify that visual field reversals often exhibit redundant retinotopic representation on either side of the reversal. In the revised manuscript we will clarify that our argument that multiple reversals can exist within a single visual area in the mouse, is an argument that some retinotopic redundancy can exist with single visual areas. Such a re-classification would align how we define visual areas in mice with existing classification in tree shrews, ferrets, cats, and primates – all of whom have secondary visual areas with complex retinotopic maps exhibiting multiple reversals and redundant retinotopic coverage.

    1. Author response:

      We sincerely thank the reviewers for the time and care they have invested in evaluating our manuscript. We greatly appreciate their thoughtful feedback, which highlights both the strengths and the areas where the work can be improved. We recognize the importance of the concerns raised, particularly regarding the TMS analyses and interpretation, as well as aspects of the manuscript structure and clarity. The authors are committed to transparency and a rigorous scientific process, and we will therefore carefully consider all reviewer comments. In the coming months, we will revise the manuscript to incorporate additional analyses, provide clearer methodological detail, and refine the interpretation of the stimulation results.

    2. Reviewer #4 (Public review):

      Summary:

      Several behavioral experiments and one TMS experiment were performed to examine adaptation to room reverberation for speech intelligibility in noise. This is an important topic that has been extensively studied by several groups over the years. And the study is unique in that it examines one candidate brain area, dlPFC, potentially involved in this learning, and finds that disrupting this area by TMS results in a reduction in the learning. The behavioral conditions are in many ways similar to previous studies. However, they find results that do not match previous results (e.g., performance in anechoic condition is worse than in reverberation), making it difficult to assess the validity of the methods used. One unique aspect of the behavioral experiments is that Ambisonics was used to simulate the spaces, while headphone simulation was mostly used previously. The main behavioral experiment was performed by interleaving 3 different rooms and measuring speech intelligibility as a function of the number of words preceding the target in a given room on a given trial. The findings are that performance improves on the time scale of seconds (as the number of words preceding the target increases), but also on a much larger time scale of tens to hundreds of seconds (corresponding to multiple trials), while for some listeners it is degraded for the first couple of trials. The study also finds that the performance is best in the room that matches the T60 most commonly observed in everyday environments. These are potentially interesting results. However, there are issues with the design of the study and analysis methods that make it difficult to verify the conclusions based on the data.

      Strengths:

      (1) Analysis of the adaptation to reverberation on multiple time scales, for multiple reverberant and anechoic environments, and also considering contextual effects of one environment interleaved with the other two environments.

      (2) TMS experiment showing reduction of some of the learning effects by temporarily disabling the dlPFC.

      Weaknesses:

      While the study examines the adaptation for different carrier lengths, it keeps multiple characteristics (mainly talker voice and location) fixed in addition to reverberation. Therefore, it is possible that the subjects adapt to other aspects of the stimuli, not just to reverberation. A condition in which only reverberation would switch for the target would allow the authors to separate these confounding alternatives. Now, the authors try to address the concerns by indirect evidence/analyses. However, the evidence provided does not appear sufficient.

      The authors use terms that are either not defined or that seem to be defined incorrectly. The main issue then is the results, which are based on analysis of what the authors call d', Hit Rate, and Final Hit rate. First of all, they randomly switch between these measures. Second, it's not clear how they define them, given that their responses are either 4-alternative or 8-alternative forced choice. d', Hit Rate, and False Alarm Rate are defined in Signal detection theory for the detection of the presence of a target. It can be easily extended to a 2-alternative forced choice. But how does one define a Hit, and, in particular, a False Alarm, in a 4/8-alternative? The authors do not state how they did it, and without that, the computation of d' based on HR and FAR is dubious. Also, what the authors call Hit Rate, is presumably the percent correct performance (PCC), but even that is not clear. Then they use FHR and act as if this was the asymptotic value of their HR, even though in many conditions their learning has not ended, and randomly define a variable of +-10 from FHR, which must produce different results depending on whether the asymptote was reached or not. Other examples of usage of strange usage of terms: they talk about "global likelihood learning" (L426) without a definition or a reference, or about "cumulative hit rate" (L1738), where it is not clear to me what "cumulative" means there.

      There are not enough acoustic details about the stimuli. The authors find that reverberant performance is overall better than anechoic in 2 rooms. This goes contrary to previous results. And the authors do not provide enough acoustic details to establish that this is not an artefact of how the stimuli were normalized (e.g., what were the total signal and noise levels at the two ears in the anechoic and reverberant conditions?).

      There are some concerns about the use of statistics. For example, the authors perform two-way ANOVA (L724-728) in which one factor is room, but that factor does not have the same 3 levels across the two levels of the other factor. Also, in some comparisons, they randomly select 11 out of 22 subjects even though appropriate test correct for such imbalances without adding additional randomness of whether the 11 selected subjects happened to be the good or the bad ones.

      Details of the experiments are not sufficiently described in the methods (L194-205) to be able to follow what was done. It should be stated that 1 main experiment was performed using 3 rooms, and that 3 follow-ups were done on a new set of subjects, each with the room swapped.

    3. Reviewer #3 (Public review):

      Summary:

      This manuscript presents a well-designed and insightful behavioural study examining human adaptation to room acoustics, building on prior work by Brandewie & Zahorik. The psychophysical results are convincing and add incremental but meaningful knowledge to our understanding of reverberation learning. However, I find the transcranial magnetic stimulation (TMS) component to be over-interpreted. The TMS protocol, while interesting, lacks sufficient anatomical specificity and mechanistic explanation to support the strong claims made regarding a unique role of the dorsolateral prefrontal cortex (dlPFC) in this learning process. More cautious interpretation is warranted, especially given the modest statistical effects, the fact that the main TMS result of interest is a null result, the imprecise targeting of dlPFC (which is not validated), and the lack of knowledge about the timescale of TMS effects in relation to the behavioural task. I recommend revising the manuscript to shift emphasis toward the stronger behavioural findings and to present a more measured and transparent discussion of the TMS results and their limitations.

      Strengths:

      (1) Well-designed acoustical stimuli and psychophysical task.

      (2) Comparisons across room combinations are well conducted.

      (3) The virtual acoustic environment is impressive and applied well here.

      (4) A timely study with interesting behavioural results.

      Weaknesses:

      (1) Lack of hypotheses, particularly for TMS.

      (2) Lack of evidence for targeting TMS in [brain] space and time.

      (3) The most interesting effect of TMS is a null result compared to a weak statistical effect for "meta adaptation"

    4. Reviewer #2 (Public review):

      Summary:

      This study investigated how listeners adapt to and utilize statistical properties of different acoustic spaces to improve speech perception. The researchers used repetitive TMS to perturb neural activity in DLPFC, inhibiting statistical learning compared to sham conditions. The authors also identified the most effective room types for the effective use of reverberations in speech in noise perception, with regular human-built environments bringing greater benefits than modified rooms with lower or higher reverberation times.

      Strengths:

      The introduction and discussion sections of the paper are very interesting and highlight the importance of the current study, particularly with regard to the use of ecologically valid stimuli in investigating statistical learning. However, they could be condensed into parts. TMS parameters and task conditions were well-considered and clearly explained.

      Weaknesses

      (1) The Results section is difficult to follow and includes a lot of detail, which could be removed. As such, it presents as confusing and speculative at times.

      (2) The hypotheses for the study are not clearly stated.

      (3) Multiple statistical models are implemented without correcting the alpha value. This leaves the analyses vulnerable to Type I errors.

      (4) It is confusing to understand how many discrete experiments are included in the study as a whole, and how many participants are involved in each experiment.

      (5) The TMS study is significantly underpowered and not robust. Sample size calculations need further explanation (effect sizes appear to be based on behavioural studies?). I would caution an exploratory presentation of these data, and calculate a posteriori the full sample size based on effect sizes observed in the TMS data.

    5. Reviewer #1 (Public review):

      Summary:

      This manuscript describes the results of an experiment that demonstrates a disruption in statistical learning of room acoustics when transcranial magnetic stimulation (TMS) is applied to the dorsolateral prefrontal cortex in human listeners. The work uses a testing paradigm designed by the Zahorik group that has shown improvement in speech understanding as a function of listening exposure time in a room, presumably through a mechanism of statistical learning. The manuscript is comprehensive and clear, with detailed figures that show key results. Overall, this work provides an explanation for the mechanisms that support such statistical learning of room acoustics and, therefore, represents a major advancement for the field.

      Strengths:

      The primary strength of the work is its simple and clear result, that the dorsolateral prefrontal cortex is involved in human room acoustic learning.

      Weaknesses:

      A potential weakness of this work is that the manuscript is quite lengthy and complex.

    6. eLife Assessment:

      This study addresses valuable questions about the neural mechanisms underlying statistical learning of room acoustics, combining robust behavioral measures with non-invasive brain stimulation. The behavioral findings are strong and extend previous work in psychoacoustics, but the TMS results are modest, with methodological limitations and over-interpretation that weaken the mechanistic conclusions. The strength of evidence is therefore incomplete, and a more cautious interpretation of the stimulation findings, alongside strengthened analyses, would improve the manuscript.

    1. eLife Assessment

      This important study evaluates a model for multisensory correlation detection, focusing on the detection of correlated transients in visual and auditory stimuli. Overall, the experimental design is sound and the evidence is compelling. The synergy between the experimental and theoretical aspects of the article is strong, and the work will be of interest to both neuroscientists and psychologists working in the domain of sensory processing and perception

    2. Reviewer #1 (Public review):

      Summary:

      Parise presents another instantiation of the Multisensory Correlation Detector model that can now accept stimulus-level inputs. This is a valuable development as it removes researcher involvement in the characterization/labeling of features and allows analysis of complex stimuli with a high degree of nuance that was previously unconsidered (i.e. spatial/spectral distributions across time). The author demonstrates the power of the model by fitting data from dozens of previous experiments including multiple species, tasks, behavioral modality, and pharmacological interventions.

      Strengths:

      One of the model's biggest strengths, in my opinion, is its ability to extract complex spatiotemporal co-relationships from multisensory stimuli. These relationships have typically been manually computed or assigned based on stimulus condition and often distilled to a single dimension or even single number (e.g., "-50 ms asynchrony"). Thus, many models of multisensory integration depend heavily on human preprocessing of stimuli and these models miss out on complex dynamics of stimuli; the lead modality distribution apparent in figure 3b and c are provocative. I can imagine the model revealing interesting characteristics of the facial distribution of correlation during continuous audiovisual speech that have up to this point been largely described as "present" and almost solely focused on the lip area.

      Another aspect that makes the MCD stand out among other models is the biological inspiration and generalizability across domains. The model was developed to describe a separate process - motion perception - and in a much simpler organism - drosophila. It could then describe a very basic neural computation that has been conserved across phylogeny (which is further demonstrated in the ability to predict rat, primate, and human data) and brain area. This aspect makes the model likely able to account for much more than what has already been demonstrated with only a few tweaks akin to the modifications described in this and previous articles from Parise.

      What allows this potential is that, as Parise and colleagues have demonstrated in those papers since our (re)introduction of the model in 2016, the MCD model is modular - both in its ability to interface with different inputs/outputs and its ability to chain MCD units in a way that can analyze spatial, spectral, or any other arbitrary dimension of a stimulus. This fact leaves wide-open the possibilities for types of data, stimuli, and tasks a simplistic neutrally inspired model can account for.

      And so it's unsurprising (but impressive!) that Parise has demonstrated the model's ability here to account for such a wide range of empirical data from numerous tasks (synchrony/temporal order judgement, localization, detection, etc.) and behavior types (manual/saccade responses, gaze, etc.) using only the stimulus and a few free parameters. This ability is another of the model's main strengths that I think deserves some emphasis: it represents a kind of validation of those experiments - especially in the context of cross-experiment predictions.

      Finally, what is perhaps most impressive to me is that the MCD (and the accompanying decision model) does all this with very few (sometimes zero) free parameters. This highlights the utility of the model and the plausibility of its underlying architecture, but also helps to prevent extreme overfitting if fit correctly.

      Weaknesses:

      The model boasts an incredible versatility across tasks and stimulus configurations and its overall scope of the model is to understand how and what relevant sensory information is extracted from a stimulus. We still need to exercise care when interpreting its parameters, especially considering the broader context of top-down control of perception and that some multisensory mappings may not be derivable purely from stimulus statistics (e.g., the complementary nature of some phonemes/visemes).

    3. Reviewer #2 (Public review):

      Summary:

      Building on previous models of multisensory integration (including their earlier correlation-detection framework used for non-spatial signals), the author introduces a population-level Multisensory Correlation Detector (MCD) that processes raw auditory and visual data. Crucially, it does not rely on abstracted parameters, as is common in normative Bayesian models," but rather works directly on the stimulus itself (i.e., individual pixels and audio samples). By systematically testing the model against a range of experiments spanning human, monkey, and rat data - the authors show that their MCD population approach robustly predicts perception and behavior across species with a relatively small (0-4) number of free parameters.

      Strengths:

      (1) Unlike prior Bayesian models that used simplified or parameterized inputs, the model here is explicitly computable from full natural stimuli. This resolves a key gap in understanding how the brain might extract "time offsets" or "disparities" from continuously changing audio-visual streams.

      (2) The same population MCD architecture captures a remarkable range of multisensory phenomena, from classical illusions (McGurk, ventriloquism) and synchrony judgments, to attentional/gaze behavior driven by audio-visual salience. This generality strongly supports the idea that a single low-level computation (correlation detection) can underlie many distinct multisensory effects.

      (3) By tuning model parameters to different temporal rhythms (e.g., faster in rodents, slower in humans), the MCD explains cross-species perceptual data without reconfiguring the underlying architecture.

      (4) The authors frame their model as a plausible algorithmic account of the Bayesian multisensory-integration models in Marr's levels of hierarchy.

      Weaknesses:

      What remains unclear is how the parameters themselves relate to stimulus quantities (like stimulus uncertainty), as is often straightforward in Bayesian models. A theoretical missing link is the explicit relationship between the parameters of the MCD models and those of a cue combination model, thereby bridging Marr's levels of hierarchy.

      Likely Impact and Usefulness

      The work offers a compelling unification of multiple multisensory tasks-temporal order judgments, illusions, Bayesian causal inference, and overt visual attention-under a single, fully stimulus-driven framework. Its success with natural stimuli should interest computational neuroscientists, systems neuroscientists, and machine learning scientists. This paper thus makes an important contribution to the field by moving beyond minimalistic lab stimuli, illustrating how raw audio and video can be integrated using elementary correlation analyses.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      Parise presents another instantiation of the Multisensory Correlation Detector model that can now accept stimulus-level inputs. This is a valuable development as it removes researcher involvement in the characterization/labeling of features and allows analysis of complex stimuli with a high degree of nuance that was previously unconsidered (i.e., spatial/spectral distributions across time). The author demonstrates the power of the model by fitting data from dozens of previous experiments, including multiple species, tasks, behavioral modalities, and pharmacological interventions.

      Thanks for the kind words!

      Strengths:

      One of the model's biggest strengths, in my opinion, is its ability to extract complex spatiotemporal co-relationships from multisensory stimuli. These relationships have typically been manually computed or assigned based on stimulus condition and often distilled to a single dimension or even a single number (e.g., "-50 ms asynchrony"). Thus, many models of multisensory integration depend heavily on human preprocessing of stimuli, and these models miss out on complex dynamics of stimuli; the lead modality distribution apparent in Figures 3b and c is provocative. I can imagine the model revealing interesting characteristics of the facial distribution of correlation during continuous audiovisual speech that have up to this point been largely described as "present" and almost solely focused on the lip area.

      Another aspect that makes the MCD stand out among other models is the biological inspiration and generalizability across domains. The model was developed to describe a separate process - motion perception - and in a much simpler organism - Drosophila. It could then describe a very basic neural computation that has been conserved across phylogeny (which is further demonstrated in the ability to predict rat, primate, and human data) and brain area. This aspect makes the model likely able to account for much more than what has already been demonstrated with only a few tweaks akin to the modifications described in this and previous articles from Parise.

      What allows this potential is that, as Parise and colleagues have demonstrated in those papers since our (re)introduction of the model in 2016, the MCD model is modular - both in its ability to interface with different inputs/outputs and its ability to chain MCD units in a way that can analyze spatial, spectral, or any other arbitrary dimension of a stimulus. This fact leaves wide open the possibilities for types of data, stimuli, and tasks a simplistic, neutrally inspired model can account for.

      And so it's unsurprising (but impressive!) that Parise has demonstrated the model's ability here to account for such a wide range of empirical data from numerous tasks (synchrony/temporal order judgement, localization, detection, etc.) and behavior types (manual/saccade responses, gaze, etc.) using only the stimulus and a few free parameters. This ability is another of the model's main strengths that I think deserves some emphasis: it represents a kind of validation of those experiments, especially in the context of cross-experiment predictions (but see some criticism of that below).

      Finally, what is perhaps most impressive to me is that the MCD (and the accompanying decision model) does all this with very few (sometimes zero) free parameters. This highlights the utility of the model and the plausibility of its underlying architecture, but also helps to prevent extreme overfitting if fit correctly (but see a related concern below).

      We sincerely thank the reviewer for their thoughtful and generous comments. We are especially pleased that the core strengths of the model—its stimulus-computable architecture, biological grounding, modularity, and cross-domain applicability—were clearly recognized. As the reviewer rightly notes, removing researcher-defined abstractions and working directly from naturalistic stimuli opens the door to uncovering previously overlooked dynamics in complex multisensory signals, such as the spatial and temporal richness of audiovisual speech.

      We also appreciate the recognition of the model’s origins in a simple organism and its generalization across species and behaviors. This phylogenetic continuity reinforces our view that the MCD captures a fundamental computation with wide-ranging implications. Finally, we are grateful for the reviewer’s emphasis on the model’s predictive power across tasks and datasets with few or no free parameters—a property we see as key to both its parsimony and explanatory utility.

      We have highlighted these points more explicitly in the revised manuscript, and we thank the reviewer for their generous and insightful endorsement of the work.

      Weaknesses:

      There is an insufficient level of detail in the methods about model fitting. As a result, it's unclear what data the models were fitted and validated on. Were models fit individually or on average group data? Each condition separately? Is the model predictive of unseen data? Was the model cross-validated? Relatedly, the manuscript mentions a randomization test, but the shuffled data produces model responses that are still highly correlated to behavior despite shuffling. Could it be that any stimulus that varies in AV onset asynchrony can produce a psychometric curve that matches any other task with asynchrony judgements baked into the task? Does this mean all SJ or TOJ tasks produce correlated psychometric curves? Or more generally, is Pearson's correlation insensitive to subtle changes here, considering psychometric curves are typically sigmoidal? Curves can be non-overlapping and still highly correlated if one is, for example, scaled differently. Would an error term such as mean-squared or root mean-squared error be more sensitive to subtle changes in psychometric curves? Alternatively, perhaps if the models aren't cross-validated, the high correlation values are due to overfitting?

      The reviewer is right: the current version of the manuscript only provides limited information about parameter fitting. In the revised version of the manuscript, we included a parameter estimation and generalizability section that includes all information requested by the reviewer.

      To test whether using the MSE instead of Pearson correlation led to a similar estimated set of parameter values, we repeated the fitting using the MSE. The parameter estimated with this method (TauV, TauA, TauBim) closely followed those estimated using Pearson correlation (TauV, TauA, TauBim). Given the similarity of these results, we have chosen not to include further figures, however this analysis is now included in the new section (pages 23-24).

      Regarding the permutation test, it is expected that different stimuli produce analogous psychometric functions: after all, all studies relied on stimuli containing identical manipulation of lags. As a result, MCD population responses tend to be similar across experiments. Therefore, it is not a surprise that the permuted distribution of MCD-data correlation in Supplementary Figure 1K has a mean as high as 0.97. However, what is important is to demonstrate that the non-permuted dataset has an even higher goodness of fit. Supplementary Figure 1K demonstrates that none of the permuted stimuli could outperform the non-permuted dataset; the mean of the non-permuted distribution is 4.7 (standard deviations) above the mean of the already high  permuted distribution.

      We believe the new section, along with the present response, fully addresses the legitimate concerns of the reviewer.

      While the model boasts incredible versatility across tasks and stimulus configurations, fitting behavioral data well doesn't mean we've captured the underlying neural processes, and thus, we need to be careful when interpreting results. For example, the model produces temporal parameters fitting rat behavior that are 4x faster than when fitting human data. This difference in slope and a difference at the tails were interpreted as differences in perceptual sensitivity related to general processing speeds of the rat, presumably related to brain/body size differences. While rats no doubt have these differences in neural processing speed/integration windows, it seems reasonable that a lot of the differences in human and rat psychometric functions could be explained by the (over)training and motivation of rats to perform on every trial for a reward - increasing attention/sensitivity (slope) - and a tendency to make mistakes (compression evident at the tails). Was there an attempt to fit these data with a lapse parameter built into the decisional model as was done in Equation 21? Likewise, the fitted parameters for the pharmacological manipulations during the SJ task indicated differences in the decisional (but not the perceptual) process and the article makes the claim that "all pharmacologically-induced changes in audiovisual time perception" can be attributed to decisional processes "with no need to postulate changes in low-level temporal processing." However, those papers discuss actual sensory effects of pharmacological manipulation, with one specifically reporting changes to response timing. Moreover, and again contrary to the conclusions drawn from model fits to those data, both papers also report a change in psychometric slope/JND in the TOJ task after pharmacological manipulation, which would presumably be reflected in changes to the perceptual (but not the decisional) parameters.

      Fitting or predicting behaviour does not in itself demonstrate that a model captures the underlying neural computations—though it may offer valuable constraints and insights. In line with this, we were careful not to extrapolate the implications of our simulations to specific neural mechanisms.

      Temporal sensitivity is, by definition, a behavioural metric, and—as the reviewer correctly notes—its estimation may reflect a range of contributing factors beyond low-level sensory processing, including attention, motivation, and lapse rates (i.e., stimulus-independent errors). In Equation 21, we introduced a lapse parameter specifically to account for such effects in the context of monkey eye-tracking data. For the rat datasets, however, the inclusion of a lapse term was not required to achieve a close fit to the psychometric data (ρ = 0.981). While it is likely that adding a lapse component would yield a marginally better fit, the absence of single-trial data prevents us from applying model comparison criteria such as AIC or BIC to justify the additional parameter. In light of this, and to avoid unnecessary model complexity, we opted not to include a lapse term in the rat simulations.

      With respect to the pharmacological manipulation data, we acknowledge the reviewer’s point that observed changes in slope and bias could plausibly arise from alterations at either the sensory or decisional level—or both. In our model, low-level sensory processing is instantiated by the MCD architecture, which outputs the MCDcorr and MCDlag signals that are then scaled and integrated during decision-making. Importantly, this scaling operation influences the slope of the resulting psychometric functions, such that changes in slope can arise even in the absence of any change to the MCD’s temporal filters. In our simulations, the temporal constants of the MCD units were fixed to the values estimated from the non-pharmacological condition (see parameter estimation section above), and only the decision-related parameters were allowed to vary. From this modelling perspective, the behavioural effects observed in the pharmacological datasets can be explained entirely by changes at the decisional level. However, we do not claim that such an explanation excludes the possibility of genuine sensory-level changes. Rather, we assert that our model can account for the observed data without requiring modifications to early temporal tuning.

      To rigorously distinguish sensory from decisional effects, future experiments will need to employ stimuli with richer temporal structure—e.g., temporally modulated sequences of clicks and flashes that vary in frequency, phase, rhythm, or regularity (see Fujisaki & Nishida, 2007; Denison et al., 2012; Parise & Ernst, 2016, 2025; Locke & Landy, 2017; Nidiffer et al., 2018). Such stimuli engage the MCD in a more stimulus-dependent manner, enabling a clearer separation between early sensory encoding and later decision-making processes. Unfortunately, the current rat datasets—based exclusively on single click-flash pairings—lack the complexity needed for such disambiguation. As a result, while our simulations suggest that the observed pharmacologically induced effects can be attributed to changes in decision-level parameters, they do not rule out concurrent sensory-level changes.

      In summary, our results indicate that changes in the temporal tuning of MCD units are not necessary to reproduce the observed pharmacological effects on audiovisual timing behaviour. However, we do not assert that such changes are absent or unnecessary in principle. Disentangling sensory and decisional contributions will ultimately require richer datasets and experimental paradigms designed specifically for this purpose. We have now modified the results section (page 6) and the discussion (page 11) to clarify these points.

      The case for the utility of a stimulus-computable model is convincing (as I mentioned above), but its framing as mission-critical for understanding multisensory perception is overstated, I think. The line for what is "stimulus computable" is arbitrary and doesn't seem to be followed in the paper. A strict definition might realistically require inputs to be, e.g., the patterns of light and sound waves available to our eyes and ears, while an even more strict definition might (unrealistically) require those stimuli to be physically present and transduced by the model. A reasonable looser definition might allow an "abstract and low-dimensional representation of the stimulus, such as the stimulus envelope (which was used in the paper), to be an input. Ultimately, some preprocessing of a stimulus does not necessarily confound interpretations about (multi)sensory perception. And on the flip side, the stimulus-computable aspect doesn't necessarily give the model supreme insight into perception. For example, the MCD model was "confused" by the stimuli used in our 2018 paper (Nidiffer et al., 2018; Parise & Ernst, 2025). In each of our stimuli (including catch trials), the onset and offset drove strong AV temporal correlations across all stimulus conditions (including catch trials), but were irrelevant to participants performing an amplitude modulation detection task. The to-be-detected amplitude modulations, set at individual thresholds, were not a salient aspect of the physical stimulus, and thus only marginally affected stimulus correlations. The model was of course, able to fit our data by "ignoring" the on/offsets (i.e., requiring human intervention), again highlighting that the model is tapping into a very basic and ubiquitous computational principle of (multi)sensory perception. But it does reveal a limitation of such a stimulus-computable model: that it is (so far) strictly bottom-up.

      We appreciate the reviewer’s thoughtful engagement with the concept of stimulus computability. We agree that the term requires careful definition and should not be taken as a guarantee of perceptual insight or neural plausibility. In our work, we define a model as “stimulus-computable” if all its inputs are derived directly from the stimulus, rather than from experimenter-defined summary descriptors such as temporal lag, spatial disparity, or cue reliability. In the context of multisensory integration, this implies that a model must account not only for how cues are combined, but also for how those cues are extracted from raw inputs—such as audio waveforms and visual contrast sequences.

      This distinction is central to our modelling philosophy. While ideal observer models often specify how information should be combined once identified, they typically do not address the upstream question of how this information is extracted from sensory input. In that sense, models that are not stimulus-computable leave out a key part of the perceptual pipeline. We do not present stimulus computability as a marker of theoretical superiority, but rather as a modelling constraint that is necessary if one’s aim is to explain how structured sensory input gives rise to perception. This is a view that is also explicitly acknowledged and supported by Reviewer 2.

      Framed in Marr’s (1982) terms, non–stimulus-computable models tend to operate at the computational level, defining what the system is doing (e.g., computing a maximum likelihood estimate), whereas stimulus-computable models aim to function at the algorithmic level, specifying how the relevant representations and operations might be implemented. When appropriately constrained by biological plausibility, such models may also inform hypotheses at the implementational level, pointing to potential neural substrates that could instantiate the computation.

      Regarding the reviewer’s example illustrating a limitation of the MCD model, we respectfully note that the account appears to be based on a misreading of our prior work. In Parise & Ernst (2025), where we simulated the stimuli from Nidiffer et al. (2018), the MCD model reproduced participants’ behavioural data without any human intervention or adjustment. The model was applied in a fully bottom-up, stimulus-driven manner, and its output aligned with observer responses as-is. We suspect the confusion may stem from analyses shown in Figure 6 - Supplement Figure 5 of Parise & Ernst (2025), where we investigated the lack of a frequency-doubling effect in the Nidiffer et al. data. However, those analyses were based solely on the Pearson correlation between auditory and visual stimulus envelopes and did not involve the MCD model. No manual exclusion of onset/offset events was applied, nor was the MCD used in those particular figures. We also note that Parise & Ernst (2025) is a separate, already published study and is not the manuscript currently under review. 

      In summary, while we fully agree that stimulus computability does not resolve all the complexities of multisensory perception (see comments below about speech), we maintain that it provides a valuable modelling constraint—one that enables robust, generalisable predictions when appropriately scoped. 

      The manuscript rightly chooses to focus a lot of the work on speech, fitting the MCD model to predict behavioral responses to speech. The range of findings from AV speech experiments that the MCD can account for is very convincing. Given the provided context that speech is "often claimed to be processed via dedicated mechanisms in the brain," a statement claiming a "first end-to-end account of multisensory perception," and findings that the MCD model can account for speech behaviors, it seems the reader is meant to infer that energetic correlation detection is a complete account of speech perception. I think this conclusion misses some facets of AV speech perception, such as integration of higher-order, non-redundant/correlated speech features (Campbell, 2008) and also the existence of top-down and predictive processing that aren't (yet!) explained by MCD. For example, one important benefit of AV speech is interactions on linguistic processes - how complementary sensitivity to articulatory features in the auditory and visual systems (Summerfield, 1987) allow constraint of linguistic processes (Peelle & Sommers, 2015; Tye-Murray et al., 2007).

      We thank the reviewer for their thoughtful comments, and especially for the kind words describing the range of findings from our AV speech simulations as “very convincing.”

      We would like to clarify that it is not our view that speech perception can be reduced to energetic correlation detection. While the MCD model captures low- to mid-level temporal dependencies between auditory and visual signals, we fully agree that a complete account of audiovisual speech perception must also include higher-order processes—including linguistic mechanisms and top-down predictions. These are critical components of AV speech comprehension, and lie beyond the scope of the current model.

      Our use of the term “end-to-end” is intended in a narrow operational sense: the model transforms raw audiovisual input (i.e., audio waveforms and video frames) directly into behavioural output (i.e., button press responses), without reliance on abstracted stimulus parameters such as lag, disparity or reliability. It is in this specific technical sense that the MCD offers an end-to-end model. We have revised the manuscript to clarify this usage to avoid any misunderstanding.

      In light of the reviewer’s valuable point, we have now edited the Discussion to acknowledge the importance of linguistic processes (page 13) and to clarify what we mean by end-to-end account (page 11). We agree that future work will need to explore how stimulus-computable models such as the MCD can be integrated with broader frameworks of linguistic and predictive processing (e.g., Summerfield, 1987; Campbell, 2008; Peelle & Sommers, 2015; Tye-Murray et al., 2007).

      References

      Campbell, R. (2008). The processing of audio-visual speech: empirical and neural bases. Philosophical Transactions of the Royal Society B: Biological Sciences, 363(1493), 1001-1010. https://doi.org/10.1098/rstb.2007.2155

      Nidiffer, A. R., Diederich, A., Ramachandran, R., & Wallace, M. T. (2018). Multisensory perception reflects individual differences in processing temporal correlations. Scientific Reports 2018 8:1, 8(1), 1-15. https://doi.org/10.1038/s41598-018-32673-y

      Parise, C. V, & Ernst, M. O. (2025). Multisensory integration operates on correlated input from unimodal transient channels. ELife, 12. https://doi.org/10.7554/ELIFE.90841

      Peelle, J. E., & Sommers, M. S. (2015). Prediction and constraint in audiovisual speech perception. Cortex, 68, 169-181. https://doi.org/10.1016/j.cortex.2015.03.006

      Summerfield, Q. (1987). Some preliminaries to a comprehensive account of audio-visual speech perception. In B. Dodd & R. Campbell (Eds.), Hearing by Eye: The Psychology of Lip-Reading (pp. 3-51). Lawrence Erlbaum Associates.

      Tye-Murray, N., Sommers, M., & Spehar, B. (2007). Auditory and Visual Lexical Neighborhoods in Audiovisual Speech Perception: Trends in Amplification, 11(4), 233-241. https://doi.org/10.1177/1084713807307409

      Reviewer #2 (Public review):

      Summary:

      Building on previous models of multisensory integration (including their earlier correlation-detection framework used for non-spatial signals), the author introduces a population-level Multisensory Correlation Detector (MCD) that processes raw auditory and visual data. Crucially, it does not rely on abstracted parameters, as is common in normative Bayesian models," but rather works directly on the stimulus itself (i.e., individual pixels and audio samples). By systematically testing the model against a range of experiments spanning human, monkey, and rat data, the authors show that their MCD population approach robustly predicts perception and behavior across species with a relatively small (0-4) number of free parameters.

      Strengths:

      (1) Unlike prior Bayesian models that used simplified or parameterized inputs, the model here is explicitly computable from full natural stimuli. This resolves a key gap in understanding how the brain might extract "time offsets" or "disparities" from continuously changing audio-visual streams.

      (2) The same population MCD architecture captures a remarkable range of multisensory phenomena, from classical illusions (McGurk, ventriloquism) and synchrony judgments, to attentional/gaze behavior driven by audio-visual salience. This generality strongly supports the idea that a single low-level computation (correlation detection) can underlie many distinct multisensory effects.

      (3) By tuning model parameters to different temporal rhythms (e.g., faster in rodents, slower in humans), the MCD explains cross-species perceptual data without reconfiguring the underlying architecture.

      We thank the reviewer for their positive evaluation of the manuscript, and particularly for highlighting the significance of the model's stimulus-computable architecture and its broad applicability across species and paradigms. Please find our responses to the individual points below.

      Weaknesses:

      (1) The authors show how a correlation-based model can account for the various multisensory integration effects observed in previous studies. However, a comparison of how the two accounts differ would shed light on the correlation model being an implementation of the Bayesian computations (different levels in Marr's hierarchy) or making testable predictions that can distinguish between the two frameworks. For example, how uncertainty in the cue combined estimate is also the harmonic mean of the unimodal uncertainties is a prediction from the Bayesian model. So, how the MCD framework predicts this reduced uncertainty could be one potential difference (or similarity) to the Bayesian model.

      We fully agree with the reviewer that a comparison between the correlation-based MCD model and Bayesian accounts is valuable—particularly for clarifying how the two frameworks differ conceptually and where they may converge.

      As noted in the revised manuscript, the key distinction lies in the level of analysis described by Marr (1982). Bayesian models operate at the computational level, describing what the system is aiming to compute (e.g., optimal cue integration). In contrast, the MCD functions at the algorithmic level, offering a biologically plausible mechanism for how such integration might emerge from stimulus-driven representations.

      In this context, the MCD provides a concrete, stimulus-grounded account of how perceptual estimates might be constructed—potentially implementing computations with Bayesian-like characteristics (e.g., reduced uncertainty, cue weighting). Thus, the two models are not mutually exclusive but can be seen as complementary: the MCD may offer an algorithmic instantiation of computations that, at the abstract level, resemble Bayesian inference.

      We have now updated the manuscript to explicitly highlight this relationship (pages 2 and 11). In the revised manuscript, we also included a new figure (Figure 5) and movie (Supplementary Movie 3), to show how the present approach extends previous Bayesian models for the case of cue integration (i.e., the ventriloquist effect).

      (2) The authors show a good match for cue combination involving 2 cues. While Bayesian accounts provide a direction for extension to more cues (also seen empirically, for eg, in Hecht et al. 2008), discussion on how the MCD model extends to more cues would benefit the readers.

      We thank the reviewer for this insightful comment: extending the MCD model to include more than two sensory modalities is a natural and valuable next step. Indeed, one of the strengths of the MCD framework lies in its modularity. Let us consider the MCDcorr​ output (Equation 6), which is computed as the pointwise product of transient inputs across modalities. Extending this to include a third modality, such as touch, is straightforward: MCD units would simply multiply the transient channels from all three modalities, effectively acting as trimodal coincidence detectors that respond when all inputs are aligned in time and space.

      By contrast, extending MCDlag is less intuitive, due to its reliance on opponency between two subunits (via subtraction). A plausible solution is to compute MCDlag in a pairwise fashion (e.g., AV, VT, AT), capturing relative timing across modality pairs.

      Importantly, the bulk of the spatial integration in our framework is carried by MCDcorr, which generalises naturally to more than two modalities. We have now formalised this extension and included a graphical representation in a supplementary section of the revised manuscript.

      Likely Impact and Usefulness:

      The work offers a compelling unification of multiple multisensory tasks- temporal order judgments, illusions, Bayesian causal inference, and overt visual attention - under a single, fully stimulus-driven framework. Its success with natural stimuli should interest computational neuroscientists, systems neuroscientists, and machine learning scientists. This paper thus makes an important contribution to the field by moving beyond minimalistic lab stimuli, illustrating how raw audio and video can be integrated using elementary correlation analyses.

      Reviewer #1 (Recommendations for the authors):

      Recommendations:

      My biggest concern is a lack of specificity about model fitting, which is assuaged by the inclusion of sufficient detail to replicate the analysis completely or the inclusion of the analysis code. The code availability indicates a script for the population model will be included, but it is unclear if this code will provide the fitting details for the whole of the analysis.

      We thank the reviewer for raising this important point. A new methodological section has been added to the manuscript, detailing the model fitting procedures used throughout the study. In addition, the accompanying code repository now includes MATLAB scripts that allow full replication of the spatiotemporal MCD simulations.

      Perhaps it could be enlightening to re-evaluate the model with a measure of error rather than correlation? And I think many researchers would be interested in the model's performance on unseen data.

      The model has now been re-evaluated using mean squared error (MSE), and the results remain consistent with those obtained using Pearson correlation. Additionally, we have clarified which parts of the study involve testing the model on unseen data (i.e., data not used to fit the temporal constants of the units). These analyses are now included and discussed in the revised fitting section of the manuscript (pages 23-24).

      Otherwise, my concerns involve the interpretation of findings, and thus could be satisfied with minor rewording or tempering conclusions.

      The manuscript has been revised to address these interpretative concerns, with several conclusions reworded or tempered accordingly. All changes are marked in blue in the revised version.

      Miscellanea:

      Should b0 in equation 10 be bcrit to match the below text?

      Thank you for catching this inconsistency. We have corrected Equation 10 (and also Equation 21) to use the more transparent notation bcrit instead of b0, in line with the accompanying text.

      Equation 23, should time be averaged separately? For example, if multiple people are speaking, the average correlation for those frames will be higher than the average correlation across all times.

      We thank the reviewer for raising this thoughtful and important point. In response, we have clarified the notation of Equation 23 in the revised manuscript (page 20). Specifically, we now denote the averaging operations explicitly as spatial means and standard deviations across all pixel locations within each frame.

      This equation computes the z-score of the MCD correlation value at the current gaze location, normalized relative to the spatial distribution of correlation values in the same frame. That is, all operations are performed at the frame level, not across time. This ensures that temporally distinct events are treated independently and that the final measure reflects relative salience within each moment, not a global average over the stimulus. In other words, the spatial distribution of MCD activity is re-centered and rescaled at each frame, exactly to avoid the type of inflation or confounding the reviewer rightly cautioned against.

      Reviewer #2 (Recommendations for the authors):

      The authors have done a great job of providing a stimulus computable model of cue combination. I had just a few suggestions to strengthen the theoretical part of the paper:

      (1) While the authors have shown a good match between MCD and cue combination, some theoretical justification or equivalence analysis would benefit readers on how the two relate to each other. Something like Zhang et al. 2019 (which is for motion cue combination) would add to the paper.

      We agree that it is important to clarify the theoretical relationship between the Multisensory Correlation Detector (MCD) and normative models of cue integration, such as Bayesian combination. In the revised manuscript, we have now modified the introduction and added a paragraph in the Discussion addressing this link more explicitly. In brief, we see the MCD as an algorithmic-level implementation (in Marr’s terms) that may approximate or instantiate aspects of Bayesian inference.

      (2) Simulating cue combination for tasks that require integration of more than two cues (visual, auditory, haptic cues) would more strongly relate the correlation model to Bayesian cue combination. If that is a lot of work, at least discussing this would benefit the paper

      This point has now been addressed, and a new paragraph discussing the extension of the MCD model to tasks involving more than two sensory modalities has been added to the Discussion section.

    1. eLife Assessment

      This study is a fundamental advance in the field of developmental biology and transcriptional regulation that demonstrates the use of hPSC-derived organoids to generate reproducible organoids to study the mechanisms that drive neural tube closure. The work is exceptional in its development of tools to use CRISPR interference to screen for genes that regulate morphogenesis in human PSC organoids. The additional characterization of the role of specific transcription factors in neural tube formation is solid. The work provides both technical advances and new knowledge on human development through embryo models.

    2. Reviewer #1 (Public review):

      Summary:

      This is a wonderful and landmark study in the field of human embryo modeling. It uses patterned human gastruloids and conducts a functional screen on neural tube closure, and identifies positive and negative regulators, and defines the epistasis among them.

      Strengths:

      The above was achieved following optimization of the micro-pattern-based gastruloid protocol to achieve high efficiency, and then optimized to conduct and deliver CRISPRi without disrupting the protocol. This is a technical tour de force as well as one of the first studies to reveal new knowledge on human development through embryo models, which has not been done before.

      The manuscript is very solid and well-written. The figures are clear, elegant, and meaningful. The conclusions are fully supported by the data shown. The methods are well-detailed, which is very important for such a study.

      Weaknesses:

      This reviewer did not identify any meaningful, major, or minor caveats that need addressing or correcting.

      A minor weakness is that one can never find out if the findings in human embryo models can be in vitro revalidated in humans in vivo. This is for obvious and justified ethical reasons. However, the authors acknowledge this point in the section of the manuscript detailing the limitations of their study.

    3. Reviewer #2 (Public review):

      Summary:

      This manuscript is a technical report on a new model of early neurogenesis, coupled to a novel platform for genetic screens. The model is more faithful than others published to date, and the screening platform is an advance over existing ones in terms of speed and throughput.

      Strengths:

      It is novel and useful.

      Weaknesses:

      The novelty of the results is limited in terms of biology, mainly a proof of concept of the platform and a very good demonstration of the hierarchical interactions of the top regulators of GRNs.

      The value of the manuscript could be enhanced in two ways:

      (1) by showing its versatility and transforming the level of neural tube to midbrain and hindbrain, and looking at the transcriptional hierarchies there.

      (2) by relating the patterning of the organoids to the situation in vivo, in particular with the information in reference 49. The authors make a statement "To compare our findings with in vivo gene expression patterns, we applied the same approach to published scRNA-seq data from 4-week-old human embryos at the neurula stage" but it would be good to have a more nuanced reference: what stage, what genes are missing, what do they add to the information in that reference?

    1. eLife Assessment

      This useful manuscript reports mechanisms behind the increase in fecundity in response to sub-lethal doses of pesticides in the crop pest, the brown plant hopper. The authors hypothesize that the pesticide works by inducing the JH titer, which through the JH signaling pathway induces egg development, for which the evidence was judged to be solid.

    2. Reviewer #1 (Public review):

      Summary:

      Gao et al. has demonstrated that the the pesticide emamectin benzoate (EB) treatment of brown plathopper (BPH) leads to increased egg laying in the insect, which is a common agricultural pest. The authors hypothesize that EB upregulates JH titer resulting in increased fecundity.

      Strengths:

      The finding that a class of pesticide increases fecundity of brown planthopper is interesting.

      Comments on revisions:

      All my concerns have been addressed to reasonable level of satisfaction.

    1. eLife assessment

      This is a useful study that applies deep transfer learning to assign patient-level disease attributes to single cells of T2D and non-diabetic patients, including obese patients. This analysis identified a single cluster of T2D-associated β-cells; and two subpopulations of obese- β-cells derived from either non-diabetic or T2D donors. The findings were validated at the protein level using immunohistochemistry on islets derived from non-diabetic and T2D organ donors, contributing solid experimental evidence for the computational analyses.

    2. Reviewer #1 (Public review):

      In this manuscript, Roy et al. used the previously published deep transfer learning tool, DEGAS, to map disease associations onto single-cell RNA-seq data from bulk expression data. The authors performed independent runs of DEGAS using T2D or obesity status and identified distinct β-cell subpopulations. β-cells with high obese-DEGAS scores contained two subpopulations derived largely from either non-diabetic or T2D donors. Finally, immunostaining using human pancreas sections from healthy and T2D donors validated the heterogeneous expression and depletion of DLK1 in T2D islets.

      Strengths:

      (1) This meta-analysis of previously published scRNA-seq data uses a deep transfer learning tool.

      (2) Identification of novel beta cell subclusters.

      (3) Identified a relatively innovative role of DLK1 in T2D disease progression.

      Comments on revisions:

      All previous concerns have been addressed.

    3. Reviewer #2 (Public review):

      Summary:

      The manuscript by Gitanjali Roy et al. applies deep transfer learning (DEGAS) to assign patient-level disease attributes (metadata) to single cells of T2D and non-diabetic patients, including obese patients. This led to the identification of a singular cluster of T2D-associated β-cells; and two subpopulations of obese- β-cells derived from either non-diabetic or T2D donors. The objective was to identify novel and established genes implicated in T2D and obesity. Their final goal is to validate their findings at the protein level using immunohistochemistry of pancreas tissue from non-diabetic and T2D organ donors.

      Strengths:

      This paper is well-written, and the findings are relevant for β-cell heterogeneity in T2D and obesity.

      Weaknesses:

      The validation they provide is not sufficiently strong: no DLK1 immunohistochemistry is shown of obese patient-derived sections. Additional presumptive relevant candidates from this transcriptomic analysis should be screened for, at the protein level.

      Comments on revisions:

      The authors have largely addressed my comments. No further experiments are requested.

    1. eLife Assessment

      This is an important study that takes a key step towards understanding developmental disorders linked to mutations in the O-GlcNAc transferase enzyme by generating a mouse model harboring the C921Y mutation. The study thoroughly examines behavioral and anatomical differences in these mice and finds behavioral hyperactivity and learning/memory deficits, as well as phenotypic differences in skull and brain formation. However, the experimental evidence is incomplete owing to discrepancy in OGT protein/RNA levels in the C921Y mutant mice in this paper and the previous paper ("Neurodevelopmental defects in a mouse model of O-GlcNAc transferase intellectual disability "). This line of research will benefit from investigation of the differences in associated glycoproteins and mechanistic insights. This study will be of interest to those studying neurodevelopment, learning and behavior, or associated brain mechanisms.

    2. Reviewer #1 (Public review):

      This study established a C921Y OGT-ID mouse model, systematically demonstrating in mammals the pathological link between O-GlcNAc metabolic imbalance and neurodevelopmental disorders (cortical malformation, microcephaly) as well as behavioral abnormalities (hyperactivity, impulsivity, learning/memory deficits). However, critical flaws in the current findings require resolution to ensure scientific rigor.

      The most concerning finding appears in Figure S12. While Supplementary Figure S12 demonstrates decreased OGA expression without significant OGT level changes in C921Y mutants via Western blot/qPCR, previous reports (Florence Authier, et al., Dis Model Mech. 2023) described OGT downregulation in Western blot and an increase in qPCR in the same models. The opposite OGT expression outcomes in supposedly identical mouse models directly challenge the model's reliability. This discrepancy raises serious concerns about either the experimental execution or the interpretation of results. The authors must revalidate the data with rigorous controls or provide a molecular biology-based explanation.

      A few additional comments to the author may be helpful to improve the study.

      Major

      (1) While this study systematically validated multi-dimensional phenotypes (including neuroanatomical abnormalities and behavioral deficits) in OGT C921Y mutant mice, there is a lack of relevant mechanisms and intervention experiments. For example, the absence of targeted intervention studies on key signaling pathways prevents verification of whether proteomics-identified molecular changes directly drive phenotypic manifestations.

      (2) Although MRI detected nodular dysplasia and heterotopia in the cingulate cortex, the cellular basis remains undefined. Spatiotemporal immunofluorescence analysis using neuronal (NeuN), astrocytic (GFAP), and synaptic (Synaptophysin) markers is recommended to identify affected cell populations (e.g., radial glial migration defects or intermediate progenitor differentiation abnormalities).

      (3) While proteomics revealed dysregulation in pathways including Wnt/β-catenin and mTOR signaling, two critical issues remain unresolved: a) O-GlcNAc glycoproteomic alterations remain unexamined; b) The causal relationship between pathway changes and O-GlcNAc imbalance lacks validation. It is recommended to use co-immunoprecipitation or glycosylation sequencing to confirm whether the relevant proteins undergo O-GlcNAc modification changes, identify specific modification sites, and verify their interactions with OGT.

      (4) Given that OGT-ID neuropathology likely originates embryonically, we recommend serial analyses from E14.5 to P7 to examine cellular dynamics during critical corticogenesis phases.

      (5) The interpretation of Figure 8A constitutes overinterpretation. Current data fail to conclusively demonstrate impairment of OGT's protein interaction network and lack direct evidence supporting the proposed mechanisms of HCF1 misprocessing or OGA loss.

    3. Reviewer #2 (Public review):

      Summary:

      The authors are trying to understand why certain mutants of O-GlcNAc transferase (OGT) appear to cause developmental disorders in humans. As an important step towards that goal, the authors generated a mouse model with one of these mutations that disrupts OGT activity. They then go on to test these mice for behavioral differences, finding that the mutant mice exhibit some signs of hyperactivity and differences in learning and memory. They then examine alterations to the structure of the brain and skull, and again find changes in the mutant mice that have been associated with developmental disorders. Finally, they identify proteins that are up- or down-regulated between the two mice as potential mechanisms to explain the observations.

      Strengths:

      The major strength of this manuscript is the creation of this mouse model, as a key step in beginning to understand how OGT mutants cause developmental disorders. This line will prove important for not only the authors but other investigators as well, enabling the testing of various hypotheses and potentially treatments. The experiments are also rigorously performed, and the conclusions are well supported by the data.

      Weaknesses:

      The only weakness identified is a lack of mechanistic insight. However, this certainly may come in the future through more targeted experimentation using this mouse model.

    4. Author response:

      Reviewer #1 (Public review):

      This study established a C921Y OGT-ID mouse model, systematically demonstrating in mammals the pathological link between O-GlcNAc metabolic imbalance and neurodevelopmental disorders (cortical malformation, microcephaly) as well as behavioral abnormalities (hyperactivity, impulsivity, learning/memory deficits). However, critical flaws in the current findings require resolution to ensure scientific rigor.

      The most concerning finding appears in Figure S12. While Supplementary Figure S12 demonstrates decreased OGA expression without significant OGT level changes in C921Y mutants via Western blot/qPCR, previous reports (Florence Authier, et al., Dis Model Mech. 2023) described OGT downregulation in Western blot and an increase in qPCR in the same models. The opposite OGT expression outcomes in supposedly identical mouse models directly challenge the model's reliability. This discrepancy raises serious concerns about either the experimental execution or the interpretation of results. The authors must revalidate the data with rigorous controls or provide a molecular biology-based explanation.

      The referee’s assessment is based on a misunderstanding – these are certainly not the same experiment repeated twice with different answers. In the previous report of the OGT-C921Y mutant mice (Florence Authier, et al., Dis Model Mech. 2023), OGT and OGA mRNA/protein expression have been assessed in total brain protein extract from 3 months old male mice. In that study we observed a significant reduction in OGT protein levels while OGT mRNA levels were significantly increased in the mutant compared to WT controls. However, in our the current study (Figure S12), OGA and OGT mRNA/protein expression have been a) restricted to the pre-frontal cortex and b) are from 4 months old male mice, which does not allow a direct comparison of the two studies. In the pre-frontal cortex, OGT protein levels are not changed while OGT mRNA levels are increased (similarly to the total brain data), albeit not significantly. The different outcomes of OGT protein levels in both total brain and prefrontal cortex could suggest regional differences in OGT protein levels/stability as OGT mRNA levels are increased in both cases. Three other brain regions (hippocampus, striatum and cerebellum) have now also been assessed for OGT mRNA/protein expression, supporting such regional differences in OGT protein levels and these data will be included in the new version of the manuscript.

      A few additional comments to the author may be helpful to improve the study.

      Major

      (1) While this study systematically validated multi-dimensional phenotypes (including neuroanatomical abnormalities and behavioral deficits) in OGT C921Y mutant mice, there is a lack of relevant mechanisms and intervention experiments. For example, the absence of targeted intervention studies on key signaling pathways prevents verification of whether proteomics-identified molecular changes directly drive phenotypic manifestations.

      We agree with the referee that these experiments would further strenghten the work. They would, however, result in a 1-5 year delay in sharing this work with the scientific and patient communities. We will continue to work along these lines and report separately in the future.

      (2) Although MRI detected nodular dysplasia and heterotopia in the cingulate cortex, the cellular basis remains undefined. Spatiotemporal immunofluorescence analysis using neuronal (NeuN), astrocytic (GFAP), and synaptic (Synaptophysin) markers is recommended to identify affected cell populations (e.g., radial glial migration defects or intermediate progenitor differentiation abnormalities).

      We are currently performing these experiments so that they can be included in the version of record of this manuscript.

      (3) While proteomics revealed dysregulation in pathways including Wnt/β-catenin and mTOR signaling, two critical issues remain unresolved: a) O-GlcNAc glycoproteomic alterations remain unexamined; b) The causal relationship between pathway changes and O-GlcNAc imbalance lacks validation. It is recommended to use co-immunoprecipitation or glycosylation sequencing to confirm whether the relevant proteins undergo O-GlcNAc modification changes, identify specific modification sites, and verify their interactions with OGT.

      We agree with the referee that these experiments would further strenghten the work and will perform further experiments to explore whether these pathways are functionally affected. However, it is important to note that the inference that these proteins must themselves be O-GlcNAc modified is incorrect – indeed, O-GlcNAcylation of unknown protein kinase X, E3 ligase/DUB, Y or transcription factor Z could indirectly affect these pathways/proteins.

      (4) Given that OGT-ID neuropathology likely originates embryonically, we recommend serial analyses from E14.5 to P7 to examine cellular dynamics during critical corticogenesis phases.

      We agree with the referee that these experiments would further strenghten the work. They would, however, result in a significant delay in sharing this work with the scientific and patient communities. We will continue to work along these lines and report separately in the future.

      (5) The interpretation of Figure 8A constitutes overinterpretation. Current data fail to conclusively demonstrate impairment of OGT's protein interaction network and lack direct evidence supporting the proposed mechanisms of HCF1 misprocessing or OGA loss.

      For clarity, we will remove panel A from Figure 8 in the version of record – this panel was only ever meant to represent a priori hypotheses for OGT-CDG mechanisms, none of which have been either excluded or confirmed.

      Reviewer #2 (Public review):

      Summary:

      The authors are trying to understand why certain mutants of O-GlcNAc transferase (OGT) appear to cause developmental disorders in humans. As an important step towards that goal, the authors generated a mouse model with one of these mutations that disrupts OGT activity. They then go on to test these mice for behavioral differences, finding that the mutant mice exhibit some signs of hyperactivity and differences in learning and memory. They then examine alterations to the structure of the brain and skull, and again find changes in the mutant mice that have been associated with developmental disorders. Finally, they identify proteins that are up- or down-regulated between the two mice as potential mechanisms to explain the observations.

      Strengths:

      The major strength of this manuscript is the creation of this mouse model, as a key step in beginning to understand how OGT mutants cause developmental disorders. This line will prove important for not only the authors but other investigators as well, enabling the testing of various hypotheses and potentially treatments. The experiments are also rigorously performed, and the conclusions are well supported by the data.

      Weaknesses:

      The only weakness identified is a lack of mechanistic insight. However, this certainly may come in the future through more targeted experimentation using this mouse model.

      We agree with the referee that these experiments would further strenghten the work. They would, however, result in a 1-5 year delay in sharing this work with the scientific and patient communities. We will continue to work along these lines and report separately in the future.

    1. eLife Assessment

      This useful study uses fiber photometry, implantable lenses, and optogenetics to show that a subset of subthalamic nucleus neurons is active during movement, and that active but not passive avoidance depends in part on STN projections to substantia nigra. The strength of the evidence for these claims is solid, whereas evidence supporting the claims that STN is involved in cautious responding or the speed of avoidance is incomplete. This paper will be of interest to basic and applied behavioural neuroscientists working on avoidance if suitably streamlined to support the strongest claims.

    2. Reviewer #1 (Public review):

      Summary:

      The manuscript presents a robust set of experiments that provide new fundamental insights into the role of STN neurons during active and passive avoidance tasks. These forms of avoidance have received comparatively less attention in the literature than the more extensively studied escape or freezing responses, despite being extremely relevant to human behaviour and more strongly influenced by cognitive control.

      Strengths:

      Understanding the neural infrastructure supporting avoidance behaviour would be a fundamental milestone in neuroscience. The authors employ sophisticated methods, including calcium imaging and optogenetics, to delineate the functions of STN neurons during avoidance behaviours. The work is extremely thorough, and the evidence presented is compelling. Experiments are carefully constructed, well-controlled, and the statistical analyses are appropriate.

      Points for Authors' Consideration:

      (1) Motoric role of STN:<br /> The authors interpret their findings within the context of active avoidance, a cognitively demanding process. An alternative interpretation is that STN activation enhances global motoric tone, facilitating general movement rather than specifically encoding cautious avoidance. Experimentally, this could be evaluated by examining STN-induced motoric tone in non-avoidance contexts, such as open field tests with bilateral stimulations. Alternatively, or additionally, the authors could explicitly discuss evidence for and against the possibility that increased motoric tone may account for aspects of the observed behaviours.

      (2) Temporal Dynamics in Calcium Imaging (AA2 vs. AA1):<br /> Based on previous work by this group, a delay (~1-2 sec) in neuronal response onset was anticipated in AA2 compared to AA1. Although a delay in peak response is observed, there is no clear evidence of a significant delay in response onset or changes in slope of neural activity. The authors could quantify calcium onset latencies and slopes and statistically compare these parameters across conditions.

      (3) Speed Differences (AA2 vs. AA1):<br /> Given the increased latency in AA2, and based on previous work from the group, one would expect faster movements following initiation. However, such differences are not evident in the presented data. The authors might want to discuss the absence of an expected speed increase and clarify whether this absence is consistent with previous findings.

      (4) Behavioural Differences Across Neuronal Classes (Figure 7):<br /> The manuscript currently does not compare responses of neuronal classes I, II, and III between AA1 and AA2 conditions separately or provide information regarding their activity during AA3.

      (5) Streamlining Narrative and Figures:<br /> Given the extensive amount of material presented, the manuscript and figures would benefit from streamlining. Many data points and graphs could be moved to supplementary materials without affecting the core interpretation and simplifying the reading of the work by a non-expert audience. Similarly, the main text could be refined to more clearly emphasise the key findings, which would improve both readability and impact. At the same time, certain aspects would benefit from additional clarification. For example, it would be helpful to explain the key features of the AA1-AA3 tasks at the point of introduction, rather than referring readers to previous literature. Overall, enhancing clarity and accessibility would serve the authors well and broaden the impact of the work.

    3. Reviewer #2 (Public review):

      Summary:

      Zhou, Sajid et al. present a study investigating the STN involvement in signaled movement. They use fiber photometry, implantable lenses, and optogenetics during active avoidance experiments to evaluate this. The data are useful for the scientific community, and the overall evidence for their claims is solid, but many aspects of the findings are confusing and seemingly contradictory. For example, STN activity increases with contraversive turning in the fiber photometry experiments, but optogenetic stimulation of the STN evokes ipsiversive turning. While the authors present a huge collection of data, it is somewhat difficult to extract the key information and the meaningful implications resulting from this data.

      Strengths:

      The study is comprehensive in using many techniques, stimulation powers, frequencies, and configurations.

      Weaknesses:

      Here are the specific weaknesses of the paper.

      (1) Vglut2 isn't a very selective promoter for the STN. Did the authors verify every injection across brain slices to ensure the para-subthalamic nucleus, thalamus, lateral hypothalamus, and other Vglut2-positive structures were never infected?

      (2) The authors say in the methods that the high vs low power laser activation for optogenetic experiments was defined by the behavioral output. This is misleading, and the high vs low power should be objectively stated and the behavioral results divided according to the power used, not according to the behavioral outcome.

      (3) In the fiber photometry experiments exposing mice to the range of tones, it is impossible to separate the STN response to the tone from the STN response to the movement evoked by the tone. The authors should expose the mouse to the tones in a condition that prevents movement, such as anesthetized or restrained, to separate out the two components.

      (4) The claim 'STN activation is ideally suited to drive active avoids' needs more explanation. This claim comes after the fiber photometry experiments during active avoidance tasks, so there has been no causality established yet.

      (5) The statistical comparisons in Figure 7E need some justification and/or clarification. The 9 neuron types are originally categorized based on their response during avoids, then statistics are run showing that they respond differently during avoids. It is no surprise that they would have significantly different responses, since that is how they were classified in the first place. The authors must explain this further and show that this is not a case of circular reasoning.

      (6) The authors show that neurons that have strong responses to orientation show reduced activity during avoidance. What are the implications of this? The author should explain why this is interesting and important.

      (7) It is not clear which conditions each mouse experienced in which order. This is critical to the interpretation of Figure 9 and the reduction of passive avoids during STN stimulation. Did these mice have the CS1+STN stimulation pairing or the STN+US pairing prior to this experiment? If they did, the stimulation of the STN could be strongly associated with either punishment or with the CS1 that predicts punishment. If that is the case, stimulating the STN during CS2 could be like presenting CS1+CS2 at the same time and could be confusing.

      (8) The experiments in Figure 10 are used to say that STN stimulation is not aversive, but they only show that STN stimulation cannot be used as punishment in place of a shock. This doesn't mean that it is not aversive; it just means it is not as aversive as a shock. The authors should do a simpler aversion test, such as conditioned or real-time place preference, to claim that STN stimulation is not aversive. This is particularly surprising as previous work (Serra et al., 2023) does show that STN stimulation is aversive.

      (9) In the discussion, the idea that the STN encodes 'moving away' from contralateral space is pretty vague and unsupported. It is puzzling that the STN activates more strongly to contraversive turns, but when stimulated, it evokes ipsiversive turns; however, it seems a stretch to speculate that this is related to avoidance. In the last experiments of the paper, the axons from the STN to the GPe and to the midbrain are selectively stimulated. Do these evoke ipsiversive turns similarly?

      (10) In the discussion, the authors claim that the STN is essential for modulating action timing in response to demands, but their data really only show this in one direction. The STN stimulation reliably increases the speed of response in all conditions (except maximum speed conditions such as escapes). It seems to be over-interpreting the data to say this is an inability to modulate the speed of the task, especially as clear learning and speed modulation do occur under STN lesion conditions, as shown in Figure 12B. The mice learn to avoid and increase their latency in AA2 vs AA1, though the overall avoids and latency are different from controls. The more parsimonious conclusion would be that STN stimulation biases movement speed (increasing it) and that this is true in many different conditions.

      (11) In the discussion, the authors claim that the STN projections to the midbrain tegmentum directly affect the active avoidance behavior, while the STN projections to the SNr do not affect it. This seems counter to their results, which show STN projections to either area can alter active avoidance behavior. What is the laser power used in these terminal experiments? If it is high (3mW), the authors may be causing antidromic action potentials in the STN somas, resulting in glutamate release in many brain areas, even when terminals are only stimulated in one area. The authors could use low (0.25mW) laser power in the terminals to reduce the chance of antidromic activation and spatially restrict the optical stimulation.

      (12) Was normality tested for data prior to statistical testing?

      (13) Why are there no error bars on Figure 5B, black circles and orange triangles?

    4. Reviewer #3 (Public review):

      Summary:

      The authors use calcium recordings from STN to measure STN activity during spontaneous movement and in a multi-stage avoidance paradigm. They also use optogenetic excitation, optogenetic inhibition, and lesion approaches to increase or decrease the activity of STN during the avoidance paradigm. The paper reports a large amount of data and makes many claims, some seem well supported to this Reviewer, others not so much.

      Strengths:

      Well-supported claims include data showing that during spontaneous movements, especially contraversive ones, STN calcium activity is increased using bulk photometry measurements. Single-cell measures back this claim but also show that it is only a modest minority of STN cells that respond strongly, with most showing no response during movement, and a similar number showing smaller inhibitions during movement.

      Similar data during cued active avoidance procedures show that STN calcium activity sharply increases in response to auditory cues, and during cued movements to avoid a footshock. Optogenetic and lesion experiments are consistent with an important role for STN in generating cue-evoked avoidance. And a strength of these results is that multiple bi-directional approaches were used.

      Weaknesses:

      I found the experimental design and presentation convoluted and the results over-interpreted.

      (1) I really don't understand or accept this idea that delayed movement is necessarily indicative of cautious movements. Is the distribution of responses multi-modal in a way that might support this idea, or do the authors simply take a normal distribution and assert that the slower responses represent 'caution'? Even if responses are multi-modal and clearly distinguished by 'type', why should readers think this that delayed responses imply cautious responding instead of say: habituation or sensitization to cue/shock, variability in attention, motivation, or stress; or merely uncertainty which seems plausible given what I understand of the task design where the same mice are repeatedly tested in changing conditions. This relates to a major claim (i.e., in the work's title).

      (2) Related to the last, I'm struggling to understand the rationale for dividing cells into 'types' based the their physiological responses in some experiments (e.g., Figure 7).

      (3) The description and discussion of orienting head movements were not well supported, but were much discussed in the avoidance datasets. The initial speed peaks to cue seem to be the supporting data upon which these claims rest, but nothing here suggests head movement or orientation responses.

      (4) Similar to the last, the authors note in several places, including abstract, the importance of STN in response timing, i.e., particularly when there must be careful or precise timing, but I don't think their data or task design provides a strong basis for this claim.

      (5) I think that other reports show that STN calcium activity is recruited by inescapable foot shock as well. What do these authors see? Is shock, independent of movement, contributing to sharp signals during escapes?

      (6) In particular, and related to the last point, the following work is very relevant and should be cited: https://elifesciences.org/reviewed-preprints/104643#tab-content. Note that the focus of this other paper is on a subset of VGLUT2+ Tac1 neurons in paraSTN, but using VGLUT2-Cre to target STN will target both STN and paraSTN.

      (7) In multiple other instances, claims that were more tangential to the main claims were made without clearly supporting data or statistics. E.g., claim that STN activation is related to translational more than rotational movement; claim that GCaMP and movement responses to auditory cues were small; claims that 'some animals' responded differently without showing individual data.

      (8) In several figures, the number of subjects used was not described. This is necessary. Also necessary is some assessment of the variability across subjects. The only measure of error shown in many figures relates to trial-to-trial or event variability, which is minimal because, in many cases, it appears that hundreds of trials may have been averaged per animal, but this doesn't provide a strong view of biological variability. When bar/line plots are used to display data, I recommend showing individual animals where feasible.

      (9) Can the authors consider the extent to which calcium imaging may be better suited to identify increases compared to decreases and how this may affect the results, particularly related to the GRIN data when similar numbers of cells show responses in both directions (e.g., Figure 3)?

      (10) Raw example traces are not provided.

      (11) The timeline of the spontaneous movement and avoidance sessions was not clear, nor was the number of events or sessions per animal nor how this was set. It is not clear if there was pre-training or habituation, if many or variable sessions were combined per animal, or what the time gaps between sessions were, or if or how any of these parameters might influence interpretation of the results.

      (12) It is not clear if or how the spread of expression outside of the target STN was evaluated, and if or how many mice were excluded due to spread or fiber placements.

    1. eLife Assessment

      This study demonstrates the potential role of 17α-estradiol in modulating neuronal gene expression in the aged hypothalamus of male rats, identifying key pathways and neuron subtypes affected by the drug. While the findings are useful and provide a foundation for future research, the strength of supporting evidence is incomplete due to the lack of female comparison, a young male control group, unclear link to 17α-estradiol lifespan extension in rats, and insufficient analysis of glial cells and cellular stress in CRH neurons.

    2. Reviewer #1 (Public review):

      Summary:

      Previous studies have shown that treatment with 17α-estradiol (a stereoisomer of the 17β-estradiol) extends lifespan in male mice but not in females. The current study by Li et al, aimed to identify cell-specific clusters and populations in the hypothalamus of aged male rats treated with 17α-estradiol (treated for 6 months). This study identifies genes and pathways affected by 17α-estradiol in the aged hypothalamus.

      Strengths:

      Using single-nucleus transcriptomic sequencing (snRNA-seq) on hypothalamus from aged male rats treated with 17α-estradiol they show that 17α-estradiol significantly attenuated age-related increases in cellular metabolism, stress, and decreased synaptic activity in neurons.

      Moreover, sc-analysis identified GnRH as one of the key mediators of 17α-estradiol's effects on energy homeostasis. Furthermore, they show that CRH neurons exhibited a senescent phenotype, suggesting a potential side effect of the 17α-estradiol. These conclusions are supported by supervised clustering by neuropeptides, hormones, and their receptors.

      Weaknesses:

      However, the study has several limitations that reduce the strength of the key claims in the manuscript. In particular:

      (1) The study focused only on males and did not include comparisons with females. However, previous studies have shown that 17α-estradiol extends lifespan in a sex-specific manner in mice, affecting males but not females. Without the comparison with the female data, it's difficult to assess its relevance to the lifespan.

      (2) Its not known whether 17α-estradiol leads to lifespan extension in male rats similar to male mice. Therefore, it is not possible to conclude that the observed effects in the hypothalamus, are linked to the lifespan extension. The manuscript cited in the introduction does not include lifespan data on rats.

      (3) The effect of 17α-estradiol on non-neuronal cells such as microglia and astrocytes is not well described (Fig.1). Previous studies demonstrated that 17α-estradiol reduces microgliosis and astrogliosis in the hypothalamus of aged male mice. Current data suggest that the proportion of oligo, and microglia were increased by the drug treatment, while the proportions of astrocytes were decreased. These data might suggest possible species differences, differences in the treatment regimen, or differences in drug efficiency. This has to be discussed.

      A more detailed analysis of glial cell types within the hypothalamus in response to drug should be provided.

      (4) The conclusion that CRH neurons are going into senescence is not clearly supported by the data. A more detailed analysis of the hypothalamus such as histological examination to assess cellular senescence markers in CRH neurons, is needed to support this claim.

      Revised submission:

      Some of the concerns were addressed in this revised version, and the authors responded and addressed study design limitations in both sexes/ages.

      However, there are still some concerns that were not sufficiently addressed:<br /> While the term "senescent" was changed to "stressed," some histological/ cellular validation of this phenotype is still needed.

      Some discussion on the sex-specific effects of 17α-estradiol in the hypothalamus is still required. Previous studies in mice demonstrated that 17α-estradiol reduced hypothalamic microgliosis and astrogliosis in male but not female UM-HET3 mice.

      Additionally, the provided analysis on astrocytes and microglia is superficial.

    3. Reviewer #2 (Public review):

      Summary:

      Li et al. investigated the potential anti-ageing role of 17α-Estradiol on the hypothalamus of aged rats. To achieve this, they employed a very sophisticated method for single-cell genomic analysis that allowed them to analyze effects on various groups of neurons and non-neuronal cells. They were able to sub-categorize neurons according to their capacity to produce specific neurotransmitters, receptors, or hormones. They found that 17α-Estradiol treatment led to an improvement in several factors related to metabolism and synaptic transmission by bringing the expression levels of many of the genes of these pathways closer or to the same levels to those of young rats, reversing the ageing effect. Interestingly, among all neuronal groups, the proportion of Oxytocin-expressing neurons seems to be the one most significantly changing after treatment with 17α-Estradiol, suggesting an important role of these neurons on mediating its anti-ageing effects. This was also supported by an increase in circulating levels of oxytocin. It was also found that gene expression of corticotropin-releasing hormone neurons was significantly impacted by 17α-Estradiol even though it was not different between aged and young rats, suggesting that these neurons could be responsible for side effects related to this treatment. This article revealed some potential targets that should be further investigated in future studies regarding the role of 17α-Estradiol treatment in aged males.

      Strengths:

      • The single nucleus mRNA sequencing is a very powerful method for gene expression analysis and clustering. The supervised clustering of neurons was very helpful in revealing otherwise invisible differences between neuronal groups and helped identify specific neuronal populations as targets.

      • There is a variety of functions used that allowed the differential analysis of a very complex type of data. This led to a better comparison between the different groups in many levels.

      • There were some physiological parameters measured such as circulating hormone levels that helped the interpretation of the effects of the changes in hypothalamic gene expression.

      Weaknesses:

      • One main control group is missing from the study, the young males treated with 17α-Estradiol.

      • Even though the technical approach is a sophisticated one, analyzing the whole rat hypothalamus instead of specific nuclei or subregions makes the study weaker.

      • Although the authors claim to have several findings, the data fail to support these claims.

      • The study is about improving ageing but no physiological data from the study demonstrated such claim with the exception of the testes histology which was not properly analyzed and was not even significantly different between the groups.

      • Overall, the study remains descriptive with no physiological data to demonstrate that any of the effects on hypothalamic gene expression is related to metabolic, synaptic or other function.

      Comments on revisions:

      The authors revised part of the manuscript to address some of the reviewers' comments. This improved the language and the text flow to a certain extent. They also added an additional analysis including glial cells. However, they failed to address the main weaknesses brought up by the reviewers and did not add any experimental demonstration of their claims on lifespan expansion induced by 17α-estradiol in rats (the cited study does not include lifespan in rats). In addition, they insisted i keeping parts in the discussion that are not directly linked to any of the papers' findings.

    4. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #1 (Public Review):

      Summary:

      Previous studies have shown that treatment with 17α-estradiol (a stereoisomer of the 17β-estradiol) extends lifespan in male mice but not in females. The current study by Li et al, aimed to identify cell-specific clusters and populations in the hypothalamus of aged male rats treated with 17α-estradiol (treated for 6 months). This study identifies genes and pathways affected by 17α-estradiol in the aged hypothalamus.

      Strengths:

      Using single-nucleus transcriptomic sequencing (snRNA-seq) on the hypothalamus from aged male rats treated with 17α-estradiol they show that 17α-estradiol significantly attenuated age-related increases in cellular metabolism, stress, and decreased synaptic activity in neurons.

      Thanks.

      Moreover, sc-analysis identified GnRH as one of the key mediators of 17α-estradiol's effects on energy homeostasis. Furthermore, they show that CRH neurons exhibited a senescent phenotype, suggesting a potential side effect of the 17α-estradiol. These conclusions are supported by supervised clustering by neuropeptides, hormones, and their receptors.

      Thanks.

      Weaknesses:

      However, the study has several limitations that reduce the strength of the key claims in the manuscript. In particular:

      (1) The study focused only on males and did not include comparisons with females. However, previous studies have shown that 17α-estradiol extends lifespan in a sex-specific manner in mice, affecting males but not females. Without the comparison with the female data, it's difficult to assess its relevance to the lifespan.

      This study was originally designed based on previous findings indicating that lifespan extension is only effective in males, leading to the exclusion of females from the analysis. The primary focus of our research was on the transcriptional changes and serum endocrine alterations induced by 17α-estradiol in aged males compared to untreated aged males. We believe that even in the absence of female subjects, the significant effects of 17α-estradiol on metabolism in the hypothalamus, synapses, and endocrine system remain evident, particularly regarding the expression levels of GnRH and testosterone. Notably, lower overall metabolism, increased synaptic activity, and elevated levels of GnRH and testosterone are strong indicators of health and well-being in males, supporting the validity of our primary conclusions. However, including female controls would enhance the depth of our findings. If female controls were incorporated, we propose redesigning the sample groups to include aged male control, aged female control, aged female treated, aged male treated, as well as young male control, young male treated, young female control, and young female treated. We regret that we cannot provide this data in the short term. Nevertheless, we believe this reviewer’s creative idea presents a valuable avenue for future research on this topic. In this study, we emphasize the role of 17α-estradiol in overall metabolism, synaptic function, GnRH, and testosterone in aged males and underscore the importance of supervised clustering of neuropeptide-secreting neurons in the hypothalamus.

      (2) It is not known whether 17α-estradiol leads to lifespan extension in male rats similar to male mice. Therefore, it is not possible to conclude that the observed effects in the hypothalamus, are linked to the lifespan extension.

      Thanks for the reminding. 17α-estradiol was reported to extend lifespan in male rats similar to male mice (PMID: 33289482). We have added the valuable reference to introduction in the new version.  

      (3) The effect of 17α-estradiol on non-neuronal cells such as microglia and astrocytes is not well-described (Figure 1). Previous studies demonstrated that 17α-estradiol reduces microgliosis and astrogliosis in the hypothalamus of aged male mice. Current data suggest that the proportion of oligo, and microglia were increased by the drug treatment, while the proportions of astrocytes were decreased. These data might suggest possible species differences, differences in the treatment regimen, or differences in drug efficiency. This has to be discussed.

      We have reviewed reports describing changes in cell numbers following 17α-estradiol treatment in the brain, using the keywords "17α-estradiol," "17alpha-estradiol," and "microglia" or "astrocyte." Only a limited amount of data was obtained. We found one article indicating that 17α-estradiol treatment in Tg (AβPP(swe)/PS1(ΔE9)) model mice resulted in a decreased microglial cell number compared to the placebo (AβPP(swe)/PS1(ΔE9) mice), but this change was not significant when compared to the non-transgenic control (PMID: 21157032). The transgenic AβPP(swe)/PS1(ΔE9) mouse model may differ from our wild-type aging rat model in this context.

      Moreover, the calculation of cell numbers was based on visual observation under a microscope across several brain tissue slices. This traditional method often yields controversial results. For example, oligodendrocytes in the corpus callosum, fornix, and spinal cord have been reported to be 20-40% more numerous in males than in females based on microscopic observations (PMID: 16452667). In contrast, another study found no significant difference in the number of oligodendrocytes between sexes when using immunohistochemistry staining (PMID: 18709647). Such discrepancies arising from traditional observational methods are inevitable.

      We believe the data presented in this article are reliable because the cell number and cell ratio data were derived from high-throughput cell counting of the entire hypothalamus using single-cell suspension and droplet wrapping (10x Genomics).

      (4) A more detailed analysis of glial cell types within the hypothalamus in response to drugs should be provided.

      We provided more enrichment analysis data of differentially expressed genes between Y, O, and O.T in microglia and astrocytes in Figure 2—figure supplement 3. In this supplemental data, we found unlike that in neurons, Micro displayed lower levels of synapse-related cellular processes in O.T. compared to O.

      (5) The conclusion that CRH neurons are going into senescence is not clearly supported by the data. A more detailed analysis of the hypothalamus such as histological examination to assess cellular senescence markers in CRH neurons, is needed to support this claim.

      We also noted the inappropriate claim and have changed "senescent phenotype" to "stressed phenotype" and "abnormal phenotype" in both the abstract and results sections. The stressed phenotype could be induced by heightened functional activity in the cells, potentially indicating higher cellular activity. The GnRH and CRH neurons discussed in this paper may represent such a case, as illustrated by the observed high serum GnRH, testosterone, and cortisol levels. This revision suggestion is highly valuable and constructive for our understanding of the unique physiological characteristics revealed by these data.

      Reviewer #2 (Public Review):

      Summary:

      Li et al. investigated the potential anti-ageing role of 17α-Estradiol on the hypothalamus of aged rats. To achieve this, they employed a very sophisticated method for single-cell genomic analysis that allowed them to analyze effects on various groups of neurons and non-neuronal cells. They were able to sub-categorize neurons according to their capacity to produce specific neurotransmitters, receptors, or hormones. They found that 17α-Estradiol treatment led to an improvement in several factors related to metabolism and synaptic transmission by bringing the expression levels of many of the genes of these pathways closer or to the same levels as those of young rats, reversing the ageing effect. Interestingly, among all neuronal groups, the proportion of Oxytocin-expressing neurons seems to be the one most significantly changing after treatment with 17α-Estradiol, suggesting an important role of these neurons in mediating its anti-ageing effects. This was also supported by an increase in circulating levels of oxytocin. It was also found that gene expression of corticotropin-releasing hormone neurons was significantly impacted by 17α-Estradiol even though it was not different between aged and young rats, suggesting that these neurons could be responsible for side effects related to this treatment. This article revealed some potential targets that should be further investigated in future studies regarding the role of 17α-Estradiol treatment in aged males.

      Strengths:

      (1) Single-nucleus mRNA sequencing is a very powerful method for gene expression analysis and clustering. The supervised clustering of neurons was very helpful in revealing otherwise invisible differences between neuronal groups and helped identify specific neuronal populations as targets.

      Thanks.

      (2) There is a variety of functions used that allow the differential analysis of a very complex type of data. This led to a better comparison between the different groups on many levels.

      Thanks.

      (3) There were some physiological parameters measured such as circulating hormone levels that helped the interpretation of the effects of the changes in hypothalamic gene expression

      Thanks.

      Weaknesses

      (1) One main control group is missing from the study, the young males treated with 17α-Estradiol.

      Given that the treatment period lasts six months, which extends beyond the young male rats' age range, we aimed to investigate the perturbation of 17α-Estradiol on the normal aging process. Including data from young males could potentially obscure the treatment's effects in aged males due to age effects, though similar effects between young and aged animals may exist. Long-term treatment of hormone may exert more developmental effects on the young than the old. Consequently, we decided to exclude this group from our initial sample design. We apologize for this omission.

      (2) Even though the technical approach is a sophisticated one, analyzing the whole rat hypothalamus instead of specific nuclei or subregions makes the study weaker.

      The precise targets of 17α-Estradiol within the hypothalamus remain unresolved. Selecting a specific nucleus for study is challenging. The supervised clustering method described in this manuscript allows us to identify the more sensitive neuron subtypes influenced by 17α-Estradiol and aging across the entire hypothalamus, without the need to isolate specific nuclei in a disturbed hypothalamic environment.

      (3) Although the authors claim to have several findings, the data fail to support these claims. You may mean the claim as the senescent phenotype in Crh neuron induced by 17a-estradiol.

      Thanks. We have changed the "senescent phenotype" to "stressed phenotype" in the abstract and results to avoid such claim. The stressed phenotype may be induced by heightened functional activity in the cells, potentially indicating higher cellular activity.

      (4) The study is about improving ageing but no physiological data from the study demonstrated such a claim with the exception of the testes histology which was not properly analyzed and was not even significantly different between the groups.

      The primary objective of this study is to elucidate the effects of 17α-Estradiol on the endocrine system in the aging hypothalamus; exploring anti-aging effects is not the main focus. From the characteristics of the aging hypothalamus, we know that down-regulated GnRH and testosterone levels, along with elevated mTOR signaling, are indicators of aging in these organs from previous publications (PMID: 37886966, PMID: 37048056, PMID: 22884327). The contrasting signaling networks related to metabolism and synaptic processes significantly differentiate young and aging hypothalami, and 17α-Estradiol helps rebalance these networks, suggesting its potential anti-aging effects.

      (5) Overall, the study remains descriptive with no physiological data to demonstrate that any of the effects on hypothalamic gene expression are related to metabolic, synaptic, or other functions.

      The study focuses on investigating cellular responses and endocrine changes in the aging hypothalamus induced by 17α-estradiol, utilizing single-nucleus RNA sequencing (snRNA-seq) and a novel data mining methodology to analyze various neuron subtypes. It is important to note that this study does not mainly aim to explore the anti-aging effects. Consequently, we have revised the claim in the abstract from “the effects of 17α-estradiol in anti-aging in neurons” to “the effects of 17α-estradiol on aging neurons.” We observed that the lower overall metabolism and increased expression levels of cellular processes in the synapses align with findings previously reported regarding 17α-estradiol. To address the lack of physiological data and the challenges in measuring multiple endocrine factors due to their volatile nature, we employed several bidirectional Mendelian analyses of various genome-wide association study (GWAS) data related to these serum endocrine factors to identify their mutual causal effects.

      Reviewing Editor Comment:

      Based on the Public Reviews and Recommendations for Authors, the Reviewers strongly recommend that revisions include an experimental demonstration of the physiological effects of the treatment on ageing in rats as well as the CRH-senescence link. Additional analysis of the glia would greatly strengthen the study, as would inclusion of females and young male controls. The important point was also raised that the work linking 17a-estradiol was performed in mice, and the link with lifespan in rats is not known. Discussion of this point is recommended.

      We thank the reviewers for their constructive feedback. Regarding the recommendations in the Public Reviews and Recommendations for Authors:

      a)  Physiological effects & CRH-senescence link:

      We acknowledge that 17α-estradiol has been reported to extend lifespan in male rats, consistent with findings in male mice (PMID: 33289482). This point has now been noted in the Introduction. We regret that further experimental validation of the treatment's physiological effects on aging in rats was beyond the scope of this study.

      b) Phenotype terminology:

      In response to concerns about the "senescent" characterization of CRH neurons, we have revised this terminology to "stressed phenotype" throughout the abstract and results. While we were unable to conduct additional experiments to confirm senescence markers, this revised description better reflects the heightened cellular activity observed (as evidenced by elevated serum GnRH and testosterone levels), without implying confirmed senescence.

      c) Glial cell analysis:

      To address questions about glial cell function during treatment, we have added new enrichment analysis data of differentially expressed genes in microglia and astrocytes from young (Y), old (O), and old treated (O.T) groups in Figure 2—figure supplement 3. This analysis reveals that microglia exhibit contrasting synaptic-related cellular processes compared to total neurons.

      d) Female and young controls:

      We sincerely apologize for the absence of female subjects and young male controls in the current study. The reviewers' suggestion to examine the male-specific effects of 17α-estradiol using female controls represents an excellent direction for future research, which we plan to pursue in upcoming studies.

      Reviewer #2 (Recommendations For The Authors):

      General comments:

      (1) The manuscript is very hard to read. Proofreading and editing by software or a professional seems necessary. The words "enhanced", "extensive" etc. are not always used in the right way.

      Thanks for the suggestion. We have revised the proofreading and editing. The words "enhanced" and "extensive" were also revised in most sentences.

      (2) The numbers of animals and samples are not well explained. Is it 9 rats overall or per group? If there are 8 testes samples per group, should we assume that there were 4 rats per group? The pooling of the hypothalamic how was it done? Were all the hypothalamic from each group pooled together? A small table with the animals per group and the samples would help.

      We appreciate your reminder regarding the initial mistake in our manuscript preparation. In the preliminary submission, we reported 9 rats based solely on sequencing data and data mining. The revised version (v1) now includes additional experimental data, with an effective total of 12 animals (4 per group). Unfortunately, we overlooked updating this information in the v1 submission. We have since added detailed information in the Materials and Methods sections: Animals, Treatment and Tissues, and snRNA-seq Data Processing, Batch Effect Correction, and Cell Subset Annotation.

      (3) The Clustering is wrong. There are genes in there that do not fall into any of the 3 categories: Neurotransmitters, Receptors, Hormones.

      We acknowledge the error in gene clustering and have implemented the following corrections:

      (a) The description has been updated to state: 'Vast majority of these subtypes were clustered by neuropeptides, hormones, and their receptors among all neurons.'

      (b) Genes not belonging to these three categories have been substantially removed.

      (c) The neuropeptide category (now including several growth hormones) has been expanded to 104 genes, while their corresponding receptors (including several sex hormone receptors) now comprise 105 genes.

      (4) The coloring of groups in the graphs is inconsistent. It must be more homogeneous to make it easier to identify.

      We have changed the colors of groups in Fig. 1D to make the color of cell clusters consistent in Fig. 1A-D.

      (5) The groups c1-c4 are not well explained. How did the authors come up with these?

      We have added more descriptions of c1-c4 in materials and methods in the new version.

      (6) In most cases it's not clear if the authors are talking about cell numbers that express a certain mRNA, the level of expression of a certain mRNA, or both. They need to do a better job using more precise descriptions instead of using general terms such as "signatures", "expression profiles", "affected neurons" etc. It is very hard to understand if the number of neurons is compared between the groups or the gene expression.

      We have changed the "signatures" to "gene signatures" to make it more accurate in meaning. The "affected neurons" were also changed to "sensitive neurons". But sorry that we were not able to find better alternatives to the "expression profiles".

      (7) Sometimes there are claims made without justification or a reference. For example, the claim about the senescence of CRH neurons due to the upregulation of mitochondrial genes and downregulation of adherence junction genes (lines 326-328) should be supported by a reference or own findings.

      The "senescence" here is not appropriate. We have changed it to "stressed phenotype" or "aberrant changes" in abstract and results.

      (8) Young males treated with Estradiol as a control group is necessary and it is missing.

      Your suggestion is appreciated; however, the treatment duration for aged mice (O.T) was set at 6 months, while the young mice were only 4 months old. This disparity makes it challenging to align treatment timelines for the young animals. The primary aim of this study is to investigate the perturbation of 17α-estradiol on the aging process, and any distinct effects due to age effect observed in young males might complicate our understanding of its role in aged males, though similar endocrine effects may exist in the young animals. Long-term treatment of hormone may exert more developmental effects on the young than the old. Therefore, we made the decision to exclude the young samples in our initial study design. We apologize for any confusion this may have caused.

      Specific Comments:

      Line 28: "elevated stresses and decreased synaptic activity": Please make this clearer. Can't claim changes in synaptic activity by gene expression.

      We have changed it to "the expression level of pathways involved in synapse"

      Line 32: "increased Oxytocin": serum Oxytocin.

      We have added the “serum”.

      Line 52 - 54: Any studies from rats?

      Thanks. In rats there is also reported that 17α-estradiol has similar metabolic roles as that in mice (PMID: 33289482) and we have added it to the refences. It’s very useful for this manuscript.

      Line 62 - 65: It wasn't investigated thoroughly in this paper so why was it suggested in the introduction?

      We have deleted this sentence as being suggested.

      Line 70: "synaptic activity" Same as line 28.

      We have changed it to "pathways involved in synaptic activity".

      Line 79: Why were aged rats caged alone and young by two? Could that introduce hypothalamic gene expression effects?

      The young males were bred together in peace. But the aged males will fight and should be kept alone.

      Lines 78, 99, 109-110: It is not clear how many animals per group were used and how many samples per group were used separately and/or grouped. Please be more specific.

      We have added these information to Materials and methods/Animals, treatment and tissues and Materials and methods/snRNA-seq data processing, batch effect correction, and cell subset annotation.

      Line 205: "in O" please add "versus young.".

      We have changed accordingly.

      Line 207: replace "were" with "was"

      We have alternatively changed the "proportion" to "proportions".

      Line 208: replace "that" with "compared to" and after "in O.T." add "compared to?"

      We have changed accordingly.

      Line 223: "O.T." compared to what? Figure?

      We have changed it accordingly.

      Line 227: Figure?

      We have added (Figure 1E) accordingly.

      Line 229: "synaptic activity" Same as line 28.

      We have revised it.

      Line 235: "synaptic activity" and "neuropeptide secretion" Same as line 28.

      We have revised it.

      Line 256:" interfered" please revise.

      We changed to "exerted".

      Line 263: "on the contrary" please revise.

      We have changed "on the contrary" to "opposite".

      Line 270: "conversed" did you mean "conserved"?

      We have changed "conversed" to "inversed".

      Line 296-298: Please explain. Why would these be side effects?

      It’s hard to explain, therefore, we deleted the words "side effects".

      Line 308: "synaptic activity" Same as line 28.

      We have changed it to "expression levels of synapse-related cellular processes".

      Line 314: "and sex hormone secretion and signaling"Isn't this expected?

      Yes, it is expected. We have added it to the sentence "and, as expected, sex hormone secretion and signaling".

      Line 325-328: Why is this senescence? Reference?

      We have added “potent” to it.

      Line 360-361: This doesn't show elevated synaptic activity.

      "elevated synaptic activity" was changed to "The elevated expression of synapse-related pathways"

      Line 363-364: "Unfortunately" is not a scientific expression and show bias.

      We have changed it to "Notably".

      Line 376: Similar as above.

      Yes, we have change it to "in contrast".

      Lines 382-385: This is speculation. Please move to discussion.

      Sorry for that. We think the causal effects derived from MR result is evidence. As such, we have not changed it.

      Line 389: Please revise "hormone expressing".

      We have changed it accordingly.

      Line 401: Isn't this effect expected due to feedback inhibition of the biochemical pathway? Please comment.

      The binding capability of 17alpha-estradiol to estrogen receptors and its role in transcriptional activation remain core questions surrounded by controversy. Earlier studies suggest that 17alpha-estradiol exhibits at least 200 times less activity than 17beta-estradiol (PMID: 2249627, PMID: 16024755). However, recent data indicate that 17alpha-estradiol shows comparable genomic binding and transcriptional activation through estrogen receptor α (Esr1) to that of 17beta-estradiol (PMID: 33289482). Additionally, there is evidence that 17alpha-estradiol has anti-estrogenic effects in rats (PMID: 16042770). These findings imply possible feedback inhibition via estrogen receptors. Furthermore, 17alpha-estradiol likely differs from 17beta-estradiol due to its unique metabolic consequences and its potential to slow aging in males, an effect not attributed to 17beta-estradiol. For instance, neurons are also targets of 17alpha-estradiol, with Esr1 not being the sole target (PMID: 38776045). Intriguingly, neurons expressing Ar and Esr1 ranked among the top 20 most perturbed receptor subtypes during aging (O vs Y), but were no longer ranked in this group following treatment (O.T vs Y and O.T vs O comparisons). This indicates that 17α-estradiol administration attenuated age-associated perturbation in these neuronal subtypes, which may be a consequence of potential feedback (Figure 3D). Nevertheless, the precise effective targets of 17alpha-estradiol are still unresolved.

      Line 409: This conclusion cannot be made because the effect is not statistically significant. Can say "trend" etc.

      Thanks for the recommendation. We have added "potential" in front of the conclusion.

      Line 426: "suggesting" please revise.

      sorry, it’s a verb.

      Lines 426-428: This is speculation. Please move to discussion.

      The elevated GnRH levels in O.T., observed through EIA analysis, suggest a deduction regarding the direct causal effects of 17alpha-estradiol on various endocrine factors related to feeding, energy homeostasis, reproduction, osmotic regulation, stress response, and neuronal plasticity through MR analysis. Thus, we have not amended our position. We apologize for any confusion.

      Lines 431-432: improved compared to what?

      The statement have been revised as " The most striking role of 17α-estradiol treatment revealed in this study showed that HPG axis was substantially improved in the levels of serum Gnrh and testosterone".

      Line 435: " Estrogen Receptor Antagonists". Please revise.

      Thanks for the recommendation. We have changed it to "estrogen receptor antagonists".

      Line 438" "Secrete". Please revise

      Sorry, it is "secret".

      Lines 439-449: None of this has been demonstrated. Please remove these conclusions.

      We appreciate the reviewer's scrutiny regarding lines 439-449. While these statements should not be interpreted as definitive conclusions from our current data, we propose they serve as clinically relevant discussion points worthy of exploration. Our findings demonstrate 17α-estradiol's role in modulating testosterone levels in aged males. This mechanistic insight warrants consideration of its therapeutic potential for age-related hypogonadism - a hypothesis we believe merits discussion given the compound's specific endocrine effects.

      Lines 450-457: No females were included in this study. Why? Also, why is this discussed? It is relevant but doesn't belong in this manuscript since it was not studied here.

      Testosterone levels are crucial for male health, while estradiol levels are essential for the health and fertility of females. Previous studies have demonstrated that 17α-estradiol does not contribute to lifespan extension in females. Given the effects of 17α-estradiol on males—specifically, its role in promoting testosterone and reducing estradiol levels—we believe it is important to discuss the potential sex-biased effects of 17α-estradiol, as this could inform future investigations. We have refined this section to clarify that these points represent mechanistic hypotheses derived from our male data and existing literature, not conclusions about unstudied female physiology. This framing maintains the discussion's scientific value while respecting the study's scope.

      Lines 458-459: This was not demonstrated in this article. Please remove.

      We have restricted the claim to "expression level of energy metabolism in hypothalamic neurons".

      Line 464: "Promoted lifespan extension" Not demonstrated. Please remove.

      At the end of the sentence it was revised as "which may be a contributing factor in promoting lifespan extension".

      Line 466: "Showed" No.

      The whole sentence was deleted in the new version.

      Line 483: "the sex-based effects". Not studied here.

      Since the changes in testosterone levels are significant in this dataset and this hormone has a sex-biased nature, we find it worthwhile to suggest this as a topic for future investigation. We have added "which needs further verification in the future" at the end of this sentence.

    1. eLife Assessment

      This is a well-done study that provides compelling data from a diverse set of approaches from single cell transcriptome data and network analysis from genetically diverse mouse cells to identify novel driver genes underlying human GWAS associations. The authors present solid evidence that network analysis of scRNA-seq data from genetically diverse mouse bone-marrow derived stromal cells can be informative for identifying human BMD GWAS driver genes. Their approach should be broadly useful and applicable to other GWAS studies.

    2. Reviewer #1 (Public review):

      In this manuscript, Dillard and colleagues integrate cross-species genomic data with a systems approach to identify potential driver genes underlying human GWAS loci and establish the cell type(s) within which these genes act and potentially drive disease.

      Specifically, they utilize a large single cell RNA-seq (scRNA-seq) dataset from an osteogenic cell culture model - bone marrow-derived stromal cells cultured under osteogenic conditions (BMSC-OBs) - from a genetically diverse outbred mouse population called the Diversity Outbred (DO) stock to discover network driver genes that likely underlie human bone mineral density (BMD) GWAS loci. The DO mice segregate over 40M single nucleotide variants, many of which affect gene expression levels, therefore making this an ideal population for systems genetic and co-expression analyses.

      The current study builds on previous published work from the same group that used co-expression analysis to identify co-expressed "modules" of genes that were enriched for BMD GWAS associations. In this study, the authors utilized a much larger scRNA-seq dataset from 80 DO BMSC-OBs, inferred co-expression based on Bayesian networks for each identified mesenchymal cell type, focused on networks with dynamic expression trajectories that are most likely driving differentiation of BMSC-OBs, and then prioritized genes ("differentiation driver genes" or DDGs) in these osteogenic differentation networks that had known expression or splicing QTLs (eQTL/sQTLs) in any GTEx tissue that co-localized with human BMD GWAS loci. The systems analysis is impressive, the experimental methods are described in detail, and the experiments appear to be carefully done. The computational analysis of the single cell data is comprehensive and thorough, and the evidence presented in support of the identified DDGs, including Tpx2 and Fgfrl1, is for the most part convincing. Some limitations in the data resources and methods hamper enthusiasm somewhat and are discussed below.

      Overall, while this study will no doubt be valuable to the BMD community, the cross-species data integration and analytical framework may be more valuable and generally applicable to the study of other diseases, especially for diseases with robust human GWAS data but for which robust human genomic data in relevant cell types is lacking.

      Specific strengths of the study include the large scRNA-seq dataset on BMSC-OBs from 80 DO mice, the clustering analysis to identify specific cell types and sub-types, the comparison of cell type frequencies across the DO mice, and the CELLECT analysis to prioritize cell clusters that are enriched for BMD heritability (Figure 1). The network analysis pipeline outlined in Figure 2 is also a strength, as is the pseudotime trajectory analysis (results in Figure 3).

      Potential drawbacks of the authors' approach include their focus on genes that were previously identified as having an eQTL or sQTL in any GTEx tissue. The authors rightly point out that the GTEx database does not contain data for bone tissue, but reason that eQTLs can be shared across many tissues - this assumption is valid for many cis-eQTLs, but it could also exclude many genes as potential DDGs with effects that are specific to bone/osteoblasts. Indeed, the authors show that important BMD driver genes have cell-type specific eQTLs. Another issue concerns potential model overfitting in the iterativeWGCNA analysis of mesenchymal cell type-specific co-expression, which identified an average of 76 co-expression modules per cell cluster (range 26-153). Based on the limited number of genes that are detected as expressed in a given cell due to sparse per cell read depth (400-6200 reads/cell) and drop outs, it's surprising that as many as 153 co-expression modules could be distinguished within any cell cluster. I would suspect some degree of model overfitting is responsible for these results.

      Overall, though, these concerns are minor relative to the many strengths of the study design and results. Indeed, I expect the analytical framework employed by the authors here will be valuable to -- and replicated by -- researchers in other disease areas.

      Comments on revisions:

      Thank you for addressing my concerns. This is an impressive study and manuscript that you should be proud of.