10,000 Matching Annotations
  1. Last 7 days
    1. Reviewer #2 (Public review):

      Summary:

      Chang et al. attempted to analyze a large number of ribo-seq datasets through a standardized pipeline, identifying novel non-canonical ORFs and elucidating their evolutionary and expression characteristics.

      Strengths:

      (1) The datasets analyzed by the authors are sufficiently comprehensive, and the use of standardized pipelines ensures excellent analytical consistency.

      (2) Their analyses of ORF evolution and co-expression further deepen our understanding of these ORFs.

      Weaknesses:

      (1) The authors primarily conducted analyses through bioinformatics, lacking sufficient wet-lab experimental evidence.

      (2) Some analytical methods and standards were not clearly presented in the manuscript.

    2. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      This work compiles a comprehensive atlas of ncORFs across mammalian tissues and cell types, derived from reanalysis of ~400 public ribosome profiling datasets. The authors then evaluate cross-species conservation and functional signatures, proposing that evolutionarily ancient ncORFs tend to have higher translation potential, stronger expression, and closer relationships with canonical coding sequences.

      Strengths:

      In general, the study provides a large-scale and timely resource of annotated ncORFs, which could be broadly useful for the community. The authors collected ~400 public ribosome profiling datasets for annotations of ncORFs, which, to my best knowledge, is the largest collection of data for such a purpose. The catalog could facilitate future investigations into ncORF biology and broaden understanding of the coding potential of the "non-coding" genome.

      We thank the reviewer for the positive evaluation of our manuscript and for recognizing the significance of our contribution.

      Weaknesses:

      Based on the ncORF catalog, some of the analyses were not properly done. Some of the results are descriptive.

      (1) Bias and representations of the data source. Public ribo-seq datasets are unevenly distributed across tissues and cell lines, raising concerns about heterogeneity and underrepresentation of certain contexts. This may limit the generalizability of the catalog.

      We agree with the reviewer that the uneven distribution of public Ribo-seq datasets across tissues can inevitably introduce bias in the ncORF composition of our catalog. This bias is likely more pronounced in humans due to the narrower tissue coverage. We have addressed this point in the Discussion section of the revised manuscript.

      (2) The discussion on modular domains of ncORFs is unclear, and the claim that they may originate via TErelated mechanisms is not well supported. Stronger evidence or clearer reasoning is needed.

      We thank the reviewer for highlighting this point. We have revised the manuscript to more clearly explain the rationale behind our analysis of ncORF modular domains and have adopted more cautious language regarding their potential transposable element–related origins, limiting interpretations to what is directly supported by the data.

      (3) The conservation comparisons are not fully convincing. Figure S7 shows only mild differences between ncORFs and CDS, and statistical significance is not clearly demonstrated.

      Comparisons with other non-coding RNAs should be added, and overlapping sequences between ncORFs and CDS should be excluded to avoid bias.

      We thank the reviewer for this comment and apologize for the lack of clarity in the original figure. Both CDSs and ncORFs show significant deviation from zero Gnocchi scores (two-sided Wilcoxon signed-rank tests), which is now stated explicitly in the revised legend and text. CDS-overlapping ncORFs were already excluded in the original analysis; this has been clarified to avoid confusion.

      As suggested, we have added lncRNAs for comparison. ncORFs display modestly higher Gnocchi scores than lncRNAs, and this difference persists when restricting the analysis to lncRNA-derived ncORFs and their corresponding full-length lncRNAs (see revised Fig. S7). These additions strengthen the conservation comparison while controlling for transcript context.

      (4) Figure 3 indicates that some ncORFs are subject to evolutionary constraints. This is not surprising. The authors should provide further analyses on more detailed features of these "conserved" ncORFs vs. the "non-conserved" ones. Some pretty informative works have been done in Drosophila, worms, mice, and humans. Figure 3 suggests some ncORFs are under evolutionary constraint, but this is not unexpected. More granular analyses contrasting "conserved" versus "non-conserved" ncORFs would be informative. In fact, small ORFs, especially uORFs, have been extensively studied for their functions and cross-species conservation. The authors should explicitly show what is new here in their analyses.

      We thank the reviewer for this insightful comment. We agree that cross-species conservation of ncORFs (particularly uORFs) has been extensively investigated in prior studies, including our own.

      However, most prior analyses have focused on conservation of start codons or overall ORF integrity, which does not distinguish selection acting on translational activity from selection acting on the encoded peptide sequence itself. In contrast, our analysis leverages codon-level periodic PhyloP signals across the full ORF. The observed three-nucleotide periodicity is consistent with selective constraint at the amino acid level, rather than merely preservation of initiation sites or translational potential. Furthermore, our newly developed branch-length statistic uncovers lineage-restricted conservation patterns among ncORFs, enabling resolution of evolutionary dynamics not captured by conventional conservation metrics.

      Thus, while the existence of conserved ncORFs is not unexpected, the conceptual advance of our study lies in demonstrating that a subset exhibits coding-like evolutionary constraint consistent with selection on their peptide products, as well as revealing lineage-specific conservation patterns. We have clarified this distinction in the revised Discussion.

      (5) Translation levels are reported using RPF counts. However, translation efficiency (normalized by RNA expression) is a more appropriate measure to account for expression heterogeneity.

      We agree that translation efficiency (TE), which normalizes ribosome footprint counts by RNA abundance, is in principle an appropriate metric. We initially calculated TE and compared ncORFs with CDSs. However, we found that TE estimates for short ncORFs were substantially inflated by RPF enrichment near start and stop codons, leading to unstable and potentially misleading values.

      For CDSs, this bias is commonly addressed by excluding the first and last 10 to 20 codons when quantifying RPF density. This strategy is not feasible for ncORFs because of their short length. We therefore used RPF counts in the final analysis, applying stringent positional filtering. Only RPFs whose P sites fall within the ORF body, excluding start and stop codons, were counted. RPFs overlapping the ORF but with P sites outside the annotated frame, likely derived from adjacent ORFs or initiation or termination pausing, were excluded.

      TE and RPF counts both measure translation but capture different aspects. TE reflects ribosome density relative to transcript abundance, whereas RPF counts quantify overall ribosome engagement. Given the short lengths of ncORFs, count-based quantification provides a more robust and conservative estimate of their translational activity.

      (6) The correlation analyses between ncORF translation levels and PhyloCSF are confusing and largely descriptive. These sections need sharper framing and clearer conclusions.

      We thank the reviewer for this comment. We agree that the original presentation lacked clear framing. The relationship between PhyloCSF scores and mean ncORF translation levels across tissues is influenced by both evolutionary age and tissue specificity. Older ncORFs with higher coding potential tend to exhibit stronger tissue-restricted expression. As a result, their mean translation levels across all tissues appear lower, not because they are weakly translated, but because their translation is concentrated in specific tissues. This point is addressed in the revised manuscript.

      (7) Public ribo-seq datasets, generated by different research labs, are known for their strong batch effects. Representations of tissues and cells are also very unbalanced. Therefore, the co-translation analysis between ncORFs and canonical CDS is not well controlled. This should be done by referring to a recent large-scale ribo-seq meta-analysis (Nat Biotechnol. 2025. doi: 10.1038/s41587-025-02718-5).

      We thank the reviewer for highlighting this important study and for raising concerns regarding batch effects and tissue imbalance in public Ribo-seq datasets. We are aware that public Ribo-seq data generated by different laboratories are subject to substantial batch effects. During the ncORF annotation phase, we applied stringent quality-control criteria to minimize technical variability. For the co-translation analysis, inclusion criteria were relaxed to increase tissue and cell-type coverage. To partially mitigate representation bias, libraries derived from the same tissue or cell type were merged when quantifying ORF translation levels, thereby reducing overrepresentation from heavily sampled contexts.

      Nevertheless, we acknowledge that these measures cannot completely eliminate batch effects or imbalance inherent to public datasets. We agree that co-translation analysis would benefit from uniformly processed, high-quality datasets generated under standardized protocols with balanced tissue representation, representing a valuable direction for future research.

      Reviewer #2 (Public review):

      Summary:

      Chang et al. attempted to analyze a large number of ribo-seq datasets through a standardized pipeline, identifying novel non-canonical ORFs and elucidating their evolutionary and expression characteristics.

      Strengths:

      (1) The datasets analyzed by the authors are sufficiently comprehensive, and the use of standardized pipelines ensures excellent analytical consistency.

      (2) Their analyses of ORF evolution and co-expression further deepen our understanding of these ORFs.

      We thank the reviewer for the positive evaluation of our manuscript. It is encouraging to know that the analytical framework was found to be sound and appropriate.

      Weaknesses:

      (1) The authors primarily conducted analyses through bioinformatics, lacking sufficient wet-lab experimental evidence.

      We thank the reviewer for this comment and acknowledge this limitation. We agree that functional validation through wet-lab experiments would provide important mechanistic insight into individual ncORFs. However, this study was designed as a systematic, genome-wide computational analysis to characterize translated ncORFs across species and tissues. Our objective was to define global patterns of translation, conservation, and structural features using large-scale datasets. Given the breadth and scale of these analyses, experimental validation of specific ncORFs falls beyond the scope of the current study. We have clarified this point in the dicussion and noted that our results provide a framework for future targeted experimental investigation.

      (2) Regarding the evolution of non-canonical ORFs, a considerable amount of prior work already exists. The authors need to further clarify what new insights and discoveries they have made based on the analysis of such a large dataset.

      We thank the reviewer for this suggestion. Similar concerns were also raised by Reviewer #1. In response, we have revised the Discussion to more clearly delineate the conceptual advances enabled by our large-scale dataset.

      Recommendations for the authors:

      Reviewing Editor Comments:

      Several aspects of the downstream analyses would benefit from additional refinement. The heterogeneity and tissue imbalance inherent in public Ribo-seq datasets introduce potential biases in ncORF detection and inferences about co-translation. Given the breadth of the dataset, it would also be informative to quantify how consistently the newly identified ncORFs are detected across samples-distinguishing those observed broadly across tissues, those enriched in specific contexts, and those detected in only a few datasets. Such stratification would help differentiate reproducibly translated ORFs from candidates requiring further validation.

      We thank the editor for the helpful comments. We agree that heterogeneity and tissue imbalance in public Ribo-seq datasets can influence ncORF detection and downstream interpretations. We have added discussion of this limitation in the revised manuscript.

      Detection of ncORF translation depends not only on biological activity but also on sequencing depth and data quality. Although all ncORFs reported here were reproducibly identified by multiple methods across independent libraries, we agree that those detected in a larger number of datasets represent stronger candidates for functional validation. Accordingly, we now report the number of methods and libraries in which each ncORF was detected in the final catalog (Supplementary Table 3). Overall, 22.3–26.3% of ncORFs were detected in more than 10 libraries, whereas more than half were observed in only two to five libraries (Fig. S1B), enabling clearer stratification of broadly translated versus more context-specific candidates.

      Some evolutionary and functional interpretations are largely descriptive or consistent with established findings for small ORFs, and the authors should more clearly articulate what is novel in their analyses. The criteria separating "young," "old," and "ancient" ORFs require clearer definition, and conservation analyses would be strengthened by improved statistical rigor and explicit exclusion of regions overlapping annotated coding sequences. Evidence for modular domain features or transposable element-related origins is limited and warrants either stronger support or more cautious framing. Proteomics validation is currently minimal and could be substantially reinforced using existing public MS resources.

      We thank the reviewer for these constructive comments. In the revised manuscript, we more clearly delineate the novel insights derived from our evolutionary analyses of ncORFs, distinguishing them from established findings on small ORFs.

      We have clarified the criteria used to classify ORFs by evolutionary age in figure 6E and refined the terminology describing “young,” “old,” and “ancient” categories to ensure precise definition. The conservation analyses have been strengthened through more rigorous statistical treatment and by explicitly excluding regions overlapping annotated coding sequences.

      With respect to modular domain features and potential transposable element–related origins, we have adopted more cautious language and limited our interpretations to what is directly supported by the data. Finally, we acknowledge that current proteomic validation remains limited and have clarified this point in the manuscript while outlining the potential for future integration of large-scale public mass spectrometry datasets in Discussion.

      The authors additionally report an interesting observation that many ncORFs on mRNA co-translate with the main CDS of the same gene. Because canonical models often posit that uORF translation suppresses downstream CDS translation, further analysis would be valuable. In particular, it would be useful to determine whether patterns of co-translation differ among ORF types or evolutionary categories and to discuss possible regulatory mechanisms underlying these relationships.

      We thank the editor for this thoughtful comment. As noted in our response to Reviewer #2, uORF–CDS co-translation does not contradict the canonical model in which uORFs repress downstream CDS translation. Co-translation reflects concurrent ribosome occupancy, whereas repression concerns the fraction of initiating ribosomes that ultimately reach and translate the CDS. Following the editor’s suggestion, we further examined whether co-translation patterns differ across ORF types or evolutionary categories. We found that ncORFs co-translating with their corresponding main CDSs are predominantly uORFs. However, these uORFs do not show statistically significant differences in conservation metrics or evolutionary age compared with other non-overlapping uORFs. Thus, we did not detect clear subtype- or age-specific distinctions among co-translating ncORFs. We have clarified these analyses in the revised manuscript.

      Addressing these points would enhance the precision, interpretability, and robustness of the study's conclusions.

      Reviewer #2 (Recommendations for the authors):

      (1) The authors developed and refined a standardized pipeline to analyze nearly 400 ribo-seq datasets, identifying over 10,000 novel non-canonical ORFs in both human and mouse samples. Given the scale of this analysis, it is intriguing to consider how many of the newly identified non-canonical ORFs are consistently detected across multiple sample types (conservatively expressed ORFs), how many are restricted to specific tissues/ or tissue-specific ORFs), and how many were detected in only a single or very few samples (ORFs requiring further validation). Providing these data could offer new insights into understanding ORF translation.

      Thanks for this constructive suggestion. This information has been presented in the revised Supplementary Table 3 and in a newly added supplementary figure (Fig. S1B), which together provide a clearer overview of ncORF detection consistency and context specificity.

      (2) The authors' validation of MS data lacks specific details in the paper. Regarding the MS-supported ORF mentioned in Lane 117, which dataset's MS data is being referenced? Or does it refer to the content in Reference 20? At present, substantial research exists in both public general proteomics studies (e.g., CPTAC) and MS investigations targeting non-canonical ORFs. We recommend the authors incorporate additional MS data or public MS-based databases to strengthen validation in this area (PMID: 34129944, 39794466, 37823596,39413795).

      We thank the reviewer for this comment and for the helpful suggestions. The MS-supported ORFs mentioned in line 117 refer to the compilation reported in Reference 20, which integrates evidence from multiple independent proteomics studies. In addition, we examined MS-supported ORFs curated by GENCODE and PeptideAtlas, which are shown in Fig. 1E.

      We agree that incorporating additional MS datasets would further strengthen validation of ncORFs. Studies cited by the reviewer and recent community efforts such as the GENCODE and PeptideAtlas analyses (PMID: 39314370) provide valuable examples in this direction. However, performing a comprehensive reanalysis of more than 95,000 public human MS runs is computationally demanding and currently infeasible for our group given resource and funding constraints.

      To our knowledge, ongoing community-wide initiatives are working toward more comprehensive catalogs of translated human ncORFs. Large-scale, exhaustive MS searches will be particularly effective once a community consensus annotation framework for ncORFs is established. We have added discussion of these limitations and future directions in the revised manuscript.

      (3) The authors classified ncORFs into three groups-"Ancient," "Young," and "Old"-based on their origin nodes. However, both the "Young" and 'Old' groups appear to be "mammalian-specific," yet the specific criteria for their division remain unclear. It is recommended to more clearly define in the figure legend or main text how "Young" and "Old" are categorized (e.g., based on specific evolutionary nodes or distance thresholds from nodes to the end) to avoid reader confusion.

      In Fig. 5, “old” and “young” were intended as qualitative descriptors of relative evolutionary age based on the position of ncORF origination nodes along the phylogeny, as indicated on the x-axis. They were not meant to represent discrete categories. To avoid confusion, we have revised the manuscript to use “older” and “younger” throughout when referring to relative age differences. A binary classification is used only in Fig. 6E, where ncORFs are grouped into ancient (pre-mammalian) and younger (mammalian-specific) categories. This distinction is clearly defined in both the main text and the corresponding figure legend.

      (4) The authors observed an intriguing phenomenon: ncORFs on mRNA tend to co-translate with the main CDS of the same gene. However, the conventional view holds that uORF translation often inhibits the translation of the main CDS. I suggest the authors could refine their analysis in this section further. For instance, do different types of ORFs or ORFs at different evolutionary levels exhibit distinct levels of cotranslation with the main CDS? Additionally, while observing this phenomenon, the authors should also propose hypotheses regarding the regulatory mechanisms involved in these processes.

      We thank the reviewer for these constructive suggestions. After excluding CDS-overlapping ORFs, we identified 258 human and 128 mouse ncORFs that co-translate with their corresponding main CDSs. With the exception of 10 human dORFs, all remaining cases were uORFs. We compared these cotranslating ncORFs with other non-overlapping uORFs and dORFs but did not detect statistically significant differences in evolutionary age and conservation metrics. Because no clear distinguishing features emerged, we did not include these results in the manuscript.

      Importantly, the observation of uORF–CDS co-translation does not contradict the established repressive role of uORFs. Co-translation reflects concurrent ribosome occupancy, whereas repression concerns the proportion of initiating ribosomes that ultimately translate the CDS. For example, if two ribosomes initiate within a given interval and one translates the uORF while one translates the CDS, CDS output is reduced by 50% relative to a uORF-free transcript. If four ribosomes initiate under the same repressive regime, two may translate the uORF and two the CDS. In this case, absolute translation of both ORFs increases, while the fractional repression remains unchanged. Thus, co-translation is compatible with a regulatory model in which uORFs reduce CDS translation efficiency without abolishing it. This has been clarified in the revised manuscript.

    1. eLife Assessment

      This study offers an important contribution to our understanding of the role of layer 6b cortical neurons in sleep-wake regulation, providing new insight into how this understudied neural population may regulate cortical arousal via orexin signaling. The evidence supporting these findings is solid, although somewhat constrained by limitations in the specificity of the genetic targeting strategy. Nonetheless, the work introduces new avenues for uncovering how the classical wake-promoting peptide, orexin, exerts its effects on the cortex.

    2. Reviewer #1 (Public review):

      Summary:

      Meijer et al. sought to investigate the role of cortical layer 6b (L6b) neurons in modulating sleep-wake states and cortical oscillations under baseline and sleep deprived conditions and in response to orexin A and B. Using chronic EEG recordings in mice with silencing of Drd1a+ neurons (via constitutive Cre-dependent knockout of SNAP25), the authors report that while overall baseline sleep-wake architecture and response to sleep deprivation are minimal/unchanged, "L6b silencing leads" to a slowing of theta activity during wakefulness and REM sleep, and a reduction in EEG power during NREM sleep. The manuscript is well written with clarity and transparency. Although Drd1a+ neurons are not exclusive to L6b, the authors describe key future studies to identify a causal role for L6b neurons in brain state regulation. These studies contribute to a growing body of evidence that cortex-in addition to subcortical brain regions-plays a role in brain state regulation.

      Strengths:

      (1) The text is well written.

      (2) The authors are transparent about methodological details and study limitations.

      (3) The stated sleep, circadian, and orexin infusion experiments are well designed, executed, and analyzed.

      Weaknesses:

      (1) Outcomes are attributed to silencing cortical L6b neurons, but the genetic manipulation is not specific to L6b neurons or cortex. The authors acknowledge this as a limitation and offer targets for future studies to identify L6b neuron-specific contributions to stated outcomes that include spatially restricted manipulations.

      (2) Experiments use only male mice, which limits generalizability to females.

      Comments on revised version:

      The authors took great care in addressing my previous comments, and I do not have any additional concerns.

    3. Reviewer #2 (Public review):

      Summary:

      In this manuscript, Meijer and colleagues investigated the effects of inactivation (conditional silencing) of cortical layer 6b neurons on sleep-wake states and EEG spectral power under the following three conditions: during natural sleep-wake states, after sleep deprivation, or after intracerebroventricular administration of orexin A and B. The authors report that silencing of L6b neurons did not have a significant effect on the total time spent in sleep-wake states, duration or number of state epochs, or the response to sleep deprivation. However, silencing of L6b neurons did slow down theta-frequency (6-9 Hz) during wake and REM sleep, and reduced the total EEG power during NREM sleep. Infusion of orexin A in the mice in which cortical layer 6b neurons were inactivated produced an increase in wakefulness. A similar effect was observed after infusion of orexin A in the mice in which these neurons were not silenced, but the effect (i.e., increase in wakefulness) was of a smaller magnitude. Silencing of cortical layer 6b neurons attenuated the effect of orexin B in increasing theta activity, as was observed in the control mice. The authors conclude that the cortical neurons in layer 6b play an essential role in state-dependent dynamics of brain activity, vigilance state control and sleep regulation.

      Strengths:

      - A focus on cortical layer 6b neurons, which is an understudied neuronal population, especially in the context of brain and behavioral state transitions.

      - The authors used a well-established mouse model to study the effect of inactivation of cortical layer 6b neurons.

      Weaknesses:

      - Although the authors used a highly selective approach to silence layer 6b neurons, the observed changes in EEG oscillations cannot be solely attributed to layer 6b neurons because of the ICV route for orexin administration.

      - The rationale for using only male rats is not provided.

      Comments on revised version:

      The authors have addressed my concerns.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      (1) All outcomes are attributed specifically to L6b neurons, but the genetic manipulation is not specific to L6b neurons. The authors acknowledge this as a limitation, but in my view, this global manipulation is more than a limitation - it affects the overall interpretations of the data. The Hoerder-Suabedissen et al., 2018 paper shows sparse, but also dense, expression of Drd1a+ neurons in brain regions outside of the L6b. Given this issue, the results are largely overstated throughout the paper.

      We appreciate the reviewer’s careful reading and concern that some of our statements may have overstated the implications of our data. The Drd1a Cre mouse model used (FK164) has a relatively selective expression of Drd1a Cre in cortex, but indeed some expression is seen subcortically. This is an acknowledged limitation which is now explicitly addressed in the revised manuscript.

      (2) It is not clear to me that the "silencing" of Drd1a+ neurons was verified.

      In our previous publications, we showed confirmation of the loss of regulated synaptic vesicle release from the Cre-positive neuronal population (Marques-Smith et al., 2016; Hoerder-Suabedissen et al., 2018; Messore et al., 2024). This has now been described in the revised manuscript.

      (3) There were various discrepancies (and potentially misattributions) between the stated significant differences in Supplementary Table T1 data and Figure 3a & S2 spectral plots. This issue makes it difficult to effectively evaluate the main text and stated outcomes.

      We thank the reviewer for their careful attention to the statistical analyses and for noting the inconsistencies in how the results of the spectral analysis were presented: in the text we described two-way ANOVAs with according posthoc tests but in the figures significance markers were positioned based on multiple t tests. We have now carefully revised the spectral results and implemented a consistent approach in statistical reporting and spectral plots. We have updated Supplementary Table T1, Figure 3a and S2 to ensure that all statistics are presented consistently throughout the manuscript, i.e. with two-way ANOVAs and accompanying posthoc tests. Please note that we performed all spectral analyses in the range between 0.5 and 128 Hz (excluding the range between 49-51.5 Hz due to electrical noise from the power grid) but only plot the range between 0.5-30 Hz as the spectral bands most relevant for sleep neurophysiology are contained in this range.

      Related, the authors stated that post hoc comparisons of EEG spectral frequency bins were not corrected for multiple testing. Instead, significance was only denoted if changes in at least two consecutive frequency bins were significant. However, there are multiple plots in which a single significance marker is placed over an isolated bin (i.e., 4c, 6, S5, S6). Unless each marker is equivalent to 2 consecutive frequency bins, these markers should be removed from the plots. Otherwise, please define the frequency and size of these markers in the main text.

      In line with the previous comment, we have adjusted markers to reflect the results from posthoc tests after two-way ANOVAs.Please note that Figure 6 and the related supplementary figures S5 and S6 have now been removed from the manuscript, as careful re-analysis indicated that the sample size was too low to support a strong conclusion regarding the comparison of orexin effects between genotypes. We stated in the text that we would only include posthoc significance when at least two consecutive bins were significant, but this was indeed not supported in our figure, where each marker reflects one 0.25 Hz bin. We have now adjusted our code to ensure that only markers are plotted when at least two consecutive bins are significant in bin-wise posthoc comparisons.

      (4) A rainbow color scale, as in Figure 3, we've now learned, can be misleading and difficult to interpret. The viridis color scale or a different diverging color scale are good alternatives.

      Thank you for pointing this out, we have adjusted the colour scale.

      (5) How much time elapsed between vehicle/orexin A & B infusions?

      There were 2-4 non-infusions days between infusions. We have added this information to methods.

      (6) For Figure 6, there are statistical discrepancies between the main text and the plots (pg. 10):

      (a) The text claims post hoc differences for relative ORXA frontal EEG, but there are no significance markers on the plot.

      (b) The text states that there were no post hoc differences for the relative ORXA occipital EEG, but significance markers are on the plot.

      (c) The main test for the relative ORXB frontal EEG was not significant, but there are post hoc significance markers on the plot.

      (d) For relative ORXB occipital EEG, there are significant markers on the plot outside of the stated range in the text.

      We agree with the reviewer, and we decided to exclude this figure from the manuscript as the sample size for some key comparisons was too low to support any strong conclusions and therefore presenting this analysis is potentially misleading. We explain the rationale for excluding this analyses in the revised manuscript.

      (7) Some important details are only available in figure captions, making it difficult to understand the main text. For example, when describing Figure 3c in the main text on page 7, it is not clear what type of transitions are being discussed without reading the figure caption. Likewise, a "decrease," "shift," and "change" are mentioned, but relative to what? Similar comment for the EEG theta activity description on pages 7 - 8. Please add relevant details to the main text.

      We have adjusted the wording in the main text to reflect more precisely which comparisons are shown in the figures.

      (8) Statistical comparisons for data in Figure 3e, post hoc analyses for data in Figure S7a-b REM data, and post hoc analyses for Figure S7c (not b) occipital EEG should be included to support differences claims. Please denote these differences on the respective plots.

      Please note that the previously named Supplementary Figures S5 and S6 have been removed from the manuscript, and that the Supplementary Figure S7 in this comment refers to the figure currently named Supplementary Figure S5.

      We have added the statistical comparisons for Figure 3e, Supplementary Figure S5A and Figure S5b to the results section. In Figure S5c, there was an overall genotype difference, but there was no significant time x genotype interaction, so we have not performed posthoc tests and did not plot posthoc significance markers for this figure. We have adjusted the wording in the results section to make this clearer. We have adjusted the reference to the figure S5c which was incorrect, thank you for your careful attention.

      (9) In the subsection titled "Layer 6b mediates effects of orexin on vigilance states (pg. 8)," there does not seem to be any stated differences between control and L6b silenced mice. A more accurate subtitle is needed.

      We agree with the reviewer and the title of this sub-section has now been changed accordingly.

      Reviewer #2 (Public review):

      Weaknesses:

      (1) Although the authors used a highly selective approach to silence layer 6b neurons, the observed changes in EEG oscillations cannot be solely attributed to layer 6b neurons because of the ICV route for orexin administration.

      We thank the reviewer for this important comment. The ICV route of orexin administration cannot guarantee that only cortical Drd1a-Cre–expressing neurons are reached by orexin, and the Drd1a-Cre driver line is highly selective but not entirely specific for layer 6b neurons (see also response to reviewer #1, comment 1). We have therefore changed the wording of the stated effects and addressed this consideration in the Limitations section of the manuscript. Please note that, as mentioned above, Figure 6 has now been excluded from the manuscript.

      (2) The rationale for using only male rats is not provided.

      We thank the reviewer for highlighting this omission. We now provide the rationale for using only male mice in the methods section as follows: “In the current study, only male mice were used, because our experimental protocol precluded the possibility of accurately monitoring the oestrous cycle, which has marked effects on brain activity, arousal and vigilance states. We therefore decided to use male mice only for the current study but are planning to use both sexes in future work.”

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) Better descriptions of L6b connectivity will improve clarity in the second paragraph of the Introduction (pg. 3). For example, it is not explicitly stated that L6b projects to L5 before the authors describe L5. Therefore, the L5 description seems irrelevant.

      We thank the reviewer for this request for clarification. We mention the connectivity between L6b and L5 because L5 pyramidal neurons have recently been found to play a key role in sleep-wake regulation (Krone et al., Nat. Neurosci. 2021; Honjo et al., 2025; Wasilczuk et al, 2025; Krone et al., 2025). We have now amended the corresponding section of the introduction to emphasise the potential functional relevance of this connection as follows:

      “L5, the major output layer of the cortex, is also bidirectionally communicative with higher order thalamic nuclei (Hoerder-Suabedissen et al., 2018) as well as layer 5 pyramidal neurons (Zolnik et al., 2024). Since several subtypes of L5 pyramidal neurons have recently been shown to play important roles in distinct aspects of sleep-wake regulation (Krone et al., 2021, 2025; Hong et al. 2023; Wasilczuk et al. 2025; Honjo et al., 2025; Chouafeev et al., 2025); depth of anaesthesia (Wasilczuk et al. 2025), and the influence of stress on sleep (Chouafeev et al. 2025) the projections of orexin-sensitive L6b to L5 pyramidal neurons may be a key circuitry in the top-down regulation of brain states.”

      (2) There are plots where the y-axis tick label appears to be offset from the tick mark (4a, S5b, S6a).

      Thank you for spotting this graphical issue. We have removed the y-axis tick labels from Figure 4a to avoid confusion. Please note that we decided to remove Figure S5 and Figure S6, because after careful re-analysis we concluded that the group size was too small to draw conclusions on orexin spectra and that any results could be potentially misleading.

      (3) The 2-h time constant, I believe, is depicted in Figure 4H (not 4G).

      Thank you for spotting this. We have corrected the figure legends accordingly and double-checked that Figure 4G depicts the 2-h time constant and Figure 4H the 6-h time constant.

      (4) "...although there was an indication of a higher absolute theta-peak power in layer 6b silenced mice (Figure S6)," pg. 10. It is not clear to me how the data lead to this conclusion.

      Thank you for identifying this inconsistency, which resulted from a preliminary statistical analysis subsequently corrected. We have now improved the statistical analysis of spectral data (for more details see comments to both reviewers in public response) and removed this statement, which in fact is no longer supported by the data.

      (5) Exclusion of female mice is not listed as a limitation.

      We now discuss this limitation as follows:

      “In the current study, only male mice were used, because our experimental protocol precluded the possibility of accurately monitoring the oestrous cycle, which has marked effects on brain activity, arousal and vigilance states. We therefore decided to use male mice only for the current study but are planning to use both sexes in future work.”

      (6) A brief description of why Cplx3 and Tbr1 antibodies are being used will be helpful to include in the Methods (pg. 21) in addition to what is in the figure caption.

      We have added the following information to the methods section to clarify why we used these two antibodies: “rabbit α-Cplx3 to distinguish between L6a and L6b” “mouse α-Tbr1 to identify the L5-6 boundary”

      (7) Including a label/title for the Figure 2c spectral plots will be helpful. It is not immediately clear if these are light period & dark period data or frontal & occipital data.

      Thank you for pointing this out, we have updated the figure legend to clarify what is shown on this Figure

      Similar comments for S2 and S3a plots. Including a state label on the plots will be helpful in addition to the caption description.

      We have now added the state labels for Figure panels S2 and S3a for improved clarity.

      Reviewer #2 (Recommendations for the authors):

      This is a soundly conducted and well-written study that enhances our understanding of the cortical control of states of consciousness. I do not have any major concerns, but would like the authors to consider some alternate possibilities as suggested in my comments below:

      We thank the reviewer for this positive assessment of our manuscript and the helpful suggestions.

      (1) Given that the inactivation of layer6b neurons did not affect the time spent in sleep-wake states, to me it appears that these neurons likely have a role in creating the background neural conditions/oscillations supportive of an activated state rather than a direct role in behavioral state control.

      We completely agree with the reviewer and have made the wording more consistent throughout the manuscript, now using “brain state control” rather than “behavioural state control” to clarify that the main effect observed in the L6b-silenced mouse model is a change in spectral characteristics reflecting brain oscillations, rather than effects on vigilance states, which were modest.

      (2) Does the observed shift in REM sleep-related theta-peak frequency in the occipital derivation suggest changes in local neural processes, or could it be just a matter of better signal detection because theta is most prominent at or around the hippocampal region, which is approximately the location of occipital electrodes in this study.

      The source of the shift in REM sleep–related theta peak frequency in the occipital derivation cannot be established with EEG recordings alone. Additional intracortical or intrahippocampal recordings would be necessary to distinguish between the two possible explanations proposed by the reviewer. We have discussed this further in the revised manuscript.

      (3) Orexinergic system innervates multiple subcortical sites and widely covers the cortex too, because of which the effect of ICV orexins cannot be attributed to just layer6b neurons as described in the manuscript ("Layer 6b mediates effects of orexin on brain activity.").

      We agree with the reviewer that this is a limitation. We have now adjusted the subtitle of the paragraph describing the results from the ICV administration of orexin and further mention this important consideration in the ‘limitations’ section of the discussion.

      (4) While the current study is focused on sleep-wake mechanisms, the findings reported here have much broader implications for behavioral and/or brain state arousal and provide a mechanistic bridge between different states of consciousness, including general anesthesia. Therefore, the authors may consider tying these findings with the recent work on the role of the prefrontal cortex in arousal from general anesthesia and slow-wave sleep (PMID: 35436248, PMID: 29937348, PMID: 33328847).

      We thank the reviewer for this excellent recommendation. We are now citing these papers in the revised manuscript.

      (5) It's up to the authors, but I do not see the need for the section on Clinical Implications. It's very speculative, and it makes the entire discussion section heavy.<br />

      We have considerably shortened the discussion of potential clinical implications to make the manuscript more concise.

      (6) Figure 1: It's difficult to compare the EEG power the way figures are set up right now. I think it would enhance clarity if the authors separate the plots based on state and show power from the control and silenced neuronal group in the same plot. Also, the colors are too similar (essentially a shade of green/blue) to provide effective visual resolution. This is especially true in panel d. Please consider changing the color scheme.

      This comment seems to refer to Figure 2 and subsequent figures with analysis of vigilance states and EEG spectra (Figure 1 contains histological images). We have selected the colour scheme for colour-blind individuals. Therefore, the main difference is in the saturation, not the colour of the plots. We have tested the visibility of the colour scheme on a high-resolution screen with the original image files and can reassure the reviewer that the genotype differences, which are slightly blurred in the reduced-resolution figures provided within the combined text file for the review process, are easily distinguishable in the final figure quality.

      (7) I don't understand the y-axis scale in Figure 1. How can this be 500% and if it is, then 500% of what?

      This comment also seems to refer to the analysis of slow wave activity (SWA) in Figure 2 rather than to Figure 1 (histology figure). The percentage of SWA is normalised to the average SWA across the recording. Since NREM sleep is characterised by considerably higher SWA than wakefulness and REM sleep, the level of SWA during NREM sleep is in the range of 200-300%, and can be even higher after long wake episodes which are followed by a rebound of NREM sleep SWA. Hence, the upper limit of the y-axis in these (and subsequent) plots of SWA is 500% (of the average SWA). We have amended the figure legend to clarify that SWA is presented here as percentage of average SWA across the recording.

    1. eLife Assessment

      In this potentially valuable computational study, the authors conducted extensive atomistic and coarse-grained simulations to probe the temperature-dependent phase behaviors of ELF3, a disordered component of the evening complex in plant. The results aim to highlight the role of polyQ tracts in modulating temperature-responsive structural and condensation behavior. Despite considerable improvements in the revised manuscript, the level of evidence is considered incomplete, since several of the supplementary observables introduced to support the revised claim indicate that the variants studied are not statistically distinguishable within the reported replicate uncertainty.

    2. Reviewer #1 (Public review):

      Summary:

      This manuscript explores the role of the Evening Complex (EC), specifically focusing on ELF3, a disordered protein component of the EC, and its temperature-dependent phase behavior. The study highlights the role of polyQ tracts in modulating temperature-sensitive condensate formation and provides a combination of computational approaches, including REST2 simulations and coarse-grained Martini simulations, to investigate how polyQ tract length and sequence context influence this behavior.

      Strengths:

      The study addresses a key question in plant biology - how temperature influences circadian clock-mediated growth regulation through protein phase behavior. The manuscript introduces the novel finding that polyQ tract length modulates the temperature-dependent formation of helices and condensates.

      Weaknesses:

      (1) Coarse-Grained Simulation Results Not Supported by Data:

      The results presented in Figure 6A of the manuscript do not seem to show a clear trend in the number of clusters formed as a function of polyQ tract length. This is particularly evident in the comparison between 0Q and 7Q polyQ lengths, which display statistically similar values in terms of the number of clusters. The lack of distinction between these values raises questions about the sensitivity of the coarse-grained simulations to polyQ tract length, which the authors claim as a key modulator of condensate formation. This discrepancy weakens the argument that polyQ length directly impacts the clustering behavior in the simulations.

      Suggested Analysis:

      a) A more detailed statistical analysis should be performed to assess whether the observed differences between polyQ lengths are significant. This could involve hypothesis testing or the use of error bars in the graphs to better communicate the variability in the data.

      b) Additionally, the authors should examine whether there are other features, such as cluster shape or internal structure, that might differentiate between different polyQ lengths, even if the total number of clusters is similar.

      (2) Inconsistency in Cluster Size Across Temperatures (Figure 6B):

      The results in Figure 6B show a striking difference in the size of the largest cluster between temperatures of 290K and 300K. This abrupt shift in behavior lacks a clear mechanistic explanation. Typically, phase transitions driven by temperature are more gradual, unless there is some underlying structural or chemical shift that the authors have not accounted for. Without a clear explanation, this sudden change in behavior reduces confidence in the simulation results.

      Suggested Analysis:

      a) The authors should explore possible explanations for the dramatic difference in cluster size between 290K and 300K. For example, they could investigate whether specific interactions (such as the breaking or formation of hydrogen bonds or hydrophobic contacts) might explain the behavior at higher temperatures.

      b) It is important to check whether the coarse-grained simulation model has been adequately parameterized and scaled for accurate temperature dependence. Atomistic simulations of monomers and dimers with varying polyQ tract lengths could be used to fine-tune the coarse-grained model, ensuring it accurately reflects molecular behavior. The gross estimate of a 10% scaling factor might be insufficient and could lead to inaccurate representations of cluster formation.

      (3) Scaling of Coarse-Grained Model with Atomistic Simulations:

      As mentioned, the coarse-grained model used in the study may not have been properly scaled against atomistic data. A simple scaling factor of 10% may not be appropriate for accurately capturing the behavior of polyQ tracts across different lengths, especially considering their sensitivity to subtle changes in temperature. Without rigorous validation against atomistic simulations, the coarse-grained model's predictions could be skewed.

      Suggested Analysis:

      a) To address this, the authors should compare the coarse-grained model with atomistic simulations of monomeric and dimeric forms of ELF3 with different polyQ tract lengths. By comparing key structural parameters (e.g., radius of gyration, contact maps, and clustering propensity), the authors could adjust the coarse-grained model to more accurately reflect the atomistic behavior. The authors have wealth of atomistic simulation data that could afford such benchmarking and identification of scaling factor

      b) Additionally, the authors should investigate whether the assumed scaling factor of 10% is appropriate for each polyQ length or whether it needs to be refined based on specific properties, such as the number of hydrophobic interactions or secondary structure stability.

      (4) Lack of Analysis for Liquid-Like Behavior in Phase Separation:

      The simulations presented in the manuscript do not analyze the liquid-like behavior of ELF3 condensates, which is a key characteristic of liquid-liquid phase separation (LLPS). In LLPS systems, condensates are often dynamic, with chains exchanging between clusters, indicating liquid-like rather than solid-like behavior. The authors fail to probe this crucial aspect, which is necessary to support the claim that ELF3 undergoes phase separation.

      Suggested Analysis:

      a) The authors should conduct additional analyses to probe the liquid-like nature of the clusters formed by ELF3. One approach would be to analyze the dynamics of chain exchange between clusters, measuring how frequently chains leave one cluster and join another over time. This analysis would reveal whether the condensates behave as liquid-like, dynamic structures or more static, solid-like aggregates.

      b) Additionally, the temperature dependence of these exchange dynamics should be investigated. In true liquid-liquid phase separation, the rate of chain exchange is often sensitive to temperature. Observing how this rate changes between 290K and 300K, for instance, could help explain the abrupt shift in cluster size seen in Figure 6B.

      c) The authors should also analyze whether the internal structures of the condensates are consistent with a liquid-like phase. For example, radial distribution functions and contact lifetimes could be calculated to reveal whether the clusters exhibit liquid-like organization.

      (5) Lack of justification of polydispersity of polyQ:

      The authors don't provide any rationale for choice of different copies of polyQ used in the manuscript for their chain-growth simulation studies. It will be more apt if it can be motivated via some precedent experimental observations.

      (6) Lack of initiative to connect to Experiments:

      While the computational models and simulations provide robust theoretical insights, the absence of direct experimental validation weakens the overall impact of the manuscript. For example, experimental data on how specific mutations in the polyQ tract influence ELF3 behavior in vivo would significantly bolster the authors' claims. The manuscript would benefit from either citing existing experimental studies that corroborate these findings or from suggesting future experimental directions.

      Comments on revised version:

      The authors have now adequately addressed to the key concerns of manuscript. The manuscript in the present form looks significantly improved.

    3. Reviewer #2 (Public review):

      Summary:

      The authors investigate how ELF3, a disordered scaffolding protein in the plant circadian Evening Complex, responds to temperature by forming reversible nuclear condensates. They focus on the C-terminal prion-like domain and on a variable polyglutamine tract within it, asking how the tract length and surrounding sequence context tune temperature-responsive structural and condensation behavior. Using a tiered set of computational approaches, including sequence heuristics, hierarchical chain-growth ensembles, all-atom enhanced-sampling simulations, and coarse-grained condensate simulations of 100 monomers, they characterize wild-type, polyQ deletion, polyQ expansion, and an aromatic-disrupting F527A variant. In the revised manuscript, the central claim has been reframed so that polyQ length is now described as tuning condensate material properties rather than driving temperature-sensitive phase separation, with temperature-responsive condensation attributed primarily to a sticker-rich aromatic contact network.

      Strengths:

      The biological question is important and timely, and the multiscale computational strategy provides a fresh view of an intrinsically disordered protein and its variants. The all-atom enhanced sampling analyses identify a temperature-dependent long-range aromatic contact involving F527 and a methionine-tyrosine coordination motif, which are concrete and mechanistically interesting observations beyond what coarse-grained or sequence-only methods could provide. In response to the previous round of review the authors have added replicate averaged statistics with error bars on the new condensate analyses, introduced new dynamics observables including effective diffusivity, an anomalous diffusion exponent, the self van Hove function, shape anisotropy, per chain radius of gyration in the condensed phase, and a condensate lifetime, provided cluster size time series for transparency, justified the choice of polyQ tract lengths against published Arabidopsis polymorphisms, expanded the Methods with explicit formulas for the new analyses, and included a split half convergence check for the all atom ensembles. The reframing toward a sticker spacer interpretation is consistent with recent experimental work and represents a more cautious and defensible reading of the data.

      Weaknesses:

      Despite these substantive additions, several core concerns from the previous review remain only partially addressed, and, on close reading, the new supplementary analyses do not robustly support the reframed claim that polyQ length tunes condensate material properties. Error bars and replicate-averaged statistics were added to the new condensate panels, but the helical propensity and per-residue analyses throughout the rest of the manuscript still show only a single curve per temperature, so variability for these key observables remains unreported. Several of the newly added dynamics observables show that the variants are essentially indistinguishable within the reported uncertainty: the self van Hove distributions, the shape anisotropy distributions, and the per chain radius of gyration distributions in the condensed phase overlap almost entirely across variants, and the anomalous diffusion exponent has between replica spreads at low temperature that exceed the variant to variant differences, with variant orderings that change with temperature. The variant-dependent signal that does survive, namely a drop in condensate lifetime for the polyQ expansion and the aromatic mutant at the highest temperature studied, rests on a single temperature point, with replicate spreads spanning most of the metric's dynamic range.

      The cluster size time series at higher temperatures shows the dominant cluster oscillating over a wide range across replicas, indicating intermittent dissolution and incomplete convergence in the very temperature regime where the variant-specific claims are made. The only convergence test provided is a split-half radius-of-gyration analysis for the all-atom ensembles, with no slab-geometry or coexistence-density check for the coarse-grained condensate simulations. The polyQ deletion variant forms dominant clusters comparable in size to wild type at low and intermediate temperatures, which on its own argues that variable polyQ presence is not a primary determinant of clustering and supports the earlier concern that the temperature sensitive behavior is dominated by generic chain length and aromatic sticker effects rather than polyQ specific sequence effects, a concern that the reframing softens but does not resolve. Statistical significance is not assessed anywhere, and with three replicas and largely overlapping error bars, claims of variant-specific differences would benefit from explicit statistical tests. Minor quality control issues are also visible in the supplementary material, including a mislabeling of the aromatic mutant in two analysis panels and an inconsistent trajectory length for one variant at one temperature.

      Additional Context for Readers:

      Readers should interpret the molecular mechanism proposed here with caution. The reframing from polyQ length driving temperature-sensitive phase separation to polyQ length tuning of condensate material properties is more scientifically measured and aligns with recent experimental work, but several of the supplementary observables introduced to support this revised claim indicate that the variants studied are statistically indistinguishable within the reported replicate uncertainty. The most robust observation in the revised work is that the prion-like domain undergoes a temperature-responsive break of an aromatic contact in all-atom simulations and that aromatic sticker contacts dominate inter-protein interactions in coarse-grained condensate simulations. The mechanistic role of the polyQ tract, beyond generic chain length and hydration effects, remains, as in the original submission, not clearly established by the simulations presented. Independent experimental validation of the proposed aromatic contact and of the predicted material-state differences between polyQ variants will be needed to establish the molecular mechanism, and improved condensate convergence tests, uniformly reported error bars across all simulation-derived figures, and explicit statistical tests of variant-versus-variant differences would substantially strengthen confidence in the conclusions.

    4. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      In this potentially valuable computational study, the authors conducted atomistic and coarsegrained simulations to probe the temperature-dependent phase behaviors of ELF3, a disordered component of the evening complex in plant. The results aim to highlight the role of polyQ tracts in modulating the temperature sensitivity. The level of evidence is considered incomplete, due to the lack of systematic calibration of the coarse-grained model and limited statistical uncertainty analysis, especially considering the relatively subtle nature of the differences due to temperature change.

      We agree that the subtle temperature dependence of ELF3-PrD condensation requires rigorous uncertainty reporting and careful interpretation of CG predictions. In the revised manuscript we therefore (i) report mean ± SEM across independent replicas for all CG observables and provide full time series in the Supplementary Information, and (ii) expand our CG analysis beyond cluster counting to include condensate stability (size), lifetime, internal mobility (D, α), dynamic heterogeneity (van Hove), and structural descriptors (anisotropy, singlechain compaction/density). These additions strengthen the robustness of the conclusions and even enable physical explanations of recent experimental measurements on ELF3-PrD condensates.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This manuscript explores the role of the Evening Complex (EC), specifically focusing on ELF3, a disordered protein component of the EC, and its temperature-dependent phase behavior. The study highlights the role of polyQ tracts in modulating temperature-sensitive condensate formation and provides a combination of computational approaches, including REST2 simulations and coarse-grained Martini simulations, to investigate how polyQ tract length and sequence context influence this behavior.

      Strengths:

      The study addresses a key question in plant biology - how temperature influences circadian clock-mediated growth regulation through protein phase behavior. The manuscript introduces the novel finding that polyQ tract length modulates the temperature-dependent formation of helices and condensates.

      Weaknesses:

      (1) Coarse-Grained Simulation Results Not Supported by Data:

      The results presented in Figure 6A of the manuscript do not seem to show a clear trend in the number of clusters formed as a function of polyQ tract length. This is particularly evident in the comparison between 0Q and 7Q polyQ lengths, which display statistically similar values in terms of the number of clusters. The lack of distinction between these values raises questions about the sensitivity of the coarse-grained simulations to polyQ tract length, which the authors claim as a key modulator of condensate formation. This discrepancy weakens the argument that polyQ length directly impacts the clustering behavior in the simulations.

      Suggested Analysis:

      A more detailed statistical analysis should be performed to assess whether the observed differences between polyQ lengths are significant. This could involve hypothesis testing or the use of error bars in the graphs to better communicate the variability in the data.

      Additionally, the authors should examine whether there are other features, such as cluster shape or internal structure, that might differentiate between different polyQ lengths, even if the total number of clusters is similar.

      We agree that the number of clusters in Fig. 6A does not show a strong or monotonic dependence on polyQ length (e.g., 0Q vs 7Q can overlap within uncertainty). The cluster number is highly sensitive to coarsening kinetics and rapidly approaches a late-time plateau, and therefore is not our primary discriminator of variant-dependent condensation behavior.

      To address the reviewer’s request for statistical rigor and additional differentiating features, we have revised the analysis in two ways. First, we now report mean ± SEM across independent replicas for all key CG observables and provide full replicate time series in the Supplementary Information to make variability and convergence/coarsening explicit.

      Second, we shift our main CG conclusions away from “cluster number” and toward more diagnostic observables of condensate robustness and material state, including: (i) stability via the late-time mean largest-cluster size, (ii) persistence/lifetime via the fraction of frames with largest cluster size greater than 50, (iii) internal dynamics via MSD-derived DDD and anomalous exponent ααα, (iv) dynamic heterogeneity via self van Hove distributions relative to a Gaussian reference, and (v) morphology/internal structure via κ<sup>2</sup> and Rg distributions.

      Notably, the κ<sup>2</sup>/Rg distributions are broadly overlapping at 300 K, indicating that in our system variant differences are expressed more strongly in stability/persistence and internal dynamics (D/α/van Hove) than in a large shift in single-chain compaction at this temperature.

      This revised framing also aligns our interpretation with the experimental picture put forward by Huntin et al -- polyQ length modestly affects onset-like behavior but more strongly tunes condensed-phase regimes and dynamics.

      Relevant revisions have been made in the Results and the Discussion sections.

      (2) Inconsistency in Cluster Size Across Temperatures (Figure 6B):

      The results in Figure 6B show a striking difference in the size of the largest cluster between temperatures of 290K and 300K. This abrupt shift in behavior lacks a clear mechanistic explanation. Typically, phase transitions driven by temperature are more gradual, unless there is some underlying structural or chemical shift that the authors have not accounted for. Without a clear explanation, this sudden change in behavior reduces confidence in the simulation Results.

      Suggested Analysis:

      The authors should explore possible explanations for the dramatic difference in cluster size between 290K and 300K. For example, they could investigate whether specific interactions (such as the breaking or formation of hydrogen bonds or hydrophobic contacts) might explain the behavior at higher temperatures.

      It is important to check whether the coarse-grained simulation model has been adequately parameterized and scaled for accurate temperature dependence. Atomistic simulations of monomers and dimers with varying polyQ tract lengths could be used to fine-tune the coarsegrained model, ensuring it accurately reflects molecular behavior. The gross estimate of a 10% scaling factor might be insufficient and could lead to inaccurate representations of cluster formation.

      We agree that the apparently sharp change in largest-cluster size between 290 K and 300 K requires clearer interpretation. In the revised manuscript, we clarify that this behavior does not imply an abrupt thermodynamic phase transition; rather, in a finite (~100-chain) simulation box, the largest cluster size is sensitive to both (i) proximity to a coexistence boundary and (ii) coarsening kinetics. Consistent with this, all systems rapidly coarsen early and then approach a late-time plateau, so the dominant cluster size can change steeply when conditions shift the balance between one system-spanning droplet versus multiple long-lived subclusters.

      To distinguish “true loss of condensation” from “differences in coarsening state,” we added replica-averaged stability and persistence metrics (mean ± SEM) and full time series. Importantly, the condensate lifetime (fraction of frames with largest aggregate-population > 50) is ~1 at both 290 K and 300 K, indicating that both temperatures correspond to a persistently condensed regime, not intermittent nucleation/dissolution. We therefore interpret the smaller dominant cluster at 290 K as reflecting slower coarsening / stronger kinetic arrest, where reduced chain mobility delays merger/annealing into a single large droplet within the simulated time window, leaving a larger satellite/dispersed population despite sustained condensation.

      We further support this interpretation with mechanistic and dynamical analyses added in the revision. As temperature increases from 290 K to 300 K, we observe increased internal mobility (higher effective diffusivity, D) that would accelerate rearrangements and coalescence. In parallel, contact/desolvation analyses show progressive loss of protein-water contacts and gain of protein-protein contacts as clusters mature, and a residue-resolved comparison indicates net contact increases at 300 K relative to 290 K concentrated in aromatic-rich “sticker” regions, consistent with a strengthened intermolecular contact network that promotes more complete annealing at 300 K.

      (We address the reviewer’s points regarding Martini temperature scaling/parameterization together with points (3)-(4) below.)

      (3) Scaling of Coarse-Grained Model with Atomistic Simulations:

      As mentioned, the coarse-grained model used in the study may not have been properly scaled against atomistic data. A simple scaling factor of 10% may not be appropriate for accurately capturing the behavior of polyQ tracts across different lengths, especially considering their sensitivity to subtle changes in temperature. Without rigorous validation against atomistic simulations, the coarse-grained model's predictions could be skewed.

      (4) To address this, the authors should compare the coarse-grained model with atomistic simulations of monomeric and dimeric forms of ELF3 with different polyQ tract lengths. By comparing key structural parameters (e.g., radius of gyration, contact maps, and clustering propensity), the authors could adjust the coarse-grained model to more accurately reflect the atomistic behavior. The authors have wealth of atomistic simulation data that could afford such benchmarking and identification of scaling factor

      Additionally, the authors should investigate whether the assumed scaling factor of 10% is appropriate for each polyQ length or whether it needs to be refined based on specific properties, such as the number of hydrophobic interactions or secondary structure stability.

      We agree that temperature-dependent CG predictions must be interpreted carefully and that the interaction balance should be justified. In the revision, we therefore clarify both our calibration choice and the scope of inference.

      We use Martini 3 with a single, literature-motivated adjustment: protein-water Lennard-Jones interactions are strengthened by 10 percent, following an established strategy shown to improve IDP/multidomain protein behavior in Martini 3. This scaling is applied uniformly to all residues, polyQ lengths, and temperatures to avoid introducing construct-specific parameters and to preserve a controlled comparison across variants.

      We emphasize that our CG simulations are used in a comparative manner (how stability/dynamics/structure change with temperature and polyQ length under a fixed model), and we do not claim a quantitatively exact phase boundary or transition temperature for ELF3. In this spirit, and consistent with how Martini 3 has been used in prior work to probe thermally varying properties across temperature windows (while acknowledging documented limits to temperature transferability), we treat the temperature sweep as a comparative probe rather than an absolute calibration (https://doi.org/10.1063/5.0221199, 10.1021/acscentsci.5c00755, https://doi.org/10.1038/s41592-021-01098-3). Accordingly, we report replica uncertainty (mean ± SEM) for all CG observables and restrict conclusions to qualitative trends that are robust to replicate variability.

      Finally, while we do not undertake a full ELF3-specific reparameterization, we include qualitative checks linking atomistic and CG behavior: the CG model reproduces the same qualitative features of single-chain reorganization inferred from atomistic simulations — notably the radiusof-gyration (Fig. S8) and the rearrangement/exposure of aromatic “sticker” regions that correlate with strengthened intermolecular contacts in the condensate. We emphasize that these comparisons are intended as qualitative sanity checks on trend direction, not as a quantitative validation or calibration of an absolute phase boundary.

      (5) Lack of Analysis for Liquid-Like Behavior in Phase Separation:

      The simulations presented in the manuscript do not analyze the liquid-like behavior of ELF3 condensates, which is a key characteristic of liquid-liquid phase separation (LLPS). In LLPS systems, condensates are often dynamic, with chains exchanging between clusters, indicating liquid-like rather than solid-like behavior. The authors fail to probe this crucial aspect, which is necessary to support the claim that ELF3 undergoes phase separation.

      Suggested Analysis:

      The authors should conduct additional analyses to probe the liquid-like nature of the clusters formed by ELF3. One approach would be to analyze the dynamics of chain exchange between clusters, measuring how frequently chains leave one cluster and join another over time. This analysis would reveal whether the condensates behave as liquid- like, dynamic structures or more static, solid-like aggregates.

      Additionally, the temperature dependence of these exchange dynamics should be investigated. In true liquid-liquid phase separation, the rate of chain exchange is often sensitive to temperature. Observing how this rate changes between 290K and 300K, for instance, could help explain the abrupt shift in cluster size seen in Figure 6B.

      The authors should also analyze whether the internal structures of the condensates are consistent with a liquid-like phase. For example, radial distribution functions and contact lifetimes could be calculated to reveal whether the clusters exhibit liquid-like organization.

      We thank the reviewer for highlighting that liquid-like behavior is a key diagnostic for LLPS. We agree that our original manuscript did not explicitly quantify condensate material properties. In the revision, we therefore add several complementary analyses and figures that directly probe whether the condensed state in our simulations is liquid-like versus dynamically arrested, and how this depends on temperature and polyQ length.

      (i) Condensate persistence vs temperature (stability and lifetime).

      We now quantify two replica-averaged metrics with uncertainty (mean ± SEM): (a) stability, defined as the mean largest-cluster size over a late-time analysis window, and (b) lifetime, defined as the fraction of frames in which the dominant cluster exceeds a fixed size threshold. These results are shown in the new figures “Stability (Mean cluster size)” and “Lifetime (P [size > 50])”. In our system, both 290 K and 300 K correspond to a persistently condensed regime (lifetime ≈ 1 across variants), whereas at 340 K the lifetime drops substantially (≈0.3-0.5 depending on variant), indicating intermittent condensation / partial dissolution at high temperature. This directly demonstrates temperature-dependent persistence of the condensed state and clarifies that the key qualitative change at high temperature is loss of stability and intermittency, rather than a purely static cluster-size difference.

      (ii) Internal mobility and viscoelasticity (D and α).

      To probe liquid-like dynamics within the condensed state, we compute internal Mean squared displacement (MSD) and extract an effective internal diffusivity D(T) and anomalous exponent α(T) (new figures FIG X). In our system, D increases systematically with temperature for all variants, confirming that internal rearrangements accelerate at higher temperature. At the same time, α remains strongly subdiffusive (α ≈ 0.3-0.5), indicating constrained, non-Fickian motion rather than simple liquid diffusion. Importantly, we also observe variant-dependent mobility: around 300-320 K, 0Q exhibits markedly lower D than 19Q, consistent with stronger kinetic arrest in 0Q even when both variants are condensed. Together, these dynamics metrics show that our condensates are not ideal liquids, but instead occupy a viscoelastic / dynamically slowed regime with clear temperature dependence.

      (ii) Dynamic heterogeneity (self van Hove).

      We additionally compute the self van Hove displacement distributions (Fig. SX). In our system, the van Hove distributions deviate from a Gaussian reference matched to the MSD, with an excess of near-zero displacements relative to a simple Gaussian model. This non-Gaussian displacement statistics is consistent with heterogeneous/caging-like dynamics inside the condensed phase, further supporting a viscoelastic (gel-like) rather than purely liquid material state at the timescales accessible to simulation.

      (iv) Internal structure and morphology (Rg and anisotropy).

      Finally, we add structural descriptors as requested. The new Rg distribution and shape anisotropy (κ<sup>2</sup>) violin plots quantify single-chain compaction and heterogeneity in morphology within the condensed phase. In our system these structural distributions are broadly overlapping at 300 K, indicating that differences among variants are more strongly expressed in dynamics (D/α/van Hove) and stability/lifetime, rather than in a large change in single-chain compaction at this temperature. We report these results transparently and include them in the SI as additional mechanistic context.

      We now explicitly frame our CG condensed phases as viscoelastic/dynamically slowed condensates rather than assuming fully liquid droplets. This interpretation is consistent with experimental observations on ELF3 PrLD that report very slow recovery/gel-like behavior under some conditions (i.e., condensates can age into low-mobility hydrogel states).

      (6) Lack of justification of polydispersity of polyQ:

      The authors don't provide any rationale for choice of different copies of polyQ used in the manuscript for their chain- growth simulation studies. It will be more apt if it can be motivated via some precedent experimental observations.

      We agree and have clarified our rationale in the revised manuscript. ELF3’s polyQ tract is a naturally polymorphic short tandem repeat in Arabidopsis, reported to vary from roughly ~7 to ~29 glutamines in natural populations, and this variation has been linked to ELF3-dependent phenotypes and temperature-responsive growth (Undurraga et al.; Jung et al.). Importantly, recent ELF3 PrLD thermosensing/condensation experiments explicitly compare multiple polyQ lengths (including Q0, short/WT-like constructs such as Q7, and expanded tracts around ~Q20) and show that polyQ length tunes temperature-responsive phase behavior and condensate properties (Jung et al.; Hutin et al.).

      Accordingly, for our chain-growth ensembles we chose a small, experimentally motivated set that brackets this range - 0Q (deletion), 7Q (WT-like short), and expanded lengths 13Q and 19Q (with 19Q closely matching the ~Q20 construct used experimentally), so that our simulations map onto established constructs and naturally occurring variation rather than arbitrary copy numbers.

      The manuscript draft has been modified in the Results and Methods sections.

      Jung J-H. et al. A prion-like domain in ELF3 functions as a thermosensor in Arabidopsis. Nature (2020).

      Undurraga S. et al. Background-dependent effects of polyglutamine variation in the Arabidopsis thaliana gene ELF3. PNAS (2012), DOI: 10.1073/pnas.1211021109.

      Hutin S. et al. Phase separation and molecular ordering of the prion-like domain of the Arabidopsis thermosensory protein EARLY FLOWERING 3. PNAS (2023).

      (7) Lack of initiative to connect to Experiments:

      While the computational models and simulations provide robust theoretical insights, the absence of direct experimental validation weakens the overall impact of the manuscript. For example, experimental data on how specific mutations in the polyQ tract influence ELF3 behavior in vivo would significantly bolster the authors' claims. The manuscript would benefit from either citing existing experimental studies that corroborate these findings or from suggesting future experimental directions.

      We agree that our original submission did not make the experimental connections explicit enough, and we have strengthened this in the revision by (i) explicitly anchoring our results to published ELF3 thermosensing/condensation measurements and (ii) articulating concrete, experimentally testable mechanistic predictions that follow from the simulations.

      (i) Explicit connection to published experimental benchmarks: We now cite and discuss key experimental studies that directly probe ELF3 temperature responsiveness and polyQ effects. Jung et al. demonstrated temperature-triggered ELF3 condensation/speckle formation in vivo and showed that polyQ length modulates thermoresponsive behavior. More recently, Hutin et al. compared ELF3 PrLD constructs spanning polyQ lengths (e.g., Q0, Q7, and ~Q20) and reported temperature-triggered phase separation, condition-dependent condensed-phase regimes (droplet-like versus more arrested/gel-/hydrogel-like), and reduced mobility/immobile fractions quantified by FRAP in some regimes. In the revised manuscript we explicitly map these observations onto our results: our coarse-grained simulations capture temperature-dependent condensation propensity, while our added condensate dynamics analyses (MSD-derived internal mobility DDD, anomalous exponent α\alphaα, and self van Hove displacement statistics) indicate dynamically slowed/heterogeneous condensates rather than assuming ideal liquid droplets—consistent with experimentally observed slow FRAP recovery and arrested behavior under some conditions.

      (ii) Mechanistic Connections: While existing experiments establish that ELF3 condensation is temperature-triggered and tuned by polyQ length, they cannot directly resolve the molecular interaction changes that drive these macroscopic readouts. We therefore emphasize that our atomistic and coarse-grained analyses provide a mechanistic interpretation: temperature shifts reorganize and expose “sticker”-rich regions (notably aromatics), strengthening intermolecular contact networks that tune condensate stability and material properties. This framing aligns our conclusions with the experimental picture that polyQ length has modest effects on onset-like behavior but more strongly tunes condensed-phase robustness and dynamics (persistence, internal mobility, and arrest) across temperature

      The modifications relevant to this are in the Discussion section.

      Reviewer #2 (Public review):

      Summary:

      The authors aimed to explore how a key protein in the circadian clock of plants, ELF3, responds to temperature changes by forming molecular condensates. They focused on understanding the role of a specific region of the protein, a polyQ tract, in promoting temperature-sensitive structural changes and regulating the formation of condensates. Through a series of computational simulations, they sought to uncover the molecular basis for ELF3's temperature responsiveness and its broader implications for plant growth and adaptation to environmental conditions.

      Strengths:

      The study's strength lies in its focus on an important biological question: how plants sense and respond to temperature changes at the molecular level. The authors employed a variety of computational techniques, including coarse-grained simulations, to explore the role of specific molecular features in this process. These methods provide a multi-scale view of protein behavior and offer valuable insights into how molecular structures may influence biological function.

      Weaknesses:

      However, there are notable weaknesses in the evidence provided. While the authors present trends in molecular changes, such as shifts in helical propensity and the formation of condensates, these results seem subtle and are not strongly substantiated by statistical analysis. The lack of error bars in the figures makes it difficult to distinguish between meaningful signals and potential noise in the data. Furthermore, the temperature-sensitive behavior appears to be influenced more by chain length than by sequence-specific effects of the polyQ region, raising questions about whether the findings truly capture the molecular mechanisms responsible for temperature sensing. Additionally, some simulations, particularly those related to the formation of condensates, do not appear fully converged, which casts further doubt on the robustness of the results.

      We appreciate the reviewer’s concerns regarding statistical support, sequence specificity, and convergence. In the revised manuscript we (i) report replicate-averaged means with uncertainty (mean ± SEM) for all key observables and add error bars/shaded bands to the relevant figures, (ii) provide the full time series plots in the Supplementary Information to make variability and equilibration transparent, and (iii) revise our interpretation to emphasize that polyQ length has only modest effects on onset-like metrics but more strongly tunes condensate stability and material state (lifetime, internal mobility (D), subdiffusion exponent (α), and non-Gaussian van Hove signatures). This revised framing is consistent with recent ELF3 PrLD experiments showing that polyQ variation can subtly affect onset while substantially modulating condensed-phase behavior and dynamics. Relevant changes to the main text have been made in the Results and Discussion section.

      Additional Context for Readers:

      Readers should interpret the results with caution, especially regarding the molecular mechanisms proposed for temperature sensing. While the study presents interesting trends, the evidence is not definitive, and the findings may be more reflective of general protein behavior (such as the effect of chain length on condensate formation) than specific sequence-driven responses to temperature. Further experimental studies and more converged simulations will be necessary to fully understand the role of ELF3 in temperature regulation.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      I already have listed my possible recommendations for authors for revising their manuscript in the review. By addressing these issues, the authors could significantly improve the robustness of their conclusions and provide stronger evidence for ELF3's role in temperature-responsive phase separation.

    1. eLife Assessment

      This is an important study describing 'SPEx', a broadly accessible method that combines cell expansion, laser microdissection, and mass spectrometry to enable subcellular proteomic profiling. The authors provide convincing evidence that this flexible integration of established techniques provides a robust and practical approach for compartment-resolved spatial proteomics. The authors support their main claims with appropriate validation across multiple subcellular compartments and show that the method can recover known markers while also identifying previously uncharacterized components. Overall, the work is likely to be of broad interest to cell and molecular biologists, particularly those seeking scalable and cost-effective strategies for mapping organelle composition.

    2. Reviewer #1 (Public review):

      Summary:

      The authors present a novel approach to subcellular spatial proteomics by combining laser microdissection with expansion microscopy and LC-MS/MS analysis (SPEx). They implement two different workflows for LMD and LC-MS/MS quantification:

      (1)The standard approach, where an area of interest is cut out by LMD, subjected to proteomics analysis, and compared to the rest of the cell without the dissected ROI.

      (2) The subtraction approach, where ROIs are removed, and the remaining cellular material is compared to samples containing both the surrounding material and the ROI.

      The authors assess the technique by applying it to subcellular targets of various sizes, volumes, and protein compositions such as the nucleus, nucleoli, and Golgi. They demonstrate that SPEx can identify proteins enriched or reduced in ROIs.

      Strengths:

      The broad, relatively easy, and inexpensive applicability of this approach to potentially many cell types and subcellular areas of interest provides an exciting alternative to subcellular fractionation, native immunoprecipitation, or genetically encoded proximity labeling constructs. Moreover, by visually selecting ROIs for subsequent analysis, subcellular context or organelle morphology can be taken into account, as discussed by the authors in the discussion section.

      Weaknesses:

      While strongly supporting the sharing of this approach, we have a number of comments and questions that will improve the impact of the manuscript:

      (1) General:

      a) The manuscript would benefit from restructuring and language revision. In its current form, the writing is sometimes dense and verbose (in particular, the Results section). This makes it difficult to follow the authors' arguments.

      b) The authors mention the possibility of selecting organelles based on morphology. This is left for the discussion, but it seems like a missed opportunity - the authors could compare individual organelles in different morphological states, e.g., connected vs. fragmented mitochondria.

      (2) Technical:

      a) Why do the authors strive and optimize for a 10x expansion factor? Is SPEx compatible with a more standard 4x expansion, as e.g., used in the classic U-ExM approach (https://www.nature.com/articles/s41592-018-0238-1)? This could be added to the discussion.

      b) The U-ExM approach shows improved ultrastructural preservation when using 3%FA with 0.1% glutaraldehyde fixation (GA). Is SPEx compatible with the use of low amounts of GA for fixation?

      c) Related to the above, was the anchoring efficiency reduced only to achieve a 10x expansion factor or does this additionally affect the proteome coverage?

      d) Have the authors considered using alternative anchoring approaches, such as GMA (https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0291506#pone.0291506.s001), which potentially increase the amount of sample retained in the hydrogel, thus allowing for better proteome coverage? This could be added to the discussion.

      e) The limitation of the approach to near-2D samples should be mentioned, and alternative approaches for more 3D samples could be discussed.

      f) How are peptides that are directly anchored to the hydrogel dealt with during LC-MS/MS analysis? Are they excluded, or can they be identified during the spectral search? The latter would allow us to get a deeper structural understanding of how proteins are actually anchored into hydrogels, which so far has not been assessed.

      An alternative approach to address this question would be to investigate if the peptide coverage of proteins detected by SPEx is enriched for peptides representing the folded core of proteins as opposed to the surface-exposed regions, which likely get more anchored into the hydrogel.

      g) Same question regarding peptides with NHS labeling. Can they be identified, or do they just compete for ionization and thus negatively affect coverage and dynamic range of the LC-MS/MS approach?

      h) How are the primary and secondary antibodies affecting the proteomics analysis identified as contaminants?

      i) Have the authors observed differences in proteomics coverage of only antibody vs NHS-labeling? Depending on the questions above, could pure antibody-based labeling increase proteomic coverage?

    3. Reviewer #2 (Public review):

      Summary:

      This study introduces a method that combines physical expansion of cells, imaging-guided isolation of defined regions, and protein identification to enable compartment-resolved analysis of protein composition at the subcellular scale. The authors aim to address a central limitation in existing approaches, namely the loss of spatial information during sample preparation or the indirect nature of proximity-based labeling methods. Using several cellular compartments as examples, they demonstrate that their approach can recover compartment-enriched protein sets and identify candidate proteins with previously unassigned localization.

      Strengths:

      A major strength of this work is the conceptual simplicity and accessibility of the approach. By combining established techniques in a modular way, the method avoids the need for genetic manipulation or specialized labeling strategies, making it broadly adaptable across experimental systems. The ability to directly select regions of interest based on imaging represents a clear advantage over indirect enrichment strategies and allows flexible targeting of both membrane-bound and non-membrane-bound compartments.

      The experimental design is also a strong aspect of the study. The use of complementary comparison strategies-analyzing isolated compartments alongside matched "subtracted" controls-provides an internal framework for assessing enrichment and depletion, increasing confidence in spatial assignment. The application of the method across multiple organelles of different sizes and properties demonstrates versatility, and the reported specificity for several compartments is encouraging. In particular, the ability to profile small and biochemically challenging structures highlights a potentially important niche for the approach.

      Weaknesses:

      Despite these strengths, several methodological limitations constrain the interpretation of the results. The most important relates to spatial accuracy in three dimensions. While lateral resolution is improved through physical expansion, the lack of depth resolution introduces uncertainty regarding contributions from structures above and below the selected region. Although the authors argue that this does not substantially affect specificity, the current evidence is largely indirect, and a more rigorous quantification of potential contamination would strengthen this conclusion.<br /> Quantitative interpretation also remains challenging. Because the measurements reflect total protein abundance rather than local concentration, differences in compartment size and protein density can influence enrichment values, particularly for small structures embedded within larger volumes. This issue is evident in the analysis of smaller compartments and complicates direct comparison across conditions. Additional normalization or modeling would help clarify how to interpret these measurements.

      Another limitation concerns variability in the expansion process and its downstream consequences. Differences in expansion factor across samples may affect the definition of regions of interest and introduce variability in sampling, yet the impact of this variability is not fully explored. Similarly, the use of a modified chemical treatment to preserve proteins for downstream analysis is central to the workflow but is not extensively validated with respect to preservation of spatial organization.

      While the identification of previously unannotated proteins is an appealing aspect of the study, validation is limited to a small number of examples, and broader support from independent datasets or literature context is lacking. In addition, the study primarily focuses on steady-state measurements in a single cell type, and therefore does not yet demonstrate the ability of the method to capture dynamic or condition-dependent changes in protein localization.

      Finally, the positioning of the method relative to existing approaches could be more clearly articulated. Although qualitative comparisons are provided, a more systematic and quantitative benchmarking against alternative strategies would help readers better understand the specific advantages and trade-offs.

    4. Reviewer #3 (Public review):

      Franziscus et al. describe an elegant approach for spatially specific proteome analysis. To achieve this, they expand fixed cells and subsequently use a laser to micro-dissect a region of interest, which is then analyzed by mass spectrometry.

      They demonstrate the effectiveness of their approach by analyzing the nucleus, nucleolus, and the Golgi, and benchmark their hits against previous datasets for these organelles.

      The manuscript is very well written and nicely guides the reader through the applied methods. The presented data is convincing, and I do not see the need for additional experimental verification of the protocol. The only minor concern is the novelty of the method and the presentation. A combination of expansion, laser microdissection, and proteomics has been applied in the past (PMID: 36450705, PMID: 39477916). In the manuscript, one of these studies is cited, though it does not become clear that this approach is already described. However, Franziscus et al. describe the approach better and make it more accessible to the reader, especially since the other studies described this methodology in combination with tissue expansion and not in combination with single cell expansion as it is done here. I would ask the authors to be clearer in the introduction about what others have already done and what their contribution is here. In general, I am convinced that the community will benefit from the presented protocol to analyze organelle proteomics in detail.

    5. Author Response:

      Reviewer #1 (Public review):

      Summary:

      The authors present a novel approach to subcellular spatial proteomics by combining laser microdissection with expansion microscopy and LC-MS/MS analysis (SPEx). They implement two different workflows for LMD and LC-MS/MS quantification:

      (1)The standard approach, where an area of interest is cut out by LMD, subjected to proteomics analysis, and compared to the rest of the cell without the dissected ROI.

      (2) The subtraction approach, where ROIs are removed, and the remaining cellular material is compared to samples containing both the surrounding material and the ROI.

      The authors assess the technique by applying it to subcellular targets of various sizes, volumes, and protein compositions such as the nucleus, nucleoli, and Golgi. They demonstrate that SPEx can identify proteins enriched or reduced in ROIs.

      Strengths:

      The broad, relatively easy, and inexpensive applicability of this approach to potentially many cell types and subcellular areas of interest provides an exciting alternative to subcellular fractionation, native immunoprecipitation, or genetically encoded proximity labeling constructs. Moreover, by visually selecting ROIs for subsequent analysis, subcellular context or organelle morphology can be taken into account, as discussed by the authors in the discussion section.

      Weaknesses:

      While strongly supporting the sharing of this approach, we have a number of comments and questions that will improve the impact of the manuscript:

      We thank the reviewer for the careful evaluation of our manuscript and the generally positive assessment. We plan on improving our manuscript based on the reviewers’ comments.

      (1) General:

      a) The manuscript would benefit from restructuring and language revision. In its current form, the writing is sometimes dense and verbose (in particular, the Results section). This makes it difficult to follow the authors' arguments.

      We will improve readability and clarity of the results section in the revised manuscript.

      b) The authors mention the possibility of selecting organelles based on morphology. This is left for the discussion, but it seems like a missed opportunity - the authors could compare individual organelles in different morphological states, e.g., connected vs. fragmented mitochondria.

      The authors agree with the reviewers’ assessment that investigating proteome of organelles based on morphology or cellular state is an exciting application of SPEx. While we plan experiments along this line in the future, we think that these experiments are beyond the scope of this manuscript, which is meant to describe the method and its general usefulness.

      (2) Technical:

      a) Why do the authors strive and optimize for a 10x expansion factor? Is SPEx compatible with a more standard 4x expansion, as e.g., used in the classic U-ExM approach (https://www.nature.com/articles/s41592-018-0238-1)? This could be added to the discussion.

      We aimed for 10x expansion solely because our ultimate goal is to cut out very small structures. Isolating structures as small as nucleoli would not be as reliable with a lower expansion factor (i.e. 4x) expansion. We did not assess the compatibility with U-ExM. We would assume that SPEx would also work with U-ExM as expansion method; omitting protease treatment, however. Still, we performed pilots with just 4x expansion (using TREx) in the early stages of optimization. We were able to isolate single cells and obtain similar protein coverage as with 10x expansion. We will further clarify our motivation to use 10x expansion in the discussion.

      We would also like to point out whether to U-ExM the standard method or not is rather subjective. Even though TREx was published three years later, it is also very widely used. The original expansion microscopy method was published three years prior to U-ExM.

      b) The U-ExM approach shows improved ultrastructural preservation when using 3%FA with 0.1% glutaraldehyde fixation (GA). Is SPEx compatible with the use of low amounts of GA for fixation?

      We tried different fixation methods in the early stages of this study (where expansion was not yet close to 10x). We saw a mild negative effect of GA on the expansion factor, so we avoided it in the later experiments since it also did not seem necessary to preserve the structure of our organelles of interest. However, the use of GA would generally be compatible with SPEx, potentially at the cost of a mild negative effect on expansion factor (see Author response image 1) and proteome coverage. We can add this information to the discussion.

      Author response image 1.

      Fixation methods mini-screen. Cells were fixed with the indicated reagents for 10 minutes at 37°C. After TREx expansion, the diameter of the nucleus was measured (A) and the resulting expansion factor compared to the non-expanded control was determined (B).

      Related to the above, was the anchoring efficiency reduced only to achieve a 10x expansion factor or does this additionally affect the proteome coverage?

      We solely lowered the anchoring in order to allow for higher expansion factors. In earlier pilots we performed proteomic analysis on samples that were just expanded 4x using standard TREx expansion (also using the original anchoring strategy from the TREx publication, consisting of 0.2 mg/ml AcX for overnight at RT). We presented the results of this pilot in Fig S1A. We still detected over 2,000 proteins from 10 cells, a coverage, which is highly similar to what we found in the final experiments (Figure 2F), in which the anchoring was lower yielding 10x expansion. Based on these data, we hypothesize that anchoring (and expansion factor!) has a negligible impact on protein coverage. We will clarify this in the manuscript.

      d) Have the authors considered using alternative anchoring approaches, such as GMA (https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0291506#pone.0291506.s001), which potentially increase the amount of sample retained in the hydrogel, thus allowing for better proteome coverage? This could be added to the discussion.

      We did not use alternative anchoring approaches. We modified the TREx protocol to fit our purposes and since this was sufficient, we did not explore alternatives. However, using anchoring approaches, in which higher amounts of sample could be retained in the gel might be beneficial for the proteomics coverage. We will keep this suggestion in mind for future experiments. Thank you for the suggestion!

      e) The limitation of the approach to near-2D samples should be mentioned, and alternative approaches for more 3D samples could be discussed.

      The authors agree that SPEx is limited to near-2D samples at this point. We suggest that SPEx is applicable for 3D samples (e.g. in tissues) by performing cryosectioning. TREx has been shown to be compatible with sectioned tissue (Damstra et al., 2022). We will elaborate this in the discussion.

      f) How are peptides that are directly anchored to the hydrogel dealt with during LC-MS/MS analysis? Are they excluded, or can they be identified during the spectral search? The latter would allow us to get a deeper structural understanding of how proteins are actually anchored into hydrogels, which so far has not been assessed.

      The reviewer raises an interesting point. In general, peptides carrying the anchoring modification are analysed by LC-MS, but we did not include these specific modifications in the database search. Overall, we assumed that the labeling would be low and stochastic and hence should, if at all, only minimally affect the detection of peptides. Nevertheless, in response to the reviewers’ comment, we searched the MS data again for the crosslinking reagent linked to lysine residues. However, we could not get any confident hit for any peptide containing this modification. Since we cannot exclude that the modification precludes the identification of the corresponding peptides, we compared the number peptides generated by trypsin cleavage after arginine and lysine. As the human genome contains similar proportions of both amino acids, one would expect similar numbers of both peptide types being identified. Any modifications of lysine by the anchoring reagent used, would prevent tryptic cleavage and thus reduce the number of lysine peptides. As shown in Author response image 2, the number of lysine terminating is only slightly lower compared to arginine terminating peptides. Notably, the proteomics results of a different fixed human tissue sample directly extracted by laser capture micro dissection without expansion showed a very similar lysine to arginine peptide ratio. This indicates that the large majority of lysine residues is not modified and affected by the hydrogel anchoring.

      Author response image 2.

      Number of peptides identified either terminating with lysine (K) or arginine (R) across all samples shown in Figure 5F.

      An alternative approach to address this question would be to investigate if the peptide coverage of proteins detected by SPEx is enriched for peptides representing the folded core of proteins as opposed to the surface-exposed regions, which likely get more anchored into the hydrogel.

      Because of the negligible amounts of modified peptides, we did not investigate this potential bias of surface-exposed versus folded-core peptides.

      g) Same question regarding peptides with NHS labeling. Can they be identified, or do they just compete for ionization and thus negatively affect coverage and dynamic range of the LC-MS/MS approach?

      The reviewer raises a similar point as above for another lysine labeling used during the SPEx protocol. Again, we specifically looked for this modification by re-searching the raw MS data, but still could not identify any peptides, carrying this modification on a lysine residue. Even though we cannot exclude that this rather large modification prevents detection, considering the high number of lysine terminating peptides in our dataset (see Figure 2), we would expect that also this labeling step is stochastic and affects only a minor proportion of the proteins.

      h) How are the primary and secondary antibodies affecting the proteomics analysis identified as contaminants?

      We thank the reviewer for this comment. Since antibodies bind to proteins in a non-covalent manner, they will be released during the denaturing steps of the protocol. Of course, the antibodies will stay in the sample, be digested and analyzed and could, if very abundant, affect the analysis of the proteins from the samples. To check this possibility, we re-searched the MS data including the sequences of the antibodies used. To our surprise, we could not detect any peptides of these antibodies. This suggests that the concentrations of the antibodies used are much lower than those of the sample proteins and thus should not have any impact on the proteomics results.  We interpret this result also as a benefit of our method compared to organellar-IP.

      i) Have the authors observed differences in proteomics coverage of only antibody vs NHS-labeling? Depending on the questions above, could pure antibody-based labeling increase proteomic coverage?

      We did not perform this comparative analysis, since we always used NHS dyes. In the experiments presented in this manuscript, NHS dyes allowed easy visualization of the whole cell without the use of antibodies. This NHS staining was essential for this particular setup for sample acquisition. We cut out entire cells, cells lacking the nucleus and cells lacking the Golgi apparatus, which served as critical controls. However, other ways of detecting cell boundaries could be used to avoid NHS staining. As shown above, both, the anchor and NHS labeling are likewise sparse and stochastic. Moreover, we could not detect any impact of the antibody labeling to our results. Thus, we assume that both labeling procedures could be used.

      Reviewer #2 (Public review):

      Summary:

      This study introduces a method that combines physical expansion of cells, imaging-guided isolation of defined regions, and protein identification to enable compartment-resolved analysis of protein composition at the subcellular scale. The authors aim to address a central limitation in existing approaches, namely the loss of spatial information during sample preparation or the indirect nature of proximity-based labeling methods. Using several cellular compartments as examples, they demonstrate that their approach can recover compartment-enriched protein sets and identify candidate proteins with previously unassigned localization.

      Strengths:

      A major strength of this work is the conceptual simplicity and accessibility of the approach. By combining established techniques in a modular way, the method avoids the need for genetic manipulation or specialized labeling strategies, making it broadly adaptable across experimental systems. The ability to directly select regions of interest based on imaging represents a clear advantage over indirect enrichment strategies and allows flexible targeting of both membrane-bound and non-membrane-bound compartments.

      The experimental design is also a strong aspect of the study. The use of complementary comparison strategies-analyzing isolated compartments alongside matched "subtracted" controls-provides an internal framework for assessing enrichment and depletion, increasing confidence in spatial assignment. The application of the method across multiple organelles of different sizes and properties demonstrates versatility, and the reported specificity for several compartments is encouraging. In particular, the ability to profile small and biochemically challenging structures highlights a potentially important niche for the approach.

      Weaknesses:

      Despite these strengths, several methodological limitations constrain the interpretation of the results. The most important relates to spatial accuracy in three dimensions. While lateral resolution is improved through physical expansion, the lack of depth resolution introduces uncertainty regarding contributions from structures above and below the selected region. Although the authors argue that this does not substantially affect specificity, the current evidence is largely indirect, and a more rigorous quantification of potential contamination would strengthen this conclusion.

      Quantitative interpretation also remains challenging. Because the measurements reflect total protein abundance rather than local concentration, differences in compartment size and protein density can influence enrichment values, particularly for small structures embedded within larger volumes. This issue is evident in the analysis of smaller compartments and complicates direct comparison across conditions. Additional normalization or modeling would help clarify how to interpret these measurements.

      Another limitation concerns variability in the expansion process and its downstream consequences. Differences in expansion factor across samples may affect the definition of regions of interest and introduce variability in sampling, yet the impact of this variability is not fully explored. Similarly, the use of a modified chemical treatment to preserve proteins for downstream analysis is central to the workflow but is not extensively validated with respect to preservation of spatial organization.

      While the identification of previously unannotated proteins is an appealing aspect of the study, validation is limited to a small number of examples, and broader support from independent datasets or literature context is lacking. In addition, the study primarily focuses on steady-state measurements in a single cell type, and therefore does not yet demonstrate the ability of the method to capture dynamic or condition-dependent changes in protein localization.

      Finally, the positioning of the method relative to existing approaches could be more clearly articulated. Although qualitative comparisons are provided, a more systematic and quantitative benchmarking against alternative strategies would help readers better understand the specific advantages and trade-offs.

      We thank the reviewer for the careful evaluation of the manuscript and for the constructive feedback. We think the reviewer raises valid points and will address them in the revised manuscript.

      Reviewer #3 (Public review):

      Franziscus et al. describe an elegant approach for spatially specific proteome analysis. To achieve this, they expand fixed cells and subsequently use a laser to micro-dissect a region of interest, which is then analyzed by mass spectrometry.

      They demonstrate the effectiveness of their approach by analyzing the nucleus, nucleolus, and the Golgi, and benchmark their hits against previous datasets for these organelles.

      The manuscript is very well written and nicely guides the reader through the applied methods. The presented data is convincing, and I do not see the need for additional experimental verification of the protocol. The only minor concern is the novelty of the method and the presentation. A combination of expansion, laser microdissection, and proteomics has been applied in the past (PMID: 36450705, PMID: 39477916). In the manuscript, one of these studies is cited, though it does not become clear that this approach is already described. However, Franziscus et al. describe the approach better and make it more accessible to the reader, especially since the other studies described this methodology in combination with tissue expansion and not in combination with single cell expansion as it is done here. I would ask the authors to be clearer in the introduction about what others have already done and what their contribution is here. In general, I am convinced that the community will benefit from the presented protocol to analyze organelle proteomics in detail.

      We thank the reviewer for the careful evaluation of our manuscript and overwhelmingly positive assessment. We apologize for the omission of the mentioned citations, and will adjust the introduction to make it clearer what has already been done and what the advance our method provides.

      References

      Damstra HG, Mohar B, Eddison M, Akhmanova A, Kapitein LC, Tillberg PW. 2022. Visualizing cellular and tissue ultrastructure using Ten-fold Robust Expansion Microscopy (TREx). eLife 11:e73775. DOI: https://doi.org/10.7554/eLife.73775

      Gambarotto D, Hamel V, Guichard P. 2021. Ultrastructure expansion microscopy (U-ExM). Methods in Cell Biology 161:57–81. DOI: https://doi.org/10.1016/bs.mcb.2020.05.006, PMID: 33478697

      Liffner B, Silva TLA e., Vega-Rodriguez J, Absalon S. 2024. Mosquito Tissue Ultrastructure-Expansion Microscopy (MoTissU-ExM) enables ultrastructural and anatomical analysis of malaria parasites and their mosquito. BMC Methods 1:13. DOI: https://doi.org/10.1186/s44330-024-00013-4

  2. May 2026
    1. eLife Assessment

      This valuable study investigates whether high-level physical reasoning is grounded in real-time bodily and vestibular signals using an innovative combination of virtual tool-use tasks and galvanic vestibular stimulation. The evidence is incomplete, as the main claims rely on limited and partially exploratory effects, including uncorrected multiple comparisons and cross-study comparisons that weaken the strength of the conclusions. The work, if it can be supported by clearer statistical support and more cautious interpretation, will be of interest to researchers in embodied cognition and physical reasoning.

    2. Reviewer #1 (Public review):

      Summary:

      This study investigates a fundamental question in cognitive science: is our ability to reason about the physical world an abstract mental process, or is it "embodied"-directly rooted in our real-time physical interactions with the environment? The authors compared participants' performance in computerized reasoning games with and without Galvanic Vestibular Stimulation (GVS). They suggest that participants failed more often and utilized suboptimal strategies under GVS compared to a sham stimulation condition. Furthermore, they found that this detrimental effect of GVS was reduced when the games were governed by altered gravity (hyper- and hypo-gravity). Consequently, the authors conclude that the physical experience of the body modifies high-level cognitive skills, such as reasoning.

      Strengths:

      The manuscript is well-written, organized, and easy to follow, making complex concepts accessible. Also, combining a specialized physical reasoning task with real-time vestibular disruption (GVS) is an intriguing approach to testing the boundaries of embodied cognition.

      Weaknesses:

      (1) Lack of Overall Effects and Inflated Type I Error for Game-Level Effects

      The study utilizes a within-subject design. Taking Study 1 as an example, each subject participated in a familiarization session (4 games), a baseline session (12 games without stimulation), a GVS session (14 games), and a sham session (14 games). No game was repeated for any single subject. Performance was quantified using three primary measures (success rate, number of attempts, and time per attempt) and two strategy measures (tool switching and the distance between tool placements).

      For Study 1, to identify condition differences at the game level (i.e., Figure 2), the authors effectively conducted 70 independent t-tests (5 measures × 14 games). While 7 significant results were reported, this large number of independent tests invites an inflated Type I error rate, as no multiple-comparison correction appears to have been applied.

      A similar inflation is expected in Study 2, where 50 independent t-tests (5 measures × 10 games) yielded 5 significant comparisons (Figure 4). Although the authors might argue the direction of the differences is systematic, implying GVS generally impairs performance, at least one significant comparison shows the opposite effect: tool switching indicates that GVS led to better performance for the 'Table_A' game in Study 2 (Figure 4d), whereas the same variable indicated GVS led to worse performance in Study 1 (Figure 2d). I suspect that none of the significant game-level results would survive a proper statistical correction. If possible, the authors can redo statistical testing with corrections (FDR or Bonferroni) or with LMM using game as a random effect. Before proper statistical analyses, I strongly encourage the authors to refrain from drawing broad conclusions based on these isolated game-level results.

      Furthermore, when analyzing data across all games, the study found no significant effect of GVS on overall performance or strategy measures in either Study 1 or Study 2. This lack of an aggregate effect contradicts the authors' conclusion that participants failed more often or utilized suboptimal strategies under GVS.

      (2) Missing Rationale for Classification Analysis

      It is puzzling why the authors pursued two exploratory analyses on tool placement after revealing that the two related primary measures (tool positioning and switching) did not generate significant condition differences in Study 1. These additional analyses-the Dirichlet Process Gaussian Mixture Model and leave-one-out classification-were not pre-registered. In the absence of overall condition differences, the authors appear to be "doubling down" by applying sophisticated classification tools to the raw data without a clear prior rationale.

      (3) Insufficient Evidence for the Reduced Effect of GVS Under Altered Gravity

      To compare Study 1 and Study 2, the authors devised a "gravity-weighted index," but its definition is not sufficiently justified. The index assigns weights of 1, 2, and 3 to low-, medium-, and high-gravity-dependent games, respectively. The choice of these specific weights appears arbitrary, making the quantitative results difficult to interpret. More importantly, there is no citation or explanation regarding how these three levels of "gravity impact" were defined in the first place (Line 468). This index was also not pre-registered.

      The authors state that for the success rate index, a value close to -1 indicates a large negative difference for GVS, 0 indicates no difference, and 1 indicates a large positive difference. These are theoretical bounds; the actual distribution of each index should be examined to validate such claims. However, the paper lacks descriptive statistics for this composite index.

      Notably, the "reduction" of the GVS effect in altered gravity was only demonstrated in one of the five available indices (success rate, p = 0.046). In fact, the success rate in Study 2 was 66.7(sham) vs 67.3 (GVS) in Table 2. It is highly debatable whether this marginal result justifies the conclusion that GVS effects "were reduced when the games included reasoning about altered gravity".

      (4) Questionable Assumptions Regarding Strategy

      The authors assume that "big changes in tool positioning and frequent tool switching indicate poor evaluation of the failed outcome". This assumption is questionable. In solving this cognitive task, participants must explore and exploit solutions based on feedback. Large shifts in positioning or frequent tool switching might reflect active, adaptive exploration based on failed outcomes rather than a failure to evaluate them.

      (5) Confounding Factors in GVS Interpretation

      The central theoretical question is whether physical reasoning is grounded in physical experience. GVS is used here to manipulate that experience. However, GVS does not selectively target the vestibular nerve; it also activates distributed fronto-parietal attention networks and hippocampal circuits essential for any reasoning task. Additionally, the vestibular system is linked to the limbic system and the cerebellum, which regulate emotional reactivity and arousal. Because attention and emotion are likely affected by GVS, the authors should be much more cautious in attributing their behavioral findings solely to changes in the "physical experience of the body."

    3. Reviewer #2 (Public review):

      Summary

      The paper investigates whether the real-time physical experience of the body shapes high-level physical reasoning. Participants played a set of computerized tool-use reasoning games (the Virtual Tools paradigm) in which they must use knowledge of physical laws - including gravity, collisions, and inertia - to guide a ball into a target area. In Study 1, participants played the games under terrestrial gravity while receiving either Galvanic Vestibular Stimulation (GVS), which introduces noise into the vestibular organ and disrupts gravitational signalling, or a Sham condition with matched skin sensation. In Study 2, a separate cohort played the same games redesigned under hypogravity (0.5 g - half Earth g) or hypergravity (2 g - double Earth g), again with concurrent GVS or Sham stimulation. Performance was assessed through success rate, number of attempts, and time per attempt; strategy was assessed through the spatial distance between successive tool placements and the frequency of tool switching across attempts. A post-hoc gravity-weighted index (GWI) was computed to compare the effect of vestibular perturbation across the two studies. The main finding is that GVS impairs performance in gravity-dependent games under terrestrial gravity, yet the same perturbation appears to be neutral or even beneficial when the game environment involves non-terrestrial gravity - a result the authors interpret as evidence for an adaptable, body-grounded internal model of physics.

      Strengths

      One of the most notable strengths of this work is its conceptual positioning at the intersection of embodied cognition and physical reasoning. Rather than treating the human body either as an abstract information-processing device or as a purely biomechanical system, the authors take seriously the idea that cognition is scaffolded by ongoing sensorimotor state - and they test this idea with a paradigm that is both tractable and theoretically motivated. The use of the Virtual Tools paradigm is well-suited to this goal: the games vary systematically in their reliance on gravitational predictions, allowing selective impairment (rather than general disruption) to serve as a signature of embodied physical reasoning.

      The dual-study design is another strength. Testing the same vestibular perturbation under terrestrial and altered game-gravity conditions, and observing a reversal in its effect depending on context, provides a form of internal control that is conceptually compelling. The additional clustering analyses (Dirichlet Process Gaussian Mixture Model and leave-one-out kernel density classification) strengthen the strategy results beyond raw distance measures, confirming that GVS systematically shifts participants' spatial exploration strategies.

      The paper is also clearly written and engages meaningfully with relevant theoretical frameworks - predictive coding, embodied cognition, and stochastic resonance - making it accessible and stimulating for a broad audience.

      Weaknesses

      (1) Absence of multiple-comparisons correction. A large number of game-level pairwise t-tests are conducted in both studies (upward of twenty per study) without correction for familywise error rate. The game-level effects that anchor the main narrative - in Study 1 alone: Remove, GoalMove, Spiky, Falling_A, Shafts_B, Gap, and Chaining - arise from an uncorrected pool of comparisons. The probability that some of these constitute false positives is non-trivial. The authors should apply a correction (e.g., Benjamini-Hochberg) or at a minimum discuss this limitation explicitly.

      (2) The facilitation claim rests on a post-hoc and arbitrarily parameterized index. The gravity-weighted index (GWI), which drives the central cross-study comparison, uses integer coefficients (1, 2, 3) to weight games by gravity dependency level. These coefficients are entirely arbitrary and bear no principled relationship to the actual gravitational magnitudes used in the study. Why not use the gravity dependency ratings themselves, or the empirically estimated gravity impact scores from the computational modelling mentioned in the Methods? The choice of weights should be either principled or tested across a range of values to demonstrate robustness. Furthermore, the notation in equation (1) as currently typeset reads as "Gravity minus Weighted Index" rather than "Gravity-Weighted Index"; this should be corrected.

      (3) The "facilitation" interpretation exceeds what the data in Study 2 directly support. Across all games in Study 2, GVS versus Sham differences in absolute performance are non-significant in all directions. The facilitation claim derives entirely from the GWI being higher in Study 2 than in Study 1 - a between-subjects comparison involving different participant groups and a non-pre-registered metric. The language of "facilitation" should be tempered accordingly, or the authors should provide additional analyses to support this framing.

      (4) Gravitational manipulation is visual only, and the vestibular system is only one component of the gravity-sensing network. Gravity perception results, as the authors very well know, from a distributed multisensory integration process that involves, in addition to the vestibular system, visual, proprioceptive, and visceral inputs. The present paradigm manipulates gravitational context solely through visual cues and targets the vestibular system through GVS - a point the authors acknowledge but do not discuss in sufficient depth. It is important to distinguish clearly between real gravitational alterations (as achieved in parabolic flight or centrifuge environments, where the entire body is physically subjected to a different gravitational vector) and virtually altered gravity, where only one sensory modality is targeted while others remain anchored to 1 g. The scope of the conclusions should reflect this distinction.

      (5) The choice of 0.5 g and 2 g may lack sensitivity. Combining the two altered-gravity conditions in Study 2, because no significant effect of hypo versus hypergravity was found, is statistically pragmatic but conceptually unsatisfying. There is evidence in the space physiology literature that gravitational processing is not linearly symmetric around 1 g: threshold effects exist below and above terrestrial gravity that may not be captured by modest deviations (half and double g) - see refs below. It is worth discussing whether the absence of a hypo/hyper distinction in Study 2 reflects a genuine equivalence or a lack of sensitivity, and whether more extreme conditions (e.g., near-zero g or 4-5 g) might reveal different processing regimes. Whether 0.5 g and 2 g were sufficient to saturate the system or merely insufficient to perturb it remains an open question with direct implications for the interpretation of the null GWI effects on strategy measures.

      Lee SMC, Ribeiro LC, Martin DS, Zwart SR, Feiveson AH, Laurie SS, Macias BR, Crucian BE, Krieger S, Weber D, Grune T, Platts SH, Smith SM, and Stenger MB. Arterial structure and function during and after long-duration spaceflight. J Appl Physiol (1985) 129: 108-123, 2020.

      de Winkel KN, Clément G, Groen EL, and Werkhoven PJ. The perception of verticality in lunar and Martian gravity conditions. Neurosci Lett 529: 7-11, 2012.

      Clément G, Moore ST, Raphan T, and Cohen B. Perception of tilt (somatogravic illusion) in response to sustained linear acceleration during spaceflight. Exp Brain Res 138: 410-418, 2001.

      Benson AJ, Kass JR, and Vogel H. European vestibular experiments on the Spacelab-1 mission: 4. Thresholds of perception of whole-body linear oscillation. Exp Brain Res 64: 264-271, 1986.

      (6) High-level reasoning is not defined with sufficient precision. The term "high-level reasoning" appears from the title onward and in the heading of the Study 1 results section (line 138), but it is never formally defined. The reader needs a clearer account of what distinguishes high-level physical reasoning from low-level sensorimotor prediction, and where the games used here fall along that continuum. What specific physical competencies - ballistic trajectories, free-fall predictions, collision dynamics, frictional forces, inertial effects - are required across the game set? When describing the subset of games that drive key effects, this information is critical for evaluating whether effects are specific to gravity reasoning or to some other physical concept.

      (7) Performance measures are disconnected from underlying kinematics. The performance measures (success rate, number of attempts, time per attempt) are coarse, high-level summaries. Time per attempt is used as a proxy for performance efficiency, yet participants received no instructions regarding speed, and different individuals may have adopted systematically different speed-accuracy trade-offs. It would be valuable to know whether time per attempt correlates with attempt number within a given game (which would indicate within-game learning) and whether mouse movement data - trajectory, velocity, hesitation - were recorded and could be analysed to provide more mechanistic insight into strategy formation.

    4. Reviewer #3 (Public review):

      Summary:

      This manuscript investigates a theoretically important question in cognitive science: whether higher-level physical reasoning is an abstract, modular process or is grounded in real-time body-environment interactions. To address this question, the authors combine galvanic vestibular stimulation (GVS) with the Virtual Tools task to test whether perturbing vestibular gravity signals affects performance in physical reasoning. The study is conceptually innovative and has the potential to bridge embodied sensory processing and higher-level cognition. However, in its current form, the evidence only partially supports the main claims, and several aspects of the analysis and interpretation limit the strength of the conclusions.

      Strengths:

      A major strength of the manuscript is the originality of the experimental paradigm. The combination of galvanic vestibular stimulation (GVS), which perturbs gravity-related vestibular signals, with computerized game-based tasks that require physical reasoning provides a novel way to test whether ongoing bodily experience influences higher-level cognition. Conceptually, the study is highly original and meaningfully bridges two domains that are often studied separately: sensorimotor processing and higher-level cognition.

      Weaknesses:

      The main weakness of the manuscript is that its central conclusion is not strongly supported by the data. The key finding depends on a marginally significant cross-study comparison, whereas direct GVS-versus-Sham differences in Study 2 are minimal across aggregate measures. In addition, many game-level analyses involve a large number of uncorrected multiple comparisons, raising the possibility that some of the reported effects may reflect chance findings. The manuscript's most important metric, the Gravity-Weighted Index, was not preregistered and is exploratory in nature, yet it is treated as a primary basis for confirmatory conclusions. The cross-study comparison is also difficult to interpret because the two studies differ in participant samples, number of games, and partially in the stimulus set. Finally, the mechanistic claims in the Discussion-particularly those invoking predictive coding, stochastic resonance, or updating of internal gravity models-go well beyond what can be directly inferred from the present behavioral data. Overall, the study provides intriguing but limited evidence that vestibular signals may influence some physical reasoning tasks under specific conditions, rather than strong evidence for a broad account of physical reasoning as grounded in online vestibular processing

    1. eLife Assessment

      In this solid work, Fukui et al. re-examined the ATP hydrolysis mechanism in GHKL ATPases, revealing a cooperative role for two conserved acidic residues rather than a single one. This useful study used a range of biochemical and structural techniques on various mutants from different members of the GHKL ATPase family to test and validate their proposed mechanism. An updated and extended mechanistic model of ATP hydrolysis by this class of enzymes is proposed.

    2. Reviewer #1 (Public review):

      Summary:

      In this manuscript, the authors study two residues in the GHKL ATPase active site of Aq MutL and GyrB, and argue that the catalytic base function is shared between two conserved acidic residues that are 3 residues apart.

      They generated mutant versions in MutL and GyrB (both ala and the appropriate Asn/Gln version) and performed ATPase analysis. They also generated high-resolution crystal structures of the GyrB NTD with AMPPnP for WT and mutants of the two acidic residues. The data show that mutation in either of these residues does not fully kill activity (with the exception of the Alanine mutation of the first of the two, which interferes with ATP (or AMPPnP) binding). When the acidic residues are mutated to Asn/Gln, the catalytic water can still be positioned, and hence these mutants are more active than the Ala mutants. In both cases, the double mutation is catalytically dead.

      The authors then perform phylogenetic analysis and ancestral gene reconstruction, and based on this, they argue that HSP90 forms a different class of GHKL ATPases, and lost rather than gained this separate status.

      Strengths:

      The biochemical analysis seems solid.

      Weaknesses:

      (1) A major question that remains is why the mutations have so much more detrimental effect in MutL (100-fold lower kcat/KM) than they do in GyrB (3-fold lower). Can the authors explain this? Doesn't this argue against the proposed catalytic conservation?

      (2) The structure figures all have omit maps for just the AMPPnP and the water, whereas the density for the acidic residues and their mutants is not shown.

    3. Reviewer #2 (Public review):

      Summary:

      In this manuscript, Fukui et al. re-examined the ATP hydrolysis mechanism in GHKL ATPases, revealing a cooperative role of two conserved acidic residues rather than one. The authors have used a range of biochemical and structural techniques on various mutants from different members of the GHKL ATPase family to test and validate their proposed mechanism.

      Through a detailed re-analysis of their previously published structure of the aqMutL NTD (ATPase domain) in complex with AMPPCP, they identified Glu29 and Glu32 as interacting with nucleophilic water for the catalysis. The authors carefully dissected the respective roles of these two acidic residues with a series of site-directed mutations. Mutations at Glu29 impaired ATPase activity without affecting protein secondary structure or ATP binding in the case of the E29Q mutant. Moreover, mutations at Glu32 did not affect secondary structure (except for E32G) but reduced ATPase activity. Activity was abolished when both residues (E29Q/E32Q) were mutated.

      The authors extended their study to another GHKL ATPase, aqGyrB. Their findings further supported the cooperative function of the corresponding acidic residues in aqGyrB (Glu48 and Asp51) during ATP hydrolysis. Mutation of these residues partially impaired ATP hydrolysis without affecting protein secondary structure. ATPase activity was completely lost in the double mutant E48Q/D51M. While the E48Q mutant retained the ability to bind ATP, the E48A mutant did not. High-resolution structures of the WT and E48A, E48Q, D51A, and D51N mutants of the aqGyrB NTD demonstrated that nucleophilic water positioning depended on these residues. E48 played a dominant role in water positioning and is critical for stabilising ATP lid formation and associated conformational changes, whereas D51 contributed cooperatively to catalysis.

      The authors investigated the functional impact of mutating the corresponding residues in the human MutL homologs PMS2 and MLH1. Clinical variants consistently exhibited reduced or abolished ATPase activity, providing a potential molecular basis for Lynch syndrome through impaired DNA mismatch repair.

      Lastly, through evolutionary analysis, the authors inferred that the second acidic residue was likely present in the common ancestor of MutL, GyrB, and MORC proteins, but was lost in the case of Hsp90.

      Strengths:

      (1) This study contains a detailed structural and biochemical analysis of a biologically important set of GHKL ATPases. The authors identify a second acidic residue that is conserved and contributes to catalysis in a large subset of GHKL ATPases. An updated and extended mechanistic model of ATP hydrolysis by this class of enzymes is proposed, which involves cooperative and partially overlapping roles for the catalytic residue pair. This revised mechanistic model is invaluable for the interpretation of clinical variants of GHKL ATPases such as PMS2 and MLH1.

      (2) The work described was performed to an excellent and rigorous technical standard. The structural and biochemical data are sound. The evidence supporting the claims is compelling.

      Weaknesses:

      (1) The identification in this study of a second acidic residue contributing to catalysis but not absolutely essential for catalysis is a useful finding. However, given that many structures of GHLK ATPases have been determined with different nucleotide analogs bound and that the essential role of the first acidic residue is well established, the importance and scope of the advances described here remain focused within the field of study of GHKL ATPases.

      (2) The authors assessed the consequences of variants in the human MutL homologs PMS2 and MLH1, but various other human GHKL ATPases contain clinically relevant variants, some of which have stronger disease associations than the mutations examined in this study. A broader analysis of the effect (or likely effect) of disease-linked mutations in GHKL ATPases would have strengthened this study.

      (3) In MLH1, the E37K mutation completely abolishes ATPase activity, but the corresponding mutations in aqMutL, aqGyrB, and PMS2 do not. It remains unclear why E37K in MLH1 leads to complete loss of activity, as the authors propose that water molecule positioning via the first acidic residue, as well as ATP lid stabilisation and associated conformational changes, should still be possible.

      (4) The authors do not examine ATP binding in the E32 mutants of aqMutL NTD and the D51 mutants of aqGyrB, or AMPPNP binding of the NLH1 and PMS2 mutants. Hence, the relative contributions of the acidic residues to ATP binding and hydrolysis remain partially unclear.

      (5) The ATPase assays for PMS2 and MLH1 (Figure 7 and Table 1) were performed with purification/solubility tags still present. Hence, it cannot be ruled out that these tags influence the measured activities.

      (6) The authors suggest that the two-acidic-residue mechanism proposed in this study could be shared among several GHKL ATPase families, yet they also state that the hydrogen-bonding network was not observed in MutL and MORC family proteins. This raises doubt about how conserved the mechanism is, e.g., in MutL and MORC proteins.

    1. eLife Assessment

      This valuable study highlights the key role of NK cells and PD-L1+ neutrophils in worsening sepsis responses in the context of MASH (metabolic dysfunction-associated steatohepatitis). It focused on the role of neutrophils in mediating this effect, which is based on a choline-deficient high-fat diet model of various knockouts or selective ablation of immune cell types. While the data presented are of great interest, there are concerns around the reliability of the strength of the evidence provided, which is currently considered incomplete. The study may be of interest to researchers in immunopathological disease mechanisms once confirmatory studies have been completed.

      [Editors' note: the authors no longer have access to the original flow cytometry data and plan to compile new datasets in the future.]

    2. Reviewer #1 (Public review):

      Summary:

      By using an established NAFLD model, choline-deficient high-fat diet, Barros et al show that LPS challenge causes excessive IFN-γ production by hepatic NK cells which further induces recruitment and polarization of a PD-L1 positive neutrophil subset leading to massive TNFα production and increased host mortality. Genetic inhibition of IFN-γ or pharmacological blockade of PD-L1 decreases recruitment of these neutrophils and TNFα release, consequently preventing liver damage and decreasing host death.

      Since NAFLD is often accompanied by chronic, low-grade inflammation, it can lead to an overactive but dysfunctional immune response and increase the body's overall susceptibility to infections, therefore this is very important research question.

      Strengths:

      The biggest strength of the manuscript is vast number of mouse strains used.

      Weaknesses:

      After the review, there are still some open questions from my side:

      (1) I would like the authors to defend their choice of diet type since this has not been done in the review/response to authors. In case they cannot, we need additional proof (HFD or WD model).

      (2) Since the authors used same control groups (chow and HFCD), as required by the animal ethics committee, they must have power analysis test to show that the number of controls (but also in other groups) they used is enough to see the effect. Please provide it.

    3. Reviewer #2 (Public review):

      Summary:

      This is an extremely interesting mouse study, trying to understand how sepsis is tolerated during obesity/NAFLD. The researchers combine a well-established model of NASH (Choline-deficiency with High Fat Diet) with a sepsis model (IP injection of 10mg/kg LPS), leading to dramatic mortality in mice. Using this model, they characterize the complex contributions of immune cells. Specifically, they find that NK-cells and Neutrophils contribute the most to mortality in this model due to IFNG and PD-L1+ Neutrophils.

      Strengths:

      The biggest strength of the manuscript is how clear the primary phenotypes/endpoints of their model are. Within 6 hours of LPS injection, there is a stark elevation of liver inflammation and damage, which is exacerbated by a High Fat/CholineDeficient diet (HFCD). And after 1 day, almost all of the mice die. Using these endpoints, the authors were able to identify which cells were critical for mortality in the model and the specific mediators involved.

      Comments on revisions:

      I have no further comments.

    4. Author response:

      The following is the authors’ response to the original reviews.

      We thank the editor and reviewers for their constructive questions, valuable feedback, and for approving our manuscript. We truly appreciate the opportunity to improve our work based on their insightful comments. Before addressing the editor’s and each referee’s remarks individually, we provide below a point-by-point response summarizing the revisions made.

      Duplication of control groups across experiments

      We appreciate the reviewers’ concern regarding the potential duplication of control groups. In the revised manuscript, we have explicitly clarified that independent groups of control mice were used for each experiment. These details are now clearly indicated in the Materials and Methods section to avoid any ambiguity and to reinforce the rigor of our experimental design (Page 15, Line 453-455): “Furthermore, knockout animals and those treated with pharmacological inhibitors or neutralizing antibodies shared the same control groups (chow and HFCD), as required by the animal ethics committee.”

      Validation of the MASLD model

      To strengthen the metabolic characterization of our MASLD model, we have now included additional parameters, including liver weight, Picrosirius staining and blood glucose measurements. These data are presented as new graphs in the revised manuscript and support the metabolic relevance of the HFCD diet model (Figure Suplementary S1). The corresponding description has been added to the Results section (Page 5, Lines 116-117) as follows: “Mice fed HFCD showed no increase in liver weight and collagen deposition as evidenced by Picrosirius staining (Fig. S1A and Fig. S1C)”

      Assessment of liver injury in RagKO and anti-NK1.1 mice

      We fully agree that assessment of liver injury is essential for these models. For mice treated with antiNK1.1, ALT levels are shown in Figure 4G, confirming increased liver injury after treatment. Regarding Rag⁻/⁻ mice, the animals exhibit exacerbation of liver injury when fed a HFCD diet and challenged with LPS (Page 7, Lines 183–184). The corresponding description has been added to the Results section (Page 7, Lines 175-176) as follows: “Interestingly, Rag1-deficient animals under the HFCD remained susceptible to the LPS challenge (Fig. 4C) with exacerbation of liver injury (Fig. 4D) ”

      Discussion of limitations

      We have expanded the Discussion section to provide a more comprehensive and balanced perspective on the limitations of our model and experimental approach (Page 13-14, Lines 401–414) “Our study presents several limitations that should be acknowledged and discussed. First, we cannot entirely rule out the possibility that our mice deficient in pro-inflammatory components exhibit reduced responsiveness to LPS. However, our ex vivo analyses using splenocytes from these animals revealed a preserved cytokine production following LPS stimulation. These results suggest that the in vivo differences observed are primarily driven by the MAFLD condition rather than by intrinsic defects in LPS sensitivity. Second, the absence of publicly available single-cell RNA-seq datasets from MAFLD subjects under endotoxemic or septic conditions limited our ability to perform direct translational comparisons. To overcome this, we analyzed existing MAFLD patients and experimental MAFLD datasets, which consistently demonstrated upregulation of IFN-y and TNF-α inflammatory pathways in MALFD. In line with these findings, our murine model revealed TNF-α⁺ myeloid and IFN-y⁺ NK cell populations, thereby reinforcing the validity and translational relevance of our results.”. This revision highlights the constraints of the MASLD model, the inherent variability among in vivo experiments, and the interpretative limitations related to immunodeficient mouse strains.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) In Figure 4 the authors are showing the number of IFN+ positive CD4, CD8, and NK 1.1+ cells. Could they show from total IFNg production, how much it goes specifically on NK cells and how much on other cell populations since NK1.1 is NK but also NKT and gamma delta T cell marker? Also, in Figure 2E the authors see a substantial increase in IFNg signal in T cells.

      While we did not specifically assess IFNγ production in NKT cells or other minor populations, our data indicate that the NK1.1+CD3+ cells (NKT cells) cited in Page 7, Lines  188-192 were essentially absent in the liver tissue of LPS-challenged animals, as shown in Supplementary Figures 3C and S10. The corresponding description has been added to the Results section (Page 7, Lines 188-192) as follows: “We observed that the number of NK cells increased in the liver tissue of PBS-treated MAFLD mice compared with mice fed a control diet (Fig. 4E). LPS challenge increased the accumulation of NK1.1+CD3− NK cells in the liver tissue of MAFLD mice and the absence of NK1.1+CD3+ NKT cells (Fig. S3C and 4E)”.

      This absence was consistent across all experimental conditions, corroborating our focus on NK1.1+CD3− cells as the primary source of NK1.1-associated IFNγ production. Furthermore, data demonstrated in Figure 2E illustrate the presence of IFNγ primarily in NK cells. Therefore, the observed IFNγ signal, attributed to NK1.1+ cells, predominantly reflects conventional NK cells, with minimal contribution from NKT or γδ T cells.

      (2) In Figure 4C, the authors state that the results suggest that T and B cells do not contribute to susceptibility to LPS challenge. However, they observe a drop in survival compared to chow+LPS. Are the authors certain there is no statistical significance there?

      The observed decrease in survival is consistent with our expectations, as T and B cells are not the primary source of interferon-gamma (IFNγ) in this context. Even in their absence, animals remain susceptible to LPS challenge due to the presence of other IFNγ-producing cells that drive the observed lethality. We have carefully re-examined the statistical analysis and confirm that it was correctly performed.  

      (3) Since the survival curve and rate are exactly the same (60%) in Figures 3F, 3G, 4C, 4F, 5G, and 5H I would just like to double-check that the authors used different controls for each experiment.

      The number of mice used in each experiment was carefully determined to ensure sufficient statistical power while fully complying with the limits established by our institutional Animal Ethics Committee. To minimize animal use, the same control group was shared across multiple survival experiments. Despite using shared controls, the total number of animals per experimental group was adequate to produce robust and reproducible survival outcomes. All groups were properly randomized, and the shared control data were rigorously incorporated into statistical analyses. This strategy allowed us to maintain both ethical standards and the scientific rigor of our findings.

      (4) In Figure 5 the authors are saying that it is neutrophils but not monocytes mediate susceptibility of animals with NAFLD to endotoxemia. However, CXCR2i depletion and CCR2 knock out mice affect both monocytes/macrophages and neutrophils. And in Figures 5E, 5G, and 5H they see that a) LPS+CXCR2i decreases liver damage more than LPS+anti Ly6G, b) HFCD mice challenged with LPS and treated with anti-LY6G do not rescue survival to levels of CHOW LPS and c) anti Ly6G treatment helps less than CXCR2i. Therefore, from both knock out mice and depletion experiments the authors can conclude that most likely monocytes (but potentially also other cells) together with neutrophils are substantial for the development of endotoxemic shock in choline-deficient high-fat diet model.

      While neutrophils express CCR2, our data clearly show that CCR2 deficiency does not impair neutrophil migration, as demonstrated in Supplemental Figures 5A and 5B (added to the manuscript, page 8, lines 213–217). The corresponding description has been added to the Results section (Page 8, Lines 213217) as follows: ``Interestingly, animals deficient in monocyte migration (CCR2-/-) showed a high mortality rate compared to wild type after LPS challenge and neutrophil migration is not altered (Fig. 5SA and Fig. 5SB)``, In contrast, CCR2 deficiency primarily affects monocyte recruitment, yet in our experimental conditions, monocyte depletion or CCR2 knockout did not significantly alter the severity of endotoxemic shock, indicating that monocytes play a minimal role in mediating susceptibility in HFCD-fed mice.

      To specifically investigate neutrophils, we used pharmacological blockade of CXCR2 to inhibit migration and antibody-mediated neutrophil depletion. Both approaches have consistently demonstrated that neutrophils are critical factors in endotoxemic shock.

      These findings support our conclusion that neutrophils are the primary cellular contributors to susceptibility in HFCD-fed mice during endotoxemia, with monocytes making a negligible contribution under the tested conditions.

      (6) In Figure 6A (but also others with PD-L1) did the authors do isotype control? And can they show how much of PD1+ population goes on neutrophils, and how much on all the other populations?

      To address this issue, we performed additional analyses to assess the distribution of PD-L1 expression on CD45+CD11B+ leukocytes. These new results, detailed on Page 9, lines 245-250, and now presented in Supplemental Figure 6, demonstrate that PD-L1 expression is predominantly enriched in neutrophils compared to other immune subsets. This observation further reinforces our conclusion that neutrophils represent a major source of PD-L1 in our experimental model.

      To ensure the robustness of these findings, we also included FMO controls for PD-L1 staining in the newly added Supplemental Figure S6. These controls validate the specificity of our gating strategy and confirm the reliability of the detected PD-L1 signal. The corresponding description has been added to the Results section (Page 9, Lines 245-250) as follows: ``First, we observed that only the MAFLD diet caused a significant increase in PD-L1 expression in CD45+CD11b+ leukocytes after LPS challenge (Fig. S6C). We observed that within this population, neutrophils predominate in their expression when compared to monocytes (Fig. 6SA, Fig. 6SB, and Fig. 6SD). Furthermore, PD-L+1 neutrophils showed an exacerbated migration of PD-L1+ neutrophils towards the liver (Fig. 6A and 6B)”

      (7) In Figure 6D it is interesting that there is not an increase in PD-L1+ neutrophils in LPS HFCD IFNg+/+ mice in comparison to LPS chow IFNg+/+ mice, since those should be like WT mice (Figure 6A going from 50% to 97%) and so an increase should be seen?

      The apparent difference between Figures 6A and 6D likely reflects inter-experimental variability rather than a biological discrepancy. Although the absolute percentages of PD-L1⁺ neutrophils varied slightly among independent experiments, the overall phenotype and trend were consistently maintained namely, that PD-L1 expression on neutrophils is enhanced in response to LPS stimulation and modulated by IFNγ signaling. Thus, the data shown in Figure 6D are representative of this consistent phenotype despite minor quantitative variation.

      (8) In Figure 7 do the authors have isotype control for TNFa because gating seems a bit random so an isotype control graph would help a lot as supplementary information, in order to make the figure more persuasive

      To address the concern regarding gating in Figure 7, we have included the FMO showing TNFα as a histogram Supplementary Figure 8gG. These control reaffirm the accuracy and reliability of our gating strategy for TNFα, further supporting the robustness of our data. The corresponding description has been added to the Results section (Page 9, Lines 272-274) as follows:`` We observed an exacerbated TNF-α expression by PD-L1+ neutrophils from MAFLD when compared to control chow animals (Fig. 7A, Fig. 7B, Fig. 7D, and Fig8SG).

      (9) Figure 6C IFNg+/+ mice on CHOW +LPS is same as Figure 8E mice chow +LPS but just with different numbers. Can the authors explain this?

      Although the data points in Figures 6C and 8E may appear similar, we confirm that they originate from entirely independent experiments and represent distinct datasets. To enhance clarity and avoid any potential confusion, we have adjusted the figure presentation and sizing in the revised manuscript. These changes make it clear that the datasets, while comparable, are derived from separate experimental replicates.

      (10) Figure 1E chow B6+LPS is the same as Figure 5D B6+LPS but should they be different since those should be two different experiments?

      We confirm that Figures 1E and 5D correspond to data obtained from independent experiments. Although the experimental conditions were similar, each dataset was generated and analyzed separately to ensure the reproducibility and robustness of our results.

      Reviewer #2 (Recommendations for the authors):

      (1) Why did you look at kidney injury in Figure 1D? I think this should be explained a little.

      We assessed kidney injury alongside ALT, a marker of liver damage, because both the liver and kidneys are among the primary organs affected during sepsis and endotoxemia. This rationale has been added to the manuscript (page 5, lines 129–131): “Remarkably, compared to the Chow group, HFCD mice exposed to LPS did not show greater changes in other organs commonly affected by endotoxemia, such as the kidneys (Figure 1D).” By evaluating markers of injury in both organs, we aimed to determine whether our physiopathological condition was liver-specific or indicative of broader systemic injury.

      (2) I know Figure 2C isn't your data, but why are there so few NK cells, considering NK cells are a resident liver cell type? Doesn't that also bring into question some of your data if there are so few NK cells? And the IFNG expression (2E) looks to mostly come from T-cells (CD8?).

      The data shown in Figure 2C were reanalyzed from a separate NAFLD model based on a 60% high-fat diet. Although this model differs from ours, the observed low number of NK cells is consistent with expectations for animals subjected solely to a hyperlipidic diet, which primarily provides an inflammatory stimulus that promotes recruitment rather than maintaining high baseline NK cell numbers.

      In our experimental model, these observations align with published data. Specifically, liver tissue from NAFLD animals typically exhibits low baseline NK cell numbers, but upon LPS challenge, there is a marked increase in NK cell recruitment to the liver. This dynamic illustrates the interplay between dietinduced inflammation and immune cell recruitment in our experimental context and supports the interpretation of our IFNγ data.

      (3) In your methods, I think you didn't explain something. You said LPS was administered to 56 week old mice, but that HFCD diet was started in 5-6 week old mice and lasted 2 weeks, then LPS was administered. So LPS administration happened when the mice were 7-8 weeks old, right?

      We thank the reviewer for pointing out this inconsistency in our Methods section. The reviewer is correct: the HFCD diet was initiated in 5–6-week-old mice, and LPS was administered after 2 weeks on the diet, such that LPS challenge occurred when the mice were 7–8 weeks old.

      We have revised the Methods section (add page 15-16, lines 474–480).  to clarify this timeline and ensure it is accurately described in the manuscript. The corresponding description has been added to the Materials and Methods section (Page 14, Lines 436-442) as follows: “Lipopolysaccharide (LPS; Escherichia coli (O111:B4), L2630, Sigma-Aldrich, St. Louis, MO, USA) was administered intraperitoneally (i.p.; 10 mg/kg) in C57BL/6, CCR2 -/-, IFN-/-, and TNFR1R2 -/- mice. The HFCD was initiated in 5–6 week-old mice, and LPS was administered after 2 weeks on the diet, meaning that LPS administration occurred when the mice were 7–8 weeks old, with body weights ranging from 22 to 26 g. LPS was previously solubilized in sterile saline and frozen at -70°C. The animals were euthanized 6 hours after LPS administration”.

      (4) Throughout the manuscript, I would consider changing the term NAFLD to something else. I think HFCD diet is a closer model to NASH, so there needs to be some discussion on that. And the field is changing these terms, so NAFLD is now MASLD and NASH is now MASH.

      We appreciate the reviewer’s comment regarding the terminology and disease classification. In our experimental conditions, the animals were subjected to a high-fat, choline-deficient (HFCD) diet for only two weeks, a period considered very early in the progression of diet-induced liver disease. At this stage, histological analysis revealed lipid accumulation in hepatocytes without evidence of hepatocellular injury, inflammation, or fibrosis. Therefore, our model more closely resembles the metabolic-associated fatty liver disease (MAFLD, formerly NAFLD) stage rather than the more advanced metabolic-associated steatohepatitis (MASH, formerly NASH).

      Indeed, prolonged exposure to HFCD diets, typically 8 to 16 weeks, is required to induce the inflammatory and fibrotic features characteristic of MASH. Since our objective was to study the initial metabolic and immune alterations preceding overt liver injury, we believe that using the term MAFLD more accurately reflects the pathological stage represented in our model. Accordingly, we have revised the text to align with the updated nomenclature and disease context.

      (6) I am concerned about over interpretation of the publicly available RNA-seq data in Figure 2. This data comes from human NAFLD patients with unknown endotoxemia and mouse models using a traditional high-fat diet model. So it is hard to compare these very disparate datasets to yours. Also, if these datasets have elevated IFNG, why does your model require LPS injection?

      We thank the reviewer for their thoughtful comments regarding the interpretation of the RNA-seq data presented in Figure 2. We would like to clarify that the human NAFLD datasets referenced in our study do not specifically include patients with endotoxemia; rather, they focus on individuals with NAFLD alone.

      Comparing data from human and murine MAFLD models, we observed that NK cells, T cells, and neutrophils are present and contribute to the hepatic inflammatory environment. Our reanalysis indicates that the elevations of IFNγ and TNF in NAFLD are primarily derived from NK cells, T cells, and myeloid cells, respectively.

      In our experimental model, LPS administration was used to evaluate whether these immune populations particularly NK cells are further potentiated under a hyperinflammatory state, leading to exacerbated IFNγ production. This approach allows us to determine whether increased IFNγ contributes to worsening outcomes in NAFLD, providing mechanistic insights that cannot be obtained from static human or traditional mouse datasets alone.

      (7) The zoom-ins for the histology (for example, Figure 1E) don't look right compared to the dotted square. The shape and area expanded don't match. And the cells in the zoom-in don't look exactly the same either.

      We have thoroughly re-examined the histological sections and the corresponding zoom-ins, including the example in Figure 1E. Upon verification, we confirm that the zoom-ins accurately represent the highlighted areas indicated by the dotted squares. The apparent discrepancies in shape or cellular appearance are likely due to minor differences in orientation or cropping during figure preparation. Nevertheless, the content and regions depicted are consistent with the original sections.  

      (8) Did the authors measure myeloid infiltration in the CCR2-/- mice? Did you measure Neutrophil infiltration in the TNF-Receptor KO mice?

      Analysis of CD45+ cell migration in CCR2 knockout mice, as shown in Supplemental Figure 5C and 5D, demonstrates that the absence of CCR2 does not impair overall leukocyte migration. Similarly, assessment of neutrophil migration in TNF receptor (TNFR1/2) knockout mice, presented in Supplemental Figure 8A, shows that neutrophil trafficking is not affected in these animals. These results indicate that the respective knockouts do not compromise the migration of the analyzed immune populations, supporting the interpretations presented in our study.

      (9) Regarding Methods for RNA-seq Analysis. Was the Mitochondrial percentage cutoff 0.8%, because that seems low. And was there not a Padj or FDR cutoff for the differential expression?

      The mitochondrial percentage in our scRNA-seq analysis reflects the proportion of mitochondrial gene expression per cell, which serves as a quality control metric. A low mitochondrial gene expression percentage, such as the 0.8% cutoff used here, is indicative of highly viable cells.

      For differential gene expression analysis, we employed the FindMarkers function in Seurat with standard parameters: adjusted p-value (Padj) < 0.05 and log2 fold change > 0.25 for upregulated genes, and adjusted p-value < 0.05 with log2 fold change < -0.25 for downregulated genes. These thresholds ensure robust identification of differentially expressed genes while balancing sensitivity and specificity.

      (10) Regarding Methods for Flow Cytometry. How were IFNG and TNF staining performed? Was this an intracellular stain? Did you need to block secretion? TNF and IFNG antibodies have the same fluorophore (PE), so were these stainings and analyses performed separately?

      Six hours after LPS challenge, non-parenchymal liver cells were isolated using Percoll gradient centrifugation. Because the animals were in a hyperinflammatory state induced by LPS, no in vitro stimulation was performed; all staining was carried out immediately after cell isolation. Detection of IFNγ and TNF was performed via intracellular staining using the Foxp3 staining kit (eBioscience). Due to both antibodies being conjugated to PE, IFN-γ and TNF-α staining and analyses were conducted in separate experiments. These distinct staining protocols and analyses are detailed in Supplemental Figures 10 and 11. The corresponding description has been added to the Materials and Methods section (Page 16, Lines 490-493) as follows: ``As animals were already in a hyperinflammatory state, no additional in vitro stimulation was required. Intracellular detection of IFN-γ and TNF-α was conducted using the Foxp3 staining kit (eBioscience). Since both antibodies were conjugated to PE, staining and analyses were performed in separate experiments``

      Reviewer #3 (Recommendations for the authors):

      (1) Achieving an NAFLD model/disease is the starting point of this study. I understand that a two-week HFCD diet period was applied due to the decrease in lymphocyte numbers. Was it enough to initiate NAFLD then? Or is it a milder metabolic disease? Which parameters have been evaluated to accept this model as a NAFLD model?

      Indeed, the two-week HFCD diet induces an early-stage form of NAFLD, characterized by initial fat accumulation in the liver without significant hepatic injury. While this represents a milder metabolic phenotype, it is sufficient to study the inflammatory and immune responses associated with NAFLD. To validate this model, we assessed multiple parameters: liver weight, blood glucose levels, and collagen deposition. These measurements confirmed the presence of early-stage NAFLD features in the animals, providing a relevant and reliable context for investigating susceptibility to endotoxemia and immune cell dynamics. They are shown in Figure Suplementary 1 and the text was included in the manuscript (Page 5, Lines 116-117): “Mice fed HFCD showed no increase in liver weight and collagen deposition as evidenced by Picrosirius staining (Fig. S1A and Fig. S1C) ”.

      (2) It is true that the CD274 gene (encoding PD-L1) and the IFNGR2 gene, corresponding to the IFNγ receptor, are among the upregulated genes when authors analyzed the publicly available RNAseq data but they are not the most significantly elevated genes. What is the reasoning behind this cherrypicking? Why are other high DEGs not analyzed but these two are analyzed?

      We highlighted the expression of the IFN-γ receptor (IFNGR2) and CD274 (encoding PD-L1) in the publicly available RNA-seq data to align and corroborate these findings with the key results observed later in our study. To avoid redundancy, we chose to present these genes in the initial figures as they are directly relevant to the subsequent analyses. Regarding the broader analysis of human RNA-seq data, our primary objective was to identify enriched biological processes and pathways, which served as a foundation for the focus and direction of this study.

      (3) Figures 3C-3G: I understand that IFNg-/- and NFR1R2a-/- mice are not showing elevated liver damage but it may simply be because of the non-responsiveness to the LPS challenge. I suggest using a different challenge or recovery experiments with the cytokines to show that the challenge is successful and results are caused by NAFLD, truly. The same goes for Figure 6: Looking at Figure 6D one may think that IFNg deficiency alters the LPS response independent of the diet condition (or NAFLD condition).

      We appreciate the reviewer’s insightful comment and fully understand the concern regarding the potential non-responsiveness of IFN-γ⁻/⁻ and TNFR1R2a⁻/⁻ mice to the LPS challenge. To address this point and confirm that these knockout animals are indeed responsive to LPS stimulation, we conducted an additional set of ex vivo experiments.

      Specifically, WT and cytokine-deficient (IFN-γ⁻/⁻) mice were fed either Chow or HFCD for two weeks, after which spleens were collected, and splenocytes were challenged in vitro with LPS. We then quantified TNF, IFN, and IL-6 production to confirm that these mice are capable of mounting cytokine responses upon LPS stimulation.

      Due to current breeding limitations and a temporary issue in colony maintenance of TNF-deficient mice, we were unable to include TNFR1R2a⁻/⁻ animals in this additional experiment. Nevertheless, we prioritized performing the analysis with the available knockout line to avoid leaving this important point unaddressed.

      These additional data demonstrate that IFN-γ-deficient mice remain responsive to LPS, reinforcing that the differences observed in vivo are related to the NAFLD condition rather than a lack of LPS responsiveness.

      (4) Figure 1 vs Figure 4: Rag-/- mice seem more susceptible to LPS-derived death even after normal conditions. But If I compare the survival data between Figure 1 and Figure 4, Rag-/- HFCD diet mice seem to be doing better than wt mice after LPS treatment. (1 day survival vs 2 days survival). How do you explain these different outcomes?

      We thank the reviewer for this insightful question regarding the survival data in Figures 1 and 4. Although there is a one-day difference in survival outcomes, Rag-/- mice consistently exhibit increased susceptibility to LPS-induced mortality can influence the exact survival timing. Nonetheless, across all experiments, Rag-/- mice display a reproducible phenotype of heightened sensitivity to LPS challenge, which is supported by multiple independent observations in our study.

      (5) How do you explain Figure 4J in connection to the observation presented with Figure 7: TNFa tissue levels, even though significant, seem very similar between the conditions?

      We would like to clarify that the animals in this study are in a metabolic syndrome state, with early-stage NAFLD characterized by hepatic fat accumulation without significant tissue injury, as shown in Figure 1C.

      Under these conditions, the LPS challenge triggers an exacerbated inflammatory response, leading to increased secretion of IFN-γ and TNF-α, primarily from NK cells and neutrophils. While TNFα levels may appear visually similar across conditions, the HFCD mice exhibit a heightened predisposition for an amplified immune response compared to chow-fed mice. This difference is consistent with the functional outcomes observed in our study and highlights the diet-specific sensitization of the immune system.

    1. eLife Assessment

      By using a combination of patch clamp recordings, calcium imaging and computer modeling, the authors analyze the spatial distribution of voltage gated calcium channels at glutamatergic synapses formed between layer 5 pyramidal neurons (L5PNs) and between layer 2/3 and L5PNs in the prefrontal cortex (PFC) and primary somatosensory cortex (S1); they conclude that the calcium channel-vesicle coupling is looser in the PFC compared to S1, although additional experiments are needed to determine how the distinct functional characteristics of these synapses in different brain regions might affect data interpretation. Overall, these findings are important because they have implications for shaping synaptic plasticity and neural circuit function across brain regions. They are solid because they are based on the use of a multi-pronged approach, although the presentation would benefit from stronger integration of the current findings with the existing literature and a more explicit discussion of potential limitations and confounding factors for data interpretation.

    2. Reviewer #1 (Public review):

      Summary:

      This study asks whether synapses formed by the same broad neuronal class (excitatory pyramidal neurons, PN) adapt their presynaptic organization in a cortex-specific manner, comparing the prefrontal cortex (PFC) with the primary somatosensory cortex (S1). The authors combine sophisticated electrophysiology (paired recordings and extracellular minimal stimulation), pharmacological perturbations of presynaptic Ca²⁺-secretion coupling, bouton Ca²⁺ imaging, and mechanistic modeling. Across two prominent excitatory connections (Layer 5 (L5) PN-L5PN and L2/3-L5PN), they provide convergent evidence that mature PFC synapses operate with looser Ca²⁺ channel-release sensor coupling than their S1 counterparts.

      Overall, the study provides an appealing mechanistic link between synaptic nano/micro-architecture and cortical-area specialization. The idea that PFC synapses retain a more "plasticity-favoring" presynaptic state, while the primary sensory cortex emphasizes reliability and timing precision, is potentially impactful for how we think about circuit computation and plasticity across cortical hierarchies.

      Strengths:

      A major strength is the multi-pronged experimental strategy. The paper first establishes robust, area-dependent differences in synaptic efficacy, reliability, timing, and short-term plasticity (facilitation prevailing in PFC versus depression in S1), using both paired recordings and minimal extracellular stimulation paradigms. The coupling interpretation is then directly supported by differential sensitivity to EGTA (and appropriate positive-control effects of fast chelators). Finally, volume-averaged calcium signals are reported to be similar across areas, arguing against trivial explanations based on gross differences in calcium influx, and the modeling provides a quantitative framework for interpreting the observed chelator effects.

      Weaknesses:

      Limitations are minor and concern interpretation/clarity rather than core results. Some key inferences rely on indirect readouts (chelator sensitivity, fluctuation analysis-derived parameters, bouton-averaged calcium signals), each of which carries assumptions and potential confounds that should be discussed more explicitly. In particular, the repatching paradigm for the paired-recording EGTA experiment, though very impressive, and the limited number of extracellular calcium conditions used for fluctuation analysis (three concentrations), can influence quantitative estimates and the confidence intervals around them.

    3. Reviewer #2 (Public review):

      Schwarze et al. investigated whether synaptic efficacy is brain-region specific. To this end, they compared synaptic connections established by layer 5 (L5) neocortical pyramidal cells and between L5 and L2/3 pyramidal cells. In order to identify the mechanism of this brain region specificity, the authors employed several experimental approaches, including paired electrophysiological recordings, extracellular stimulation, low- and high-affinity intracellular calcium chelators (EGTA and BAPTA), multiple probability fluctuation analysis (MPFA), and intracellular measurements of calcium transients as well as computational modelling. The findings of the present study indicate that synaptic connections in the primary somatosensory cortex (S1) are significantly stronger and more reliable than those in the prefrontal cortex (PFC).

      The study is timely, and the topic is of significant interest to the neuroscience community. Despite the extensive research that has been carried out on the neuroanatomy and receptor distribution of different brain regions, comparatively little attention has been paid to differences in synaptic physiology. The authors' approach is characterised by its elegance and comprehensive nature, and the conclusions drawn are compelling. Nevertheless, there are a number of unresolved issues.

      Major points:

      (1) The authors state that data from the S1 cortex were obtained in a previous study. In the context of an explicitly comparative study (PFC vs. S1cortex), it would have been advantageous for the authors to perform a subset of experiments in which both cortices were obtained from a single animal. This is a feasible undertaking, given the spatial separation of the PFC and S1 cortex.

      (2) Figure 1A is somewhat misleading because it could suggest that the authors have performed dual recordings in identified PFC pyramidal cells.

      (3) PFC and S1 cortex in rodents differ markedly in their morphological organisation. For example, in all sensory cortices, layer 4 is very pronounced; however, in the PFC of rodent,s no clear layer 4 can be found. On the other hand, PFC shows a clear separation of layers 2 and 3, which is not visible inthe S1 cortex. Furthermore, PFC pyramidal cells in layers 2, 3, and 5 exhibit significant heterogeneity, diverging considerably from those found in layers 5a and 5b of S1 cortex. Thus, there is no clear correlation between L5 pyramidal cells in the PFC and the S1 cortex. In order to achieve a meaningful comparison of the data obtained in PFC and S1 cortex, it is necessary for the authors to determine whether the record is from similar pyramidal cell populations.

      (3) In addition, PFC pyramidal cells in layer 2, 3 and 5 are highly heterogeneous and differ markedly from those in layer 5a and 5b of S1 cortex. To achieve a meaningful comparison of the data obtained in the PFC and the S1 cortex, the authors need to determine whether the record from similar pyramidal cell populations.

      (4) For the S1 cortex, in rats it has been found that L5 synaptic connection between pairs of L5a pyramidal cells and pairs of L5b pyramidal cells differ markedly with respect to mean EPSP amplitude, latency and coefficient of variation (cv, a surrogate measure for the synaptic release probability) (cf. Markram et al., 1997; Frick et al., 2008). It is therefore likely that PFC and S1 pre- and postsynaptic pyramidal cells are not only morphologically and electrophysiological distinct but also with respect to their synaptic properties. At least, the authors need to discuss these confounding issues and preferentially address them experimentally. For example, it would be helpful to demonstrate that paired recordings were made from the same pyramidal cell types, perhaps by documenting their morphology and/or firing patterns. In addition, they should discuss the marked difference in EPSP amplitude and putative release probability between their data and the earlier studies.

      (5) In order to perform multiple probability fluctuation analysis (MPFA), a parabolic fit with a mere three points is inadequate, particularly because 2 mM and 5 mM Ca2+ are close to the peak of the variance-to-mean parabola, and only 1 mM Ca2+ is on its initial linear part. A more meaningful result would have been obtained with an additional Ca2+ concentration between 1.0 and 2.0 mM, as these are closer to the physiological range. In this context, the authors should have quoted the more recent and more detailed paper by the Silver group (Saviane and Silver, 2006; Lanore and Silver, 2016) and not just the Clements and Silver review paper.

      (6) Methods: The authors should clarify whether their paired recordings from L5 pyramidal cells involved whole-cell recordings from both pre- and postsynaptic neurons. From Figure 1B, it appears as if the presynaptic neurons were not recorded in whole cell mode but rather stimulated in cell-attached mode. This is also reflected in the artefact visible in the current trace recorded in the postsynaptic neuron. The authors should explicitly state their methodological approach and mention how reliable the timing of the presynaptic action potential was under these circumstances. The same holds true for the extracellular stimulation protocol. A significantly more detailed description of the experimental protocol is necessary here.

      (7) Methods: The authors use Student's t-test for data comparison. The authors should verify that the data distribution was indeed normal, e.g. by using a Shapiro-Wilk test. If this is not the case, non-parametric tests should be used.

    4. Reviewer #3 (Public review):

      Summary:

      In this manuscript, Max Schwarze and colleagues examined the coupling distance between presynaptic Ca²⁺ channels and the vesicular release sensor at neocortical synapses in mice. They propose that Ca²⁺ channel-release sensor coupling differs across cortical areas, with relatively loose (microdomain) coupling in prefrontal cortex (PFC) and tighter (nanodomain) coupling in primary somatosensory cortex (S1) for comparable pyramidal-neuron synapse types. To test this, they combine paired recordings and minimal stimulation with chelator manipulations (EGTA/BAPTA), mean-variance/MPFA-style analyses, presynaptic Ca²⁺ imaging, and computational modeling. They conclude that presynaptic coupling organization is area-specific in the mature cortex and contributes to regional differences in synaptic timing, reliability, and short-term plasticity.

      Strengths:

      This study tackles an important question and is strengthened by a cohesive body of evidence assembled from multiple complementary approaches. A major asset is the inclusion of high-value datasets, particularly the paired recordings between L5 pyramidal neurons and the systematic assessment of EGTA sensitivity, which provide a solid functional foundation for the authors' central claims. The work is further distinguished by its genuinely multimodal design: combining electrophysiology with presynaptic calcium imaging (and integrating these observations with quantitative analyses and modeling) offers a more mechanistic view of neurotransmitter release than any single method could provide. Overall, the direct, within-framework comparison of presynaptic release-control mechanisms across cortical areas for comparable synapse types is compelling and gives the conclusions a level of robustness and interpretability that is often difficult to achieve in studies of cortical synaptic diversity.

      Weaknesses:

      Several aspects would benefit from clearer explanation, stronger integration with the existing literature, and a more explicit discussion of limitations and potential confounds. Without these additions, some conclusions remain speculative. Throughout the manuscript, the authors also often imply that different measurements reflect the same underlying synapse population. This is unlikely to be strictly true across all experiments and makes it difficult to integrate results from the various approaches into a single, unified set of functional synaptic properties. In addition, some statements-particularly those linking coupling mode to "higher-order neocortical functions"-appear broader than what is directly supported by the experiments and should be tempered or more precisely scoped.

      Below, I list several topics that could help better frame the main findings of the present study and clarify how it relates to previously published work.

      (1) The authors use EGTA sensitivity of EPSCs (together with additional metrics) to argue that S1 and PFC synapses differ in Ca²⁺ channel-release sensor coupling. While this is a plausible interpretation, EGTA effects are not uniquely determined by coupling distance and can also reflect differences in Ca²⁺ entry kinetics, action potential waveform, endogenous buffering/extrusion, or release-sensor/vesicle state. The authors use a constrained modeling approach, but the rationale for the different constraint sets is not fully clear from the current description. It would be helpful to expand and clarify the Methods section to explain how these constraints were defined, justified, and applied (and how alternative constraint choices would affect the results). In this context, the Abstract's broader claim that the study "reveals microdomain coupling as a presynaptic structure-function correlate of higher-order neocortical functions" appears overstated. Given the well-known diversity of cortical synapses even within a single region (e.g., synapses onto different interneuron subclasses or different PN cell types, extracortical sources like thalamus), the authors should clarify the intended scope: is the conclusion meant to apply broadly across synapse classes in S1 and PFC, or only to the specific connection type(s) examined here?

      (2) The chelator logic is sound in principle, but the Discussion should more explicitly acknowledge standard caveats and alternative explanations. The authors partly address this by including presynaptic Ca²⁺ imaging and modeling, yet it would help to explain more clearly how the combination of (i) chelator sensitivity, (ii) presynaptic Ca²⁺ signals, and (iii) model constraints rules out-or substantially reduces the likelihood of-changes in AP waveform, Ca²⁺ influx kinetics, buffering/extrusion, or sensor/vesicle state as the primary drivers. In addition, recent hypotheses emphasizing vesicle priming and/or release-site occupancy as contributors to apparent EGTA sensitivity should be discussed as a complementary or alternative interpretation.

      (3) A substantial portion of the S1 comparison appears to rely on previously published datasets. This should be made unambiguous in the Results and Methods, and it would be helpful to summarize this clearly (e.g., in a table indicating which figures/analyses use new data versus reanalysis of published data). If this information is already present, it should be highlighted more prominently.

      (4) The modeling is informative, but the choice of a specific VGCC-release-site geometry and channel arrangement is not sufficiently justified. The manuscript adopts a particular spatial configuration, yet the rationale for selecting this geometry, rather than other plausible architectures discussed in the literature, is not clearly explained, nor is it meaningfully revisited in the Discussion. The authors should justify why the same organization is assumed across two distinct cortical areas and, ideally, include (or at a minimum discuss) a sensitivity analysis showing how key inferences (e.g., coupling distance and channel number) depend on the assumed geometry.

      (5) The calcium imaging data are valuable, but given the diversity of synapses within each cortical layer, it is not clear that imaged boutons can be confidently assigned to the specific connection types being interrogated electrophysiologically. A substantial fraction of boutons likely corresponds to different postsynaptic targets (including interneurons and distinct pyramidal-cell classes), and this heterogeneity could complicate interpretation. This limitation should be discussed explicitly

      (6) In unitary connections, the authors assess EGTA effects alongside other functional parameters (strength, delay, short-term plasticity), which is a major strength. However, for L2/3 to L5 connections, it appears that EGTA sensitivity was tested primarily using extracellular stimulation. Given anatomical and circuit differences between PFC and S1, extracellular stimulation may recruit different synapse populations across regions, potentially confounding regional comparisons of EGTA sensitivity. This limitation should be acknowledged explicitly. While I am not requesting technically demanding L2/3↔L5 paired recordings in S1, the possibility that different synapse identities are being sampled should be treated as a meaningful source of uncertainty. The Discussion would also benefit from placing the magnitude of EGTA effects in the context of prior "loose coupling" literature, where comparatively large EGTA effects have been reported in some systems. In addition, the reported difference between adult PFC EGTA effects and S1 inhibition appears small (on the order of <10%) and should be interpreted cautiously, especially given that PFC and S1 mature on different timelines and P21-P26 is unlikely to reflect a mature PFC circuit state. The adult cohort (P90-P100) is therefore important, but the age mismatch complicates PFC-S1 comparisons; ideally, S1 should be assessed at matched ages, or this limitation should be discussed explicitly. Finally, for statistical robustness, in panel D of Figure 2, were the comparisons corrected for multiple testing to control Type I error?

      (7) Alterations in initial release probability are often associated with changes in short-term plasticity. In the present manuscript, the authors report similar initial release probability at PFC and S1 synapses, yet observe differences in short-term plasticity profiles. The mechanistic basis for this apparent dissociation is not addressed and should be discussed explicitly, including potential explanations.

      (8) There are multiple instances where the text appears to cite non-existent or misnumbered figure panels (e.g., references to "Figure 4G-I / 4J" when the relevant material appears elsewhere). These should be corrected throughout, as they currently reduce readability and confidence.

      (9) The Methods describe P21-P26 animals, whereas the Results include older cohorts (e.g., P90-P100) and additional regions (e.g., mPFC). The Methods should be updated so that all cohorts and regions analyzed in the Results are fully described.

    5. Author response:

      We will extend and clarify the text of the paper according to the suggestions of the reviewers. In particular we will extend the description and discussion of the calcium chelator approach, re-patching and multiple probability fluctuation analysis. We will also include in the Results section that volume-averaged calcium signals were measured and extend the description about measurement of the resting calcium and variability between boutons. Literature will be included and discussed as suggested.

      In order to avoid any misunderstandings, we will also make it clearer that recordings from

      L5PN – L5PN synapses in S1 were published in our preceding papers (Bornschein et al., 2019a, b), but that these data were partially reanalyzed for the comparison with recordings from L5PN – L5PN synapses in PFC (this paper). We will also emphasize that the recordings from L2/3 to L5PN synapses in S1 and PFC were made directly in the present study. We will include a supplementary table, which explicitly shows for each figure which data are from Bornschein et al. (2019a, b) and which data were obtained in the present study. 

      We will consider all points of the reviewers and the recommendations of the editors in detail in the revised manuscript and/or our pointwise response.

      We recognized one factual error in the public reviews:

      Reviewer 2, point 7: “Methods: The authors use Student's t-test for data comparison. The authors should verify that the data distribution was indeed normal, e.g. by using a Shapiro-Wilk test. If this is not the case, non-parametric tests should be used.”

      A detailed description of the statistics, including test for normality, is given in the Methods section. In particular we wrote in the Methods: “Normality was tested using the Shapiro-Wilk Test. (…) To compare pre- and post-treatment data the paired t-test or the Wilcoxon signed rank test (WSR) was used, depending on the distribution of the data. (…)”

      To further emphasize that the data was tested for normal distribution, we have also extended the description of the statistical tests in the figure legends.

      Bornschein G, Brachtendorf S, Schmidt H (2019a) Developmental increase of neocortical  presynaptic efficacy via maturation of vesicle replenishment. Front Synaptic Neurosci 11:36.

      Bornschein G, Eilers J, Schmidt H (2019b) Neocortical high probability release sites are formed by distinct Ca2+ channel-to-release sensor topographies during development. Cell Rep 28:1410-1418 e1414.

    1. eLife Assessment

      The findings of this study are important since they cover the repurposing of small molecules as metalloprotease and phospholipase inhibitors for early intervention in the treatment of bothropic envenoming in the Neotropics, and thus provide a strong rationale for the progression of these inhibitors into future preclinical and clinical evaluation for snakebite indications across various ecological zones, albeit the current evidence casts doubts on the viability of repurposing nafamostat. The strength of the evidence is solid; however, there are some weaknesses, such as a lack of translatability of the in vivo model and insufficient venom characterization. Thus, the strength of the evidence can be enhanced by using a mouse model in future studies. The paper remains of interest to ophiologists, biochemists and medicinal chemists.

    2. Reviewer #1 (Public review):

      Very nice and coherent body of work with appropriate in vitro to in vivo transition in methods.

      Lovely and easy to follow figures that can be understood even without the manuscript.

      My recommendation is that a sentence or two be added clearly stating the authors think nafamostat is off the table and suggest other approaches/drugs that might be considered instead of just making a general statement. I think all this can be done in a few sentences.

      Gabexate was administered to a snakebite victim in this case report from about 20 years ago and also a good example of the now better recognized threat to pregnancy.

      Nasu K, Ueda T, Miyakawa I. Intrauterine fetal death caused by pit viper venom poisoning in early pregnancy. Gynecol Obstet Invest. 2004;57(2):114-6. doi: 10.1159/000075676. Epub 2003 Dec 19. PMID: 14691344

    3. Reviewer #2 (Public review):

      Summary:

      The authors set out to test whether a defined set of small molecules can lessen damaging effects caused by venoms from several Bothrops species, and whether these effects are consistent enough to suggest a broadly applicable approach. They present a cross-venom dataset spanning in-vitro activity readouts and blood-based functional outcomes, and include a chicken embryo model to explore whether venom inhibition can translate into improved survival. The central message is that certain small molecules can reduce specific venom-driven effects across multiple samples, providing a comparative resource for the field and a basis for prioritizing future validation.

      Strengths:

      The main value of this work is the breadth and structure of the dataset, which places multiple venoms and multiple readouts into a single, comparable framework that should be useful for readers evaluating patterns across samples. The experimental flow is generally coherent, moving from activity measurements to functional outcomes and then to an in-vivo test, which helps the reader understand how the authors link mechanism-oriented assays to more integrated endpoints. The manuscript also provides practical information for the community by highlighting which readouts appear most consistently affected across venoms, which can help guide hypothesis generation and study design in follow-up work.

      Comments on revisions:

      I would like to thank the authors for answering my questions. The manuscript has gained in quality, knowing the limitations that are now better stated in the manuscript.

    4. Author response:

      The following is the authors’ response to the current reviews.

      We thank the editors and reviewers for their assessment of this manuscript, and for the positive words highlighting the value of undertaking evaluation of small molecule drugs for snakebite in the neotropics, inclusive of the quality of this work and the value of the validated screening pipeline. We completely agree that the next steps for this work will be to evaluate the preclinical efficacy of the identified drugs in mouse models, though this considerable undertaking will form the basis of future work. Critically, the pipeline that we describe herein facilitates the selection of the most appropriate candidates to progress into such mouse studies, aligning with the 3Rs principles for minimising the need for animal research. The comment around insufficient venom characterisation seems somewhat misplaced – the objective of this project was not to characterise the venoms used, but to evaluate the in vitro inhibition of venom toxin family activities and identify the potential utility of specific repurposed drugs as therapeutics for snakebite in the neotropics. Venom characterisation of the diverse samples used in this project would represent an entire project and manuscript in its own right. We are pleased that the reviewers highlight the gap in research on serine protease inhibitors and the value this paper has in highlighting that more research is required in this area to identify a candidate that is more suitable for future clinical use than nafamostat.


      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Small molecule therapeutics for snakebite have received a lot of attention for their potential to close the gap between bite and treatment, where antivenom is not immediately available.

      Strengths:

      There has been a lot of focus on Africa, Asia, and India, but very little work related to neotropical regions. The authors seek to begin filling this gap in the preclinical literature. The authors use well-developed methods for preclinical assessment.

      Weaknesses:

      A clearer and more focused discussion of the limitations of the overall present work would be desirable (e.g. protection vs. rescue, why marimastat over prinomastat for in vivo assays when both have been through clinical trials for other indications; real-world feasibility of nafamostat, which has a half-life of 1-2 minutes compared to camostat, which has a half-life of hours). All of this could be improved in a revision.

      We thank the reviewer for their shared opinion of the potential value of small molecules as snakebite envenoming therapeutics and their insight on the gap in focus in the neotropics, which this manuscript aims to address.

      Our work in this manuscript included standard practice of pre-incubation between drug and venom for all in vitro studies, and sequential (i.e. not co-incubation) administration in the egg model. In our revised manuscript we will make these distinctions clearer. Use of a ‘rescue’ approach in the in vitro assays is not feasible due to the rapid destruction of the substrates used for assay readouts. The clearest rationale for the use of rescue models relates to their power within in vivo preclinical models (i.e. murine envenoming models) which, following the in vitro characterisations presented in this paper, are the logical next step for evaluating small molecule drugs for inhibiting neotropical snake venoms.

      Although both marimastat and prinomastat are repurposed drugs that have undergone clinical evaluation for other indications, marimastat has been more extensively characterised preclinically than prinomastat for snakebite, and will soon enter Phase II clinical trial evaluation for this indication (https://www.ddw-online.com/ophirex-to-produce-snake-venom-inhibitor-for-lstm-study-40669-202602/). Marimastat also has a longer half-life in humans of 8-10 hours (Millar et al. 1998), compared to prinomastat (2-5h, Hande et al. 2004). We will more clearly highlight the rationale for selecting marimastat in the revised manuscript.

      Although we appreciate the reviewer’s point regarding the short half-life of nafamostat (which is typically given by continuous iv infusion due to its short half-life), in the manuscript we have already stated that we do not recommend the progression of nafamostat as a snake venom serine protease (SVSP) inhibitor candidate due its low efficacy and off target effects. We highlight the need for the community to identify other serine protease inhibitors that might have utility for snakebite.

      Reviewer #2 (Public review):

      Summary:

      The authors set out to test whether a defined set of small molecules can lessen damaging effects caused by venoms from several Bothrops species, and whether these effects are consistent enough to suggest a broadly applicable approach. They present a cross-venom dataset spanning in-vitro activity readouts and blood-based functional outcomes, and include a chicken embryo model to explore whether venom inhibition can translate into improved survival. The central message is that certain small molecules can reduce specific venom-driven effects across multiple samples, providing a comparative resource for the field and a basis for prioritizing future validation.

      Strengths:

      The main value of this work is the breadth and structure of the dataset, which places multiple venoms and multiple readouts into a single, comparable framework that should be useful for readers evaluating patterns across samples. The experimental flow is generally coherent, moving from activity measurements to functional outcomes and then to an in-vivo test, which helps the reader understand how the authors link mechanism-oriented assays to more integrated endpoints. The manuscript also provides practical information for the community by highlighting which readouts appear most consistently affected across venoms, which can help guide hypothesis generation and study design in follow-up work.

      Weaknesses:

      Several aspects of the study design and framing reduce the confidence with which readers can translate the findings beyond the specific experimental context presented. The evidence base is strongest in controlled in-vitro settings, while the bridge to real-world effectiveness remains limited, particularly for understanding performance under conditions that better reflect delayed treatment and systemic exposure. As a result, the manuscript is best interpreted as a well-organized comparative screening study with promising signals, rather than a definitive demonstration of a broadly effective, deployable intervention.

      We appreciate the reviewer’s opinion on the thorough and logical workflow we present in this manuscript and the value this pipeline providers the field for future and parallel work. We agree with the reviewer that this provides a well-organized comparative screening study applicable to different snake species or therapeutics. In relation to the comment on this manuscript being a definitive demonstration of a broadly effective, deployable intervention we agree with their opinion and are happy to clarify that while the evidence presented in this manuscript is promising, there is much work still to do before such molecules are ready for deployment for treating snakebite. Ultimately, this manuscript supports the growing evidence of the promising utility of marimastat and varespladib, and extends this evidence to neotropical snake venoms in a comparative manner. The next step will be to evaluate the efficacy of these molecules within in vivo murine preclinical models, which will be crucial for further supporting the evidence base for onward translation.

      Reviewer #3 (Public review):

      In this work, the authors wanted to evaluate repurposed small molecule inhibitors for the treatment of envenomation by snakes of the Bothrops genus; one of the most medically relevant in the Americas. I believe the objectives of the research were clearly achieved, and compelling evidence for the ability of these molecules to neutralize enzymatic and toxic activities of metalloproteinases and phospholipases in all the tested venoms is provided. Furthermore, the work highlights the limited efficacy of the tested serine protease inhibitor, suggesting a need for drug discovery campaigns to address toxicity caused by this protein family. The methods are well designed and performed, and the use of both in vitro and in vivo methodologies makes this a thorough and robust work.

      These results are extremely relevant, since they take us one step further to a potential orally administered snakebite treatment. The existence of such a treatment could improve the outcomes for thousands of snakebite victims worldwide. I have a few comments and questions that I hope will be useful to the authors:

      We thank the author for their high regard for the purpose and execution of this work. Their insight in relation to questions are supportive for an improved manuscript and discussion points for the field.

      During the introduction, the authors mention that small-molecule inhibitors can neutralize the localized tissue damage via cytotoxicity of some venoms, and cite PLA2s, SVMPs and/or cytotoxic 3FTxs as the main causing agents of this pathology. I am not aware of any direct effect described by small molecule inhibitors on cytotoxic 3FTxs alone. Has this been observed at all? Or is it more likely that the small molecule inhibitors act on the enzymatic toxins only, preventing synergistic effects with 3FTxs?

      We apologise for this error on our behalf. While inhibitory molecules have been described for cytotoxic 3FTxs, these are not small molecules as alluded to in the previous version of the manuscript. We have amended this text in our revised manuscript.

      I think it would be relevant to address the effects of non-enzymatic PLA2s, such as myotoxin II, which have been described in detail within Bothrops venoms. I believe there is some evidence of Varespladib also having a neutralizing effect on the myotoxicity caused by these non-enzymatic PLA2s. I suggest adding a comment about the contribution of these toxins in the discussion or in the section where PLA2 activity of the venoms is compared. In my opinion, right now it seems like these were overlooked.

      We thank the reviewer for highlighting this point. We agree that this is highly relevant and would benefit from discussion in the revised manuscript given the nature of our assays and the non-enzymatic mechanism of action of certain Bothrops PLA<sub>2</sub>s. We have added this to the discussion.

      Regarding Marimastat and the other MP inhibitors, are there any studies showing that they don't have an effect on endogenous MPs? I understand they have been approved for human use before, but is there any indication that they would not have an effect at the doses that would be required to treat envenomation?

      Most matrix metalloproteinases inhibitors will act on endogenous MPs to at least some extent (variable potency on different MMPs). Marimastat has demonstrated activity against endogenous metalloproteinases, including MMP1, which was hypothesised to cause severe joint pain when used chronically (i.e. frequent dosing over many weeks) for indications such as cancer, though this effect was reversible within 8 weeks of cessation of drug administration (Wojtowicz-Praga, 1998). Thus long-term use of matrix metalloproteinases inhibitors can cause safety concerns. However, the anticipated duration of dosing for snakebite, which is an acute life-threatening condition, is a few days. It is therefore unlikely that prior safety concerns observed following chronic dosing in cancer studies would apply to its potential use as a snakebite field therapy.

      Regarding the quenched fluorescence substrate used for enzymatic activity. Is there a possibility that some of the SVMPs would not act on this substrate, and therefore their activity or neutralization is not observed? Would it be relevant to test other substrates, such as gelatin, collagen, or even specific clotting factors?

      It has been observed that certain SVMPs (specifically several PI SVMPs) are not active against this ES010 substrate in vitro. The substrate used in the in vitro SVMP assay is reported by the manufacturer as a substrate for a wide range of MMPs which target the extracellular matrix components mentioned by the reviewer, i.e. collagenases and gelatinases as well as matrilysins, stromelysins and elastate. This in vitro assay combined with the coagulation assays are complementary in covering the main targets of SVMPs (ECM and clotting cascade), prior to haemorrhagic assessment in the egg model. Thus, we are confident that activity for the broad range of SVMP isoforms will be captured through the screening pipeline we have developed.

      Finally, could the authors comment or provide some bibliography regarding the translatability of the chicken embryo model in the context of envenomation?

      Our current model is based on an earlier egg embryo model (Sells et al. 1997, Sells et al. 1998 and Sells et al. 2000) which described good correlations (p<0.01) with the standard WHO murine preclinical envenoming model. These studies have assessed correlations for minimal haemorrhagic doses (MHDs), LD50s and ED50s in both models for a selection of viper venoms. As chicken embryos at day 6 of development have incomplete neural arcs, the model is not well suited for assessing neurotoxic effects, but can be effectively used for addressing venom-induced haemorrhage and lethality and for testing therapeutics. In addition, a more recent study (Yusuf et al. 2023) reported almost identical LD50s for the venom of Bitis arietans between the two in vivo approaches. The model is also being pursued as a preclinical testing model by an antivenom manufacturer with the focus of reducing the use of rodents in batch release testing (Verity et al. 2021). We will provide further clarification on the rationale for using the egg model, including the supportive references outlined above, in the revised manuscript.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      The manuscript provides a useful comparative dataset across multiple Bothrops venoms and supports SVMP inhibition as a broadly effective lever in the authors in-vitro work. However, the strength of the 'pan-Bothrops' and translational claims is currently limited by insufficient characterization of the exact venom samples tested and by experimental designs that fall in clinically realistic rescue.

      Major comments:

      (1) The venoms used in this study are historical batches and are not formally characterized beyond SDS-PAGE and literature summaries, despite well-known intra- and inter-population venom variability; this weakens the generalization of the conclusions.

      To address this comment, we have increased clarity on our venom sources being historic, Due to the historic source locality is not available beyond country of origin, with the exception of B. lanceolatus which is endemic to Martinique. Figure 1 also makes clear that we agree with the reviewer that the variation is high within Bothrops species. We discuss this variation on the limitations in our sampling for making broad conclusions throughout the first paragraph of the discussion, with the final sentence stating Future proteomic characterisations of the specific venom samples used in this study, which were all sourced from a historical collection (except for B. lanceolatus), would be informative in this regard. Although venom composition of our samples has not been characterised, the focus of the manuscript is the characterisation of the whole venom functional activity through a wide ranging screening pipeline, and the generalisation of our findings is supported by the diversity of the venom samples (i.e. several species) despite them not being characterised (which is not critical for the focus of the study).

      (2) On a technical comment, the venom inhibition assays appear to rely on drug-first or preincubation conditions, which can easily overestimate efficacy compared with real snakebite envenomation, where toxins distribute and engage targets rapidly. Here, a translational gap is the clinical feasibility of the 'repurposed' inhibitors, as it is unclear whether the drugs central to the conclusions (especially marimastat, prinomastat and varespladib) are realistically available or stocked in hospitals or could be deployed in regions where Bothrops envenoming occurs. I think that the manuscript should clearly distinguish this from candidates with a plausible access and delivery pathway.

      Our work in this manuscript includes standard practice of pre-incubation between drug and venom for all in vitro studies, and sequential (i.e. not co-incubation) administration in the egg model. None of our methods administer drug-first. Throughout the methods and figure legends we have made these distinctions clearer. Use of a ‘rescue’ approach in the in vitro assays is not feasible due to the rapid destruction of the substrates used for assay readouts. The clearest rationale for the use of rescue models relates to their power within in vivo preclinical models (i.e. murine envenoming models), which would be the next step for this research programme.

      While the evidence presented in this manuscript is promising, there is much work still to do before such molecules are ready for deployment for treating snakebite, inclusive of the requirement to complete clinical trials, cost-benefit analysis and policy change and manufacturing/distribution feasibility assessments. Ultimately, this manuscript supports the growing evidence of the promising utility of marimastat and varespladib, and extends this evidence to neotropical snake venoms in a comparative manner. The next step will be to evaluate the efficacy of these molecules within rescue in vivo murine preclinical models, which will be crucial for further supporting the evidence base for onward translation. To further support this point we have included an additional section to the manuscript discussing the current preclinical and clinical progression of prinomastat and marimastat, which also incorporates the public comment on selection of marimastat over prinomastat.

      (3) In my opinion, the Nafamostat results and discussion need reframing, given weak SVSP inhibition and intrinsic anticoagulant behavior at 5 µM. Excluding it from certain analyses undermines interpretability, and it may be more appropriate to include it throughout as an explicit negative control condition (showing its baseline anticoagulant effect) rather than omitting it.

      Although we understand the reviewers opinion here, we disagree and believe that including nafomastat as a ‘negative control’ may present a negative reflection on the benefit that an efficacious serine protease inhibitor could provide. Furthermore, as the intrinsic anticoagulant effect of nafamostat cannot be de-coupled from direct SVSP toxin inhibition we were unable to interpret the activity which undermines the results. This can be seen in Figure 3b, which demonstrates that a false positive result would occur. For the serine protease assay, we do clearly discuss the lack of efficacy and justification of why EC<sub>50</sub> testing wasn’t appropriate within the guidance of our screening protocols.

      In the manuscript we have now further justified our approach in relation to the limitations of nafamostat as a snake venom serine protease (SVSP) inhibitor candidate due its low efficacy and off target effects. We highlight the need for the community to identify other serine protease inhibitors that might have utility for snakebite.

      (4) The data presentation needs consistent statistical analyses (currently absent for multiple key figures, including Figures 2, 3, 4, 6 and 7) and a clearer explanation for the dose of venom and drugs you choose. For example, Figure 3 relies on a fixed 5 µM drug concentration and very different venom amounts (50-100-250 ng), but it is not discussed whether such exposures are achievable in vivo, or how these concentrations map onto expected pharmacokinetics in patients. Likewise, Bothrops venoms can contain both pro- and anticoagulant activities, so the authors should justify how their framework accounts for anticoagulant components and why the observed plasma phenotypes are interpreted as they are

      In relation to the reviewers comment on the need for consistent analysis we thank the reviewer for flagging this and have now included these in figures 3, 4, 6 and 7. However, Figure 2 is presented to display the variation between all the venoms and ultimately used to select the most relevant doses for the latter inhibition experiments, therefore statistical analysis is not relevant for this figure. The updated statistical analysis now includes the following, which has been included in the relevant figure legends and results sections;

      Figure 3 - Bars indicate significant results (p = <0.05) identified through one-way ANOVA with Dunnett’s multiple comparisons test to the DMSO control

      Figure 4 - two-way ANOVA with Šídák's multiple comparisons test of each venom control compared to the matched venom treated with inhibitor

      Figure 6 – the CT and MCF data were analysed independently using one-way ANOVA with Tukey’s multiple comparisons test

      Figure 7 - Log-rank test (Mantel-Cox) with Holm- Šídák's multiple comparisons test against treatment vs venom-only control

      We have ensured that all figure legends clearly indicate the venom and drug dose to aid the clarity which the reviewer requested.

      The comment Figure 3 relies on a fixed 5 µM drug concentration and very different venom amounts (50-100-250 ng), but it is not discussed whether such exposures are achievable in vivo, or how these concentrations map onto expected pharmacokinetics in patients. is an understandable query however, in vitro assessment such as those carried out in this manuscript are not designed to directly inform pharmacokinetic/pharmacodynmanic interpretations, largely because they do not replicate real world envenoming (i.e. preincubation would not occur between a venom and treatment). This is why, as stated, follow on preclinical and clinical assessments are needed for onward progression of these inhibitors to inform dosing regimens that might achieve the necessary exposures required for in vivo venom neutralisation. That being said, PK/PD work has been initiated within Phase I trials, for example with DMPS Abouyannis et al. 2025 demonstrated a plasma exposure of >10 µg/mL for single doses of 1,200 mg and higher. This is equivalent to 80 µM, which although is lower than the EC<sub>50</sub> for some venoms in the clotting assay (Figure 3J), the venom dose (50 to 250 ng/ 50 µL, i.e. 1,000 to 5,000 ng/µL) is estimated to be >1000 times higher than a natural envenoming by Bothrops atrox at less than 1 ng/mL in serum (https://doi.org/10.1016/j.toxicon.2022.09.010). These extrapolations therefore indicate that the doses selected in our studies would have human clinical relevance.

      Finally, in terms of anticoagulant venom effects - these would be observed in our experimental approach either as reduced kinetic responses in the plasma clotting assay (as observed with nafamostat in Figure 3B) or as a prolonged clotting time in the thromboelastography assay (Figure 6). As stated in the results section Comparison of coagulation profiles, all of the venoms tested presented with a procoagulant effect. If underlying anticoagulant activity from PLA<sub>2</sub> toxins was to arise after inhibition of the procoagulant toxins (i.e. SVMPs by marimastat), as has been seen for certain other snake venoms previously, this would result in a percentage inhibition far greater than 100% in the plasma assay (Figure 3C to I) or as a prolonged clotting time in the thromboelastography assay. These described anticoagulant profiles were not observed with any venom tested in this study.

      (5) Finally, the in vivo evidence is limited to a chicken embryo model. To support your hypothesis, a conventional mouse model with delayed post-envenomation dosing (24-36 h monitoring) is needed to address both safety/toxicity and post-exposure efficacy, and to define a realistic therapeutic window, especially because venom toxins act very quickly and the timing of administration is central to the clinical utility of any small-molecule approach.

      We agree with the reviewer that the next important step for this research activity is utilising murine preclinical models to validate the in vitro and preliminary in vivo findings described in this manuscript. However, as stated above, this study provides the initial evidence base that the promising utility of marimastat, DMPS and varespladib as repurposed snakebite drugs extends to a range of neotropical viper venoms. Evaluating the safety, efficacy (both precincubation and rescue approaches) and PK/PD relationships to inform optimal dosing strategies of these molecules will be crucial next steps for the field. However, these activities are far from trivial and will take several years of additional research, and therefore fall outside the scope of this initial manuscript.

      To address the concern related to the evidence is limited to a chicken embryo model, we have included additional sentences to discuss the wider use of the egg model within snakebite research and related translation to murine studies.

      Minor comments:

      (1) Figure 2D: How do you discuss the fact that "no venom" has SVSP activity?

      The data for all in vitro assays in Figure 2 is presented as AUC from the raw data (absorbance or fluorescence), for consistency across assay. Therefore, all assays (B to D) have background signal in the absence of venom. The SVSP assay has a greater background signal.

      (2) For better understanding, I would suggest adding a dedicated column in Figure 4A with Nafamostat SVSP data reported as "N/D" where applicable.

      As stated in the results, due to the weak inhibitory activity EC<sub>50</sub> assessment was not justified, therefore adding this column would be redundant.

      (3) The introduction is too long relative to the experimental content and would benefit from tightening to sharpen the motivation and unmet need.

      We thank the reviewer for their opinion and we have reviewed the introductory section again. While we made minor edits throughout, we decided not to make substantial modifications to it.

      Reviewer #3 (Recommendations for the authors):

      I only have some minor comments:

      (1) In line 100, the word "that" is repeated.

      We thank the reviewer for spotting this error, which we have corrected.

      (2) Line 433. I believe the word "compromising" should be substituted by "comprising" here.

      We thank the reviewer for spotting this.

      (3) Figure 1 and supplementary: Bothrops asper venom has been very thoroughly studied, and using only one study from Costa Rica might underestimate the venom variation within the species. I suggest looking at the following study: https://doi.org/10.1016/j.toxicon.2022.106983. Maybe it is not necessary to change anything, but worth looking into.

      We appreciate the reviewer flagging this paper, it has been added to the manuscript (reference 48) and has provided additional data for Figure 1 and Supplementary table 1.

      (4) Methods: Given the intraspecies variation described for some of these species, I believe it is relevant to add the locality of origin of the venoms, and not only the country. I, of course, understand this is often unknown for historical samples.

      We have included the following sentence in the methods. Due to the historic nature of the venom samples, the source locality is not available beyond country of origin, with the exception of B. lanceolatus which is endemic to Martinique.

      (5) Figure 3: It is not very accurate to show an SD when the sample number is 2. I suggest, when possible, showing the mean and the two data points in the plots. This also applies to other figures where n=2. Also, in Figure 3D, does Marimastat seem to have an anticoagulant effect, or is this just within normal variation?

      We have removed the statement in the statistics paragraph of the methods Standard deviation (SD) for all kinetic reads and standard error for AUC is reported based on Prism v10 but kept the sentence. The sample sizes for HTS assays including the SVMP, PLA<sub>2</sub> and coagulation experiment are the average of the means from independent assays (n >2 within each independent assay). We understand the reviewer’s opinion on limited meaning of SD as well as SE for Fig 3 A to I, therefore we have changed the error bars to range, as we think that displaying the individual points would result in a lack of visual and analytic clarity.

      In relation to the query about marimastat anticoagulant effect in Fig 4D, as shown in 4B marimastat has no direct anticoagulant effect. The >100% inhibition for marimastat is likely to be normal variation as this is a biological assay which has high variability. However, it could also be that the strong inhibition of the SVMPs in B. asper along with limited SVSP activity has unmasked an anticoagulant effect of the remaining PLA<sub>2</sub> toxin which has high activity in this venom. That being said, as B. asper has a similar profile, we would have expected to see a similar profile in B. atrox in both the plasma and TEG assays. Therefore, assay variation seems the most likely reason for this observation.

    1. eLife Assessment

      This study introduces an important method to estimate the probability of malaria importation with a new Bayesian approach that integrates epidemiological, travel reports, and genetic data. The authors provide convincing evidence for the value of this model in identifying the main sources of malaria transmission and risk factors. This work will be of interest to the area of genomic epidemiology and public health strategies aiming to eliminate malaria.

    2. Reviewer #1 (Public review):

      Summary:

      This study presents a new Bayesian approach to estimate importation probabilities of malaria combining epidemiological data, travel history, and genetic data through pairwise IBD estimates. Importation is an important factor challenging malaria elimination, especially in low transmission settings. This paper focus on Magude and Matutuine, two districts in south Mozambique with very low malaria transmission. The results show isolation-by-distance in Mozambique, with genetic relatedness decreasing with distances larger than 100 km, and no spatial correlation for distances between 10 and 100 km. But again strong spatial correlation in distances smaller than 10 km. They report high genetic relatedness between Matutuine and Inhambane, higher than between Matutuine and Magude. Inhambane is the main source of importation in Matutuine, accounting for 63.5% of imported cases. Magude, on the other hand, shows smaller importation and travel rates than Matutuine, as it is a rural area with less mobility. Additionally, they report higher levels of importation and travel in the dry season, when transmission is lower. Also, no association with importation was found for occupation, sex and other factors. These data have practical implications for public health strategies aiming malaria elimination, for example, testing and treating travelers from Matutuine in the dry season.

      Strengths:

      The strength of this study relies in the combination of different sources of data - epidemiological, travel and genetic data - to estimate importation probabilities, the statistical analyses.

      Weaknesses:

      The authors recognize the limitations related to sample size and the biases of travel reports.

    3. Reviewer #2 (Public review):

      Summary:

      Based on a detailed dataset, the authors present a novel Bayesian approach to classify malaria cases as either imported or locally acquired.

      Strengths:

      The proposed Bayesian approach for case classification is simple, well justified, and allows the integration of parasite genomics, travel history, and epidemiological data.

      Weakness:

      While the authors aim to classify cases as imported or locally acquired, the method does not quantify the contribution of each case type to overall transmission, which the authors leave for future study.

    4. Reviewer #3 (Public review):

      This work provides a novel statistical model to identify imported malaria cases, which are an important challenge for elimination, particularly in low-transmission areas. This tool was applied in Plasmodium falciparum populations in Mozambique and determined differences in importation rates in two low-transmission districts in the South.

      Strengths:

      The study has several strengths, particularly the development of a novel Bayesian model integrating genomic, epidemiological, and travel data to estimate importation probabilities. The findings provided important insights into malaria transmission dynamics, including the identification of importation sources and regional differences in importation rates across Mozambique. These results highlight the potential value of targeted interventions among traveler populations to support malaria elimination efforts. Moreover, this approach could be adapted to other epidemiological settings.

      Weaknesses:

      The study has some limitations, including uneven sample representation across provinces, incomplete metadata for risk factor analysis and a proxy for transmission intensity. Future work will include a new sample collection effort and the incorporation of monthly malaria incidence estimates.

    5. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This study presents a new Bayesian approach to estimate importation probabilities of malaria combining epidemiological data, travel history, and genetic data through pairwise IBD estimates. Importation is an important factor challenging malaria elimination, especially in low transmission settings. This paper focus on Magude and Matutuine, two districts in south Mozambique with very low malaria transmission. The results show isolation-by-distance in Mozambique, with genetic relatedness decreasing with distances larger than 100 km, and no spatial correlation for distances between 10 and 100 km. But again strong spatial correlation in distances smaller than 10 km. They report high genetic relatedness between Matutuine and Inhambane, higher than between Matutuine and Magude. Inhambane is the main source of importation in Matutuine, accounting for 63.5% of imported cases. Magude, on the other hand, shows smaller importation and travel rates than Matutuine, as it is a rural area with less mobility. Additionally, they report higher levels of importation and travel in the dry season, when transmission is lower. Also, no association with importation was found for occupation, sex and other factors. These data have practical implications for public health strategies aiming malaria elimination, for example, testing and treating travelers from Matutuine in the dry season.

      Strengths:

      The strength of this study relies in the combination of different sources of data - epidemiological, travel and genetic data - to estimate importation probabilities, the statistical analyses.

      Weaknesses:

      The authors recognize the limitations related to sample size and the biases of travel reports.

      We appreciate the review and comment about the manuscript.

      Reviewer #2 (Public review):

      Summary:

      Based on a detailed dataset, the authors present a novel Bayesian approach to classify malaria cases as either imported or locally acquired.

      Strengths:

      The proposed Bayesian approach for case classification is simple, well justified, and allows the integration of parasite genomics, travel history, and epidemiological data.

      Weakness:

      While the authors aim to classify cases as imported or locally acquired, the work lacks a quantification of the contribution of each case type to overall transmission.

      Comments on revisions:

      All my questions and concerns were satisfactorily addressed.

      We appreciate the review and comment about the manuscript. In fact, the approach does not pretend to quantify the contribution of each case to overall transmission. In the discussion we state it and refer to future work with this scope.

      Reviewer #3 (Public review):

      This work provides a novel statistical model to identify imported malaria cases, which are an important challenge for elimination, particularly in low-transmission areas. This tool was applied in Plasmodium falciparum populations in Mozambique and determined differences in importation rates in 2 low-transmission districts in the South.

      Strengths:

      The study has several strengths, mainly the development of a novel Bayesian model that integrates genomic, epidemiological, and travel data to estimate importation probabilities. The results showed insights into malaria transmission dynamics, particularly identifying importation sources and differences in importation rates in Mozambique. Finally, the relevance of the findings is to suggest interventions focusing on the traveler population to support efforts for malaria elimination.

      Weaknesses:

      The study also has some limitations, although the authors have plans to address them. The sample collection was not representative of some provinces, and not all samples had sufficient metadata for the risk factor analysis. Additionally, the authors used a proxy for transmission intensity and assumed some other conditions to calculate the importation probability for specific scenarios. They plan to conduct a new sample collection and include monthly malaria incidence estimates in the future.

      Comments on revisions:

      Delete "We added this text to the discussion" in line 302 (Discussion)

      I recommend adding the plans to address limitations indicated in the Response to Reviewers document in the Discussion. This would really strengthen the limitation section.

      Thank you for pointing to these aspects. We deleted the sentence mentioned. In the discussion section, we now finish the paragraph on limitations with the proposed future work to address them.

    1. eLife Assessment

      This important study presents a theoretically grounded framework for dimensionality reduction in single-cell RNA sequencing data, utilizing the principles of Riemannian manifolds. The proposed method addresses a critical challenge in bioinformatics-extracting highly informative latent dimensions without relying on the heuristic assumptions common in existing workflows. The evidence supporting the method's utility in estimating intrinsic dimensionality and identifying cell types is convincing, though the work would benefit from more rigorous validation against established ground truths and a clearer strategy for addressing prevalent batch effects.

    2. Reviewer #1 (Public review):

      Summary:

      Sidarta-Oliveira et al. present TopOMetry, a novel dimensionality reduction method based on the eigendecomposition of approximated Laplace-Beltrami Operator. Shortly, TopOMetry is an iterative version of the existing spectral methods (e.g., Laplacian Eigenmap or Diffusion map). It approximates the Laplacian operators twice, once in a "phenotypic space" and then once again in the eigenbases space. By doing this the approximated operator will contain more information of the manifold, which allows for more robust and accurate downstream analyses.

      Strengths:

      - Introduces operator-native fidelity scores and Riemannian diagnostics to single-cell analysis, enabling researchers to evaluate and trust embeddings - functionality absent in prior methods.<br /> - The approach was rigorously tested based on synthetic and real single-cell RNA-seq datasets.<br /> - The package is well-made and easily scalable to millions of cells.<br /> - The comprehensive documentation helps the end-users to run desired analyses.

      Weaknesses:

      - The method is an extension of the current state-of-art methods, not a fundamentally new one.

      Comments on revised version:

      The revised manuscript partially addresses the concerns raised in the prior review. The jargon weakness has been substantially mitigated by relocating mathematical derivations to the Methods section and simplifying language in the main text; this weakness has been updated accordingly.

      The introduction of operator-native fidelity scores and Riemannian diagnostics represents a meaningful addition and has been added to the Strengths. The benchmarking scope has also been notably expanded.

      The core weakness - that the method is an extension of existing spectral methods rather than a fundamentally new contribution - remains unchanged, as the authors' rebuttal did not provide a sufficiently precise mathematical argument to overturn it.

    3. Reviewer #2 (Public review):

      Summary:

      This work introduces a novel framework to systematically learn the latent dimensions of single-cell data, grounded in the theory of the Riemannian manifold. The authors demonstrate how this framework can be applied to various important tasks, such as estimating intrinsic dimensionalities, annotating cell types, etc. They did a great job of tackling an important but not yet established problem in the field and approaching it with a theoretically sound and novel approach. I think after a more rigorous and comprehensive validation, this work could be impactful.

      Strengths:

      - Dimensionality reduction is a routine step in analyzing many high-dimensional data, such as molecular data. While the downstream analysis results depend heavily on this step, existing methods rely on strong assumptions and are sometimes heuristic. The authors present a novel, theoretically grounded approach to address this important problem.

      - The authors demonstrated its usability in downstream analysis in a comprehensive manner. Especially, they show evidence suggesting novel T-cell subpopulations.

      - I commend the authors for releasing and maintaining their software well with comprehensive documentation. This significantly increases the usability and accessibility of the method.

      Weaknesses:

      - The paper lacks experiments that validate the results. It would be beneficial to see additional evaluation settings with better-established ground truths to more strongly demonstrate the method's effectiveness.

      - Batch effects are prevalent in single-cell data. The paper does not adequately address how the proposed method handles this issue.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      (1) The method is an extension of the current state-of-art methods, not a fundamentally new one.

      We respectfully disagree with this characterization. While TopoMetry is inspired by the theory of spectral geometry, it is not a simple extension of existing dimensionality reduction methods such as Diffusion Maps. Instead, TopoMetry introduces a new framework for single-cell analysis that:

      Iteratively approximates manifold geometry by constructing refined diffusion operators on spectral scaffolds (“the geometry of the geometry”), a procedure not present in existing methods.

      Provides a unified workflow for dimensionality estimation, clustering, visualization, imputation, lineage inference, and diagnostics, all within the same geometric framework.

      Introduces operator-native fidelity scores and Riemannian diagnostics to single-cell analysis, enabling researchers to evaluate and trust embeddings—functionality absent in prior methods.

      Thus, TopoMetry represents a new paradigm for geometry-aware single-cell analysis, not merely a reimplementation of existing algorithms.

      (2) The paper contains a lot of jargon.

      We have thoroughly simplified the text throughout the manuscript. We now introduce geometric concepts in accessible terms, avoiding technical details where they are not essential for biological interpretation. For example, references to the Laplace–Beltrami operator and its eigenfunctions have been reduced and reframed in terms of “geometry,” “diffusion,” and “spectral scaffolds,” which are more intuitive for a general audience.

      Reviewer #1 (Recommendations for the authors):

      (1) What happens if the LBO is approximated more than twice? As the main idea of the method is an iterative approach to approximate LBO more precisely, then the authors would have already considered this. If so, this could be additionally discussed in the manuscript.

      We thank the reviewer for this important point. Indeed, TopoMetry’s design naturally supports iterating the Laplace–Beltrami operator (LBO) approximation beyond two steps. However, additional iterations (three or more) lead to only marginal improvements in final results while significantly increasing computational cost. In some tested cases, additional iterations could even over-smooth the data, reducing the resolution of fine-scale structure. The revised manuscript avoids an excessive focus on iterative LBO approximations and instead centers the narrative around representing and evaluating the underlying geometry of single-cell data.

      (2) As the paper describes the method in a very comprehensive way, as a result, it contains a lot of mathematical equations and jargon. This could hinder the visibility of the whole manuscript to biologists who do not have a background in mathematics. Thus, I strongly recommend that the authors consider moving a considerable amount of text to the supplementary material, and the main text should focus on the benchmarking results and the possible applications.

      We appreciate this recommendation and have substantially revised the manuscript to make it more accessible to a broad biological audience. In the revised version:

      We moved detailed mathematical derivations and operator definitions to the Methods section, keeping only the most essential concepts in the main text.

      We reframed technical terms (e.g., Laplace–Beltrami operator, eigenfunctions) in simpler and more intuitive language in the main text. 

      The Results section now emphasizes benchmarking outcomes and biological applications.

      Reviewer #2 (Public review):

      (1) To encourage the single-cell community to adopt this method, the authors should more clearly demonstrate its advantages over existing methods. There are many single cell analysis algorithms that are proposed in each task and some of them are widely used by biologists. However, the comparison in this work is somewhat limited. For example, Even methods mentioned in the relevant work paragraph (2nd paragraph) on page 2 are not all compared, or the reason why they are not included is not discussed. Also, I am curious how PC dimensions are determined. The choice of 300 PCs on page 11 seems arbitrary. Furthermore, the usefulness of dimension-reduced data also depends a lot on the preceding processing steps, such as highly variable gene selection. I understand it is hard to control all those factors, but I think there is room for improvement.

      We have substantially expanded the benchmarking and discussion of competing methods. These additions more clearly demonstrate TopoMetry’s advantages and robustness compared to widely adopted alternatives. In the revised manuscript:

      We now benchmark TopoMetry against 68 diverse single-cell datasets, far exceeding the scope of the original version.

      We explicitly compare TopoMetry with PCA→UMAP, standalone UMAP, and scVI. These workflows represent the de facto current standard in single-cell analysis. While numerous other approaches exist, a comprehensive benchmark of every possible workflow lies beyond the scope of this study and would itself warrant a dedicated report.

      We adopt the exact same preprocessing steps for all evaluated workflows to ensure a fair comparison, except for scVI, which requires gene counts data and performs its own internal preprocessing.

      We adjust the number of PCs used for each dataset based on the currently adopted “elbow point” ad hoc.

      (2) The paper lacks experiments that validate the results. It would be beneficial to see additional evaluation settings with better-established ground truths to more strongly demonstrate the method's effectiveness.

      We agree that validation is crucial and have strengthened this aspect:

      We introduce new geometry-preservation metrics and validate that TopoMetry outperforms current de facto standards.

      We demonstrate that TopoMetry resolves well-established ground-truth structures, such as the cell cycle in pancreas development and T cell proliferation, which PCA→UMAP fails to capture (Suppl. Fig. S3).

      We validate the biological relevance of novel T cell subpopulations by linking them to TCR clonotypes and clonal expansion patterns using datasets with paired VDJ information (ECCITE-TCR, TICA).

      We show that TopoMetry faithfully recovers expected lineage trajectories in atlas-scale datasets (MOCA).

      These analyses demonstrate that TopoMetry not only preserves geometry but also recovers biologically meaningful ground-truth structures. Further experimental investigation of biological insights obtained from the presented examples exceeds the scope of the presented methodological work.

      (3) The effect of various parameters, such as those involved in k-nearest neighbors (KNN) or choosing the appropriate Laplacian operator, is not comprehensively explored. How can we ensure the analysis is not overly sensitive to these parameters?

      We now explicitly address parameter robustness and show that results are stable across a wide range of k values (30–200) in the neighborhood graph (Suppl. Fig. S1e).

      The range of possible Laplacian operators was a design choice aimed at increasing user freedom, but we agree with the reviewer that this option could confuse readers and users. TopoMetry now only uses the appropriate operator (density-normalized graph Laplacian, a.k.a. diffusion operator), reducing variability and improving usability.

      (4) Batch effects are prevalent in single-cell data. The paper does not adequately address this issue.

      Several of the datasets we analyzed include cells from multiple donors and experimental batches, and TopoMetry successfully recovers consistent biological structure across these.

      TopoMetry’s spectral scaffolds can be integrated with data integration methods such as Harmony and Scanorama, which are employed to correct the latent PCA space in current practice.

      Reviewer #2 (Recommendations for the authors):

      (1) The paper introduces technical jargon without sufficient explanation abruptly many times. This makes it difficult for readers from a biological background to follow. Even I, with a more computational background, struggled to grasp some parts.

      We thank the reviewer for this feedback and have streamlined terminology throughout the manuscript, replacing jargon with more intuitive language and providing brief explanations when technical terms are first introduced. This makes the text more accessible to both computational and biological audiences.

      (2) There is no comparison of the computational cost of this method with existing approaches, which is an important factor for practical adoption. Including a benchmarking section on this would be useful.

      We thank the reviewer for this suggestion and have now included a runtime benchmark against PCA→UMAP, PHATE, and scVI (Suppl. Fig. 1f), showing that while TopoMetry is slightly slower than PCA→UMAP, it scales more favorably than alternative geometry-aware methods (PHATE) and neural networks (scVI).

      (3) TopOMetry allows users to obtain and evaluate dozens of possible representations. However, I wonder if this could introduce a user burden, increasing uncertainty and subjectivity, as users should examine them manually. I think this should be clarified.

      We appreciate this concern and have streamlined the workflow to minimize user burden. As shown in the original manuscript, representations learned with different TopoMetry kernels and Laplacian variants converge to highly similar results. Based on this, TopoMetry now defaults to the best-performing kernel and the most appropriate Laplacian operator, yielding only two scaffold representations (fixed-time and multiscale) and corresponding visualizations rather than dozens of alternatives. This removes the need for manual selection while retaining flexibility for advanced users. In addition, we introduced a single-line command that runs the entire analysis and generates a comprehensive PDF report, allowing users to evaluate results in a standardized and user-friendly way. Together, these changes eliminate unnecessary subjectivity and ensure consistent outputs across analyses.

      (4) Formatting. There are errors in figure numbering within the main text. For instance, it should be Figure 4 instead of Figure 3 on page 11. Some figures are not concise. For example, Figure 2 contains too much text, which detracts from its visibility. I recommend trimming the figures to improve clarity. A color map is missing in Figure 2, which could help better interpret the data.

      We have thoroughly adjusted the manuscript and figures for improved visibility and clarity.

      Broader Impact and Reception

      Since our preprint, TopoMetry has been used by Hale et al. (Science, 2024), where it helped reveal morphological T cell subpopulations, and in a recent preprint by Tedeschi et al. (2025). These independent applications highlight the utility and impact of TopoMetry beyond our group, supporting its relevance to diverse biological contexts. In addition, two independent studies performing multimodal integration of RNA and TCR data (Zhang et al., 2023 and Drost et al., 2024) have identified a diversity of T cell subpopulations that resembles the clusters identified by TopoMetry using only RNA data.

    1. eLife Assessment

      This study reports the relative importance of Tie1 and Tie2 signaling for atrial versus ventricular trabeculation. It is an important study and is one of the few works to date that have carefully and simultaneously analyzed these two processes. In line with a previous study in zebrafish, the authors demonstrate key differences between atrial and ventricular trabeculation. While the imaging and quantitative data were conducted with solid and validated methodology throughout the manuscript, the work would benefit from more rigourous approaches where Tie1/2 signaling is disrupted prior to the onset of atrial/ventricular trabeculation, to allow for a more direct comparison.

    2. Reviewer #1 (Public review):

      Summary:

      In this manuscript, Ding et al. use genetic mouse models to demonstrate that atrial trabeculation is more dependent on Tie1/Tie2 signaling than ventricular trabeculation. With additional experimentation that would support the current claims, the results may hold significant value, as atrial trabeculation remains an understudied phenomenon in cardiac biology with potential implications for atrial cardiomyopathy and atrial fibrillation.

      Strengths:

      Detailed characterization of atrial versus ventricular trabeculation across different developmental timepoints, and the use of appropriate animal models to address the scientific question at hand.

      Weaknesses:

      The authors have consistently treated mice with tamoxifen after ventricular, but not atrial, trabeculation has already started. As such, the observed cardiac phenotypes - where predominantly atrial trabeculation is affected - might be a mere consequence of the precise time window in which Tie1/2 signaling was impaired, rather than a direct measurement of its relative importance for atrial versus ventricular trabeculation. The conclusions of the paper may thus be significantly strengthened by depleting Tie1/2 signaling prior to the onset of ventricular trabeculation, as is done for atrial trabeculation.

    3. Reviewer #2 (Public review):

      Summary:

      Ding et al. examine the role of TIE1 in cardiac chamber morphogenesis using genetic mouse models targeting Tie1, Tek, or both, and analyzing endocardial cell-mediated chamber formation across multiple embryonic developmental and postnatal stages, supported by analysis of published single-cell datasets and new bulk RNA seq analyses of murine cardiac tissue. The authors find that Tie1 and Tek expression is higher in atrial than ventricular endocardial cells. Notably, endothelial Tie1 is required for atrial trabeculation at E12.5, but is less critical in ventricular trabeculation. TIE1 also acts synergistically with TIE2 during atrial trabeculation. While Tie1 deficiency alone does not cause defects at E10.5, combined heterozygous deletion of Tek disrupts both atrial and ventricular development at E10.5. This synergy is further supported by analyses at later embryonic stages and in postnatal hearts.

      Strengths:

      The study is well-designed, clearly written, and supported by high-quality figures. The performed experiments demonstrate a previously unrecognized role for Tie1 in cardiac development and identify synergistic control of cardiac morphogenesis by Tie1 and Tie2. This synergy is consistent with the previously identified roles of Tie1 and Tek in venous development and with Tie1 involvement in angiopoietin-dependent postnatal vascular and lymphatic remodeling. Together, these findings support a role for Tie1 as a contributor to Ang1-Tie2 signaling during heart development.

      Weaknesses:

      The manuscript does not include direct mechanistic studies; however, RNA seq analysis of atria and ventricles showed reduced expression of Tek, Dll1, and Notch1 upon Tie1 deficiency in developing hearts. Although previously reported mechanisms, such as TIE1-TIE2 heterodimer formation and effects on endothelial junctions, migration, or survival are discussed, no direct mechanistic experiments are performed. Addressing some of these mechanisms would have clarified the basis of Tie1-Tie2 synergy. As two distinct Tie1 models are used, including one targeting the kinase domain, the authors should state whether phenotypes differed or were similar between models.

    4. Reviewer #3 (Public review):

      Summary:

      Ding et al. investigate the roles of TIE1 and TEK (Tie2) in mouse cardiac development, with a particular focus on atrial trabeculation. The authors employ multiple genetic models, including Tie1ICDflox/flox (with Cdh5-CreERT2), a knockout-first allele (EUCOMM, Tie1 tm1a/tm1a), and a Tek deletion model.

      Based on the dataset from Feng et al. 2022 Nat Commun, the authors report increased expression of Tie1 and Tek transcripts in atrial endocardial cells compared to ventricular cells at embryonic day (E) 14.5. Loss of Tie1 leads to early atrial trabeculation defects detectable at E12.5, whereas ventricular defects appear later and are less pronounced at E14.5. Chamber-specific RNA sequencing reveals stronger transcriptional changes in atrial tissue.

      Conditional deletion of Tek results in a similar phenotype, with more pronounced atrial defects. Combined deletion of Tie1 and Tek (Tie1 ΔICD/ΔICD; Tek+/-) leads to earlier and more severe defects in both atrial and ventricular trabeculation and results in embryonic lethality around E12.5, suggesting a synergistic interaction between the two genes.

      Conditional endothelial deletion of Tie1 combined with heterozygous global Tek at later embryonic stages allows analysis at later time points and again shows more severe defects in atrial trabeculation. Postnatal analysis of this model reveals reduced heart-to-body weight ratios and potential mild atrial abnormalities.

      Strengths:

      (1) The authors address chamber-specific signaling mechanisms underlying atrial versus ventricular trabeculation, an area of high developmental and clinical relevance.

      (2) The study provides a comprehensive temporal analysis across multiple embryonic stages.

      (3) The use of multiple genetic models strengthens the overall conclusions and allows comparative interpretation.

      (4) While focusing on trabeculation, the authors also include observations on coronary vessel development, increasing the broader relevance of the work. The findings are therefore of interest to the wider cardiovascular research community.

      Weaknesses:

      (1) Timing of recombination vs. trabeculation onset

      Ventricular trabeculation begins earlier than atrial trabeculation. Since tamoxifen (in contrast to 4-hydroxytamoxifen) requires metabolic activation, Cre-mediated recombination will occur with a delay. This suggests that atrial trabeculation may be targeted before its onset, whereas ventricular trabeculation may already be underway for 2-3 days at the time of effective gene deletion.

      How do the authors account for this discrepancy in their interpretation?

      Have earlier induction time points been tested to better capture the onset of ventricular trabeculation? This limitation should be explicitly discussed.

      (2) Clarity of genetic models and experimental design

      The study employs several genetic constructs. It would improve clarity if, for each experiment, the specific genetic model and tamoxifen regimen were clearly described before presenting the results.

      (3) Tie1 tm1a/tm1a phenotype vs. known global knockout

      Previous studies (PMID: 8846781, 7596437) show that complete Tie1 loss leads to severe edema, vascular rupture, and embryonic lethality around E13.5-E14.5.

      How does the Tie1 tm1a/tm1a allele differ, given that animals appear to survive longer? Is this allele hypomorphic rather than a full knockout?

      This point requires clarification.

      (4) Limited mechanistic insight

      While the authors aim to investigate underlying mechanisms, the current study is largely descriptive and based on mRNA expression and genetic interaction analyses (Tie1/Tek co-deletion). Direct mechanistic insights into signaling pathways remain limited. However, the dataset provides a valuable foundation for future mechanistic studies, which should be more clearly acknowledged in the discussion.

    5. Author response:

      eLife Assessment

      This study reports the relative importance of Tie1 and Tie2 signaling for atrial versus ventricular trabeculation. It is an important study and is one of the few works to date that have carefully and simultaneously analyzed these two processes. In line with a previous study in zebrafish, the authors demonstrate key differences between atrial and ventricular trabeculation. While the imaging and quantitative data were conducted with solid and validated methodology throughout the manuscript, the work would benefit from more rigourous approaches where Tie1/2 signaling is disrupted prior to the onset of atrial/ventricular trabeculation, to allow for a more direct comparison.

      We thank the editors for the eLife assessment. We would like to request that the following statement be modified: “…the work would benefit from more rigourous approaches where Tie1/2 signaling is disrupted prior to the onset of atrial/ventricular trabeculation, to allow for a more direct comparison”. We request this change for the following reasons:

      We utilized two distinct genetic mouse models in this study (as summarized in Fig. 7I), comprising conventional knockouts (Tie1<sup>tm1a/tm1a</sup>, Tie1<sup>ΔICD/ΔICD</sup> and Tie1<sup>ΔICD/ΔICD</sup>;Tek<sup>+/-</sup>) and inducible gene deletion models (Tek<sup>iECKO</sup>, Tie1ICD<sup>iECKO</sup>, and Tie1ICD<sup>iECKO</sup>;Tek<sup>+/-</sup>) [1-3]. The Tie1<sup>tm1a/tm1a</sup> line is equivalent to the previously published Tie1<sup>-/-</sup mouse line, as demonstrated in our prior work and by others [1, 2, 4-6]. Therefore, the Tie1 or Tek alleles were inactivated prior to the onset of atrial and ventricular trabeculations, as shown in Fig. 1, Fig. 2, Fig. 3, Fig. 5A-D, and Supplemental Fig. 3. Based on these findings, we propose that TIE1 is differentially required for atria versus ventricle morphogenesis, and acts synergistically with TIE2 during cardiac trabeculation.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this manuscript, Ding et al. use genetic mouse models to demonstrate that atrial trabeculation is more dependent on Tie1/Tie2 signaling than ventricular trabeculation. With additional experimentation that would support the current claims, the results may hold significant value, as atrial trabeculation remains an understudied phenomenon in cardiac biology with potential implications for atrial cardiomyopathy and atrial fibrillation.

      Strengths:

      Detailed characterization of atrial versus ventricular trabeculation across different developmental timepoints, and the use of appropriate animal models to address the scientific question at hand.

      Weaknesses:

      The authors have consistently treated mice with tamoxifen after ventricular, but not atrial, trabeculation has already started. As such, the observed cardiac phenotypes - where predominantly atrial trabeculation is affected - might be a mere consequence of the precise time window in which Tie1/2 signaling was impaired, rather than a direct measurement of its relative importance for atrial versus ventricular trabeculation. The conclusions of the paper may thus be significantly strengthened by depleting Tie1/2 signaling prior to the onset of ventricular trabeculation, as is done for atrial trabeculation.

      We thank the reviewer for the comments.

      Regarding the timeline of gene deletion and tamoxifen treatment, we would like to provide the following clarification.

      Fig. 1-3: As described in the Methods and Materials, Tie1<sup>tm1a/tm1a</sup> is a knockout first mouse model established from EUCOMM embryonic stem cells (EPD0735-3B07) targeting Tie1 gene. Therefore, the Tie1<sup>tm1a/tm1a</sup> line is equivalent to the previously published Tie1 null mice (Tie1<sup>-/-</sup>). The Tie1<sup>Flox/Flox</sup> mouse line (with exon 8 floxed) was generated when the lacZ reporter and neo-cassette were excised using the FLPeR mice.

      Fig. 5A-D: To investigate the synergy of TIE1 and TIE2 in cardiac trabeculation, we utilized the Tek<sup>+/-</sup> and Tie1<sup>ΔICD/+</sup> mouse lines and they were crossbred to generate double mutant mice harboring a homozygous Tie1 mutation and a single null Tek allele (Tie1<sup>ΔICD/ΔICD</sup>;Tek<sup>+/-</sup>). Although no obvious defects were observed in atrial or ventricular structures following Tie1 deficiency alone at E10.5, both atria and ventricle development were disrupted in Tie1<sup>ΔICD/ΔICD</sup>;Tek<sup>+/-</sup> mutants at the same stage (Fig. 5A-D).

      Supplemental Fig. 3: To verify the role of TIE1 in atrial development, we employed alternative knockout mouse line targeting the Tie1 intracellular domain by floxing exons 15 and exon 16 (Tie1ICD<sup>Flox/Flox</sup>). Mutants harboring these null alleles are designated as Tie1<sup>ΔICD/ ΔICD</sup>. As detailed in the previous publication [2], the line is also equivalent to the previously published Tie1 null mice (Tie1<sup>-/-</sup>). The cardiac phenotypes shown in Supplemental Fig. 3 are indeed similar to those of Tie1<sup>tm1a/tm1a</sup> mutant mice.

      For the inducible knockouts targeting Tie1, Tek and both, the results are shown in Fig. 4, Fig. 5E-H, Fig. 6, Fig. 7.

      Fig. 4: As mice homozygous for Tek mutation (Tek<sup>-/-</sup>) die before E10.5 [3, 7], we performed studies using the inducible knockout line targeting Tek (Tek<sup>Flox/-</sup>;Cdh5-Cre<sup>ERT2</sup> named as Tek<sup>iECKO</sup>), as shown in Fig. 4.

      Fig. 5-7: To investigate the synergy of TIE1 and TIE2 in the cardiac trabeculation at the later stages of embryogenesis (Fig. 5E-H, Fig. 6) and the postnatal stage (Fig. 7), we used the inducible knockout models targeting Tie1/Tek, including Tie1ICD<sup>iECKO</sup> (Tie1ICD<sup>Flox/-</sup>;Cdh5-Cre<sup>ERT2</sup>) and Tie1ICD<sup>iECKO</sup>;Tek<sup>+/-</sup> (Tie1ICD<sup>Flox/-</sup>;Cdh5-Cre<sup>ERT2</sup>;Tek<sup>+/-</sup>).

      Reviewer #2 (Public review):

      Summary:

      Ding et al. examine the role of TIE1 in cardiac chamber morphogenesis using genetic mouse models targeting Tie1, Tek, or both, and analyzing endocardial cell-mediated chamber formation across multiple embryonic developmental and postnatal stages, supported by analysis of published single-cell datasets and new bulk RNA seq analyses of murine cardiac tissue. The authors find that Tie1 and Tek expression is higher in atrial than ventricular endocardial cells. Notably, endothelial Tie1 is required for atrial trabeculation at E12.5, but is less critical in ventricular trabeculation. TIE1 also acts synergistically with TIE2 during atrial trabeculation. While Tie1 deficiency alone does not cause defects at E10.5, combined heterozygous deletion of Tek disrupts both atrial and ventricular development at E10.5. This synergy is further supported by analyses at later embryonic stages and in postnatal hearts.

      Strengths:

      The study is well-designed, clearly written, and supported by high-quality figures. The performed experiments demonstrate a previously unrecognized role for Tie1 in cardiac development and identify synergistic control of cardiac morphogenesis by Tie1 and Tie2. This synergy is consistent with the previously identified roles of Tie1 and Tek in venous development and with Tie1 involvement in angiopoietin-dependent postnatal vascular and lymphatic remodeling. Together, these findings support a role for Tie1 as a contributor to Ang1-Tie2 signaling during heart development.

      Weaknesses:

      The manuscript does not include direct mechanistic studies; however, RNA seq analysis of atria and ventricles showed reduced expression of Tek, Dll1, and Notch1 upon Tie1 deficiency in developing hearts. Although previously reported mechanisms, such as TIE1-TIE2 heterodimer formation and effects on endothelial junctions, migration, or survival are discussed, no direct mechanistic experiments are performed. Addressing some of these mechanisms would have clarified the basis of Tie1-Tie2 synergy. As two distinct Tie1 models are used, including one targeting the kinase domain, the authors should state whether phenotypes differed or were similar between models.

      We thank the reviewer for the comments. In this study, we have provided genetic evidence that TIE1 is differentially required for atrial versus ventricular trabeculation. Although the precise molecular mechanisms underlying TIE1 function require further investigation, we have provided compelling genetic evidence of its synergistic role with TIE2 during this process. The two genetic models targeting Tie1 (Tie1<sup>tm1a/tm1a</sup>, Tie1<sup>ΔICD/ΔICD</sup>) produced consistent cardiac and vascular phenotypes as shown in this study and our previous work [1, 2].

      Reviewer #3 (Public review):

      Summary:

      Ding et al. investigate the roles of TIE1 and TEK (Tie2) in mouse cardiac development, with a particular focus on atrial trabeculation. The authors employ multiple genetic models, including Tie1ICDflox/flox (with Cdh5-CreERT2), a knockout-first allele (EUCOMM, Tie1 tm1a/tm1a), and a Tek deletion model.

      Based on the dataset from Feng et al. 2022 Nat Commun, the authors report increased expression of Tie1 and Tek transcripts in atrial endocardial cells compared to ventricular cells at embryonic day (E) 14.5. Loss of Tie1 leads to early atrial trabeculation defects detectable at E12.5, whereas ventricular defects appear later and are less pronounced at E14.5. Chamber-specific RNA sequencing reveals stronger transcriptional changes in atrial tissue.

      Conditional deletion of Tek results in a similar phenotype, with more pronounced atrial defects. Combined deletion of Tie1 and Tek (Tie1 ΔICD/ΔICD; Tek+/-) leads to earlier and more severe defects in both atrial and ventricular trabeculation and results in embryonic lethality around E12.5, suggesting a synergistic interaction between the two genes.

      Conditional endothelial deletion of Tie1 combined with heterozygous global Tek at later embryonic stages allows analysis at later time points and again shows more severe defects in atrial trabeculation. Postnatal analysis of this model reveals reduced heart-to-body weight ratios and potential mild atrial abnormalities.

      Strengths:

      (1) The authors address chamber-specific signaling mechanisms underlying atrial versus ventricular trabeculation, an area of high developmental and clinical relevance.

      (2) The study provides a comprehensive temporal analysis across multiple embryonic stages.

      (3) The use of multiple genetic models strengthens the overall conclusions and allows comparative interpretation.

      (4) While focusing on trabeculation, the authors also include observations on coronary vessel development, increasing the broader relevance of the work. The findings are therefore of interest to the wider cardiovascular research community.

      Weaknesses:

      (1) Timing of recombination vs. trabeculation onset

      Ventricular trabeculation begins earlier than atrial trabeculation. Since tamoxifen (in contrast to 4-hydroxytamoxifen) requires metabolic activation, Cre-mediated recombination will occur with a delay. This suggests that atrial trabeculation may be targeted before its onset, whereas ventricular trabeculation may already be underway for 2-3 days at the time of effective gene deletion.

      How do the authors account for this discrepancy in their interpretation?

      Have earlier induction time points been tested to better capture the onset of ventricular trabeculation? This limitation should be explicitly discussed.

      (2) Clarity of genetic models and experimental design

      The study employs several genetic constructs. It would improve clarity if, for each experiment, the specific genetic model and tamoxifen regimen were clearly described before presenting the results.

      We thank the reviewer for the detailed and constructive comments. For studies employing the inducible gene deletion mouse models, the genetic models and tamoxifen treatment schemes have been provided in the related figures. For the rest of studies, we used the conventional knockouts targeting Tie1 and Tek (Tie1<sup>tm1a/tm1a</sup>, Tie1<sup>ΔICD/ΔICD</sup> and Tie1<sup>ΔICD/ΔICD</sup>;Tek<sup>+/-</sup>), as detailed above.

      (3) Tie1 tm1a/tm1a phenotype vs. known global knockout

      Previous studies (PMID: 8846781, 7596437) show that complete Tie1 loss leads to severe edema, vascular rupture, and embryonic lethality around E13.5-E14.5.

      How does the Tie1 tm1a/tm1a allele differ, given that animals appear to survive longer? Is this allele hypomorphic rather than a full knockout?

      This point requires clarification.

      Tie1<sup>tm1a/tm1a</sup> is equivalent to the full knockout (Tie1<sup>-/-</sup>). As demonstrated in our prior work, the Tie1<sup>ΔICD/ΔICD</sup> model produced lymphatic and blood vascular phenotypes similar to those of Tie1<sup>-/-</sup> mutants [1, 2, 5, 6].

      (4) Limited mechanistic insight

      While the authors aim to investigate underlying mechanisms, the current study is largely descriptive and based on mRNA expression and genetic interaction analyses (Tie1/Tek co-deletion). Direct mechanistic insights into signaling pathways remain limited. However, the dataset provides a valuable foundation for future mechanistic studies, which should be more clearly acknowledged in the discussion.

      We thank the reviewer for the comments. The manuscript will be revised accordingly, and a detailed response will be provided in our final submission.

      Reference

      (1) Cao, X., et al., Endothelial TIE1 Restricts Angiogenic Sprouting to Coordinate Vein Assembly in Synergy With Its Homologue TIE2. Arterioscler Thromb Vasc Biol, 2023. 43(8): p. e323-e338.

      (2) Shen, B., et al., Genetic dissection of tie pathway in mouse lymphatic maturation and valve development. Arterioscler Thromb Vasc Biol, 2014. 34(6): p. 1221-30.

      (3) Chu, M., et al., Angiopoietin receptor Tie2 is required for vein specification and maintenance via regulating COUP-TFII. Elife, 2016. 5:e21032.

      (4) Rodewald, H.R. and T.N. Sato, Tie1, a receptor tyrosine kinase essential for vascular endothelial cell integrity, is not critical for the development of hematopoietic cells. Oncogene, 1996. 12(2): p. 397-404.

      (5) D'Amico, G., et al., Loss of endothelial Tie1 receptor impairs lymphatic vessel development-brief report. Arterioscler Thromb Vasc Biol, 2010. 30(2): p. 207-9.

      (6) Qu, X., et al., Abnormal embryonic lymphatic vessel development in Tie1 hypomorphic mice. Development, 2010. 137(8): p. 1285-95.

      (7) Dumont, D.J., et al., Dominant-negative and targeted null mutations in the endothelial receptor tyrosine kinase, tek, reveal a critical role in vasculogenesis of the embryo. Genes Dev, 1994. 8(16): p. 1897-909.

    1. eLife Assessment

      This Review Article provides a scholarly, clear and well-structured review of intracranial research into the neural correlates of consciousness (NCCs). To our knowledge this is the first such review and is therefore likely to become a must-read for anyone working in the field of consciousness research. The authors discuss the difficulties that researchers must face when studying NCCs and how insights may emerge via intracranial recordings in humans. This no doubt reflects an in-depth, timely, and insightful contribution to the literature.

    2. Reviewer #1 (Public review):

      Summary:

      In this review paper, the authors describe the concept of neural correlates of consciousness (NCC) and explain how noninvasive neuroimaging methods fall short of being able to properly characterise an unconfounded NCC. They argue that intracranial research is a means to address this gap and provide a review of many intracranial neuroimaging studies that have sought to answer questions regarding the neural basis of perceptual consciousness.

      Strengths:

      The authors have provided an in-depth, timely, and scholarly contribution to the study of NCCs. First and foremost, the review surveys a vast array of literature. The authors synthesise findings such that a coherent narrative of what invasive electrophysiology studies have revealed about the neural basis of consciousness can be easily grasped by the reader. The authors also succeed in describing how single-cell recordings can interface with task-design to help mitigate the impact of confounded neural activity when searching for NCCs.

      The review is also, to the best of my knowledge, the first review to specifically target intracranial approaches to consciousness and to describe their results in a single article. This is a credit to the authors - as it becomes ever harder to apply strict tests to theories of consciousness using methods such as fMRI and M/EEG, it is important to have informative resources describing the results of human intracranial research so that theorists will have to constrain their theories further in accordance with such data. Additionally, the authors provide a compelling case for single-celled research in consciousness science, despite the dominance of theories situated at the system and circuit level of analysis. As far as the authors were aiming to provide a complete and coherent overview of intracranial approaches to the study of NCCs, I believe they have achieved their aim.

      Weaknesses:

      Overall, I feel positive about this paper. The authors have addressed my comments from my previous review and I see no significant weaknesses in the current version.

      Comment on revised version:

      No comments - congratulations to the authors!

    3. Reviewer #2 (Public review):

      Summary:

      In this work, the authors review the study of the neural correlates of consciousness (NCCs). They discuss several of the difficulties that researchers must face when studying NCCs, and argue that several of these difficulties can be alleviated by using intracranial recordings in humans.

      They describe what constitutes an NCC, and the difficulties to distinguish between an NCC proper from the prerequisites and consequences of conscious processing.

      They also describe the two main types of experimental designs used to study NCCs. These are the contrastive approach (with its report and non-report variants), and the supraliminal approach, each with their own merits and pitfalls.

      They discuss the limitations of non-invasive methods, such as fMRI, EEG and MEG, as well as the limitations of the use of invasive recordings in non-human animals.

      After setting the stage in this way, the authors provide an extensive review on the knowledge acquired by using invasive recordings in humans. This included population level measurements in vision and in other sensory modalities, as well as single neuron level studies. The authors also discuss studies of subcortical NCCs.

      The second half of this work discusses the theoretical insights gained through the use of intracranial recordings, as well as their limitations, and a perspective for future work.

      Strengths:

      This work offers an impressive review, which will serve as a useful reference document, both for newcomers to the study of NCC as for experienced researchers. The inclusion of non-visual and subcortical NCCs is of particular merit, as these have been understudied.

      Besides serving as a review, this work includes a perspective, exploring several directions to pursue for the progress of the field.

      Weaknesses:

      No major weaknesses.

      Appraisal of whether the authors achieved their aims:

      In this work, the authors have gathered an impressive review, and have discussed several important problems in the field of study of NCCs, as well as provided a perspective on how the field could move forward.

      Discussion of the likely impact of the work on the field:

      This work has the potential of becoming a must read for anyone working in the field of consciousness research.

      Comment on revised version:

      The authors have addressed all my concerns. Once again, my compliments for a nice piece of work.

    4. Reviewer #3 (Public review):

      Summary:

      This narrative review provides a clear, well-structured, and comprehensive synthesis of intracerebral recording work on the neural correlates of consciousness. It is written in an accessible manner that will be useful to a broad community of researchers, from those new to iEEG to specialists in the field.

      Strengths:

      The manuscript successfully integrates methodological and theoretical perspectives and offers a balanced overview of current sometimes contradicting evidence. As such, the manuscript is important as call for a concernted better exploration of NCCs using iEEG in the future.

      Weaknesses:

      The manuscript discusses extensively the use of "report" as a criterion for identifying conscious perception and its limitations for separating between correlates of consciousness and post consciousness processes, yet the term is not defined at the outset. The authors should specify what they mean by "report" (e.g., verbal report, nonverbal self-report, or any meta-cognitive indication of experience). Importantly, this definition should be explicitly linked to the theoretical landscape: whether the authors adopt an access-consciousness perspective in which (self) reportability is central, or whether the review also aims to address phenomenal consciousness. Making this conceptual grounding explicit at the beginning will help readers interpret the empirical work surveyed throughout the review.

      In addition, the review would benefit from an earlier introduction of the distinction between states and contents of consciousness. This distinction becomes important in the later section on anesthesia, sleep, and epileptic seizures, where the focus shifts from content-specific NCCs to alterations in global states. Presenting these definitions upfront, and briefly explaining how states and contents interact, would strengthen the coherence of the manuscript.

      Overall, this is an excellent and timely review. With clearer initial theoretical definitions of consciousness, the manuscript will offer an even stronger conceptual framework for interpreting intracerebral studies of consciousness.

      Comments on revised version:

      The current version of the manuscript is clear and complete. Kudos to the authors for their thorough revisions.

      My only remaining point concerns the definition of "report": "We define a report as any explicit behavioral response (whether verbal, manual, or otherwise) that communicates a participant's subjective state."

      It would be helpful to clarify whether this definition is intended to exclude purely internal, explicit self-reports that are not externally expressed. As currently formulated, the definition appears to require overt behavioral communication. However, this raises a conceptual issue in relation to the no-report paradigm literature, where the distinction between report, metacognitive access, and overt motor/verbal expression is precisely at stake.

      Could the authors specify whether "report" is meant to (i) be restricted to externally observable, behaviorally expressed reports, or (ii) extend to internally generated, explicit metacognitive judgments even when they are not communicated? Clarifying this point would help situate the manuscript more precisely within ongoing debates on the role of report in identifying neural correlates of consciousness.

    5. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary

      In this review paper, the authors describe the concept of neural correlates of consciousness (NCC) and explain how noninvasive neuroimaging methods fall short of being able to properly characterise an unconfounded NCC. They argue that intracranial research is a means to address this gap and provide a review of many intracranial neuroimaging studies that have sought to answer questions regarding the neural basis of perceptual consciousness.

      Strengths

      The authors have provided an in-depth, timely, and scholarly contribution to the study of NCCs. First and foremost, the review surveys a vast array of literature. The authors synthesise findings such that a coherent narrative of what invasive electrophysiology studies have revealed about the neural basis of consciousness can be easily grasped by the reader. The review is also, to the best of my knowledge, the first review to specifically target intracranial approaches to consciousness and to describe their results in a single article. This is a credit to the authors, as it becomes ever harder to apply strict tests to theories of consciousness using methods such as fMRI and M/EEG it is important to have informative resources describing the results of human intracranial research so that theorists will have to constrain their theories further in accordance with such data. As far as the authors were aiming to provide a complete and coherent overview of intracranial approaches to the study of NCCs, I believe they have achieved their aim.

      We appreciate the reviewer's positive feedback on our work.

      Weaknesses

      Overall, I feel positive about this paper. However, there are a couple of aspects to the manuscript that I think could be improved.

      (1) Distinguishing NCCs from their prerequisites or consequences

      This section in the introduction was particularly confusing to me. Namely, in this section, the authors' aim is to explain how intracranial recordings can help distinguish 'pure' NCCs from their antecedents and consequences. However, the authors almost exclusively describe different tasks (e.g., no-report tasks) that have been used to help solve this problem, rather than elaborating on how intracranial recordings may resolve this issue. The authors claim that no-report designs rely on null findings, and invasive recordings can be more sensitive to smaller effects, which can help in such cases. However, this motivation pertains to the previous sub-section (limits of noninvasive methods), since it is primarily concerned with the lack of temporal and spatial resolution of fMRI and M/EEG. It is not, in and of itself, a means to distinguish NCCs from their confounds.

      As such, in its current formulation, I do not find the argument that intracranial recordings are better suited to identifying pure NCCs (i.e. separating them from pre- or post-processing) convincing. To me, this is a problem solved through novel paradigms and better-developed theories. As it stands, the paper justifies my position by highlighting task developments that help to distinguish NCCs from prerequisites and consequences, rather than giving a novel argument as to why intracranial recordings outperform noninvasive methods beyond the reasons they explained in the previous section. Again, this position is justified when, from lines 505-506, the authors describe how none of the reported single-cell studies were able to dissociate NCCs from post-perceptual processing. As such, it seems as if, even with intracranial recording, NCCs and their confounds cannot be disentangled without appropriate tasks.

      The section 'Towards Better Behavioural Paradigms' is a clear attempt to address these issues and, as such, I am sure the authors share the same concerns as I am raising. Still, I remain unconvinced that the distinguishing of NCCs from pre-/post- processing is a fair motivation for using intracranial over noninvasive measures.

      We agree that distinguishing proper NCCs from their prerequisites or consequences is primarily a matter of experimental design and theoretical framework, not merely of recording modality. We did not mean to imply that intracranial recordings inherently solve this dissociation.This is now explicitly stated that at the beginning of this section. Instead, we argued that the high signal-to-noise ratio and spatiotemporal accuracy of sEEG offer a stronger "testing ground" for the null findings often relied on by no-report paradigms. This is now also further clarified in the revised section “Limits of noninvasive measures”.

      We also explicitly acknowledge, as the reviewer noted, that even the most precise recordings require careful task dissociations to distinguish NCCs from their prerequisites and consequences.

      (2) Drawing misleading conclusions from certain studies

      There are passages of the manuscript where the authors draw conclusions from studies that are not necessarily warranted by the studies they cite. For instance:

      Lines 265 - 271: "The results of these two studies revealed a complex pattern: on the one hand, HGA in the lateral occipitotemporal cortex and the ventral visual cortex correlated with stimulus strength. On the other hand, it also correlated with another factor that does not appear to play a role in visibility (repetition suppression), and did not correlate with a non-sensory factor that affects visibility reports (prior exposure). These results suggest that activity in occipitotemporal cortex regions reflecting higher-order visual processing may be a precursor to the NCC but not an NCC proper."

      It's possible to imagine a theory that would predict HGA could correlate with stimulus strength and repetition suppression, or that it would not correlate with prior exposure (e.g. prior exposure could impact response bias without affecting subjective visibility itself). The authors describe this exact ambiguity in interpretation later in the article (line 664), but in its current form, at least in line 270 (when the study is most extensively discussed), the manuscript heavily implies that HGA is not an NCC proper. This generates a false impression that intracranial recordings have conclusively determined that occipitotemporal HGA is not a pure NCC, which is certainly a premature conclusion.

      We agree that our interpretation of these studies (lines 265–271 of the previous version of the manuscript) was presented too definitively. We have modified the text (now lines 314-317) to soften this conclusion and align it with the more nuanced discussion later in the manuscript. Specifically, we now frame this as a "suggested dissociation" rather than a conclusive finding (line 730), and we explicitly acknowledge that alternative interpretations remain viable.

      Line 243: "Altogether, these early human intracranial studies indicate that early-latency visual processing steps, reflected in broadband and low gamma activity, occur irrespective of whether a stimulus is consciously perceived or not. They also identified a candidate NCC: later (>200 ms) activity in the occipitotemporal region responsible for higher-order visual processing."

      The authors claim in this section that later (>200ms) activity in occipitotemporal regions may be a candidate for an NCC. However, the Fisch et al. (2009) study they describe in support of this conclusion found that early (~150ms) activity could dissociate conscious and unconscious processing. This would suggest that it is early processing that lays claim to perceptual consciousness. The authors explicitly describe the Fisch et al results as showing evidence for early markers of consciousness (line 240: '...exhibited an early...response following recognized vs unrecognised stimuli.) Yet only a few lines later they use this to support the conclusion that a candidate NCC is 'later (>200ms) activity in the occipitotemporal region' (line 245). As such, I am not sure what conclusion the authors want me to make from these studies.

      This problem is repeated in lines 386-387: "Altogether, studies that investigated the cortical correlates of visual consciousness point to a role of neural responses starting ~250 ms after stimulus onset in the non-primary visual cortex and prefrontal cortex."

      This seems to be directly in conflict with the Fisch et al results, which show that correlates of consciousness can begin ~100ms earlier than the authors state in this passage.

      We thank the reviewer for pointing out this inconsistency. We agree that stating ">200 ms" conflicts with the findings of Fisch et al. (2009), who observed dissociations as early as ~150 ms. Our goal was to contrast the very early, stimulus-driven responses with the later responses that reflect consciousness. However, as the reviewer correctly notes, the exact "onset" of these signals varies across studies and paradigms. To address this, we have removed the specific ">200 ms" mentioned in line 245 of the previous version of the manuscript and updated the timing in line 284 to "starting 150 ms" to better reflect the results of Fisch et al. We also clarify that while the exact latency depends on the paradigm, a consistent finding is that activity representing conscious contents in higher-order visual cortex follows an initial wave of unconscious processes (lines 809-810).

      (3) Justifying single-neuron cortical correlates of consciousness

      The purpose of the present manuscript is to highlight why and how intracortical measures of neural activity can help reveal the neural correlates of perceptual consciousness. As such, in the section 'Single-neuron cortical correlates of perceptual consciousness', I think the paper is lacking an argument as to why single-neuron research is useful when searching for the NCC. Most theories of consciousness are based around circuit or system-level analyses (e.g., global ignition, recurrent feedback, prefrontal indexing, etc.) and usually do not make predictions about single cells. Without any elaboration or argument as to why single-cell research is necessary for a science of consciousness, the research described in this section, although excellent and valuable in its own right, seems out of place in the broader discussion of NCCs. A particularly strong interpretation here could be that intracranial recordings mislead researchers into studying single cells simply because it is the finest level of analysis, rather than because it offers helpful insight into the NCCs.

      It is true that many prominent theories of consciousness were developed based on macroscopic observations, largely due to the prevalence of non-invasive recordings in humans. However, we argue that recording single-unit activity is important for several reasons, and we made this clearer in the revised version. First, signals like fMRI, EEG (or even LFP) often conflate multiple distinct neural populations. SUA allows us to dissociate neurons representing the percept from neighboring neurons involved in task-related confounds (e.g., motor preparation or arousal) that would otherwise be blurred together. Therefore, some percepts might be represented by sparse coding involving a small, specific population of "concept" or "percept" cells. Electrophysiological studies in animal models reveal that various cognitive processes are encoded within neuronal subspaces that only emerge when single-unit activity is analyzed as lower-dimensional projections of the broader neural activity manifold (Mante et al., 2013; Ebitz & Hayden, 2021; Jayazeri & Afraz, 2017). Importantly, many neural computations are only discernible through the lens of population dynamics (i.e. with single neuron activity) (Vyas et al., 2021). We believe that providing high granularity through SUA recordings prevents over-aggregation of data, ensuring that even system-level theories can build on biologically accurate foundations.

      Moreover, some theories are defined at the cellular level. For instance, the Dendritic Integration Theory (Bachmann et al., 2020) posits that the integration of feedforward and feedback signals occurs at the level of individual pyramidal neurons. Without SUA, these cellular mechanisms remain untestable. Beyond spatial granularity, SUA also provides excellent temporal granularity, which is crucial for testing theories that rely on the precise timing of spikes (e.g., neural synchrony). As LFPs reflect average activity across populations, only SUA can confirm whether individual neurons lock their spikes to a specific phase, a mechanism hypothesized to bind features into a conscious whole.

      We added these points to a new section in the revised manuscript. References:

      Bachmann, T., Suzuki, M., & Aru, J. (2020). Dendritic integration theory: A thalamo-cortical theory of state and content of consciousness. Philosophy and the Mind Sciences, 1(II).

      Ebitz, R. B., & Hayden, B. Y. (2021). The population doctrine in cognitive neuroscience. Neuron, 109(19), 3055-3068.

      Jazayeri, M., & Afraz, A. (2017). Navigating the neural space in search of the neural code. Neuron, 93(5), 1003-1014.

      Mante, V., Sussillo, D., Shenoy, K. V., & Newsome, W. T. (2013). Context-dependent computation by recurrent dynamics in prefrontal cortex. nature, 503(7474), 78-84.

      Vyas, S., Golub, M. D., Sussillo, D., & Shenoy, K. V. (2020). Computation Through Neural Population Dynamics. Annual Review of Neuroscience, 43(1), 249-275.

      (4) No mention of combined fMRI-EEG research

      A minor point, but I was surprised that the authors did not mention any combined fMRI-EEG research when they were discussing the limits of noninvasive recordings. Intracortical recordings are one way to surpass the spatial and temporal resolution limits of M/EEG and fMRI respectively, but studies that combine fMRI and EEG are also an alternative means to solve this problem: by combining the spatial resolution of fMRI with the temporal resolution of EEG, researchers can - in theory - compare when and where certain activity patterns (be they univariate ERPs or multivariate patterns) arise. The authors do cite one paper (Dellert et al., 2021 JNeuro) that used this kind of setup, but they discuss it only with respect to the task and ignore the recording method. The argument for using intracranial recordings is weaker for not mentioning a viable, noninvasive alternative that resolves the same issues.

      We thank the reviewer for this point. We have added a discussion of fMRI-EEG to the "Limits of noninvasive measures" section (lines 167-171). While we acknowledge that fMRI-EEG is a powerful non-invasive tool for bridging spatial and temporal scales, we note that it relies on merging an indirect metabolic signal with a weak electrophysiological one filtered by the skull, which is computationally complex and often noisy. In contrast, intracranial recordings provide direct measures of both local field potentials and spiking activity within the same neural population, offering interpretability and signal-to-noise ratio that non-invasive combinations cannot match. In our view, this is not just an alternative to these methods, but a unique means of accessing the underlying neuronal ground truth.

      Reviewer #2 (Public review):

      Summary:

      In this work, the authors review the study of the neural correlates of consciousness (NCCs). They discuss several of the difficulties that researchers must face when studying NCCs, and argue that several of these difficulties can be alleviated by using intracranial recordings in humans.

      They describe what constitutes an NCC, and the difficulties to distinguish between an NCC proper from the prerequisites and consequences of conscious processing.

      They also describe the two main types of experimental designs used to study NCCs. These are the contrastive approach (with its report and non-report variants), and the supraliminal approach, each with its own merits and pitfalls.

      They discuss the limitations of non-invasive methods, such as fMRI, EEG and MEG, as well as the limitations of the use of invasive recordings in non-human animals.

      After setting the stage in this way, the authors provide an extensive review of the knowledge acquired by using invasive recordings in humans. This included population-level measurements in vision and in other sensory modalities, as well as single-neuron level studies. The authors also discuss studies of subcortical NCCs.

      The second half of this work discusses the theoretical insights gained through the use of intracranial recordings, as well as their limitations, and a perspective for future work.

      Strengths:

      This work offers an impressive review, which will serve as a useful reference document, both for newcomers to the study of NCC and for experienced researchers. The inclusion of non-visual and subcortical NCCs is of particular merit, as these have been understudied.

      Besides serving as a review, this work includes a perspective, exploring several directions to pursue for the progress of the field.

      We thank the reviewer for acknowledging the strength of our work.

      Weaknesses:

      The intention of the authors is to argue how some of the problems faced when studying NCCs are alleviated by the use of intracranial recordings in humans. But in some cases, the link between the problems related to the study of NCCs and the advantages of intracranial recordings over non-invasive methods is not clear.

      For example, the authors explain the difficulties in distinguishing between true NCCs from their prerequisites and consequences. This constitutes a difficult conceptual problems that plague all recording techniques. The authors don't provide a convincing explanation of how intracranial recordings offer advantages over EEG or MEG when dealing with these problems.

      We agree that the distinction between proper NCCs and their prerequisites or consequences is a fundamental challenge that affects all recording modalities. We did not intend to imply that intracranial recordings are a "silver bullet" for solving this conceptual problem in isolation, and we now explicitly state that at the beginning of this section (line 101).

      We have revised the section on "Distinguishing NCCs from their prerequisites or consequences" to clarify that intracranial recordings are a powerful tool when used in conjunction with appropriate experimental designs, rather than a standalone solution to these conceptual difficulties.

      For example, the authors explain how the use of non-report designs to rule out post-perceptual processing relies on null results, which, according to them, are harder to interpret given the low resolution of non-invasive methods. But the interpretation of null results is actually more complicated in the case of intracranial recordings. As the coverage achieved by the electrodes is sparse, if a null result is attested, it remains possible that a true effect was present in a nearby patch of cortex out of coverage.

      It is true that a null result in an intracranial study may simply reflect that the relevant neural population was not sampled by the specific electrode implantation scheme. However, we argue that interpreting null results is equally, if not more, complicated in non-invasive methods, albeit for different reasons. While M/EEG offers broader coverage, it is blind to many cortical sources because of their orientation (radial sources in MEG) or their location in deep sulci and subcortical structures. The signal-to-noise ratio of M/EEG is also much lower than that of intracranial EEG, making it more likely that null results obscure the existence of subtle effects (Parvizi & Kastner, 2018).

      To address this, we revised the manuscript to clarify that intracranial recordings provide high local certainty within the sampled regions (lines 224-227), whereas non-invasive methods provide broader coverage (lines 247-249). We now explicitly emphasize that drawing conclusions from null results based on intracranial recordings requires caution regarding electrode placement. We also point out that these approaches are complementary: M/EEG can identify large regions of interest, while sEEG can then provide high-resolution "ground truth" to confirm whether those regions are part of the NCC.

      Reference: Parvizi, J., & Kastner, S. (2018). Promises and limitations of human intracranial electroencephalography. Nature Neuroscience, 21(4), 474-483. https://doi.org/10.1038/s41593-018-0108-2

      The authors argue that the spatial resolution of intracranial recordings is better than that of EEG and MEG. While this is technically true (especially compared to EEG), the true spatial scale of the NCCs is unknown. If NCCs' span is in the mm range, then the additional spatial resolution of intracranial recordings might not be an advantage.

      We agree with the reviewer that the exact spatial scale of the NCC remains a topic of ongoing debate. However, we believe that the advantage of intracranial recordings holds true whether the NCC spans millimeters or centimeters. The main spatial limitation of non-invasive electrophysiology (M/EEG) is not just its spatial resolution but also the inverse problem. Since scalp sensors detect a mixture of signals from across the brain, different cortical configurations can produce identical scalp patterns. This makes it challenging to precisely locate the NCC or distinguish it from nearby activity (e.g., motor or attentional signals). When recording intracortically, a widespread NCC could be captured across multiple adjacent channels with high accuracy. Conversely, if the NCC is focal, it can be isolated with high spatial resolution. In either case, intracranial recordings eliminate the spatial ambiguity inherent in scalp recordings. We have revised the Introduction (lines 158-164) to clarify that the "spatial advantage" of intracranial recordings also pertains to the inverse problem, not merely to the ability to record from smaller cortical areas.

      Another factor that should be taken into consideration when assessing the spatial resolution of intracranial recordings is that while the listening zone of individual intracranial contacts is small, coverage is sparse and defined by clinical criteria (something that the authors discuss). In practice, the activity recorded by contacts is usually attributed to anatomically defined ROIs with a scale in the cm range. Given the sparse and uneven (across regions and patients) coverage afforded by intracranial recordings, the advantage of intracranial recordings in terms of spatial resolution is overstated.

      We thank the reviewer for raising this point regarding how intracranial data is often aggregated into regions of interest. We agree that if researchers generalize findings to large anatomical regions without accounting for single-channel recordings, some of the spatial benefits of intracranial recordings are indeed mitigated. We toned down some of the original claims accordingly, and acknowledged more clearly that clinical constraints of sEEG lead to sparse coverage (245-249).

      However, we maintain that even when using an ROI-based approach, intracranial recordings offer a clear advantage over non-invasive methods, in that they represent a direct measure from a specific patch of tissue, rather than a statistical estimate that may be contaminated by "leakage" from distant sources. To address the reviewer’s concern, we have updated the manuscript (lines 244-245) to emphasize the importance of relying on MNI coordinates and individual anatomy rather than solely on broad ROI labels.

      Appraisal of whether the authors achieved their aims:

      In this work, the authors have gathered an impressive review and have discussed several important problems in the field of study of NCCs, as well as provided a perspective on how the field could move forward.

      What is less clear is how the use of intracranial recordings per se holds potential to overcome problems such as the distinction between true NCCs and the prerequisites and consequences of conscious processing.

      Discussion of the likely impact of the work on the field:

      This work has the potential of becoming a must-read for anyone working in the field of consciousness research.

      Reviewer #3 (Public review):

      Summary:

      This narrative review provides a clear, well-structured, and comprehensive synthesis of intracerebral recording work on the neural correlates of consciousness. It is written in an accessible manner that will be useful to a broad community of researchers, from those new to iEEG to specialists in the field.

      Strengths:

      The manuscript successfully integrates methodological and theoretical perspectives and offers a balanced overview of current, sometimes contradicting evidence. As such, the manuscript is important as it calls for a concerted and better exploration of NCCs using iEEG in the future.

      We thank the reviewer for stating the importance of our work and its potential contribution to the field.

      Weaknesses:

      The manuscript extensively discusses the use of "report" as a criterion for identifying conscious perception and its limitations for separating between correlates of consciousness and post-consciousness processes, yet the term is not defined at the outset. The authors should specify what they mean by "report" (e.g., verbal report, nonverbal self-report, or any meta-cognitive indication of experience). Importantly, this definition should be explicitly linked to the theoretical landscape: whether the authors adopt an access-consciousness perspective in which (self) reportability is central, or whether the review also aims to address phenomenal consciousness. Making this conceptual grounding explicit at the beginning will help readers interpret the empirical work surveyed throughout the review.

      We agree that a clear definition of report is essential for the reader to interpret the empirical findings presented. We have added a definition to the Introduction (lines 108-111), specifying that we use "report" to refer to any explicit behavioral response (whether verbal, manual, or otherwise) that communicates a subject’s subjective state.

      Regarding the conceptual distinction between Phenomenal and Access consciousness, we refer to recent work from some of the co-authors (Mudrik et al., 2025), which suggests that P and A should not be seen as two types of consciousness, but rather as two necessary conditions for conscious experience. While a full discussion of this distinction is beyond the scope of this review, we now clearly state that our focus is on identifying neural activity that reflects the subjective experience itself, regardless of the downstream requirements of report.

      Reference: Mudrik, L., Faivre, N., Pitts, M., & Schurger, A. (2025). On a confusion about there being two types of consciousness. Trends in Cognitive Sciences. https://doi.org/10.1016/j.tics.2025.11.012

      In addition, the review would benefit from an earlier introduction of the distinction between states and contents of consciousness. This distinction becomes important in the later section on anaesthesia, sleep, and epileptic seizures, where the focus shifts from content-specific NCCs to alterations in global states. Presenting these definitions upfront and briefly explaining how states and contents interact would strengthen the coherence of the manuscript.

      We agree that clarifying the distinction between contents and levels of consciousness early on provides a stronger framework for the paper.

      We have added a brief clarification in the Introduction (lines 63-76): "It is also helpful to distinguish between levels of consciousness, defined as a global level of arousal or wakefulness (e.g., being awake vs. under anesthesia), and the contents of consciousness, defined as the specific subjective experiences one has while conscious (e.g., perceiving a visual stimulus; Bayne et al., 2016; Laureys, 2005). While the majority of this review focuses on 'content-specific' NCCs, the two dimensions are intrinsically linked, as global states typically set the conditions for the occurrence of specific conscious contents."

      Overall, this is an excellent and timely review. With clearer initial theoretical definitions of consciousness, the manuscript will offer an even stronger conceptual framework for interpreting intracerebral studies of consciousness.

      We thank the reviewer again for this highly positive assessment of the manuscript.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      I would like to reiterate that I believe this is a very scholarly piece of writing, and I congratulate the authors on producing such a useful and timely manuscript. Below, I suggest just a few ways the authors may resolve some of the issues I raised in the public review. However, I would like to emphasise that these are merely suggestions - the authors may think of different and better ways to address these comments that are more in line with either their thinking or writing style, and I would certainly encourage the authors to follow their own preferences if they feel they are at odds with my suggestions.

      For the longer comment questioning whether intracranial recordings are really a way to isolate NCCs from their pre- and post-processing, there are two ways the authors could resolve this. One is that they collapse the section distinguishing NCCs from their prerequisites and consequences into the previous section regarding limits of noninvasive measures. For instance, they could make the point that null results are easier to interpret with intracranial recordings in this previous section. Then they could discuss how specific intracranial studies have been able to resolve questions of pre-/post- processing confounds when they introduce studies later in the manuscript. At the moment, the Distinguishing NCCs from their prerequisites and consequences section, at least to me, undermines the argument of why intracranial recordings are important because it spends too much time describing how tasks are the core component of isolating pure NCCs, and not the recording method.

      Alternatively, the authors could keep the structure as it is. In this case, I would urge the authors to emphasise the role of intracortical recordings here and to make the argument that this is a problem that intracortical recordings (rather than novel tasks) can solve more convincingly. Citing specific studies that combined intracortical recordings with no-report paradigms and emphasising how the invasive recording allowed the researchers to reach a conclusion that would not have been possible with noninvasive measures would also be helpful.

      We thank the reviewer for these useful suggestions and agree that we would not want readers to take from this paper that design issues can be fixed by using invasive recordings. Because confounding issues are crucial in research on the NCC, we believe it is important to include a section on this topic in the Introduction. However, as we explained in our response to the public review, we revised the section introducing Human intracranial electrophysiology to reflect that intracranial recordings are a complementary tool that improves the interpretability of no-report paradigms, rather than a “silver bullet” solution for confound issues. We also explicitly say now that this problem is relevant to all techniques in the study of consciousness, including intracranial recordings (line 101). Additionally, based on the reviewer’s suggestion, we have added a more detailed explanation of how studies that pair intracranial recordings with no-report paradigms provide a unique insight in the Temporal Insights section (lines 822-823).

      For my comment: Drawing misleading conclusions from certain studies, I think the public review speaks for itself. I would recommend that the authors make sure they are drawing correct conclusions from the studies they cite, and make clear from the outset where there is ambiguity in interpretation.

      We thank the reviewer for bringing these ambiguities to our attention. As explained in the response to the public review, we have modified the text accordingly.

      Finally, with regard to the single-cell analyses, I would imagine that most readers will share at least some scepticism around single neurons being the appropriate level of analysis for revealing the basis of perceptual experience. As such, I think it would strengthen the manuscript greatly if the authors could provide a brief argument as to how such work can either inform theories of consciousness or contribute more generally to the study of NCCs, given that the field and its theories are mostly biased towards studying system-level neural processes. I think single-cell analyses are extremely valuable to NCC research, and the authors have a good opportunity to frame these studies accordingly.

      We agree. As detailed in the response to the public review, we now specify (1) how a higher level of granularity in electrophysiological measurements can distinguish between awareness-related signals and confounds, (2) that these measurements provide an opportunity to study neuronal population dynamics where various cognitive processes have been shown to emerge in animals and (3) that single-neuron measurements are necessary to test predictions of theories that are defined at the cellular level

      Reviewer #2 (Recommendations for the authors):

      Recommendations for improving the writing and presentation:

      My compliments for having written an impressive review. Overall, I think that this is a beautiful piece of work that will be of great use to the community. My only concern is that the advantages of intracranial recordings over non-invasive methods in solving the difficulties faced in the study of NCCs are overstated.

      Here I provide more precise comments for your consideration.

      (1) On page 5, lines 100 to 102, you argue that "Scalp EEG and MEG have limitedanatomical resolution due to the overlap of deep and superficial brain signals at the scalp level and, in the case of EEG, the scattering of the adjacent electrical signals through the scalp". It would be good to provide precise estimates of the spatial resolutions of EEG, MEG and intracranial recordings, with accompanying references. Consider also that MEG is relatively insensitive to deep sources. I recommend this paper: Piastra et al. 2020 https://onlinelibrary.wiley.com/doi/10.1002/hbm.25272

      We thank the reviewer once again for their positive evaluation of our work. As detailed in the response to the public reviews, we now clarify that intracranial recordings provide high local certainty within the sampled regions (lines 224-227), whereas non-invasive methods provide broader coverage (lines 247-249). We thank the reviewer for their additional suggestions and have clarified our concern about the anatomical conclusions that can be drawn from scalp EEG and MEG data (lines 158-164).

      (2) On page 11, you describe work showing that activity in the occipitotemporal cortex mightreflect a precursor to consciousness, but not an NCC proper, except for the case of faces, in which the fusiform seems to behave like a true NCC. Could you discuss how these seemingly contradictory results could be reconciled?

      One possibility is that activity in some parts of the occipitotemporal cortex instantiates content-specific NCCs, i.e., correlates that are only specific to certain stimulus types (in this case: faces), while activity in other parts instantiates precursors of the NCCs. Because faces have been extensively studied, we might have uncovered the content-specific NCCs for these stimuli but not for others. This is now discussed in the text on lines 342-344. Based on reviewer 1’s suggestion, we have also toned down our claim about occipitotemporal activity being a precursor to the NCC.

      (3) From line 322, you start to discuss connectivity analyses. Adding a subheading mightimprove readability.

      We appreciate the suggestion; however, adding a subheading to a single paragraph would require restructuring the entire section, which could disrupt the flow. We believe the current format maintains clarity and cohesion.

      (4) In line 329, you write "It remains unclear to what extent these connectivity patterns reflectpost-perceptual processing and how the signals associated with perceptual consciousness in the occipitotemporal cortex interact with frontoparietal regions." But it's not clear why this is the case.

      We meant to make two separate points: (1) these studies did not control for report-related activity using no-report paradigms and (2) there has been no investigation so far of the interaction between occipitotemporal and frontoparietal signals associated with perceptual consciousness. These two points have been clarified in the text (lines 378-381).

      (5) In line 692, it would be good to clarify that Pereira 2021 is a single-neuron study.

      This has been clarified in the text.

      (6) The phrase "more research/work is needed" is repeated several times.

      Thank you for pointing this out. To avoid redundancy, we have deleted the second mention of this phrase.

    6. eLife Assessment

      The authors provide a scholarly review of intracranial research into the neural correlates of consciousness (NCCs). To our knowledge, this is the first such review, and it therefore may become a must-read for anyone working in the field of consciousness research. It is not so persuasive that intracranial recordings are better suited to identifying pure NCCs than other methods, which appears a problem instead solved through novel paradigms and better-developed theories - but this no doubt reflects an in-depth, timely, and insightful contribution to the literature.

    7. Reviewer #1 (Public review):

      Summary

      In this review paper, the authors describe the concept of neural correlates of consciousness (NCC) and explain how noninvasive neuroimaging methods fall short of being able to properly characterise an unconfounded NCC. They argue that intracranial research is a means to address this gap and provide a review of many intracranial neuroimaging studies that have sought to answer questions regarding the neural basis of perceptual consciousness.

      Strengths

      The authors have provided an in-depth, timely, and scholarly contribution to the study of NCCs. First and foremost, the review surveys a vast array of literature. The authors synthesise findings such that a coherent narrative of what invasive electrophysiology studies have revealed about the neural basis of consciousness can be easily grasped by the reader. The review is also, to the best of my knowledge, the first review to specifically target intracranial approaches to consciousness and to describe their results in a single article. This is a credit to the authors, as it becomes ever harder to apply strict tests to theories of consciousness using methods such as fMRI and M/EEG it is important to have informative resources describing the results of human intracranial research so that theorists will have to constrain their theories further in accordance with such data. As far as the authors were aiming to provide a complete and coherent overview of intracranial approaches to the study of NCCs, I believe they have achieved their aim.

      Weaknesses

      Overall, I feel positive about this paper. However, there are a couple of aspects to the manuscript that I think could be improved.

      (1) Distinguishing NCCs from their prerequisites or consequences

      This section in the introduction was particularly confusing to me. Namely, in this section, the authors' aim is to explain how intracranial recordings can help distinguish 'pure' NCCs from their antecedents and consequences. However, the authors almost exclusively describe different tasks (e.g., no-report tasks) that have been used to help solve this problem, rather than elaborating on how intracranial recordings may resolve this issue. The authors claim that no-report designs rely on null findings, and invasive recordings can be more sensitive to smaller effects, which can help in such cases. However, this motivation pertains to the previous sub-section (limits of noninvasive methods), since it is primarily concerned with the lack of temporal and spatial resolution of fMRI and M/EEG. It is not, in and of itself, a means to distinguish NCCs from their confounds.

      As such, in its current formulation, I do not find the argument that intracranial recordings are better suited to identifying pure NCCs (i.e. separating them from pre- or post-processing) convincing. To me, this is a problem solved through novel paradigms and better-developed theories. As it stands, the paper justifies my position by highlighting task developments that help to distinguish NCCs from prerequisites and consequences, rather than giving a novel argument as to why intracranial recordings outperform noninvasive methods beyond the reasons they explained in the previous section. Again, this position is justified when, from lines 505-506, the authors describe how none of the reported single-cell studies were able to dissociate NCCs from post-perceptual processing. As such, it seems as if, even with intracranial recording, NCCs and their confounds cannot be disentangled without appropriate tasks.

      The section 'Towards Better Behavioural Paradigms' is a clear attempt to address these issues and, as such, I am sure the authors share the same concerns as I am raising. Still, I remain unconvinced that the distinguishing of NCCs from pre-/post- processing is a fair motivation for using intracranial over noninvasive measures.

      (2) Drawing misleading conclusions from certain studies

      There are passages of the manuscript where the authors draw conclusions from studies that are not necessarily warranted by the studies they cite. For instance:

      Lines 265 - 271: "The results of these two studies revealed a complex pattern: on the one hand, HGA in the lateral occipitotemporal cortex and the ventral visual cortex correlated with stimulus strength. On the other hand, it also correlated with another factor that does not appear to play a role in visibility (repetition suppression), and did not correlate with a non-sensory factor that affects visibility reports (prior exposure). These results suggest that activity in occipitotemporal cortex regions reflecting higher-order visual processing may be a precursor to the NCC but not an NCC proper."

      It's possible to imagine a theory that would predict HGA could correlate with stimulus strength and repetition suppression, or that it would not correlate with prior exposure (e.g. prior exposure could impact response bias without affecting subjective visibility itself). The authors describe this exact ambiguity in interpretation later in the article (line 664), but in its current form, at least in line 270 (when the study is most extensively discussed), the manuscript heavily implies that HGA is not an NCC proper. This generates a false impression that intracranial recordings have conclusively determined that occipitotemporal HGA is not a pure NCC, which is certainly a premature conclusion.

      Line 243: "Altogether, these early human intracranial studies indicate that early-latency visual processing steps, reflected in broadband and low gamma activity, occur irrespective of whether a stimulus is consciously perceived or not. They also identified a candidate NCC: later (>200 ms) activity in the occipitotemporal region responsible for higher-order visual processing."

      The authors claim in this section that later (>200ms) activity in occipitotemporal regions may be a candidate for an NCC. However, the Fisch et al. (2009) study they describe in support of this conclusion found that early (~150ms) activity could dissociate conscious and unconscious processing. This would suggest that it is early processing that lays claim to perceptual consciousness. The authors explicitly describe the Fisch et al results as showing evidence for early markers of consciousness (line 240: '...exhibited an early...response following recognized vs unrecognised stimuli.) Yet only a few lines later they use this to support the conclusion that a candidate NCC is 'later (>200ms) activity in the occipitotemporal region' (line 245). As such, I am not sure what conclusion the authors want me to make from these studies.

      This problem is repeated in lines 386-387: "Altogether, studies that investigated the cortical correlates of visual consciousness point to a role of neural responses starting ~250 ms after stimulus onset in the non-primary visual cortex and prefrontal cortex."

      This seems to be directly in conflict with the Fisch et al results, which show that correlates of consciousness can begin ~100ms earlier than the authors state in this passage.

      (3) Justifying single-neuron cortical correlates of consciousness

      The purpose of the present manuscript is to highlight why and how intracortical measures of neural activity can help reveal the neural correlates of perceptual consciousness. As such, in the section 'Single-neuron cortical correlates of perceptual consciousness', I think the paper is lacking an argument as to why single-neuron research is useful when searching for the NCC. Most theories of consciousness are based around circuit or system-level analyses (e.g., global ignition, recurrent feedback, prefrontal indexing, etc.) and usually do not make predictions about single cells. Without any elaboration or argument as to why single-cell research is necessary for a science of consciousness, the research described in this section, although excellent and valuable in its own right, seems out of place in the broader discussion of NCCs. A particularly strong interpretation here could be that intracranial recordings mislead researchers into studying single cells simply because it is the finest level of analysis, rather than because it offers helpful insight into the NCCs.

      (4) No mention of combined fMRI-EEG research

      A minor point, but I was surprised that the authors did not mention any combined fMRI-EEG research when they were discussing the limits of noninvasive recordings. Intracortical recordings are one way to surpass the spatial and temporal resolution limits of M/EEG and fMRI respectively, but studies that combine fMRI and EEG are also an alternative means to solve this problem: by combining the spatial resolution of fMRI with the temporal resolution of EEG, researchers can - in theory - compare when and where certain activity patterns (be they univariate ERPs or multivariate patterns) arise. The authors do cite one paper (Dellert et al., 2021 JNeuro) that used this kind of setup, but they discuss it only with respect to the task and ignore the recording method. The argument for using intracranial recordings is weaker for not mentioning a viable, noninvasive alternative that resolves the same issues.

    8. Reviewer #2 (Public review):

      Summary:

      In this work, the authors review the study of the neural correlates of consciousness (NCCs). They discuss several of the difficulties that researchers must face when studying NCCs, and argue that several of these difficulties can be alleviated by using intracranial recordings in humans.

      They describe what constitutes an NCC, and the difficulties to distinguish between an NCC proper from the prerequisites and consequences of conscious processing.

      They also describe the two main types of experimental designs used to study NCCs. These are the contrastive approach (with its report and non-report variants), and the supraliminal approach, each with its own merits and pitfalls.

      They discuss the limitations of non-invasive methods, such as fMRI, EEG and MEG, as well as the limitations of the use of invasive recordings in non-human animals.

      After setting the stage in this way, the authors provide an extensive review of the knowledge acquired by using invasive recordings in humans. This included population-level measurements in vision and in other sensory modalities, as well as single-neuron level studies. The authors also discuss studies of subcortical NCCs.

      The second half of this work discusses the theoretical insights gained through the use of intracranial recordings, as well as their limitations, and a perspective for future work.

      Strengths:

      This work offers an impressive review, which will serve as a useful reference document, both for newcomers to the study of NCC and for experienced researchers. The inclusion of non-visual and subcortical NCCs is of particular merit, as these have been understudied.

      Besides serving as a review, this work includes a perspective, exploring several directions to pursue for the progress of the field.

      Weaknesses:

      The intention of the authors is to argue how some of the problems faced when studying NCCs are alleviated by the use of intracranial recordings in humans. But in some cases, the link between the problems related to the study of NCCs and the advantages of intracranial recordings over non-invasive methods is not clear.

      For example, the authors explain the difficulties in distinguishing between true NCCs from their prerequisites and consequences. This constitutes a difficult conceptual problems that plague all recording techniques. The authors don't provide a convincing explanation of how intracranial recordings offer advantages over EEG or MEG when dealing with these problems.

      For example, the authors explain how the use of non-report designs to rule out post-perceptual processing relies on null results, which, according to them, are harder to interpret given the low resolution of non-invasive methods. But the interpretation of null results is actually more complicated in the case of intracranial recordings. As the coverage achieved by the electrodes is sparse, if a null result is attested, it remains possible that a true effect was present in a nearby patch of cortex out of coverage.

      The authors argue that the spatial resolution of intracranial recordings is better than that of EEG and MEG. While this is technically true (especially compared to EEG), the true spatial scale of the NCCs is unknown. If NCCs' span is in the mm range, then the additional spatial resolution of intracranial recordings might not be an advantage.

      Another factor that should be taken into consideration when assessing the spatial resolution of intracranial recordings is that while the listening zone of individual intracranial contacts is small, coverage is sparse and defined by clinical criteria (something that the authors discuss). In practice, the activity recorded by contacts is usually attributed to anatomically defined ROIs with a scale in the cm range. Given the sparse and uneven (across regions and patients) coverage afforded by intracranial recordings, the advantage of intracranial recordings in terms of spatial resolution is overstated.

      Appraisal of whether the authors achieved their aims:

      In this work, the authors have gathered an impressive review and have discussed several important problems in the field of study of NCCs, as well as provided a perspective on how the field could move forward.

      What is less clear is how the use of intracranial recordings per se holds potential to overcome problems such as the distinction between true NCCs and the prerequisites and consequences of conscious processing.

      Discussion of the likely impact of the work on the field:

      This work has the potential of becoming a must-read for anyone working in the field of consciousness research.

    9. Reviewer #3 (Public review):

      Summary:

      This narrative review provides a clear, well-structured, and comprehensive synthesis of intracerebral recording work on the neural correlates of consciousness. It is written in an accessible manner that will be useful to a broad community of researchers, from those new to iEEG to specialists in the field.

      Strengths:

      The manuscript successfully integrates methodological and theoretical perspectives and offers a balanced overview of current, sometimes contradicting evidence. As such, the manuscript is important as it calls for a concerted and better exploration of NCCs using iEEG in the future.

      Weaknesses:

      The manuscript extensively discusses the use of "report" as a criterion for identifying conscious perception and its limitations for separating between correlates of consciousness and post-consciousness processes, yet the term is not defined at the outset. The authors should specify what they mean by "report" (e.g., verbal report, nonverbal self-report, or any meta-cognitive indication of experience). Importantly, this definition should be explicitly linked to the theoretical landscape: whether the authors adopt an access-consciousness perspective in which (self) reportability is central, or whether the review also aims to address phenomenal consciousness. Making this conceptual grounding explicit at the beginning will help readers interpret the empirical work surveyed throughout the review.

      In addition, the review would benefit from an earlier introduction of the distinction between states and contents of consciousness. This distinction becomes important in the later section on anaesthesia, sleep, and epileptic seizures, where the focus shifts from content-specific NCCs to alterations in global states. Presenting these definitions upfront and briefly explaining how states and contents interact would strengthen the coherence of the manuscript.

      Overall, this is an excellent and timely review. With clearer initial theoretical definitions of consciousness, the manuscript will offer an even stronger conceptual framework for interpreting intracerebral studies of consciousness.

    1. eLife Assessment

      The revised version of the paper demonstrates that a genetic code expansion to tag two ALS proteins associated with stress granules is useful and convincingly evaluates the utility of the genetic code expansion in this context. The data is solid and demonstrates the feasibility of using ANAP-fluorescence for live cell imaging.

    2. Reviewer #1 (Public review):

      Summary:

      The authors utilize genetic code expansion to tag TDP-43 and G3BP1, and evaluate this protein tagging system (ANAP) compared to antibodies and evaluate protein trafficking and stress granule formation in response to stress with sodium arsenite treatment. They find similar staining to antibodies in HeLa cells, mouse embryonic stem cells and primary mouse cortical neurons. By incorporating the intrinsically fluorescent noncanonical amino acid Anap at carefully selected sites, the authors enable live-cell and neuronal visualization of protein localization, stress-induced redistribution, and dynamic behavior without the structural and functional compromises often associated with large fluorescent protein tags. The work provides technical framework that will be useful for live imaging of tagged proteins.

      Strengths:

      A key strength is the demonstration of the specificity of the Anap fluorescence signal through appropriate controls and the agreement between Anap labeling and antibody-based detection across multiple cell types, including primary neurons. The ability to visualize stress-induced redistribution of both G3BP1 and TDP 43 in living cells highlights the practical value of this approach.<br /> The functional validation of TDP 43-Anap is compelling. The rescue of both cell viability and RNA splicing defects in TDP 43 knockout models provides evidence that Anap incorporation preserves core protein functions. This is important, as functional disruption is a central concern for any alternative tagging strategy applied to aggregation-prone or RNA-binding proteins.

      Weaknesses:

      While some inherent limitations of genetic code expansion remain (e.g., variable amber suppression efficiency and the inability to directly assess endogenous protein behavior), these are acknowledged and discussed appropriately. Importantly, these limitations do not undermine the central contributions of the study.

    3. Reviewer #2 (Public review):

      Summary:

      In this manuscript, Chen and colleagues describe a novel means of labeling two RNA binding proteins, G3BP1 and TDP-43, using genetic code expansion. Overexpressed constructs that incorporate the intrinsically-fluorescent non-canonical amino acid Anap redistribute to cytoplasmic granules upon application of external stressors such as sodium arsenite. Similar labeling and redistribution of overexpressed G3BP1 and TDP-43 was observed in cultures of mouse primary neurons.

      Genetic code expansion and non-canonical amino acid labeling have many advantages over traditional fusion proteins for tracking protein redistribution in living cells. The authors show that they are able to label exogenous G3BP1 and TDP-43 with the non-canonical amino acid Anap, and follow labeled proteins in living cells with and without stress.

      I suspect that this method could be incredibly valuable to many investigators studying the dynamics and interactions of proteins that are difficult to label or detect by conventional methods.

    4. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      Amyotrophic lateral sclerosis (ALS) affects nerve cells in the brain and spinal cord. The authors' approach to use genetic code expansion to tag two ALS proteins associated with stress granules has value and should be useful in the ALS field. Parts of the work are well done, but there are concerns that the evidence is incomplete overall, and additional controls would strengthen the study.

      We thank the editors and reviewers for their thoughtful assessment and for highlighting the potential value of applying genetic code expansion (GCE) to study ALSassociated proteins involved in stress granule biology. Our goal in this work was to establish and validate a minimally perturbative labeling strategy using the noncanonical amino acid Anap to monitor the localization and stress-dependent behavior of TDP-43 and G3BP1.

      We agree that additional controls can further strengthen the conclusions. In the revised manuscript, we have clarified the experimental design and added essential controls to better support the reliability of the Anap labeling approach (Supplementary Fig. 1).

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors utilize genetic code expansion to tag TDP-43 and G3BP1, and evaluate this protein tagging system (ANAP) compared to antibodies, and evaluate protein trafficking and stress granule formation in response to stress with sodium arsenite treatment. They find similar staining to antibodies in HeLa cells, mouse embryonic stem cells, and primary mouse cortical neurons. This is a useful study that demonstrates the utility of ANAP tagging to evaluate ALS proteins.

      We sincerely thank the reviewer for the positive assessment of our work and for recognizing the utility of the Anap-based GCE system for studying ALS-associated proteins.

      Strengths:

      Rescue of cell survival by ANAP-tagged TDP-43 is compelling

      We appreciate the reviewer’s highlighting of this point. Demonstrating that TDP43-Anap can rescue cell survival was an important validation in our study, as it indicates that incorporation of the noncanonical amino acid does not substantially disrupt the biological function of TDP-43. Additionally, we also tested the RNA splicing function recovery potency of TDP-43-Anap. As shown in Fig. 1K and 1L, a recovery of expression of PFKP, a protein undergoing cryptic exon when TDP-43 lost its function [1], was observed when expressing TDP-43-Anap in TDP-43 knockout Hela cells.

      Weaknesses:

      While the ANAP-tagged proteins had similar distributions to antibody staining, there were some discrepancies that may be more explained by the technique than by novel findings, as the authors suggested. The inclusion of additional controls to evaluate this would be helpful.

      This is a helpful suggestion. To ensure that the fluorescence signal observed in our experiments was specifically derived from site-specific Anap incorporation rather than background fluorescence, we performed three control conditions. Specifically, we tested: (1) cells cultured with Anap supplement, (2) cells expressing the Anap incorporation system with the addition of Anap, and (3) cells expressing both the TAG-mutated protein plasmid and the Anap incorporation system but without the addition of Anap. These control experiments were performed for both TDP-43 and G3BP1, and no observable fluorescence signal was detected under any of these conditions (Supplementary Fig. 1). We have clarified this control experiment in the revised manuscript.

      Reviewer #2 (Public review):

      Summary:

      In this manuscript, Chen and colleagues describe a novel means of labeling two RNAbinding proteins, G3BP1 and TDP-43, using genetic code expansion. Overexpressed constructs that incorporate the intrinsically fluorescent non-canonical amino acid Anap redistribute to cytoplasmic granules upon application of external stressors such as sodium arsenite. Similar labeling and redistribution of overexpressed G3BP1 and TDP43 were observed in cultures of mouse primary neurons.

      We are grateful for the reviewer’s accurate summary of our study and recognition of the value of GCE strategy for labeling the RNA-binding proteins G3BP1 and TDP-43.

      Strengths:

      Genetic code expansion and non-canonical amino acid labeling have quite a few advantages over traditional fusion proteins for tracking protein redistribution in living cells. The authors show that they are able to label exogenous G3BP1 and TDP-43 with the non-canonical amino acid Anap and follow labeled proteins in living cells with and without stress.

      We acknowledge the reviewer’s comment on the advantages of GCE-based noncanonical amino acid labeling for studying protein dynamics in living cells.

      Weaknesses:

      The authors do not convincingly leverage the advantages of genetic code expansion in the current study. There is no specific question posed by the authors that can be or is answered using this approach, and several of the experiments lack critical controls. This is also not the first example of TDP-43 labeling by genetic code expansion (see PMID: 38290242). As a result, the study as a whole adds little to our understanding of protein trafficking and behavior under stress.

      We thank the reviewer for raising these important points. Although as reviewer mentioned, genetic code expansion has previously been applied to TDP-43 [2], it mainly employed the photocaged lysine incorporation system to optogenetic control of TDP-43 translocation, and the protein was still labeled by mRubby. Our paper has totally different goal, to establish and validate a minimally perturbative labeling strategy using the intrinsically fluorescent noncanonical amino acid Anap to monitor the localization and stress-dependent behavior of both TDP-43 and G3BP1. And our work extends this approach in several important ways.

      First, we demonstrate that Anap incorporation enables visualization of stress-dependent redistribution of both TDP-43 and G3BP1, two key proteins involved in stress granule biology. Importantly, we validate this approach across multiple cellular systems, including HeLa cells, mouse embryonic stem cells, and primary mouse cortical neurons, which broadens the applicability of this labeling strategy.

      Second, we provide functional validation of the Anap-tagged protein, showing that TDP43-Anap rescues both cell survival and RNA splicing activity in TDP-43 knockout cells, including restoration of PFKP expression, a known cryptic exon target of TDP-43. These results support that Anap incorporation does not substantially disrupt protein function.

      We performed additional control experiments to ensure the specificity of the labeling system. Specifically, we tested three control conditions: (1) cells cultured with Anap supplement, (2) cells expressing the Anap incorporation system with the addition of Anap, and (3) cells expressing both the TAG-mutated protein plasmid and the Anap incorporation system but without the addition of Anap. These control experiments were performed for both TDP-43 and G3BP1, and no observable fluorescence signal was detected under any of these conditions (Supplementary Fig. 1).

      We agree that the manuscript would benefit from clearer articulation of the advantages of genetic code expansion in this context. Accordingly, we have revised the manuscript to more explicitly emphasize how Anap labeling provides a minimally perturbative alternative to large fluorescent protein fusions, which can alter the phase behavior and localization of stress granule proteins.

      “Conventional fluorescent protein tags have enabled visualization of TDP-43 and G3BP1 in living cells; however, these approaches can perturb the native biophysical properties of the proteins being studied. For example, GFP or other fluorescently tagged TDP-43 usually requires additional modifications, such as deletion of the nuclear localization signal (NLS) [3, 4], to induce cytoplasmic inclusion formation. Such manipulations introduce non-physiological conditions that may alter the native trafficking and aggregation behavior of TDP-43. As for G3BP1, tags like GFP may also cause unexpected effects on the phase separation or other dynamics of the protein. In contrast, Anap based GCE strategy allows the minimally perturbative labeling and visualization of protein localization and stress-induced redistribution while preserving native protein architecture and function of both proteins. Importantly, the approach provides a generalizable genetically encoded platform for quantitatively examining the behavior of ALS-associated proteins in living cells. By enabling faithful monitoring of protein trafficking and stressgranule dynamics without extensive protein engineering, Anap-based GCE can offer a powerful strategy for probing molecular-scale mechanisms underlying ALS-linked proteinopathies”.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) Figure 1A

      The authors report that the nuclear staining of G3BP1 by ANAP labeling shows the presence of nuclear pools of G3BP1 that aren't detected with antibody staining. However, unspecific nuclear staining by aminoacylated tRNAs bound to synthetases has been described. It would be important to have a control to evaluate for this possibility.

      This is an important point. We agree that the nuclear ANAP signal should be carefully controlled to exclude the possibility of nonspecific staining arising from the Anap incorporation machinery itself, such as aminoacylated tRNAs and/or synthetases.

      To address this concern, in methods and material part, we note that after DPBS washes to remove excess Anap, cells were incubated in fresh medium for 2 hours to allow sufficient time for the decay of unstable aminoacylated tRNAs, which are generally cleared within minutes to tens of munites [5].

      Also, we performed three control conditions for both TDP-43 and G3BP1: (1) cells cultured with Anap supplement, (2) cells expressing the Anap incorporation system with the addition of Anap, and (3) cells expressing both the TAG-mutated protein plasmid and the Anap incorporation system but without the addition of Anap. Under all three conditions, we observed no detectable fluorescence signal (Supplementary Fig. 1).

      In addition, as shown in Fig. 1I, the nuclear signal of G3BP1-Anap partially colocalizes with the nuclear signal of TIA-1 in several condensate-like structures. This observation further supports that the nuclear Anap signal reflects protein-associated localization rather than nonspecific fluorescence, as it overlaps with a known RNA-binding protein that can form nuclear condensates under certain conditions.

      (2) Figure 1A, 1B

      Anap labeling appears to stain fewer cytoplasmic structures compared to antibody staining for both G3BP1 and TDP-43 after sodium arsenite treatment. Quantification would be useful to address whether this is the case. If so, might this be due to unincorporated/truncated proteins competing with Anap-labeled proteins?

      We appreciate the reviewer’s helpful suggestion. To address this point, we performed quantitative colocalization analysis using Fiji/ImageJ, calculating the Pearson correlation coefficient (R) for regions of interest between the Anap signal and antibody staining. These analyses indicate a strong overall agreement between the two detection methods under stress conditions, supporting that Anap labeling reliably reports the localization of both G3BP1 and TDP-43 (see Fig1. A, B).

      Regarding the possibility that truncated or unincorporated proteins could influence the observed signal, we note that fluorescence from Anap depends on successful amber suppression and incorporation of Anap at the engineered TAG site. Proteins that fail to incorporate Anap, such as truncated products generated by premature termination, would not produce fluorescence, and therefore would not contribute to the Anap signal. Thus, the Anap fluorescence selectively reports the population of successfully labeled full-length proteins, whereas antibody staining detects both labeled and unlabeled protein pools. This difference may partially explain why antibody staining appears to label a larger number of cytoplasmic structures.

      (3) Figure 1F

      FRAP of G3BP1-GFP in stress granules is slower than in previous publications. The underlying reasons for this should also be addressed.

      We thank the reviewer for this important observation. Differences in FRAP recovery kinetics of G3BP1 in stress granules may arise from several experimental variables that are known to influence stress granule dynamics. These include differences in cell type, expression levels of G3BP1-GFP, and imaging or photobleaching parameters. In our experiments, FRAP measurements were performed under specific conditions optimized for our experimental system, which may lead to recovery kinetics that differ from those reported in previous studies.

      (4) Figure 1H

      A full-size Western blot would be useful to evaluate for amount of truncated protein for G3BP1 and TDP-43. Could truncated proteins be competing with and altering ANAPtagged G3BP1 and TDP-43 localization in response to stress? This should be addressed.

      We acknowledge this important point. Full-size Western blotting can provide information on the overall presence of truncated species in the transfected population; however, it represents a bulk measurement and does not capture cell-to-cell variability in amber suppression efficiency at the single-cell level. We therefore cannot exclude the possibility that truncated products are present at varying levels in individual cells and may contribute, directly or indirectly, to differences between antibody staining and Anap fluorescence.

      Importantly, we observe that cells with successful Anap incorporation consistently exhibit strong antibody staining for TDP-43 or G3BP1, indicating that full-length protein is the predominant species in these cells. Because Anap fluorescence depends on successful amber suppression, it selectively reports the full-length protein population, whereas truncated products are not detected in the imaging assay. The concordance between Anap fluorescence and antibody staining therefore argues against a major contribution of truncated species to the observed localization patterns (Supplementary Fig. 1).

      Accordingly, we interpret the Anap signal as reflecting the localization of successfully labeled full-length protein, while acknowledging that heterogeneity in suppression efficiency is an important limitation of the current approach.

      (5) Figure 3

      This is a well-designed diagram.

      We are grateful for the reviewer’s positive feedback on the diagram and are pleased that the schematic effectively illustrates the experimental design and the principles of the genetic code expansion strategy used in this study.

      Reviewer #2 (Recommendations for the authors):

      The authors present a one-sided viewpoint concerning the connection between stress granules and disease (lines 45-46). A more balanced discussion is recommended, including data arguing against a role for abnormal stress granules in neurodegeneration.

      This is an important suggestion. We agree that the relationship between stress granules and neurodegeneration remains an active area of investigation and that evidence both supporting and questioning a causal role of stress granules in disease has been reported. In the revised manuscript, we have modified the Introduction to provide a more balanced discussion of this topic.

      “Altered stress-granule dynamics have been associated with ALS/FTD [6, 7]; however, whether stress granules directly drive neurodegeneration remains debated, as several studies suggest that stress granules primarily function as protective stress responses [8].”

      (1) A central rationale for the study is missing. The authors state only that G3BP1 and TDP-43 'undergo dynamic stress-dependent redistribution, making them ideal candidates for minimally invasive, site-specific fluorescent labeling.' Is there a controversy or question that can be resolved using these approaches?

      We thank the reviewer for raising this important point. The central motivation of this study is that the dynamic behavior and phase separation properties of stressgranule proteins are highly sensitive to protein modifications and tagging strategies.

      “Conventional fluorescent protein tags have enabled visualization of TDP-43 and G3BP1 in living cells; however, these approaches can perturb the native biophysical properties of the proteins being studied. For example, GFP or other fluorescently tagged TDP-43 usually requires additional modifications, such as deletion of the nuclear localization signal (NLS) [3, 4], to induce cytoplasmic inclusion formation. Such manipulations introduce non-physiological conditions that may alter the native trafficking and aggregation behavior of TDP-43. As for G3BP1, tags like GFP may also cause unexpected effects on the phase separation or other dynamics of the protein.”

      (2) Related to this, there is little context for how or why genetic code expansion is utilized for these studies

      We agree that the rationale for using genetic code expansion should be more clearly explained. In this study, genetic code expansion was employed to enable sitespecific incorporation of the small fluorescent noncanonical amino acid Anap, allowing minimally perturbative labeling of proteins of interest.

      “Anap based GCE strategy allows the minimally perturbative labeling and visualization of protein localization and stress-induced redistribution while preserving native protein architecture and function of both proteins. Importantly, the approach provides a generalizable genetically encoded platform for quantitatively examining the behavior of ALS-associated proteins in living cells. By enabling faithful monitoring of protein trafficking and stress-granule dynamics without extensive protein engineering, Anapbased GCE can offer a powerful strategy for probing molecular-scale mechanisms underlying ALS-linked proteinopathies.”

      (3) The justification for the criteria for selecting the site for incorporation of non-canonical amino acids in G3BP1 or TDP-43 is missing.

      We acknowledge this important comment and agree that the rationale for selecting the incorporation sites should be stated more clearly.

      “For TDP-43, the incorporation site was selected to avoid the major functional domains involved in RNA binding, nuclear localization, and aggregation-related behavior, thereby reducing the likelihood that Anap incorporation would perturb its native trafficking or function. For G3BP1, the selected site was chosen to minimize interference with domains important for stress granule assembly, RNA binding, and protein-protein interactions. More generally, we aimed to place the ncAA at positions likely to be solventaccessible and tolerant of substitution, while avoiding highly conserved or functionally essential residues.”

      (4) Studies in Figures 1 and 2 lack essential controls, including background signal from Anap in non-transfected cells, or those transfected with plasmids lacking the tRNA or tRS.

      This is an important point, also raised by Reviewer 1. To evaluate potential background fluorescence arising from Anap or the labeling system, we performed several control experiments. Specifically, we examined three conditions: (1) cells cultured with Anap supplement, (2) cells expressing the Anap incorporation system with the addition of Anap, and (3) cells expressing both the TAG-mutated protein plasmid and the Anap incorporation system but without the addition of Anap. Under all three conditions, we observed no detectable fluorescence signal (Supplementary Fig. 1).

      (5) Another marker of stress granules should be used for confirming the identity of G3BP1-Anap (+) or TDP-43-Anap (+) structures, including TIA1, TAF15, or polyA RNA.

      We appreciate this helpful suggestion. To further confirm the identity of the stress granule structures observed in our experiments, we performed colocalization analysis with TIA-1, a well-established marker of stress granules. The results have been included in revised manuscript.

      “Additionally, we examined the colocalization of G3BP1-Anap with TIA-1, another established stress granule marker. Under stress conditions, G3BP1-Anap largely colocalized with TIA-1 within stress granules. Interestingly, under basal conditions, the nuclear signal of G3BP1-Anap, which was not detected by antibody staining, appeared to partially colocalize with TIA-1 in several condensate-like structures. (Fig. 1I).”

      (6) There is no information on the number of granules bleached or the number of cells selected for FRAP studies. There is no information on the shaded areas in Figure 1F or 1G, and no information on statistical comparisons between regressions in Figure 1F.

      We thank the reviewer for pointing out these omissions. We have revised the figure legends to clarify these details.

      “One granule from each of three independent cells was selected and photobleached for FRAP analysis.”

      “Here, error bars with filled area are used for better data presentation. FRAP recovery curves were compared using two-way ANOVA.”

      (7) Protein dynamics measured by FRAP are highly dependent on the concentration and/or expression level of each protein. Because of this, the authors need to control for expression level in all FRAP studies.

      We agree that protein concentration and expression level can influence FRAP recovery kinetics. Since Anap incorporation is based on amber suppression, and the suppression rate in each cell varies, so it is difficult to control the expression of Anap labeled proteins, however, to minimize this potential effect, we performed FRAP measurements on cells exhibiting comparable fluorescence intensities, which served as a proxy for similar expression levels of the labeled proteins. In addition, FRAP analyses were conducted on individual granules within cells expressing moderate levels of the protein, avoiding cells with unusually high fluorescence intensity that might reflect overexpression.

      Furthermore, fluorescence recovery was normalized to the pre-bleach intensity of the selected granules, which reduces variability arising from differences in overall expression levels between cells.

      (8) There is no point of reference for TDP-43-Anap FRAP results in Figure 1G. Additional studies using variants harboring a mutated NLS (mNLS) can be used in place of TDP43-YFP.

      This is a helpful suggestion. In response, we have performed additional FRAP experiments using TDP-43<sup>ΔNLS</sup>, a commonly used construct that promotes cytoplasmic localization and facilitates analysis of TDP-43 granules. The results from TDP-43<sup>ΔNLS</sup> have now been included as a reference for the FRAP measurements of TDP-43-Anap in the revised manuscript (Fig. 1D, 1G).

      “We then used YFP-tagged nuclear localization signal (NLS)-deleted TDP-43 (TDP43<sup>ΔNLS</sup>-YFP) as a reference and performed FRAP analysis to compare the mobility of TDP-43-Anap and TDP-43<sup>ΔNLS</sup>-YFP. Fluorescence recovery of TDP-43-Anap reached ~45% within 20 s after photobleaching, consistent with liquid-like dynamics. In contrast, TDP-43<sup>ΔNLS</sup>-YFP showed only ~22% recovery, suggesting more solid-like dynamics (Fig. 1D, 1G). These results are consistent with previous reports describing relatively immobile aggregates formed by TDP-43<sup>ΔNLS4</sup>and illustrate the advantage of Anap-based labeling, which preserves native protein properties and enables real-time assessment of protein dynamics without introducing disruptive mutations.”

      (9) There is no point of reference for comparing FRAP results from G3BP1-GFP to G3BP1-Anap. What is the 'gold standard'? Without this, it is difficult to conclude that "... Anap labeling better preserved the native mobility and biophysical properties of G3BP1 than the conventional GFP tag."

      We acknowledge this important point and agree that there is currently no definitive gold standard for measuring the native mobility of endogenous G3BP1 within stress granules in living cells. Our intention was not to claim that the Anap-labeled protein definitively represents the native state, but rather to compare the relative effects of different labeling strategies.

      Thus, we rewrite the sentence as “These results suggest that G3BP1-Anap displays higher mobility compared with G3BP1-GFP, indicating that Anap labeling may provide a less perturbative approach for monitoring G3BP1 dynamics.”

      (10) The WB in Figure 1H is overexposed, making it difficult to compare expression levels between WT and V100Anap-transfected cells. In addition, there is no similar assay for confirming G3BP1-Anap expression.

      Thank you for pointing this out. In the revised manuscript, we have replaced the image with a properly exposed Western blot to allow clearer comparison of protein expression levels.

      In addition, we have now included a corresponding western blot analysis to confirm the expression of G3BP1-Anap in G3BP knockout U2OS cell (Fig. 1H). These results verify that the Anap-labeled proteins are expressed at detectable levels and support the interpretation of the imaging and FRAP experiments.

      (11) Although survival studies in Figures 1I and J are promising, a more convincing demonstration of functional replacement of TDP-43 would involve an assessment of cryptic exon splicing, comparing WT to TDP-43 KO, V100Stop- and V100Anaptransfected cells.

      This is a valuable suggestion.

      “We also evaluated TDP-43-dependent RNA splicing activity by examining the expression of PFKP, a well-established target that undergoes cryptic exon inclusion upon loss of TDP-43 function17. As shown in Figures 1K and 1L, expression of TDP-43Anap in TDP-43 knockout HeLa cells restored PFKP expression, indicating that the Anap-labeled protein retains functional RNA splicing activity. These results demonstrate that TDP-43-Anap is capable of functionally compensating for endogenous TDP-43, supporting that the incorporation of Anap does not substantially disrupt the protein’s biological function.”

      (12) Tuj1 staining in Figure 2 is inconsistent and often fails to confirm neuronal identity.

      We thank the reviewer for this important comment. We acknowledge that Tuj1 staining in Figure 2 is variable and, in some cases, does not clearly delineate neuronal identity. Notably, the reduced Tuj1 signal is primarily observed in neurons that express Anap-labeled proteins under sodium arsenite treatment, which likely reflects the combined effects of transfection-associated stress and oxidative stress on neuronal morphology and cytoskeletal integrity.

      In addition, transfection efficiency in primary neurons is inherently low and variable, and cells that successfully express the constructs may represent a more stress-sensitive subpopulation, further contributing to variability in staining quality. Despite optimization efforts, these technical constraints limit the consistency of Tuj1 labeling under these experimental conditions.

      (13) Close-up images and correlation scatter plots in Figures 1 and 2 do not add very much information.

      We thank the reviewer for this comment. To address the reviewer’s concern, we have revised the figure legends to better clarify the purpose of these panels and how they support the quantitative analysis presented in the manuscript.

      For scatter plot, “Colocalization threshold analysis was performed in Fiji/ImageJ to calculate the Pearson correlation coefficient (R) for each region of interest (A, B, I, J). The X- and Y-axes represent the fluorescence intensity values of the red and green channels, respectively. When signals are colocalized, pixels with high intensity in one channel correspond to high intensity in the other, forming a diagonal distribution. In contrast, non-colocalized signals cluster along the axes. A higher R value indicates a greater degree of colocalization. Scale bar, 3 μm.”

      Same information was added to figure legend of figure 2.

      For the scheme, please see line 412-413 in the revised manuscript.

      Reference:

      (1) Rothstein, J.D. et al. Sporadic ALS induced pluripotent stem cell derived neurons reveal hallmarks of TDP-43 loss of function. Nature Communications 16, 7092 (2025).

      (2) Shadish, J.A. & Lee, J.C. Genetically encoded lysine photocage for spatiotemporal control of TDP-43 nuclear import. Biophys Chem 307, 107191 (2024).

      (3) Gasset-Rosa, F. et al. Cytoplasmic TDP-43 De-mixing Independent of Stress Granules Drives Inhibition of Nuclear Import, Loss of Nuclear TDP-43, and Cell Death. Neuron 102, 339–357.e337 (2019).

      (4) Yan, X. et al. Intra-condensate demixing of TDP-43 inside stress granules generates pathological aggregates. Cell 188, 4123–4140.e4118 (2025).

      (5) Walker, S.E. & Fredrick, K. Preparation and evaluation of acylated tRNAs. Methods 44, 81–86 (2008).

      (6) Kassouf, T. et al. Targeting the NEDP1 enzyme to ameliorate ALS phenotypes through stress granule disassembly. Science Advances 9, eabq7585 (2023).

      (7) Van Nerom, M. et al. C9orf72-linked arginine-rich dipeptide repeats aggravate pathological phase separation of G3BP1. Proceedings of the National Academy of Sciences 121, e2402847121 (2024).

      (8) Wolozin, B. & Ivanov, P. Stress granules and neurodegeneration. Nat Rev Neurosci 20, 649–666 (2019).

    1. eLife Assessment

      This important study identifies a physiologically relevant interaction between LRRK2 and drebrin, an actin-binding protein crucial for neuronal structure. A solid body of evidence, including multiple cell models, highlights the complexities of how modifiers like BDNF intersect with LRRK2-kinase dependent function, and that many modifiers between AKT and LRRK2 in different cellular pathways are yet to be identified and understood.

    2. Reviewer #1 (Public review):

      Summary:

      LRRK2 protein is familially linked to Parkinson's disease by the presence of several gene variants that confer a gain-of-function effect on LRRK2 kinase activity.

      The authors examine the effects of BDNF stimulation in immortalized neuron-like cells, cultured mouse primary neurons, hIPSC-derived neurons, and brain tissue from genetically modified mice. They examine a LRRK2 regulatory phosphorylation residue, LRRK2 binding relationships, other kinase phosphorylation status, and measures of synaptic structure and function.

      Strengths:

      The study addresses an important research question: how does a PD-linked protein interact with other proteins, and contribute to responses to a well-characterized neuronal signalling pathway involved in the regulation of synaptic function and cell health.

      They employ a range of models and techniques to convincingly demonstrate that BDNF stimulation alters LRRK2 phosphorylation at pS935 and binding to many proteins. Several independent data sets lead to some exciting conclusions.<br /> In this re-revised manuscript, some aspects are very convincing and well validated e.g., drebrin binding to LRRK2, increased by BDNF, and reduced LRRK2 protein levels in young (but not mature) drebrin KO mice. A phosphoproteomic analysis of PD mutant Knock-in mouse brain is included. Overall, the links between LRRK2, LRRK2 activity, and the changes to synaptic molecules, structures, and activity are intriguing.

      Weaknesses:

      Enthusiasm for the title claim that "LRRK2 regulates synaptic function through BDNF signalling" is tempered by disconnected results across different model systems and inconsistent alterations upon kinase phosphorylation in SHSY5Y cell line and primary neurons. Exciting conclusions are sometimes not consistently supported by the data and/or only conducted in one of the models.

      BDNF increasing pS935 LRRK2 is quite well supported in cell lines, as is BDNF regulation of derbrin-LRRK2 binding. However, there is a lack of connection between this result and subsequent alterations to LRRK2 substrates e.g., phosphorylation of Rab GTPases, especially in neurons. Interesting omic data sets are provided, but with very little or no validation. For example, only drebrin protein was assessed in BDNF treatment omic, and the phosphoproteomic analysis of PD mutant Knock-in mouse is stand alone with no validation and G2019S is not explored elsewhere in the study.

      The major disconnect this reviewer struggles with is the conclusion that the quite clear data in SHSY5Y cells is the same as that from neurons regarding BDNF / LRRK2 and ERK / Akt. It seems they are not.

      ERK and Akt phosphorylation by BDNF is absent in CRISPR KO SHSY5Y cells.<br /> This conclusion is at odds with interpretation of neuronal data. To explain; in div14 neurons, BDNF's transient increase in pLRRK2 is seen and strongly prevented by MLi2. BDNF also increased pAkt & pERK1&2 in WT... but also in LRRK2 KO. Furthermore, this happened in the presence of MLi2 in WT despite no pLRRK2 increase. While the 5min BDNF induced increase to pAkt appears reduced in LKO, the same time BDNF in LKO with MLi2 is as high as WT (in these unquantified examples) and ERK is almost identical. This is described as "significantly reduced" but I see no replicates or quantification, and face value assessment of the blot argues against this.<br /> Thus, there is little or no evidence supporting that LRRK2 activity is involved in BDNF-stimulated increases in pAkt or pERK, upstream, in neurons as neither Mli2 nor KO prevented this.

      Synapse markers increased in WT neuron with BDNF treatment which did not happen in LKO neurons. So this process requires pLRRK2, but is unrelated to pAkt or pERK (which do still go up with BDNF in KO)? Similarly, an increase in synaptic activity in WT hiPSC neurons in response to BDNF seems lost in LRRK2 KO hiPSC neurons, although their activity is already increased and depending on the age of the cells the effects were different. Both of these experiments lack supporting evidence by other measures e.g., LRRK2 inhibition effects on BDNF-induced increases in WT and parallel biochemistry of p'd LRRK2, Akt, ERK in WT & KO.

      LRRK2 activating Akt1 has been published before (e.g., Ohta 2011 - not cited), but Ohta also conclude that LRRK2 gain of function mutations (more LRRK2 kinase activity) were associated with a reduced ability of LRRK2 to bind AND phosphorylate Akt at the same residue, in contradiction to the mechanism proposed here? This should be discussed. Here the authors also conclude Akt is Upstream of LRRK2. However, it appears from the data here in neurons that pLRRK2 increases in response to BDNF are separate from BDNF signalling to Akt.

      Of note, in comparison to bTubulin control, LKO total Akt levels appear consistently higher in this single example blot; a large increase in Akt would skew the ratio down, while absolute levels of pAkt (probably the most important matter for an active enzyme - what is the ratio against total protein stain) are similar or increased. These are major problems for the conclusions as presented.

      BDNF increased mEPSC frequency in hIPSC neurons; which didn't happen in LKO, which already had high frequency. Earlier in the manuscript BDNF is shown to alter synapse number in WT but not LKO mouse neurons, but no increase in synapse number was seen following BDNF treatment in any WT or LKO hiPSC neurons +/- BDFN.

      If we are to assume that the WT neurons have LRRK2 (not demonstrated), and that LRRK2 KO neurons have similar drebrin (not demonstrated) it is unclear how to interpret this result in the model of BDNF-LRRK2 being upstream of pERK/Akt. There is no evidence that the BDNF increase in WT is blocked by LRRK2 inhibition, nor has it been associated with changes (or not) to pAkt or ERK1, which would be expected in both WT and KO based on Figure 4C.

      There are many reports of acute and longer term BDNF application increasing event frequency in brain slices & primary neurons. Overexpression of BDNF in NPCs has also been shown to increase synapse function in hiPSC neurons derived from them. Here, BDNF has an effect on frequency in only one 6 comparisons (3 timepoints, two lines). Is it not concerning that expected BDNF effects occur at only one time point in WT, and that generally a lack of effect is more common both in WT and LKO... is this due to slow appearance of TrkB receptors and degeneration at 90 days?

      There are no other data provided to show that BDNF was having a consistent expected effect in human neurons (pAkt, pLRRK2 etc etc), and there is little to link between this data and that in previous figures of the study.

      The discussion of some of the weaknesses is mostly fair, asides the disparities noted above which are not.

    3. Reviewer #2 (Public review):

      The data show that BDNF regulates the PD-associated kinase LRRK2, they place LRRK2 within well-described BDNF pathways biochemically, and they show that LRRK2 can play a role mediating BDNF-driven synaptic outcomes at excitatory synapses. The chief strength is that the data provide a potential focal point for multiple observations that have been made across many labs. The findings will be of broad interest because LRRK2 has emerged as a protein that is likely to be part of Parkinson's pathology and its normal and pathological actions remain poorly understood.

      A major strength of the study is the multiple approaches that were used (biochemistry, bioinformatics, light and electron microscopy and electrophysiology) across different experimental models (cells, primary neurons, human neurons, mice) to identify and examine the impact of BDNF on LRRK2 signaling and functions. Noteworthy is also the employment of LRRK2KO preparations to validate outcomes and to place LRRK2 actions up or downstream.

      The demonstration that LRRK2 and drebrin interact directly is important and suggests that other interacting proteins identified biochemically and bioinformatically in the paper will be important to pursue.

    4. Author response:

      The following is the authors’ response to the previous reviews

      eLife Assessment

      This important work begins to understand how BDNF regulates the phosphorylation and activity of LRRK2. The overall strength of evidence has been assessed as compelling, though some claims are only partially supported. The work will be of interest for those that might pursue specific LRRK2 interactions and mutational effects on these pathways as the work continues to develop.

      We thank the editors and reviewers for the constructive feedback. We have revised the manuscript to improve clarity, strengthen statistical analysis and increase the western blot sample size in drebrin KO mice.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      LRRK2 protein is familially linked to Parkinson's disease by the presence of several gene variants that all confer a gain-of-function effect on LRRK2 kinase activity.

      The authors examine the effects of BDNF stimulation in immortalized neuron-like cells, cultured mouse primary neurons, hIPSC-derived neurons, and brain tissue from genetically modified mice. They examine a LRRK2 regulatory phosphorylation residue, LRRK2 binding relationships, and measures of synaptic structure and function.

      Strengths:

      The study addresses an important research question: how does a PD-linked protein interact with other proteins, and contribute to responses to a well-characterized neuronal signalling pathway involved in the regulation of synaptic function and cell health.

      They employ a range of good models and techniques to fairly convincingly demonstrate that BDNF stimulation alters LRRK2 phosphorylation and binding to many proteins. In this revised manuscript, aspects are well validated e.g., drebrin binding, but there is a disconnect between these findings and alterations to LRRK2 substrates. A convincing phosphoproteomic analysis of PD mutant Knock-in mouse brain is included. Overall the links between LRRK2, LRRK2 activity, and the changes to synaptic molecules, structures, and activity are intriguing.

      We thank this Reviewer for appreciating our work including the new experiments performed during the revisions.

      Weaknesses:

      The data sets remain disjointed, conclusions are sweeping, and not always in line with what the data is showing. Validation of 'omics' data is light. Some inconsistencies with the major conclusions are ignored. Several of the assays employed (western blotting especially) are underpowered, findings key to their interpretation are addressed in only one or other of the several models employed, and supporting observations are lacking.

      We understand the Reviewer’s points and agree that it is important to increase the sample size (animals) for western blot. In particular, we acknowledge that the initial experiments with the Dbn1 KO mice included only 3 mice, which was insufficient to draw any definitive conclusion on the effect, especially regarding pRab8 levels. In response to this, we have collected additional animals and repeated the experiment with N=7 wild-type and N=7 KO mice (2 months old). Despite a high degree of interindividual variability, we have confirmed that drebin KO mouse brains have reduced levels of pLRRK2 (Author response image 1). In the new figure 2H we included all the replicates (N=7+3 per genotype) for pLRRK2. However, we removed western blot for pRab8 because a new batch of pRab8 antibody did not yield specific results, making it impossible to reassess.

      Author response image 1.

      Western blot analysis of N=7 WT and N=7 drebrin KO whole brains.

      Main Conclusions of Abstract:

      (1) Increase in pLRRK2 Ser935 and pRAB after BDNF in SH-SY5Y & mouse neurons

      Well supported, but only for pLRRK2 in neurons, why not pERK pAkt & pRab?

      The response of pERK and pAKT in neurons is shown in figure 4C. We have repeatedly tried pRab (both pRab8 and pRab10) in primary neurons but with no success. In support of the difficulty in detecting pRab in primary neurons, we are not aware of studies in the literature of western blot analysis of pRabs in primary neuronal cultures. This is likely due to the high levels of PPM1H in neurons as discussed in Berndsen et at, eLife, 2019 (PMID: 31663853).

      (2) Omics Proteome remodelling of LRRK2 interactome with BDNF & different in G2019S mouse neurons.

      Supports that the phosphoproteome of G2019S is different. Drebrin interaction with LRRK2 very well supported. Link between drebrin and LRRK2 activity somewhat supported (pS935 site), but the consequence (non-specific pRab8) not supported, as there is no evidence of a change in LRRK2 substrate(s).

      As discussed above, we removed the pRab8 western blot in figure 2H as we could not confirm with the new set of mice and a new batch of pRab8 antibody.

      (3) Golgi 1 month LKO mouse altered dendritic spines, transient at 1m not older.

      Supported but very small transient change in spines, disconnected to other results (e.g., drebrin).

      We agree with the Reviewer that the observed effect is modest, still we believe it is important to report. As discussed in the discussion, one plausible explanation for the limited magnitude of the effect is functional compensation by LRRK1.

      (4) iPSC-derived neurons BDNF increases mEPSC frequency (transient at 70 not 50 or 90 days) in WT not KO "which appear to bypass this regulation through developmental compensation"

      Weak, not clear what is being bypassed.

      We reviewed the statistical analysis as described below.

      Main Conclusions Based on Old and New Figure / Data:

      (1) Increase in pLRRK2 Ser935 and pRAB after BDNF in SH-SY5Y & mouse neurons

      Well supported, but only for pLRRK2 in neurons, why not ERK Akt & Rab?

      The response of pERK and pAKT in neurons is shown in figure 4C. We have repeatedly tried pRab (both pRab8 and pRab10) in primary neurons but with no success. In support of the difficulty in detecting pRab in primary neurons, we are not aware of studies in the literature of western blot analysis of pRabs in primary neuronal cultures. This is likely due to the high levels of PPM1H in neurons as discussed in Berndsen et at, eLife, 2019 (PMID: 31663853).

      (2) BDNF promotes LRRK2 interaction with "post-synaptic actin cytoskeleton components"

      Tone down, only one postsynaptic validated - drebrin strong BUT CONTRADICTORY; link between drebrin and LRRK2 activity (pS935 site) supported, consequence (non-specific pRab8) broken, no evidence of change in LRRK2 substrate.

      As suggested we tone down the paragraph title and changed it as follow: “BDNF stimulates LRRK2 interaction with drebrin, an actin cytoskeletal-associated protein enriched at the postsynapse”. As mentioned above, pRab8 has not been incorporated.

      (3) LRRK2 G2019S striatal phosphoproteome is different from WT.

      It is different. Where is link to BDNF or Drebrin?

      We found that debrin S339 phosphorylation is 3.7 fold higher in G2019S KI mice as compared to WT, suggesting a potential functional connection between LRRK2 and drebrin. However, differences in phosphorylation do not necessarily translate into physiological effects so further validation is required. To test if BDNF can induce S339 drebrin phosphorylation in a LRRK2-dependent manner we plan an in vivo experiment where BDNF is acutely administered to WT vs G2019S-KI mice +/- MLi2 to control for LRRK2 dependency. This is an important experiment to establish the mechanistic link, though it will require sufficient time due to the necessary ethical authorization needed to administer BDNF in the mouse brain.

      (4) BDNF signaling is impaired in Lrrk2 knockout neurons

      TrkB changes seem higher in SHSY5Y. pAKT impaired, pERK not convincing. Primary neurons Akt slower but it and Erk mostly intact. MLi-2 did not block pAkt or pErk in WT or KO (higher in latter). Whatever is happening in KO, Mli-2 not really blocking effect in WT. If we are to assume that studying the KO was a means to understand LRRK2 function, the authors data should explain why we care if an effect is absent in LKO, if LRRK2 isn't doing the same job in WT?

      To further support the conclusion that this effect is reproducible and dependent on LRRK2 kinase activity acting upstream of AKT and ERK signaling, we probed the membranes shown in Figure 1H for phosphorylated and total AKT and ERK1/2. Consistent with our hypothesis, the inhibition of LRRK2 with MLi-2 significantly reduced BDNF-induced AKT and ERK1/2 phosphorylation (Author response image 2).

      Author response image 2.

      Western blot (same experiments as in figure 1) was performed using antibodies against phosphoThr202/185 ERK1/2, total ERK1/2 and phospho-Ser473 AKT, total AKT protein levels. Retinoic acid-differentiated SH-SY5Y cells stimulated with 100 ng/mL BDNF for 0, 5, 30, 60 mins. MLi-2 was used at 500 nM for 90 mins to inhibit LRRK2 kinase activity.

      BDNF increases synaptic puncta in WT not LKO (which start higher?). Is this BDNF increase blocked by LRRK2 inhibition?

      This is an important experiment that we plan to investigate in a future study.

      (5) Postsynaptic structural changes in Lrrk2 knockout neurons

      Golgi impregnation shows some very small spine changes at 1m. Not sustained over age. mRNA changes are very small (10% not even a fold... very weak and should be written as so). Derbrin levels reduced clearly at 1m, but probably also at 4 & 18. Underpowered, disconnected time course from the spine changes.

      While differences are small they have been observed in independent sets of mice through qPCR, histology, WB and TEM, supporting the consistency of the effect, although small. For clarity we rescaled the qPCR graphs to 0.

      (6) An effect on "spontaneous electrical activity" at Div70

      Weak. What is so special at 70 days that means we should be confident in the differences, or be satisfied that the other time points are legitimately ignored? These are 10-11 cells from 3 cultures assayed at 3 time points but only one is presented (rest in supplement). This should be a 2 (time) or 3 way (+culture RM) ANOVA. As it stands, in WT there is a little - no activity at 50 days, little to no at 70 days, and variable to lots or none at 90. BDNF did nothing at 50 or 90 but may have at 70. In KO low activity stable at 50 & 70, tanks at 90. BDNF would seem to have a similar effect on KO at 90 as WT at 70, but as there are only 7 cells it remains inconclusive. Thus the conclusion that BDNF signalling is broken in LKO is not well supported by the ephys data, nor is the BDNF effect in WT cells (even at the 70 day time point) shown to be susceptible to LRRK2 inhibition.

      We thank the Reviewer for suggesting a more comprehensive analysis of the data. Following this suggestion, we performed separate two-way ANOVAs (DIV × treatment) for WT and LRRK2 KO neurons. This analysis revealed significant main effects of DIV and BDNF treatment in WT neurons, indicating that synaptic activity increases with neuronal maturation and is globally enhanced by BDNF. In contrast, neither DIV nor BDNF treatment reached statistical significance in LRRK2KO neurons, and no DIV × treatment interaction was observed. These results indicate that BDNFdependent enhancement of synaptic activity is preserved in WT neurons but is lost in the absence of LRRK2. We have now incorporated this analysis into the main figure and removed the individual DIV50 and DIV90 plots from the supplementary material. We also revised the title of the last paragraph to reflect the outcome of this analysis and toned down our interpretation (page 12).

      Furthermore, we have added a paragraph to the Discussion section highlighting the limitations of this study. These include the variability observed in protein content and phosphorylation analyses by western blot, as well as the necessity to confirm the electrophysiological findings in larger datasets, including in dopaminergic neurons.

      Reviewer #2 (Public review):

      The data show that BDNF regulates the PD-associated kinase LRRK2, they place LRRK2 within welldescribed BDNF pathways biochemically, and they show that LRRK2 can play a role mediating BDNFdriven synaptic outcomes at excitatory synapses. The chief strength is that the data provide a potential focal point for multiple observations that have been made across many labs. The findings will be of broad interest because LRRK2 has emerged as a protein that is likely to be part of Parkinson's pathology and its normal and pathological actions remain poorly understood.

      We thank this Reviewer for appreciating our work and acknowledging that our findings will be of broad interest.

      A major strength of the study is the multiple approaches that were used (biochemistry, bioinformatics, light and electron microscopy and electrophysiology) across different experimental models (cells, primary neurons, human neurons, mice) to identify and examine the impact of BDNF on LRRK2 signaling and functions. Noteworthy is also the employment of LRRK2KO preparations to validate outcomes and to place LRRK2 actions up or downstream.

      Thank you to the Reviewer

      The demonstration that LRRK2 and drebrin interact directly is important and suggests that other interacting proteins identified biochemically and bioinformatically in the paper will be important to pursue.

      We agree with this statement

      Some data from different models do not fit well with one another (like mouse and human neurons). This is likely due to inherent differences in the preparations. Since different experiments were carried out on the different preps, however, it is not possible to cross compare. The lack of this information is viewed more as an open question than a cause for concern.

      We thank the Reviewer for raising this point. In response, we have added a new section to the Discussion explicitly addressing the limitations of the study.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      MLi2 pretreatment experiment is nice. They state in legends BDNF treatment prior to MLi2, they mean prior MLI2 treatment. Or MLi2 pretreatment, prior to BDNF. However, this experiment is hard to interpret as it has no control (non BDNF treated) time course following MLi2, could this be (at least in part) a rebound effect produced by relief of inhibition? This should be discussed if not addressed directly by experiments.

      The non BDNF treated group represents the 0 time point. We have specified it in the figure legend. We have excluded the rephosphorylation kinetic after MLi-2 relief because pRabs increase significantly at 5 minutes, far exceeding the control levels. This observation gives us feel confidence that the effect if BDNF dependent.

      (1) "As suggested, we performed qPCR and observed that 1 month-old KO midbrain and cortex express lower levels of Dbn1 as compared to WT brains (Figure 5G). This result is in agreement with the western blot data (Figure 5H)."

      There is no Fig 5H? 5F? In 5F effect sizes are exaggerated with axes not crossing zero. There is a 10% reduction in mRNA (normally >1 or 2 fold changes would be considered biologically important?). This isn't much change, and should be presented as such. 1 month old WB in G are much more convincing of a reduction of drebrin levels, but what brain region is this from?

      We apologize for the error in the rebuttal, where we incorrectly referred to figure 5G (the correct is 5F), while what we called 5H is instead 5G. We checked the labeling in the manuscript text and it is correct.

      Following the Reviewer’s important suggestion, we rescaled all plots to start at zero. Although some differences appear relatively modest, they are statistically significant. Importantly, all brains used for qPCR analyses (N = 6 per genotype) were obtained from independent mice. In addition, independent cohorts of mice were used for spine morphology analyses (N = 3 per genotype), TEM analyses (N = 4), and western blot experiments (N = 3). Thus, the overall sample size across approaches is substantial.

      WB are from whole brain, now indicated in the figure legend.

      All blots are underpowered, especially given what appears to be an age dependent loss of drebrin in both genotypes beset by high variability

      (i) Western blots looking at pSer935 and pRab8 (pan Rab) in Dbn1 WT and knockout brains.

      "As reported and quantified in Figure 2I, we observed a significant decrease in pSer935 and a trend decrease in pRab8 in Dbn1 KO brains. This finding supports the notion that Drebrin forms a complex with LRRK2 that is important for its activity, e.g. upon BDNF stimulation."

      Non-sig data in Fig2I/H and especially Fig5G are important data but hard to interpret because the experiment is underpowered. I am surprised the authors designed studies on an n=3 western blot.

      For fig 2 this is a problem if they wish to correlate LRRK2 activity with drebrin. The KO have a clear 50% decrease in LRRK2 pS935 but no change to pRab8(pan).

      As discussed above, we increased the sample size by 7 additional mice per genotype (total of 10 brains analyzed).

      Why not look at Rab10, and certainly redo with a higher n than three. Of special confusion is the observation that the WT with the highest drebrin levels, is the animal with the lowest pS935 & pRab

      As discussed above neither pRab8 nor pRab10 returned convincing results in the new round of western blots. We acknowledge that future experiments should explore the phosphorylation levels of Rab12 which is emerging as a more reliable readout of LRRK2 kinase activity in the brain.

      (ii) "Reverse co-immunoprecipitation of YFP-drebrin full-length, N-terminal domain (1-256 aa) and Cterminal domain (256-649 aa) (plasmids kindly received from Professor Phillip R. Gordon-Weeks, Worth et al., J Cell Biol, 2013) with Flag-LRRK2 co-expressed in HEK293T cells. As shown in supplementary Fig. S2C, we confirm that YFP-drebrin binds LRRK2, with the N terminal region of drebrin appearing to be the major contributor to this interaction"

      CoIP with drebrin (and fragments) is very convincing.

      We thank the Reviewer for his/her comment/feedback

      Ephys data, presentation, and response to review is weak.

      We reanalyzed the data as suggested by the Reviewer and reviewed the text and interpretation.

      Reviewer #2 (Recommendations for the authors):

      p. 12, last paragraph. "sealing" should be "ceiling"

      We corrected the misspelled word

    1. eLife Assessment

      This important study provides solid evidence that early childhood malaria exposure affects the development of antibody responses to unrelated pathogens and vaccine-derived antigens in Kenyan children. The findings are of major public health importance and limitations of the observational study design are properly acknowledged.

    2. Reviewer #2 (Public review):

      Summary:

      The authors investigated whether early-life malaria exposure has long-term effects on immune responses to unrelated antigens. They leveraged a natural experiment in coastal Kenya where two adjacent communities (Junju and Ngerenya) experienced divergent malaria transmission patterns after 2004. Using 15 years of longitudinal data from 123 children with weekly malaria surveillance and annual serological sampling, they measured antibody responses to multiple pathogens using a protein microarray technology and ELISA.

      Strengths:

      (1) Extensive longitudinal data collection with weekly malaria surveillance, enabling precise exposure classification.

      (2) Use of a natural experiment design that allows for causal inference about malaria's immunological effects.

      (3) Broad panel of antigens tested, demonstrating generalized rather than antigen-specific effects.

      (4) Within-cohort analysis in Ngerenya controls for geographic and environmental factors.

      (5) Validation of key findings using both serologic microarray and ELISA.

      (6) Important public health implications for vaccine strategies in malaria-endemic regions.

      Weaknesses:

      (1) Due to its nature, the study lacks the ability to determine the direction of the associations found between malaria exposure and other IgG levels to unrelated pathogens.

      (2) No evaluation of the clinical Implications of the reduced IgG levels observed in the area with high malaria exposure.

      Assessment of Claims:

      The data appear to support the authors' primary claims. The strength of the evidence is limited by the observational nature of the study and the results should be interpreted in that light. Together with the currently available evidence of P. falciparum's impact on the host's immune function, this natural experiment design provides further evidence for a relationship between early malaria exposure and reduced antibody responses to other pathogens and vaccine-derived antigens. The within-Ngerenya analysis controls for geographic factors and thus enhances the quality of the evidence; there is limited physical, nutritional, and socio-economic information on factors that may have driven the observed changes.

      Impact and Utility:

      This work has fundamental implications for understanding vaccine effectiveness in malaria-endemic regions and may contribute to inform vaccination strategies. The findings, if confirmed, would suggest that children in areas of high malaria transmission may require modified immunization approaches. The dataset provides a valuable resource for future studies of malaria's immunological legacy.

      Context:

      This study builds on prior work showing acute immunosuppressive effects of malaria but uniquely attempts to demonstrate the durability of these effects years after exposure. The natural experiment design addresses limitations of previous observational studies by providing a more controlled comparison.

    3. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      This important study sought to investigate the role that early childhood malaria exposure plays in the development of antibody responses to unrelated pathogens and vaccine-derived antigens in Kenyan children. In this natural experiment, the authors compare antibody levels among children who have been exposed to different levels of malaria transmission by using protein microarray technology. Although the findings are of importance, the evidence remains incomplete, and the analysis would benefit from a more in-depth evaluation of potential confounders. With the appropriate analysis, the findings will be of great interest for global health, immunology, and vaccine development.

      We thank the editors for highlighting the need for a more comprehensive evaluation of potential confounding. We agree that this is a critical aspect of the study and have now undertaken additional analyses to address this directly.

      The original longitudinal cohort was designed to investigate the acquisition of naturally acquired immunity to malaria and did not include systematic collection of anthropometric/nutritional, environmental or socioeconomic data, precluding direct adjustment for these factors within the primary dataset. However, to assess whether there were population-level differences in these factors, we leveraged contemporaneous hospital-based surveillance data from the same geographic regions, which includes measurements of anthropometry and nutritional status (muac, weight-for-age, and height-for-age) and detailed infection diagnostics.

      Using this independent dataset, we fitted mixed-effects regression models adjusting for age, calendar year, and concurrent infections (RSV, parainfluenza, influenza A, human metapneumovirus, OC43). For all three anthropometric indices, we found no evidence of systematic differences between children from Junju and Ngerenya. Adjusted differences were small and centred around zero (muac: −0.12, 95% CI −0.38 to 0.15, weight-for-age: −0.05, −0.28 to 0.19, height-for-age: 0.08, −0.17 to 0.33), with no consistent directional effect.

      As the longitudinal cohort was randomly selected from these underlying populations, these findings suggest that the groups were broadly comparable with respect to nutritional status and there were no differences in their exposure to the infections that were included in the analysis. We have incorporated these analyses into the revised manuscript, added a new figure focussed on this analysis -fig. 6, updated the statistical analysis and discussion sections), and believe they substantially strengthen the evidence by addressing a key source of potential confounding.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The study shows that childhood malaria can weaken the antibody response to other vaccines and infections. This suggests that early exposure to P. falciparum may have a long-lasting effect on immunity, with implications for vaccine efficacy in endemic areas.

      Strengths:

      This study stands out for its longitudinal design, the use of robust immunological techniques, and the comparison between areas with different levels of malaria exposure. Its findings reveal that early malaria can weaken the response to childhood vaccines, with important implications for public health in endemic regions.

      We thank the reviewer for this comment

      Weaknesses:

      One of the study's main limitations is the lack of functional data confirming the clinical impact of the low antibody levels. Furthermore, although multiple immune responses were measured, other important components, such as cellular immunity, were not assessed. Furthermore, the results may not be generalizable to other regions.

      We thank the reviewer for this important comment and agree that the absence of functional immunological assays is a limitation of the current study. Our analysis was designed to determine whether early-life malaria exposure is associated with durable alterations in antibody responses to unrelated pathogens and vaccine antigens, rather than to establish the downstream functional consequences of these differences. As such, the study is able to demonstrate a broad and persistent attenuation of humoral responses but cannot directly determine whether the lower antibody levels observed translate into reduced neutralising capacity or diminished protection at the individual level.

      We have revised the manuscript to make this distinction more explicit. In the revised discussion, we now state that although reduced antibody titres to vaccine-preventable pathogens may have implications for long-term protection, the clinical significance of these differences remains to be established in future studies incorporating functional assays and clinical outcome data.

      Reviewer #2 (Public review):

      Summary:

      The authors investigated whether early-life malaria exposure has long-term effects on immune responses to unrelated antigens. They leveraged a natural experiment in coastal Kenya where two adjacent communities (Junju and Ngerenya) experienced divergent malaria transmission patterns after 2004. Using 15 years of longitudinal data from 123 children with weekly malaria surveillance and annual serological sampling, they measured antibody responses to multiple pathogens using a protein microarray technology and ELISA.

      Strengths:

      (1) Extensive longitudinal data collection with weekly malaria surveillance, enabling precise exposure classification.

      (2) Use of a natural experiment design that allows for causal inference about malaria's immunological effects.

      (3) Broad panel of antigens tested, demonstrating generalized rather than antigen-specific effects.

      (4) Within-cohort analysis in Ngerenya controls for geographic and environmental factors.

      (5) Validation of key findings using both serologic microarray and ELISA.

      (6) Important public health implications for vaccine strategies in malaria-endemic regions.

      We thank the reviewer for these comments

      Weaknesses:

      (1) Lack of participants' characteristics (socio-economic, nutritional, physical).

      We thank the reviewer for this important comment. We have now included a detailed summary of participant characteristics in Table 1to provide context for the study population. This includes key demographic and longitudinal variables stratified by cohort (Junju and Ngerenya), including sex distribution, age at study entry and exit, duration of follow-up, number of visits per participant, and total serum samples analysed. Detailed data on socio-economic status, nutritional status, and other environmental or physical characteristics were not consistently available across all participants and time points, and therefore could not be included. This has now been explicitly stated as a limitation in the discussion.

      (2) Somewhat limited sample size (longitudinal analysis of 123 children total), with further subdivision reducing statistical power for some analyses.

      We thank the reviewer for this important observation. The study is based on an intensively followed cohort with weekly malaria surveillance and repeated serological measurements throughout childhood, allowing detailed characterisation of early-life exposure and subsequent immune trajectories. This depth of longitudinal sampling provides resolution that is not achievable in larger cross-sectional studies. We acknowledge that subdivision of the cohort reduces statistical power for some analyses. Nevertheless, the key findings were consistent in several independent comparisons, including a reduction in antibody levels for broad panel of antigens in the malaria endemic setting, within-cohort analyses in Ngerenya that replicated this observation, and the confirmation of results generated on the protein microarray on the ELISA platform. The consistency of these findings across analytical approaches and measurement platforms reduces the likelihood that the observed effects are driven by small-sample variability. We have clarified this point in the revised discussion to emphasise that the strength of the study lies in the depth and longitudinal resolution of the data rather than the absolute sample size.

      (3) Potential confounding by unmeasured socioeconomic, nutritional, or environmental factors between communities.

      We thank the reviewer for this important point and agree that residual confounding between communities must be considered. As outlined in reponse to the editorial assesment, we have undertaken additional analyses using contemporaneous population-level data from the same regions and found no evidence of systematic differences in anthropometric indices between children from Junju and Ngerenya after accounting for age, calendar year, and concurrent infections, with effect estimates small and crossing zer. In addition, the within-Ngerenya analysis provides an internal comparison within a shared geographic and environmental setting, reducing the likelihood that unmeasured socioeconomic or environmental differences between communities account for the observed associations. The new analyses suggest that major population-level differences in nutritional status or infection burden are unlikely to explain the observed patterns. We have clarified this point in the revised discussion and explicitly acknowledge the possibility of residual confounding.

      (4) Lack of ability to determine the direction of the associations found between malaria exposure and other IgG levels to unrelated pathogens.

      We agree that, as an observational study, our analysis cannot definitively establish the direction of the association between malaria exposure and antibody responses to unrelated antigens. However, several features of the study design strengthen the inference that early-life malaria exposure contributes to the observed differences. First, malaria exposure was characterised prospectively through intensive weekly surveillance, allowing precise temporal definition of exposure in early childhood. Second, within the Ngerenya cohort, where children were exposed to different levels of malaria due to a rapid decline in transmission, those with even limited early-life exposure exhibited lower antibody responses at 10 years of age than malaria-naïve peers, despite residing in the same geographic and environmental context. In addition, we now show that these differences are not confined to a single timepoint but are evident across the full longitudinal follow-up after adjustment for age and repeated measurements. While we cannot exclude the possibility of residual confounding or bidirectional relationships, the convergence of evidence from the natural experiment design, within-cohort contrasts, and age-adjusted longitudinal analyses supports early-life malaria exposure as a key contributor to long-term differences in antibody responses. We have clarified this in the discussion.

      (5) Despite good longitudinal data, the main analysis was conducted as a cross-sectional analysis at age 10 for many comparisons, which limits the understanding of temporal dynamics.

      We thank the reviewer for highlighting this point. While age 10 was initially used as a standardised reference point for cross-sectional comparisons, the underlying dataset is longitudinal, with repeated antibody measurements across childhood. To address this more directly, we have now complemented these analyses with antigen-specific mixed-effects regression models incorporating all available longitudinal data, with adjustment for age and a random intercept for repeated measurements within individuals. These models demonstrate that the differences between cohorts are not confined to the age-10 cross-section but are evident in an age-adjusted longitudinal framework for multiple antigens. We have retained the age-10 comparisons for reference, but the primary inference is now based on the longitudinal mixed-effects analyses. These changes are reflected in the revised results and statistical analysis sections. We thank the reviewer for this astute point, which we think has substantially improved the manuscript.

      (6) Statistical analysis is limited to univariable comparisons without consideration for confounders or adjusting for multiple comparisons.

      We agree that the original analyses relied primarily on univariable comparisons. In the revised manuscript, we have extended the analytical framework to include mixed-effects regression models that account for age effects and repeated measurements within individuals. These models estimate the average age-adjusted difference in antibody responses between cohorts across the full follow-up period. We have also applied false discovery rate (FDR) correction to account for multiple antigen testing. For multiple antigens, the direction and magnitude of cohort differences remain consistent under this approach, strengthening the robustness of the findings beyond the original univariable comparisons. These analyses have been incorporated into the revised results and statistical analysis sections.

      (7) No mechanistic understanding of how early malaria exposure creates lasting immunosuppression.

      We agree that this study does not directly resolve the mechanistic basis underlying the observed long-term differences in antibody responses. The primary aim of this work was to identify and characterise durable alterations in humoral immune profiles associated with early-life malaria exposure, rather than to define the cellular or molecular pathways involved. However, our findings are consistent with a growing body of experimental and clinical literature suggesting that malaria infection can induce sustained perturbations in B cell and T cell compartments, including the expansion of atypical memory B cells, altered germinal centre responses, and increased regulatory immune activity. These mechanisms have been proposed to impair the generation and maintenance of effective humoral immunity. In the revised discussion, we have clarified that the mechanistic basis of this phenomenon remains to be fully defined and have expanded the discussion of plausible pathways in light of existing literature. We now explicitly position our findings as providing population-level evidence of a durable immunological phenotype that warrants further mechanistic investigation.

      (8) No understanding of the clinical Implications of the reduced IgG levels observed in the area with high malaria exposure.

      We agree that this study does not directly establish the clinical consequences of the reduced antibody levels observed in malaria-exposed children. The primary objective of this study was to characterise long-term differences in humoral immune profiles associated with early-life malaria exposure, rather than to assess downstream clinical outcomes or functional antibody activity. We have clarified this limitation in the revised discussion. Nevertheless, the breadth and consistency of the observed differences for multiple vaccine-preventable and infectious antigens raise the possibility that early-life malaria exposure may have implications for long-term immune protection. We now emphasise in the revised discussion that future studies incorporating functional assays and clinical outcome data will be required to determine whether these serological differences translate into altered susceptibility to infection or reduced vaccine effectiveness.

      Assessment of Claims:

      The data appear to support the authors' primary claims, but the strength of the evidence is limited, and the results should be interpreted with caution. Together with the currently available evidence of P. falciparum's impact on the host's immune function, this natural experiment design provides further evidence for a relationship between early malaria exposure and reduced antibody responses. The within-Ngerenya analysis controls for geographic factors and thus enhances the quality of the evidence, however, it still fails to account for the physical, nutritional, and socio-economic factors that may have driven the observed changes. Additionally, the mechanism underlying this effect remains unclear, and the clinical significance of reduced antibody levels is not established.

      We thank the reviewer for this assessment and for recognising the strengths of the natural experiment design and within-cohort analyses. We agree that, as an observational study, our findings should be interpreted appropriately. In the revised manuscript, we have undertaken additional analyses and clarifications to strengthen the evidential basis of our conclusions and to address the points raised. To address potential confounding by nutritional and related factors, we analysed contemporaneous hospital-based surveillance data from the same geographic populations since nutritional and socioeconomic variables were not consistently collected during the course of longitudinal follow up. For three independent anthropometric indices of nutrition status (muac, weight-for-age, and height-for-age), we found no evidence of systematic differences between children from Junju and Ngerenya after adjustment for age, calendar year, and concurrent infections. As the longitudinal cohort subjects were randomly drawn from these populations, these findings suggest that the two groups were broadly comparable with respect to early-life growth and nutritional status.

      We agree that the mechanistic basis of the observed differences is not resolved in this observational study. We have clarified this point in the revised discussion and expanded our consideration of plausible biological pathways based on existing literature, including perturbations in B cell and T cell compartments. Similarly, we now explicitly state that the clinical implications of reduced antibody levels remain to be determined and will require studies incorporating functional assays and clinical outcomes. We believe these revisions strengthen the manuscript by providing a more comprehensive interpretation of the data.

      Impact and Utility:

      This work has fundamental implications for understanding vaccine effectiveness in malaria-endemic regions and may contribute to informing vaccination strategies. The findings, if strengthened, would suggest that children in areas of high malaria transmission may require modified immunization approaches. The dataset provides a valuable resource for future studies of malaria's immunological legacy.

      We thank the reviewer for this comment

      Context:

      This study builds on prior work showing acute immunosuppressive effects of malaria but uniquely attempts to demonstrate the durability of these effects years after exposure. The natural experiment design addresses limitations of previous observational studies by providing a more controlled comparison.

      We thank the reviewer for this comment

      Recommendations for the authors:

      Reviewing Editor Comments:

      We suggest that further analyses of potential confounders such as anthropometric indices, socioeconomic status, and comorbidities would render the evidence more robust.

      We thank the Reviewing Editor for this important suggestion. We agree that careful consideration of potential confounding factors is critical to the interpretation of these findings, and have undertaken additional analyses to address this.

      Because anthropometric and related socioeconomic measurements were not collected systematically within the original longitudinal malaria cohort, we assessed potential population-level differences using hospital-based surveillance data from the same geographic regions. This dataset includes measurements of anthropometry (mid-upper arm circumference, weight-for-age, and height-for-age) as well as detailed infection diagnostics in childhood. Using these data, we fitted regression models adjusting for age, calendar year, and concurrent, clinically diagnosed infections. For all three anthropometric indices, we found no evidence of systematic differences between children from Junju and Ngerenya, with effect estimates small and crossing zero (fig. 6). As the longitudinal cohorts were randomly selected from these populations, these findings suggest that the groups were broadly comparable with respect to nutritional status and infection exposure. With respect to socioeconomic status and comorbidities, detailed individual-level data were not available within the longitudinal cohort. However, the within-Ngerenya analysis, where children with differing early-life malaria exposure were compared within the same geographic and environmental setting, provides a complementary control for these factors. We have incorporated these additional analyses and clarifications into the revised manuscript statistical analysis, discussion lines and believe they strengthen the robustness of the findings by addressing key sources of potential confounding.

      Reviewer #1 (Recommendations for the authors):

      The manuscript is well written, with clear and informative figures that effectively support the findings.

      We thank the reviewer for this comment

      Suggestions:

      (1) Although the study well controlled for malaria exposure, other environmental or infectious factors that influence immunity could be considered:

      Nutritional status in childhood (malnutrition impacts immune response), co-infections (helminths, respiratory viruses), socioeconomic differences, or differences in access to health services, even minimal, between Junju and Ngerenya.

      We thank the reviewer for highlighting the potential influence of environmental, infectious, and socioeconomic factors on immune responses. We agree that these are important considerations in the interpretation of observational data. To address nutritional status and concurrent infectious exposures, we analysed contemporaneous hospital-based surveillance data from the same geographic populations. This dataset includes measurements of anthropometric indices (mid-upper arm circumference, weight-for-age, and height-for-age) alongside detailed clinical diagnostics for common childhood infections. Using regression models adjusting for age, calendar year, and concurrent infections, we found no evidence of systematic differences in anthropometric profiles between children from Junju and Ngerenya (fig. 6). These findings suggest that the populations from which the longitudinal cohorts were randomly selected were comparable with regard to early-life growth and nutritional status. We agree that we cannot fully exclude the influence of unmeasured factors such as helminth infections, socioeconomic variation, or subtle differences in healthcare access. However, the within-Ngerenya analysis, where children with differing early-life malaria exposure were compared within the same geographic, environmental, and healthcare setting, provides an internal control for many of these factors. The persistence of similar patterns within this setting supports malaria exposure as a key contributor of the observed differences. We have clarified these considerations in the revised discussion and believe that, the additional analyses and within-cohort comparisons strengthen the robustness of our conclusions while acknowledging the limitations inherent to observational studies.

      (2) Measurement of other immunological markers:

      In addition to IgG, include: B cell subpopulations (naive, memory, atypical), cytokine levels (IL-10, IFN-γ) to characterize the immunological microenvironment.

      You could include these recommendations in the text for future studies.

      We thank the reviewer for this thoughtful suggestion. We agree that detailed immunophenotyping, including characterisation of B cell subpopulations and cytokine profiles, would provide important insight into the mechanisms underlying the observed differences in antibody responses. In the revised manuscript, we have expanded the discussion to highlight these important avenues for future work, including the potential role of altered B cell subsets (and regulatory or inflammatory cytokine environments in shaping long-term humoral responses).

      Reviewer #2 (Recommendations for the authors):

      The manuscript is well-written.

      We thank the reviewer for this comment

      (1) Methodological Clarifications:

      Do the authors have any information regarding the characteristics of these children that could be of use in understanding their immune responses better? (e.g., weight, height, BMI, socioeconomic status, HB level, access to health care, etc.).

      We thank the reviewer for highlighting the importance of participant characteristics in interpreting immune responses. Anthropometric and related clinical measures were not collected systematically within the original longitudinal malaria cohort, as the study was designed to investigate the acquisition of naturally acquired immunity to malaria.

      To address this, we analysed contemporaneous hospital-based surveillance data from the same geographic populations, which include measurements of anthropometric indices (mid-upper arm circumference, weight-for-age, and height-for-age) alongside detailed infection diagnostics. Using regression models adjusting for age, calendar year, and concurrent infections, we found no evidence of systematic differences in anthropometric profiles between children from Junju and Ngerenya (fig. 6) Detailed individual-level data on socioeconomic status, haemoglobin levels, and healthcare access were not available within the longitudinal cohort impeding direct adjustment in the longitudinal cohorts. However, the within-Ngerenya analysis, where children with differing early-life malaria exposure were compared within the same geographic and healthcare setting, provides an internal control for many of these factors. These considerations are now clarified in the revised discussion.

      Could the authors provide more detailed statistical analysis, including power calculations and multiple comparison corrections?

      In the revised manuscript, we have extended the statistical analysis and now include antigen-specific mixed-effects regression models incorporating all available longitudinal measurements, which is comprehensively described in the statistical analysis section. We have also applied false discovery rate (FDR) correction to account for multiple testing across antigens, and report both unadjusted and FDR-adjusted significance in the revised results. With respect to power, the sample size was determined by the number of children meeting inclusion criteria within the long-term surveillance cohorts in terms of availability of a sufficient number of longitudinal samples. We have clarified this in the revised manuscript.

      Clarify the criteria for selecting the 123-child subset from the larger surveillance cohorts.

      We thank the reviewer for this comment. The 123 children included in this analysis were selected from the larger surveillance cohorts based on the availability of sufficiently dense longitudinal serum sampling as described above. Specifically, children were required to have at least eight longitudinal samples available in the archive, enabling robust assessment of within-individual antibody trends over time. This criterion was applied to ensure adequate temporal resolution to examine the long-term stability of malaria-associated effects on antibody responses. Children with fewer available samples were therefore excluded, as limited sampling would not allow reliable characterisation of longitudinal patterns. We have clarified these inclusion criteria in the revised manuscript.

      (2) Additional Analyses and Data Presentation:

      The authors could consider dose-response analyses relating malaria episode frequency/timing to degree of immunosuppression or even AMA-1 IgG levels and degree of immunosuppression. How do they associate over time?

      We thank the reviewer for this suggestion. To address this, we examined the relationship between malaria exposure (using cumulative febrile malaria episode count derived from longitudinal surveillance data) and the magnitude of heterologous antibody responses. In mixed-effects models adjusting for age and repeated antibody measurements, higher malaria episode burden was associated with lower antibody responses against multiple antigens (fig 7).

      Analyze whether the effects vary by specific age at malaria exposure.

      We agree that age at exposure is an important consideration. We have now assessed how the relationship between malaria burden and antibody responses varies with age by including age as a non-linear term and modelling interactions between malaria exposure and age as described above. These analyses did not suggest substantial heterogeneity in the association over age, and therefore we have retained the simpler presentation for clarity.

      Provide correlation analyses between different antibody responses to assess whether suppression is generalized.

      We have addressed this by modelling responses jointly across a panel of heterologous antigens and by examining antigen-specific associations. The direction of effect was consistent for the majority of antigens, with no evidence of opposing trends, supporting a broad rather than antigen-specific effect.

      The authors could consider moving Figures 2a and b to the supplementary material.

      We thank the reviewer for this suggestion. We carefully considered whether panels 2a and 2b could be moved to the supplementary material. However, we have retained them in the main text because they provide a simple, intuitive illustration of how AMA1 antibody responses track with malaria exposure at the individual level, complementing the population-level analysis shown in fig. 2c. We feel that this helps establish the biological validity of the microarray platform in a way that is immediately interpretable to the reader, and therefore supports the interpretation of subsequent analyses.

      The authors could consider replacing Figures 3a and b with IgG levels from ALL vaccinated children and ALL non-vaccinated children.

      We thank the reviewer for this suggestion. We would like to retain these figures for the same reasons that have been articulated above for figures 2a and b.

      (3) Discussion Enhancements:

      The authors should consider expanding the discussion to address the limitations of the data more thoroughly, particularly regarding the potential differences between cohorts that could have contributed to the results.

      We have expanded the discussion to more explicitly address potential differences between cohorts that could contribute to the observed findings, including nutritional, socioeconomic, and environmental factors.

      The discussion needs to acknowledge the lack of directionality for the associations observed. As stated above, although I agree in general terms with the observations that the authors have made, it is not possible to distinguish between a suppressive effect of malaria on immune responses to infection-derived pathogens or a protective effect of malaria that leads to less exposure to infection-derived pathogens (and consequently lower IgG levels). The mechanisms behind these could include things like different health-seeking behaviors or social interactions from kids who have malaria versus those who don't, for example.

      We agree that, as an observational study, we cannot definitively establish the direction of the association between malaria exposure and antibody responses to unrelated antigens. We have now clarified this limitation explicitly in the discussion. We acknowledge the alternative interpretations raised by the reviewer, including the possibility that differences in exposure to other pathogens, potentially driven by behavioural, environmental or healthcare-related factors, could contribute to the observed patterns. At the same time, we note that the natural experiment design, prospective malaria exposure classification, and within-Ngerenya comparisons support early-life malaria exposure as a key contributing factor. We have revised the discussion to reflect this balance.

      Extend the discussion of potential biological mechanisms underlying durable immunosuppression.

      We thank the reviewer for this suggestion. We have expanded the discussion to more fully consider potential biological mechanisms that could underlie the observed long-term differences in antibody responses. Specifically, we now discuss evidence from prior studies indicating that malaria infection can induce sustained alterations in B cell and T cell compartments, including expansion of atypical memory B cells, disruption of germinal centre responses, and increased regulatory immune activity. We position our findings as providing population-level evidence of a durable immunological phenotype, while noting that targeted mechanistic studies will be required to define the underlying pathways.

      Extend the discussion around the clinical implications of the observed antibody level differences.

      In the revised discussion, we highlight that studies incorporating functional assays and clinical outcome data will be required to determine whether these serological differences translate into altered susceptibility to infection or reduced vaccine effectiveness.

      (4) Technical Issues:

      Could the authors please:

      (1) Clarify microarray data processing and quality control procedures.

      We thank the reviewer for this request. We have expanded the methods section to provide additional detail on microarray data processing and quality control procedures.

      (2) Provide information on inter-assay variability and batch effects.

      We have expanded the methods section to clarify how these were evaluated and addressed. Inter-assay variability was monitored using pooled adult serum included on every slide as a consistent positive control. This allowed us to assess slide-to-slide consistency in signal detection across the full antigen panel. In addition, fluorophore-conjugated IgG and IgA controls were printed directly onto each miniarray to confirm scanner performance independently of antigen–antibody interactions. At the sample level, each specimen was assayed on two independent miniarrays per slide, generating four spatially separated replicate measurements per antigen. Technical variability was quantified using the coefficient of variation (CV), and measurements with CV >20% were excluded from downstream analyses.

      (3) Include details on how missing data were handled in longitudinal analyses.

      We thank the reviewer for highlighting this point. We have added clarification in the statistical analysis section describing how missing data were handled. Specifically, mixed-effects models were used, which accommodate unbalanced longitudinal data without requiring imputation, allowing all available observations to contribute to the analysis.

      (4) Include details of the parameters of the LOWESS analysis shown in Figure 1.

      We have expanded the figure 1 legend to include the parameters used for the loess smoothing shown, including the smoothing span.

      (5) Include details of the samples used for Figure 3d (Negative and Pooled Adult Serum).

      We have clarified in the methods the nature and purpose of the samples used in Figure 3d. The negative control consisted of phosphate-buffered saline applied to a full miniarray in place of serum, allowing assessment of background and non-specific signal in the absence of antibody binding. The pooled adult serum comprised a composite of sera from multiple healthy adults from the same setting and was included as a positive reference sample, expected to contain a broad repertoire of antigen-specific antibodies. These controls were included on each slide to enable interpretation of assay performance, with the negative control defining baseline signal and the pooled adult serum providing a consistent reference for antigen recognition across the microarray.

    1. eLife Assessment

      This valuable study identifies a brown adipose tissue-specific heat shock factor 1 (HSF1)-alcohol dehydrogenase 5 (ADH5) axis that regulates oxidative stress and cellular senescence during aging. The authors show that ADH5 deficiency drives BAT dysfunction and contributes to organismal health decline in aged mice. The evidence is convincing, and the work will be of broad interest to the adipose tissue and aging research communities.

    2. Reviewer #1 (Public review):

      Sebag et al. addressed the role of ADH5 in BAT in the development of aging and metabolic disarrangements associated with it. This is a follow-up study after the authors' demonstration of the role of BAT ADH5 in glucose homeostasis, obesity, and cold tolerance. By ablating ADH5 specifically in brown adipocytes or pharmacologically modulating ADH5 through activation of its transcription factor, the authors conclude that preservation of BAT function is crucial for healthy aging and ADH5 is causally involved in this process. The topic is appealing given the rise in the aging population and the unclear role of BAT function in this process. Overall, the study uses several techniques and addresses several physiological and molecular manifestations of aging. Therefore, the findings contribute to the growing body of literature pointing to the biological role of BAT activity in aging.

      Comments on revised version:

      I have no further comments other than to congratulate the authors on the nice piece of work.

    3. Reviewer #2 (Public review):

      Summary:

      This study investigates the role of the enzyme Alcohol Dehydrogenase 5 (ADH5) in brown adipose tissue (BAT) during aging. BAT is crucial for thermogenesis and energy balance, but its function and mass diminish with age, contributing to metabolic dysfunction and age-related diseases. ADH5, also known as S-nitrosoglutathione reductase, regulates nitric oxide (NO) signaling by removing damaging S-nitrosylation modifications from proteins. The authors show that aging in mice leads to increased protein S-nitrosylation associated with a combination of increased Nos2 expression and reduced ADH5 expression in BAT, resulting in impaired metabolic and cognitive functions. Deletion of ADH5 in BAT accelerates tissue senescence and systemic metabolic decline. Mechanistically, aging suppresses ADH5 via downregulation of heat shock factor 1 (HSF1), a master regulator of protein homeostasis. Importantly, pharmacologically boosting HSF1 improves BAT function and mitigates both metabolic and cognitive declines in aged mice. The findings highlight a critical HSF1-ADH5 pathway in BAT that protects against aging-related dysfunction, suggesting that targeting this pathway may offer new therapeutic strategies for improving metabolic health and cognition during aging.

      Strengths:

      This research provides insight into the interplay between redox biology, proteostasis, and metabolic decline in aging. By showing that age regulates genes that control SNO status in BAT and further developing a therapy to target ADH5 in BAT to prevent age related decline, the authors have identified a putative mechanism to combat age related decline in BAT function.

      Weaknesses:

      None identified.

      Comments on revised version:

      Congratulations to the authors for this interesting manuscript. I don't want to pat myself on the back, but I found the increased Nos2 expression in Figure 1C of the revised manuscript very satisfying, as it reinforces the shift in the regulation of SNO status that happens in BAT with aging. I appreciate the authors addressing this suggestion.

    4. Author response:

      The following is the authors’ response to the original reviews.

      We sincerely thank the reviewers for their thoughtful and constructive comments. We fully agree that when two independent variables (genotype and age) are being evaluated, the statistical analysis must appropriately account for both factors and their potential interaction. We appreciate the reviewers’ guidance in strengthening the statistical rigor of our study.

      In response to this concern, we have carefully reanalyzed the relevant datasets using two-way ANOVA to properly assess the effects of genotype, age, and their interaction. The manuscript, figures, and figure legends have been revised accordingly. Specifically:

      Figure 1:

      The quantification of p16 expression in Fig. 1F has been reanalyzed using two-way ANOVA. The figure has been replotted, and the corresponding legend has been updated to reflect the revised statistical approach.

      Figure 2:

      The quantification of AUC in Fig. 2F has been reanalyzed using two-way ANOVA. The figure and legend have been updated accordingly.

      Figure 3:

      The quantification of F4/80 in Fig. 3C and 3D has been reanalyzed using two-way ANOVA. The figures and corresponding legends have been revised to reflect this updated analysis.

      Public Reviews:

      Reviewer #1 (Public review):

      Sebag et al. addressed the role of ADH5 in BAT in the development of aging and metabolic disarrangements associated with it. This is a follow-up study after the authors' demonstration of the role of BAT ADH5 in glucose homeostasis, obesity, and cold tolerance. By ablating ADH5 specifically in brown adipocytes or pharmacologically modulating ADH5 through activation of its transcription factor, the authors conclude that preservation of BAT function is crucial for healthy aging and ADH5 is causally involved in this process. The topic is appealing given the rise in the aging population and the unclear role of BAT function in this process. Overall, the study uses several techniques, is easy to follow, and addresses several physiological and molecular manifestations of aging. However, the study lacks an appropriate statistical analysis, which severely affects the conclusions of the work. Therefore, interpretation of the findings is limited and must be done with caution.

      We sincerely thank the reviewer for their thoughtful and constructive comments. We fully agree that when two independent variables (genotype and age) are being evaluated, the statistical analysis must appropriately account for both factors and their potential interaction. We appreciate the reviewers’ guidance in strengthening the statistical rigor of our study.

      Reviewer #2 (Public review):

      Weaknesses:

      (1) Sex needs to be considered as a biological variable, at a minimum in the reporting of the phenotypes observed in this manuscript, but also potentially by further experimentation. The only mention of sex I could find is that the authors reported the general protein SNO status in BAT is increased with age in male C57Bl/6J mice. Is this also true in female mice?

      We thank the reviewer for this insightful comment. In response, we examined whether aging affects Hsf1 and Adh5 transcript levels in wild-type female mice (3 months vs. 19 months). Our analysis did not reveal significant age-associated changes in the expression of either gene. These results have now been incorporated into the revised manuscript and are presented in Figure 4A.

      (2) It would be helpful to know the extent of ADH5 loss in the adipose tissue of knockout mice, either by mRNA or by immunoblotting for ADH5. It could also be helpful to know if ADH5 is deleted from the inguinal adipose tissue of these mice, especially since they seem to accumulate fat mass as they age (Figure 2B).

      We thank the reviewer for this suggestion. Indeed, we have previously measured ADH5 expression in both brown adipose tissue (BAT) and inguinal white adipose tissue (iWAT). These data were published in Cell Reports (PMID: 3478865).

      (3) For Figure 4D, it's not clear how these BAT samples were treated with HSF1A - was this done in vivo or ex vivo?

      We thank the reviewer for their thoughtful comment. We have now provided additional methodological details in the revised manuscript. In Figure 4D (current Figure 4E), BAT was collected from wild-type mice and cultured ex vivo as explants. The BAT explants were treated for 24 hours with HSF1A (an HSF1 activator; 20 µM). Following treatment, mRNA levels of the indicated genes were measured by RT-qPCR.

      (4) I didn't understand what was on the y-axis in Figure 5A, nor how it was measured.

      We apologize for not making these critical points clearer in the initial submission. Figure 5A shows the release profiles of HSF1A from collagen gels with nanoclay (Collagen–NC–HSF1A) and without nanoclay (Collagen–HSF1A), determined using an established standard curve method (Hu et al., PMID: 33225042).

      The concentration of HSF1A was quantified by UV–Vis spectroscopy. Briefly, a standard curve for HSF1A was generated by measuring the UV–Vis spectra of HSF1A at known concentrations (1.25, 2.5, 5, 10, and 20 µM) prepared in phosphate-buffered saline (PBS). Collagen gels with or without nanoclay were then fabricated to evaluate the release profile. At predetermined time points (1, 5, 9, 14, and 21 days), the PBS supernatant from each sample was collected and analyzed by UV–Vis spectroscopy. The amount of released HSF1A was calculated using the previously established standard curves. A brief description has now been included in the figure legend.

      (6) Figure 1B: What is the age of the positive (ADH5BKO) and negative (Adh5 fl) mice?

      We regret that we did not describe our results clearly in the first submission and have included detailed information in the revised manuscript.

      (7) Figure 1F: Can you clarify what I'm looking at in the P16ink4a panels? The red staining? Is the blue staining DAPI? This is also a problem in Figures 3C, 3D and 5G, and 5I. Figure 4B looks great - maybe this could be used as an example?

      We regret that we did not present results clearly in the first submission and have provided detailed information in these figures in the revised manuscript.

      (8) Figure 3B looks a bit odd. Can the approach to measuring IL-1β be clarified, and could the authors explain why they can't show units of mass for IL-1β levels?

      We have provided information in the revised manuscript.

      (9) What are the levels of nitric oxide synthase in the BAT of the aging model? Since protein S-nitrosylation is regulated by a balance of both, the attribution of greater protein S-nitrosylation to ADH5 is incomplete without determining nitric oxide synthase.

      We thank the reviewer for this thoughtful comment. In response, we have now included the analysis of iNOS transcript expression levels in the revised manuscript. These data are presented in Figure 1C.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      (2) Presentation of metabolomics is not appropriate. The authors described, using color coding, the metabolites up- or downregulated in the experimental design. However, the current approach does not allow the reader to detect sample size, magnitude of changes, variability of the data, p-values, etc. This approach does not follow the standard practices of scientific rigor and should be modified. Metabolomic data could be uploaded as supplementary data, or a table with all necessary information to allow a full interpretation of the data should be provided.

      We have now provided the the metabolimic data in a table format as Figure 3I.

      (6) What are the levels of nitric oxide synthase in the BAT of the aging model? Since protein S-nitrosylation is regulated by a balance of both, the attribution of greater protein S-nitrosylation to ADH5 is incomplete without determining nitric oxide synthase.

      We thank the reviewer for their thoughtful comment. We have now included iNOS transcript levels expression level in the revised manuscript (Figure 1C).

      Minor Comments:

      (1) The conclusion of the abstract is somewhat vague. I suggest the authors rewrite it to better recapitulate what was found in the study.

      We thank the reviewers for this helpful suggestion. In response, we have revised the Abstract to improve the specificity and clarity of our conclusions.

      (2) In the introduction, the authors mention that an increased level of mitochondrial ROS activates UCP1. Given that the evidence for this statement is circumstantial and not supported by the current state-of-the-art (PMID: 28710335), where it is accepted that UCP1 activation diminishes ROS production, I suggest that the authors tone down this statement or at least acknowledge conflicting findings and interpretations.

      We thank the reviewer’s insight, we have included this important notion in the introduction.

      (3) Figure 2H - It is unclear what this figure (and statistical analysis) represents. Please, improve the description of the experiment and how the data were plotted to reach such a conclusion.

      We regret that we did not present results clearly in the first submission. The trend lines show the relationship between body weight and time on rotarod. The P value is the comparison of the slope of the line between Adh5 BKO mice and Adh5 fl/fl mice. The data implicate that the heavier the BKO mouse, the less time spent on the rotarod.

      (4) Figure 2M - The unit of LV thickness is missing. Please, provide it. In addition, I am missing the other cardiac parameters obtained from the echocardiogram.

      We have included this information in Figure 2M in the revised manuscript.

      (5) Figure 2G - I believe force is not the right unit for the grip strength test. Please, revise accordingly.

      We regret that we did not describe our results clearly in the first submission. We have corrected this unit in the revised figure.

      (6) Figure 3H - What is the unit when reporting mitochondrial area?

      We regret that we did not describe our results clearly in the first submission. We have added this information in the revised figure.

      (7) Is HFS1 also downregulated in iWAT?

      We thank the reviewer for this thoughtful comment. In response, we measured Hsf1 expression in iWAT from young and aged wild-type male mice. Our analysis did not reveal any significant age-associated changes in Hsf1 expression in iWAT. These results have now been included in the revised manuscript (Figure 4C).

      (8) Can the authors explain how HFS1 expression increases upon HSF1 activation? I understand ADH5 is controlled by HSF1, but what would control HSF1 itself? Off targets?

      We thank the reviewer for this insightful comment. At present, we do not have direct mechanistic evidence to definitively support this notion, and we cannot exclude the possibility of off-target effects of HSF1A. However, previous studies have reported that the HSF1 promoter contains heat shock elements (HSEs) in humans and HSE-like domains in mice. Based on this, we speculate that activated HSF1 may enhance its own transcription through an autoregulatory or positive feedback mechanism.

    1. eLife Assessment

      The study by Reed et al. provides fundamental findings defining the topological changes that occur during tumorigenesis. These compelling findings enhance the understanding of stable long-range connections among genes that reprogram cancer-related functions.

    2. Reviewer #1 (Public review):

      Summary:

      In their manuscript, Metz Reed and colleagues present an exceptionally thorough analysis of three-dimensional genome reorganization during breast cancer progression using the well-characterized MCF10 model system. The integration of high-resolution Micro-C contact maps with multi-omics profiling provides compelling insights into stage-specific dynamics of chromatin compartments, TAD boundaries, and looping events. The discovery that stable chromatin loops enable epigenetic reprogramming of cancer genes while structural changes selectively drive metastasis-associated pathways represents a significant conceptual advance. This work substantially deepens our understanding of genome topology in malignancy.

      Strengths:

      This work sets a benchmark for integrative 3D genomics in oncology. Its methodological sophistication and conceptual advances establish a new paradigm for studying nuclear architecture in disease.

      Comments on revised version:

      The authors made a significant effort to improve the manuscript. My comments were sufficiently addressed.

    3. Reviewer #2 (Public review):

      Using the MCF10 breast cancer progression sequence, the authors combined high-resolution Micro-C chromatin conformation capture with RNA-seq and ChIP-seq to depict the sequential reorganization of compartments, topologically associated domains (TADs), and long-range loops in benign, pre-tumor, and metastatic states, and coupled these three-dimensional changes with gene expression and enhancer activity. Four main findings were: (i) chromatin structure was largely quiescent, still limiting gene output differentiation, with upregulated sites being most significantly affected; (ii) enhancer-promoter contact strength covariated with transcriptional amplitude; (iii) 127 genes gained expression with increasing chromatin contact; and (iv) progression-related genes acquired altered histone markers in distal enhancers, which remained connected by stable loops. These conclusions are widely accepted and provide strong justification for the publication of this paper.

    4. Reviewer #3 (Public review):

      Summary:

      The authors tackle an important problem- that is defining the topological changes that occur during tumorigenesis. To study this, they use an established stepwise cell model of breast cancer. A strength of their study is a careful, robust differential analysis of topological features across each cell state that is presented clearly and rigorously. They define changes in compartmentalization, TAD structure and chromatin looping. Intriguingly, when the authors integrate differential gene expression with chromatin looping, they see that most differentially regulated genes are not involved in loop changes, suggesting that changes in promoter or enhancer chromatin marks may play a bigger role in regulating transcription than differential loops. The differential topology analysis and its integration with transcription is very well done- one of the best versions of this I have read in the 3D genome field! However, the paper is framed largely as a cancer biology study and it teaches us much less about this. I am worried that some of the trends for each topologic feature are not going to be consistent across the pre-malignant-malignant-metastatic spectrum and would like the authors to soften some of their claims a bit regarding how this clarifies our understanding of cancer evolution.

      Updated comments on revision:

      There are still some issues with this paper. First, it reads descriptively. It is a series of comparisons with limited biologic insight as changes are always seen in genomics and in this case, they're often not tied back to transcription or gene regulation in cancer. Cell lines do not represent cancer faithfully and in this case should not be argued to represent malignant transformation broadly. The authors did not really soften their language as much as I think required. I would caution the authors to further qualify their results in the context of a single, clonal cell line that has undergone stepwise transformation. This is not a patient cohort analysis or frank progression. This matters because there is likely to be much more noise, not pertinent to transformation, in a cell line model. It doesn't negate the validity of the study, but this language should be adjusted appropriately. It was nice to see the authors compare gene expression data from their model to the primary tumor data, however the limited overlap is concerning that at the least patterns of transcriptional regulation in their model are not faithful to primary tumors. If this is the case, it raises concern that the topological changes are also not generalizable to cancer.

      The authors declined a number of functional assays to validate their observations (which are purely correlative). And while I see that the burden of extra experiments may be beyond the scope of this study, they must soften their language to justify the observed relationships.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Strengths:

      This work sets a benchmark for integrative 3D genomics in oncology. Its methodological sophistication and conceptual advances establish a new paradigm for studying nuclear architecture in disease.

      We appreciate the very kind words.

      Weaknesses:

      Major Issues

      (1) Functional tests would strengthen the observed links between structure and gene changes. For example, the COL12A1 gene loop formation correlates with its increased expression. Disrupting this loop using CRISPR-dCas9 at chr6 position 75280 kb could prove whether the loop causes COL12A1 activation. Such experiments would turn strong correlations into clear mechanisms.

      We agree that targeted disruption of specific loops such as COL12A1 will be important for functional validation of the causal relationships between enhancer-promoter loop formation/dissipation and changes in gene expression. However, the intent of our current study was to profile changes in genome organization at a global scale to deduce general features of cancer progression-associated changes in genome organization, rather than to explore specific loop interactions. The current findings are a foundation for more targeted functional follow-up studies.

      (2) The H3K27ac looping idea needs deeper validation. Data suggests H3K27ac loss weakens loops without affecting CTCF. Testing how cohesin proteins interact with H3K27acmodified sites would clarify this process. Degron systems could rapidly remove H3K27ac to observe real-time effects. Also, the AP-1 motifs found at dynamic loop sites deserve functional tests. Knocking down AP-1 factors might show if they control loop formation.

      We agree that modulating histone modifications or transcription factors would provide insights into the underlying mechanisms driving the changes we observed. However, such studies utilizing degrons or small molecule inhibitors that globally knock down either H3K27ac or specific transcription factors are often difficult to interpret. For example, assessing the role of AP-1 factors, as suggested, would be complicated by the variety of AP-1 proteins. In addition, H3K27ac reduction could inhibit loop strength either directly (i.e. by reducing cohesin recruitment) or indirectly (i.e. by reducing gene expression which could in turn affect loop strength). Parsing out the exact relationships between these features will require extensive follow-up work and falls outside of the scope of the current study.

      (3) Connecting findings to patient data would boost clinical relevance. The MCF10 model is excellent for controlled studies. Checking if TAD boundary weakening occurs in actual patient metastases would show real-world importance. Comparing primary and metastatic tumor samples from the same patients could reveal new structural biomarkers. If tissue is scarce, testing cancer cells with added stroma cells might mimic tumor environment effects.

      We have leveraged publicly available datasets to link the observations from the progression model to clinical samples. Specifically, we have compared our datasets to chromatin organization data in non-cancerous mammary epithelial cells (HMEC), five cell lines representing distinct cancer subtypes ranging from less (luminal) to more aggressive (triple negative, TNBC), as well as tissue samples from TNBC patients with contralateral normal controls. We explored the conservation of both loops and TADs identified in the MCF10 progression system in each of these maps, paying particular attention to how features that are differential between MCF10 cells differ across other cancer cell types. We observe a high degree of conservation of static loops and TAD boundaries among the cancer samples, as well as some degree of cell-specific changes among loops and boundaries that change during MCF10 progression. These findings are included in Supplemental Figures 3 and 4 and are discussed on page 7.

      Minor Issues

      (1) Adding a clear definition for static loops would help readers. For example, state that static loops show less than 10 percent contact change across replicates.

      Static loops are defined as loops with a fold-change of 1.5 or more between any two MCF10 cell lines and an adjusted p-value of less than 0.025 considering change across biological and technical replicates. This definition is stated on page 6).

      (2) In the ABC model analysis, removing promoter regions from the enhancer list would focus results on true long-range interactions.

      The ABC model already excludes the promoter of each gene. Only self-promoters are excluded, whereas the model allows promoters of other genes to act as potential long-range enhancers of the target gene. We have added text to make this clear (see page 11).

      (3) Briefly noting why this study sees TAD weakening while other cancer types show different patterns would provide useful context.

      The biological reason for TAD weakening in the MCF10 model is not known, but neither the mechanism for boundary weakening nor the reason for apparently different behavior amongst cancers is known. We expanded the text on this discussion slightly, but we refrain from making any definitive claims. We do note that differences in the types of cancer studied or the methods used for detecting changes in TADs (i.e. different sensitivities and thresholds for detecting change) could be responsible (see page 15). We also mention that the loss of insulation at many TAD boundaries detected in our study are subtle changes in intensity that could be potentially missed if using methods tailored to find more drastic changes in TAD architecture.

      Reviewer #2 (Public review):

      While the conclusions are broadly supported, methodological and analytical refinements are required.

      We appreciate these comments.

      (1) Model representativeness. The long-term culture-adapted MCF10 genome harbours extensive aneuploidies and translocations. Validation of key COL12A1/WNT5A loop dynamics in an independent breast-cancer line (e.g., MDA-MB-231, T47D) or in patientderived organoids/PDX models would strengthen generalizability.

      Although the generation of Micro-C datasets in additional cell lines is outside of the scope of this study, we used publicly available Hi-C data from triple negative breast cancer (TNBC) progression and patient samples (Kim, Han & Chun et al. 2022) to assess generalizability of the MCF10 model findings. While these maps are lower resolution than the Micro-C maps used in our study, they are of sufficient depth to detect loops at a similar resolution (10 kb). We report these findings in Supplemental Figures 3 and 4 and discuss them on page 7.

      We find that chromatin loops and TAD boundaries detected across the MCF10 system are highly conserved across all other mammary epithelial lines studied. Chromatin loops that were more prominent in MCF10AT1 and MCF10CA1a lines were also significantly stronger in TNBC cells. Insulation score boundaries that were weakened in MCF10CA1a showed strong insulation across all cell lines in TNBC. These findings highlight that different model systems indeed have distinct profiles of structural change, just as they have distinct gene expression profiles.

      It is worth noting that direct comparison at individual loci is complicated by variations in gene expression profiles between the MCF10 model and the TNBC progression model; for example, COL12A1 is not significantly upregulated between normal and TNBC tissues in this study (unlike in the TCGA-BRCA data) and is downregulated between HMEC and TNBC cell lines. Regardless, our analysis provides some indication of conserved and divergent features in the various model systems.

      (2) The study remains purely correlative; no perturbation experiments are conducted to demonstrate causal roles of chromatin loops on gene expression. CRISPR interference (CRISPR-Cas9-KRAB/HDAC) or enhancer deletion/inversion should be applied to 3-5 pivotal loops (e.g., COL12A1, WNT5A) to test their impact on target-gene expression and cellular phenotypes (e.g., proliferation, migration).

      We agree that targeted disruption of specific loops such as COL12A1 will be important for understanding the causal relationships between enhancer-promoter loop formation/dissipation and changes in gene expression. However, the intent of our current study was to profile changes in genome organization at a global scale to deduce general features of cancer progression-associated changes in genome organization, rather than exploring specific loop interactions. The current findings are a foundation for more targeted follow-up functional studies.

      (3) The manuscript lacks integration with clinical datasets. Integrate TCGA-BRCA data to assess whether elevated COL12A1/WNT5A expression associates with overall survival (OS) or distant metastasis-free survival (DMFS)

      To assess clinical significance of specific loci, we have queried expression of all differentially expressed genes in the MCF10 progression system among TCGA-BRCA expression data. We summarize our findings in Supp. Fig. 5E and discuss them on page 8.

      We found that roughly 25% of genes that change in our model also change significantly in breast cancer, but only roughly half of those genes change in the same direction (i.e. up-regulated in MCF10CA1a vs MCF10A, and up-regulated in tumor vs normal samples). Interestingly, there was a higher degree of directional agreement between latechanging genes (i.e. genes that change in MCF10CA1a compared to MCF10A and MCF10AT1) than early-changing genes (i.e. genes that change in MCF10AT1 and MCF10CA1a compared to MCF10A).

      We have also explored the impact of select highlighted genes on overall survival (OS). We present these data in Supp. Fig. 6 and discuss it on page 8. While not all genes showcased in this study have a significant impact on overall survival, most trend in the same direction as their differential expression would suggest (i.e. genes more highly expressed in cancer vs tumor also have a hazard ratio above 1).

      Reviewer #3 (Public review):

      The differential topology analysis and its integration with transcription is very well done- one of the best versions of this I have read in the 3D genome field!

      We appreciate the reviewers’ endorsement.

      However, the paper is framed largely as a cancer biology study, and it teaches us much less about this. I am worried that some of the trends for each topologic feature are not going to be consistent across the pre-malignant-malignant-metastatic spectrum and would like the authors to soften some of their claims a bit regarding how this clarifies our understanding of cancer evolution.

      We agree that the strength of the study lies in its deep mapping of chromatin architecture and the landscape of enhancers and differentially expressed genes, which we hope to use to better understand the relationship between chromatin structure and gene expression, regardless of their cancer relevance. To better relate the findings in the progression system to cancer, we have added new data from direct comparisons of the MCF10 progression system with multiple patient-derived cancer cell lines and cancer tissues. These data are shown in Supp. Fig. 3 and 4 and discussed on p. 7. Regardless, we have softened the claims regarding cancer progression throughout the manuscript.

      Weaknesses:

      Major Concerns:

      (1) The integration of gene expression and chromatin loops is intriguing. The authors' differential analysis, however, omits consideration of genes that are on and simply further upregulated versus genes that transition on/off or off/on. It would be nice to see the authors break out looping patterns for these two different patterns of regulation, as it may be instructive regarding the rules for how EP loops govern transcription.

      To address different types of gene expression patterns, we analyzed 108 genes that went from an unexpressed or “off” state (2 or fewer read counts) in one cell line to an expressed “on” state (100 or more read counts) in another, and 111 genes that go from “on” to “high” (1000 or more read counts). We present these data in Supp. Fig. 8 and discuss the findings on page 9. While neither of these genes were enriched for differential loops, a large number overlap with loop anchors. We found a relationship between loop strength and gene expression levels; genes that are more strongly expressed are more likely to overlap with the anchor of a chromatin loop. All gene sets show similar strong trends at distal regulatory regions.

      (2) Given the paucity of differential loops at the majority of genes whose expression changes, the authors should examine chromatin subcompartments, as these may associate more with differential transcription.

      We present subcompartment analysis in Supp. Fig. 9. Our CALDER compartment calls are qualitative rather than quantitative, so to explore this we examined how compartments change genome-wide and at specific promoters. We show these data in Supp. Fig. 9 and discuss the findings on page 10-11. We see that between any two cell types, a majority of changes occur between closely related subcompartments, i.e. from A.2.2 to A.2.1 (1 step more A-like) or B.1.1 (1 step more B-like). The promoters of differentially expressed genes have minimal subcompartment changes, but genes that shift from on to off have larger changes. Differentially expressed genes with promoters that shift by multiple subcompartments have significant impacts on fold-change, but smaller shifts have minimal impacts on gene expression. In summary, small changes in subcompartments are very common and have little impact on gene expression, while larger changes are infrequent and correlate more strongly with changes in gene expression.

      (3) The authors could push their TAD analysis further by integrating it with transcription. Can they look at genes and their enhancers that span these altered boundaries to see if these shifts impact transcription?

      We provide this analysis in Supp. Fig. 9. We started, as suggested, by looking at genes with distal enhancers (as determined by the ABC model) that span a single TAD boundary. However, the number of genes that fit this definition was relatively small, so we expanded to look at any genes with promoters in the proximity (50kb) of differential insulation score boundaries, for which we saw the same trends with more robust signal. Our findings are shown in Supp. Fig. 9 and discussed on page 10. We found that genes near weakened boundaries are not enriched for differentially expressed genes, while those near strengthened boundaries are. Comparing the fold-change of genes near strengthened, weakened, and static boundaries showed a significant inverse correlation between boundary strength and gene expression, although effect sizes were small. These results show that changes in TAD boundary insulation have small but noticeable impacts on gene expression.

      (4) The progression of cancer critically goes from a benign -> pre-malignant -> malignant -> metastatic series of steps. The AT1 line is described as 'premalignant' and thus the authors' series omits a malignant line. While I think adding such a sample is an unreasonable request at this point (as it would have had to have been studied in 'batch' with these other samples), the authors should acknowledge that they omit this step and spend some time discussing the genetic, morphologic, and phenotypic features for their 3 conditions. The images in Figure 1S aren't particularly useful- they don't tell the reader that these cells are malignant/benign. The karyotypic data are intriguing but not fully analyzed, so it is hard to know what true phenotype these cells represent. For example, malignant means DCIS/invasive carcinoma - so then what does this pre-malignant cell model represent? The described alteration in the AT1 line is a Ras oncogene, so in some sense, the transition to this line really is just +/- Ras. The authors could spend some time thinking about the effects of Ras specifically on the 3D genome.

      We have expanded our discussion of the relevance of the MCF10 model on page 4, and the limitations of the model on page 17. The MCF10 progression model has been extensively used by many laboratories, and its properties have been discussed in detail (i.e. Polizzotti et al. 2012). Critically, the MCF10AT1 cell line is the product not only of Ras oncogene expression but then derived from a 100-day-old precancerous lesion that formed a squamous carcinoma in a mouse, and over this time it accumulated additional changes. The MCF10AT1 line is considered pre-malignant as it has accrued critical changes that prepare it for the metastatic transition, but it does not immediately form tumors when injected back into mice. Unlike the MCF10DCIS cell line which is malignant but not metastatic, the more aggressive MCF10CA1a is classified as both malignant and highly metastatic, forming tumors that quickly metastasize to the lungs in mouse xenografts. While both MCF10AT1 and MCF10CA1a are tumorigenic, we acknowledge the lack of a nonmetastatic malignant cell line in the discussion on page 17. We have also provided updated karyotype characterization of the cell lines used in this study in Supp. Fig. 1B and now include full composite karyotypes in the Methods section (page 18).

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      The reviewer’s recommendations are the same as their public review comments. See our response to the review comments above.

      Reviewer #2 (Recommendations for the authors):

      (1) If conditions permit, it is recommended that inclusion of primary human mammary epithelial cells (HMECs) to distinguish immortalisation-specific from malignancy-specific 3D changes.

      Micro-C data of equal resolution is not available for HMECs. We have, however, incorporated analysis of publicly available deeply sequenced Hi-C data of HMECs into several figures that explore the conservation of loops and TADs in these cells (Supp. Fig. 3 and 4).

      We find that chromatin loops and TAD boundaries detected across the MCF10 system are highly conserved across all other mammary epithelial lines studied. Chromatin loops that were more prominent in MCF10AT1 and MCF10CA1a lines were also significantly stronger in TNBC cells. Insulation score boundaries that were weakened in MCF10CA1a showed strong insulation across all cell lines in the TNBC system. These findings highlight that different model systems indeed have distinct profiles of structural change, just as they have distinct gene expression profiles.

      (2) The relationship between loop alterations and copy-number variations (CNVs) is not explored. If conditions permit, it is recommended that overlay differential loops with SNP/Indel/CNV data to exclude spurious differences arising from structural alterations.

      While we have not conducted an in-depth SNP analysis, we have clarified our discussion of the karyotype analysis on pages 21 and 23 and how we mitigated these effects when identifying differential loops between cell lines.

      (3) The horizontal and vertical coordinates of the diagram are difficult to view; it is recommended that the size of the text on the picture be adjusted to ensure that it is clear to read. Some of the text coordinates of the figure are labeled in gray; it is recommended that they be in black.

      The clarity of the figures has been improved.

      Reviewer #3 (Recommendations for the authors):

      I really like this paper. I think if the cancer focus can be down-emphasized (because I'm not fully clear what we've really learned about cancer), then it represents a nice dataset and a thoughtful, comprehensive analysis.

      We greatly appreciate the kind words and helpful feedback. The cancer focus has been toned down throughout the manuscript, as suggested.

      Minor Concerns:

      (1) The authors present a nice summary of the topological changes across samples. However, summary statistics can mask noise/bias and also don't fully convey the effect size of the reported changes. Highlighting individual loci and visualizing these would strengthen the paper and participate in maintaining a high standard for our genomic studies of topology, in which we summarize, but also provide representative examples. I would appreciate seeing more example plots at distinct loci (even if in the supplemental information).

      We have included several more example regions in Supp. Fig. 7 and 12, including four looped genes that change similarly between the MCF10 series and TCGA-BRCA data (2 stably looped genes and 2 differentially looped genes, 2 up-regulated and 2 downregulated), and six differentially looped and differentially expressed genes (3 which change in the same direction as the loops, and 3 which change in the opposite direction).

      (2) "To identify loops that changed significantly during cancer progression, we assessed changes in contact frequency among every loop in each cell type, correcting for karyotypic differences that result in differences in coverage between cell lines (see Methods)." The Methods section is not adequately explained. Also, could you go a bit deeper to define if these large-scale changes shift the 3D genome specifically? This is hard, but there may be some low-hanging fruit given the otherwise fairly isogenic features in your model.

      We have added more detail to the Methods section on pages 21 and 23 on how karyotypic abnormalities were included in our analysis and differential loop calling. A deeper analysis of how large-scale karyotypic changes affect chromatin organization (i.e. through the formation of neoloops and TADs through translocations) is indeed an attractive subject, but due to its complexity requires a separate dedicated study.

      (3) "Approximately half of chromatin loops featured some combination of active gene promoters and enhancers within 10kb of loop anchors". The authors have high-resolution topology data and should be more stringent; these features should have to overlap loop anchors or at least use a distance less than 10kb, which, in some sense, forfeits the advantages of high-resolution topology data.

      The threshold of 10kb was chosen for several specific reasons: First, the loop sizes detected here are large enough that this relatively large region still represents a small fraction of the loop span, and these regions are reasonably considered anchor-proximal. Second, the loops we detect are non-punctate, both in aggregate analysis (Figure 1H, bottom) and at individual loci (see example regions), showing increased contact frequency among several 5kb or 10kb bins. Therefore, adding 10kb to either side (2 pixels on 5kb maps and 1 pixel on 10kb maps) ensures that the full region of increased contact frequency is included. Finally, ultra-resolution Hi-C data has also shown that loops remain diffuse even with 1kb resolution maps (albeit they do get smaller than the 30kb used here) (Harris & Gu 2023). We have added a brief justification of this overlap size to the text on page 24.

      (4) "These results show that not only changes in either contact frequency and enhancer activity correlate with increased gene expression, but they also correlate with each other, suggesting a potentially linked functional role during enhancer-promoter communication." The authors could use this opportunity to disentangle the contributions of loops and chromatin modifications a bit more. The exceptions are of interest - e.g., loop is stable, gene expression changes or loop changes, gene expression does not. Highlighting exemplar cases for these exceptions (rather than just a genomics summary) would be nice to see.

      The additional example regions we have included in Supp. Fig. 7 and 12 now showcase a wider variety of scenarios; in addition to more examples of static loops with gene expression changes (Fig. 2, Supp. Fig. 7E-F) and differential loops with matching gene expression changes (Fig. 4, Supp. Fig. 7C-D, Supp. Fig. 12A-C), we now also feature examples of differential loops where gene expression changes in the opposite direction (i.e. a strengthened loop at a down-regulated gene, Supp. Fig. 12D-F).

    1. eLife Assessment

      This study reports a novel function for syntaxin 11, a specialized SNARE protein critical for the immune system whose mutations cause familial hemophagocytic lymphohistiocytosis type 4. The data convincingly show that depletion of STX11 impairs store-operated calcium entry in Jurkat T cells and that this defect is recapitulated in primary cells from a patient suffering from the disease; the authors further show that the syntaxin interacts with the pore subunit of the ORAI1 channel and propose that it primes the channel by promoting the assembly of multimers before activation by its endogenous ligand, the ER Ca2+ sensing protein STIM1. This is a conceptually important claim that challenges the prevailing view that all structural transitions in ORAI1 are STIM-driven. The data are high-quality and broadly consistent with the interpretation, but alternative mechanisms for the defects are not considered; additional work should rule out vesicular trafficking, discuss other mechanisms, and address methodological issues.

    2. Reviewer #1 (Public review):

      Summary:

      Patients with STX11 mutations develop familial hemophagocytic lymphohistiocytosis Type 4, a fatal immune disorder marked by defective T and NK cell cytotoxicity and cytokine storm. The conventional explanation attributes this to impaired cytotoxic granule release, but this has never fully accounted for the broader disease picture. This study proposes an alternative mechanism. The authors show that STX11 is required for store-operated calcium entry through ORAI1 channels, which are essential for both cytotoxic killing and NFAT-driven gene expression in T cells. In STX11-deficient cells, ORAI1 currents drop, NFAT nuclear translocation fails, IL-2 expression is suppressed, and degranulation is impaired. These defects are largely rescued by ionomycin or a constitutively active ORAI1 mutant, placing the primary lesion at calcium signaling rather than the fusion machinery. Mechanistically, STX11 binds the C-terminal tail of ORAI1 via its Habc domain and maintains ORAI1 in a state competent for productive assembly prior to STIM1-dependent gating, a step the authors call "priming."

      Strengths:

      The paper identifies a novel and disease-relevant role for STX11 in calcium channel regulation and raises the possibility of using channel agonists as a therapeutic strategy in the disease. The biochemical and functional data are of high quality and generally consistent with the interpretation. The proposal that a non-conventional syntaxin directly interacts with ion channels to prime its activation is novel and interesting.

      Weaknesses:

      For readers to appreciate the value of patient experiments derived from a single individual, the authors should quote prior studies showing that STX11 protein levels are abolished in all known human STX11 mutations. The priming model, while functionally well-supported, rests on indirect structural evidence, and the precise conformational transition involved remains to be defined. These are acknowledged limitations, but alternate mechanisms have not been explored and formally excluded. More direct evidence should be provided to exclude the possibility that STX11 could act as a conventional SNARE and sustain calcium fluxes by promoting the delivery of additional ORAI1 channels from vesicles.

    3. Reviewer #2 (Public review):

      Summary:

      Vig's lab delineates a critical role for STX11 in CRAC channel function, particularly in the context of the fatal immune disorder familial hemophagocytic lymphohistiocytosis type 4 (FHL4). They demonstrate that Syntaxin 11 directly binds and regulates Orai1, and that STX11 depletion abolishes CRAC currents and downstream signaling. Loss of STX11 reduces IL2 gene expression and impairs degranulation, both of which are rescued by the constitutively active Orai1 mutant H134S, whereas a gain‑of‑function mutant targeting the C‑terminus fails to restore these defects. The authors conclude that STX11 primes Orai1 for optimal local assembly that is independent of STIM1 yet required for CRAC channel gating.

      Strengths:

      This study is firmly grounded in disease biology and demonstrates that STX11 downregulation leads to profound functional defects. Using a comprehensive suite of methods and analyses, the authors interrogate the co-regulation of STX11 and Orai1 and present a near-complete view of STX11's modulatory role in CRAC channel function and downstream signaling pathways. The figures are clear, and the statistical analyses are rigorous and convincing.

      Weaknesses:

      The authors conclude that Syntaxin 11 directly binds Orai1. This conclusion is well supported by a multifaceted approach, including co-immunoprecipitation (co-IP), molecular dynamics simulations, co-localization/FRET assays, and targeted mutational analysis-all of which are thoroughly executed. While the interaction appears reasonably strong in co-IP experiments, the STX11-Orai1 interaction is comparatively weaker in pull-down assays, which the authors attribute to instability of the purified His-STX11 protein. A remaining gap is direct evidence of interaction in live cells; this is understandably challenging given that fluorescent tagging of STX11 is not feasible. Fully resolving this question lies beyond the scope of the present study and will require more advanced approaches to capture STX11 binding dynamics.

    4. Author response:

      eLife Assessment

      This study reports a novel function for syntaxin 11, a specialized SNARE protein critical for the immune system whose mutations cause familial hemophagocytic lymphohistiocytosis type 4. The data convincingly show that depletion of STX11 impairs store-operated calcium entry in Jurkat T cells and that this defect is recapitulated in primary cells from a patient suffering from the disease; the authors further show that the syntaxin interacts with the pore subunit of the ORAI1 channel and propose that it primes the channel by promoting the assembly of multimers before activation by its endogenous ligand, the ER Ca2+ sensing protein STIM1. This is a conceptually important claim that challenges the prevailing view that all structural transitions in ORAI1 are STIM-driven. The data are high-quality and broadly consistent with the interpretation, but alternative mechanisms for the defects are not considered; additional work should rule out vesicular trafficking, discuss other mechanisms, and address methodological issues.

      We thank the editor and reviewers for assessing our work. Although significant amount of data in this paper already rule out any potential defects in the vesicular trafficking of Orai1 in cells lacking STX11, we will still include the additional suggested experiments. In the revised version, we will include the various experiments that we had already performed to measure vesicular trafficking and ER-PM junctions in STX11 depleted cells. We will discuss any remaining alternate explanations, include missing methods, quantifications and calibrations, where applicable, and provide response to each of the reviewer’s comments.

      Public Reviews:

      Reviewer #1 (Public review):

      Weaknesses:

      For readers to appreciate the value of patient experiments derived from a single individual, the authors should quote prior studies showing that STX11 protein levels are abolished in all known human STX11 mutations. The priming model, while functionally well-supported, rests on indirect structural evidence, and the precise conformational transition involved remains to be defined. These are acknowledged limitations, but alternate mechanisms have not been explored and formally excluded. More direct evidence should be provided to exclude the possibility that STX11 could act as a conventional SNARE and sustain calcium fluxes by promoting the delivery of additional ORAI1 channels from vesicles.

      In the revised version, we will include references for the prior STX11 human mutations that have been biochemically characterized till date (Bryceson, Rudd et al. 2007);(Muller, Chiang et al. 2014);(Macartney, Weitzman et al. 2011);(Marsh, Satake et al. 2010). As the reviewer has correctly pointed out, the STX11 protein levels were almost completely abolished in these studies. Therefore, the prior mutations are essentially comparable to the frameshift mutation characterized in this study, in terms of the mechanisms underlying the phenotypic defects reported here versus earlier. From a mechanistic point of view, we believe that our data from even a single FHLH4 patient, where STX11 levels were severely depleted, and additional knockdown studies across three different cell lines, are representative of all STX11 patients that have been reported thus far.

      Regarding the Reviewers’ concern that absence of STX11 as a conventional SNARE could affect Orai1 channel delivery from intracellular vesicles. We would like to point out the following:

      (1) In Miao et al. 2013 (Miao, Miner et al. 2013), Figure 3C-D, we conclusively showed that expression of a dominant negative mutant of NSF, a non-redundant protein in vesicle trafficking, impaired vesicle trafficking but did not impair SOCE. This experiment had essentially ruled out a role for vesicle trafficking in SOCE. In the same paper, we had also shown that Orai1 levels in the PM do not increase post-store depletion (Figure 3, figure supplement 2).

      (2) In this manuscript (Supplementary Figure 3B), we have shown that U2OS cells stably expressing Orai1-BBS-YFP have identical levels of Orai1 in the PM with and without STX11 depletion. This shows that the biosynthesis or delivery of Orai1 to the PM is not affected by STX11 depletion, another broadly classified member of the vesicle trafficking. The levels were also assessed in store-depleted U2OS cells but not included here because in Miao et al. 2013 we had already shown that levels of PM Orai1 are essentially equal in resting and store-depleted cells. In our revised submission, we will include the data from store-depleted cells in U2OS and also repeat this experiment in the other cell types used in this paper. In addition, in our revised submission, we will include three different vesicle trafficking assays performed in STX11 depleted cells.

      (3) Most importantly, in Figure 7I-J of this manuscript, we showed that calcium influx from a constitutively active mutant Orai1 (Orai H134S) is identical between STX11 depleted and scramble control cells. If wildtype Orai1 was indeed stuck in vesicles in STX11 depleted cells, then how would H134S Orai1 be able to rescue the defect in SOCE? In fact, the Orai1 mutant calcium flux assays were done using a 20X water objective, to visualize and confirm whether the expression of mutant and WT Orai1 was comparable in the PM. We will include the quantification of PM levels of Orai1 mutants w.r.t WT Orai1 in the revised paper.

      (4) We have generated and been using HEK293, U2OS and Jurkat cell lines that stably express fluorescently tagged Orai1 for most of our experiments (Miao, Miner et al. 2013); (Li, Miao et al. 2016);(Ramanagoudr-Bhojappa, Miao et al. 2021). In each case, we have never observed Orai1 in intracellular vesicles with or without store depletion. In all cases, it is constitutively and stably expressed in the PM.

      In summary, significant amount of data in this paper already rule out any potential reduction in the PM levels of Orai1 in cells lacking STX11. We will still do the additional experiments suggested by the Reviewer 1.

      Regarding STX11 induced precise conformational transition, we are trying to setup collaborations with scientists who might be able to visualize this in vivo.

      The readers should note that purification of isolated pore subunits of ion channels followed by crystallization or expression in membranes for cryo-EM is currently considered a gold standard in the analysis of ion channel pore subunits. However, we have shown that ion channels are dynamic macromolecular complexes, in vivo (Li, Miao et al. 2016), where synaptic proteins dynamically bind to induce conformational changes and affect their stoichiometry (Li, Miao et al. 2016). Please also see (Chorev, Baker et al. 2018) and (Dorwart, Wray et al. 2010). More advanced in vivo approaches therefore need to be developed to enable visualization of the dynamics of ion channel macromolecular complexes in the native environment. In the absence of such approaches, the structural insights obtained from detergent purified subunits will remain incomplete and biased.

      Reviewer #2 (Public review):

      Weaknesses:

      The authors conclude that Syntaxin 11 directly binds Orai1. This conclusion is well supported by a multifaceted approach, including co-immunoprecipitation (co-IP), molecular dynamics simulations, co-localization/FRET assays, and targeted mutational analysis-all of which are thoroughly executed. While the interaction appears reasonably strong in co-IP experiments, the STX11-Orai1 interaction is comparatively weaker in pull-down assays, which the authors attribute to instability of the purified His-STX11 protein. A remaining gap is direct evidence of interaction in live cells; this is understandably challenging given that fluorescent tagging of STX11 is not feasible. Fully resolving this question lies beyond the scope of the present study and will require more advanced approaches to capture STX11 binding dynamics.

      We thank the reviewer for acknowledging that the above studies will require standardization of advanced techniques which are beyond the scope of the present study. We plan to continue developing methods that will allow us to visualize the binding and unbinding of STX11 to Orai1 in vivo.

      References:

      Bryceson, Y. T., E. Rudd, C. Zheng, J. Edner, D. Ma, S. M. Wood, A. G. Bechensteen, J. J. Boelens, T. Celkan, R. A. Farah, K. Hultenby, J. Winiarski, P. A. Roche, M. Nordenskjold, J. I. Henter, E. O. Long and H. G. Ljunggren (2007). "Defective cytotoxic lymphocyte degranulation in syntaxin-11 deficient familial hemophagocytic lymphohistiocytosis 4 (FHL4) patients." Blood 110(6): 1906-1915.

      Chorev, D. S., L. A. Baker, D. Wu, V. Beilsten-Edmands, S. L. Rouse, T. Zeev-Ben-Mordehai, C. Jiko, F. Samsudin, C. Gerle, S. Khalid, A. G. Stewart, S. J. Matthews, K. Grunewald and C. V. Robinson (2018). "Protein assemblies ejected directly from native membranes yield complexes for mass spectrometry." Science 362(6416): 829-834.

      Dorwart, M. R., R. Wray, C. A. Brautigam, Y. Jiang and P. Blount (2010). "S. aureus MscL is a pentamer in vivo but of variable stoichiometries in vitro: implications for detergent-solubilized membrane proteins." PLoS Biol 8(12): e1000555.

      Li, P., Y. Miao, A. Dani and M. Vig (2016). "alpha-SNAP regulates dynamic, on-site assembly and calcium selectivity of Orai1 channels." Mol Biol Cell 27(16): 2542-2553.

      Macartney, C. A., S. Weitzman, S. M. Wood, D. Bansal, M. Steele, M. Meeths, M. Abdelhaleem and Y. T. Bryceson (2011). "Unusual functional manifestations of a novel STX11 frameshift mutation in two infants with familial hemophagocytic lymphohistiocytosis type 4 (FHL4)." Pediatr Blood Cancer 56(4): 654-657.

      Marsh, R. A., N. Satake, J. Biroschak, T. Jacobs, J. Johnson, M. B. Jordan, J. J. Bleesing, A. H. Filipovich and K. Zhang (2010). "STX11 mutations and clinical phenotypes of familial hemophagocytic lymphohistiocytosis in North America." Pediatr Blood Cancer 55(1): 134-140.

      Miao, Y., C. Miner, L. Zhang, P. I. Hanson, A. Dani and M. Vig (2013). "An essential and NSF independent role for alpha-SNAP in store-operated calcium entry." Elife 2: e00802.

      Muller, M. L., S. C. Chiang, M. Meeths, B. Tesi, M. Entesarian, D. Nilsson, S. M. Wood, M. Nordenskjold, J. I. Henter, A. Naqvi and Y. T. Bryceson (2014). "An N-Terminal Missense Mutation in STX11 Causative of FHL4 Abrogates Syntaxin-11 Binding to Munc18-2." Front Immunol 4: 515.

      Ramanagoudr-Bhojappa, R., Y. Miao and M. Vig (2021). "High affinity associations with alpha-SNAP enable calcium entry via Orai1 channels." PLoS One 16(10): e0258670.

    1. eLife Assessment

      This valuable study demonstrates that a multi-step differentiation programme in bacteria combining a bistable switch with two quorum-sensing systems is capable of generating autonomous and self-organized spatial patterns. The evidence for the core engineering system supporting patterning across several conditions is convincing, albeit incomplete for the stronger differentiation/maturation claims because the irreversibility of the proposed states is not consistently established, and some modelling and conceptual interpretation details require further clarification.

    2. Reviewer #1 (Public review):

      Summary:

      This paper by Boni and colleagues presents the engineering of a multi-step differentiation program in Escherichia coli based on synthetic gene circuits. The motivation behind the study was to engineer a system capable of undergoing differentiation in a step-wise manner without the presence of external spatial cues and without inducers added during the differentiation process. To achieve this, the authors created several synthetic gene circuits, one being a toggle switch, and the others being quorum-sensing-mediated gene expression modules. The outputs of the differentiation process are fluorescent proteins, which allowed the authors to quantify the behavior of the system using fluorescence intensity measurements. The authors additionally built a multi-component mathematical model which is able to reproduce the experimental data.

      The data presented are convincing and support the claims; the work is well executed.

      Strengths:

      (1) The differentiation process proceeds autonomously after the initial step in liquid culture in the presence of external inducers.

      (2) It is indeed a step-wise process.

      (3) The mathematical model predicts the outcome (% of green, blue and red FP-expressing cells in the population) when changing the initial ratio of green:blue FP-expressing cells.

      Weaknesses:

      (1) No spatial pattern emerges. There are some isolated colonies that turn on the downstream FPs, but I do not see a pattern, really. Nonetheless, some colonies do differentiate (i.e. they turn on additional FPs).

      (2) The mathematical model appears somewhat superfluous. While it can clearly reproduce the data, it is not used to make interesting predictions, changing parameters (and not initial conditions) that guide further experimental implementations.

      Future directions

      The utility of this differentiation process (e.g. in metabolic engineering or for the study of biofilm formation and antibiotic resistance) will become clearer once the FPs are substituted with functional proteins that exert an effect on the cells.

    3. Reviewer #2 (Public review):

      In this manuscript, the authors implement a three-step genetic programme in E. coli that converts an initially homogeneous population into spatially structured sender, receiver, and "matured" receiver colonies on agar without externally supplied positional information. They combine a TetR/LacI toggle switch for symmetry breaking, LuxI/LuxR quorum sensing for a paracrine signalling step, and CinI/CinR for an autocrine signalling-like maturation step, and complement the experiments with a mathematical model that qualitatively reproduces pattern formation over a range of initial conditions.

      While the article has many strengths such as a clear conceptual framing using Waddington landscapes, a modular and carefully optimised circuit design, thorough experimental characterisation of the toggle and quorum-sensing modules, integration of spatial modelling with experiments, and generally clear writing and figures, I think it will benefit the article to clarify the definition and stability of "differentiated" states, clarify several quantitative and modelling aspects, better explain how fitted curves and promoter engineering were done, and improve some figure design and wording to avoid ambiguity.

      Detailed comments below:

      (1) P5-8 / and more generally: A major concern is that producing a reporter output is not, by itself, differentiation. For a state to be credibly called "differentiated", it should be stable (self-maintained) over relevant timescales, ideally in the absence of the inducing context. As written, the manuscript sometimes seems to equate cell type with reporter expression. I strongly suggest adding a short subsection explicitly defining state versus output, and for each claimed state, stating whether it is stable/bistable or unstable/reversible, with evidence. Concretely, the authors should enumerate:<br /> a) Toggle-derived sender versus receiver: stable? under what conditions (inducer ranges, hysteresis window)?<br /> b) Paracrine-induced "red" receivers: is this a stable differentiated state, or a context-dependent induction requiring proximity to senders?<br /> c) "Mature" (yellow) state: does it persist after removal from the spatial signal field? If not, it should be described as an induced output programme rather than a mature lineage state.

      At present, later sections (and the "maturation" language) risk over-stating what is demonstrated.

      (2) Figure 2d: It is unclear whether this panel is intended to be qualitative (schematic/illustrative) or generated from quantitative data. The legend should explicitly state the origin (e.g., representative image, averaged data, simulation output, schematic) and, if quantitative, what was measured, how many replicates, and how the visualisation was constructed.

      (3) Figure 2e: The cross-sectional line is described as meant to be comparable, yet the leftmost plot appears to have a different slope from the others. The authors should explain whether this reflects a different scaling/normalisation, a different underlying dataset/condition, or simply a plotting artefact. If these are fitted trends, report the fit function (see also the comment on fitted lines below).

      (4) Around P7-8: (saddle/separatrix description): When describing the saddle or separatrix between the two valleys, it would be helpful to briefly connect this more directly to a quantitative dynamical-systems perspective: for instance, the intersection of nullclines and how nullcline geometry changes under IPTG/aTc induction. This will make the landscape picture more complete for readers familiar with the original genetic toggle switch work (Garder et al., 2000).

      (5) P9, lines 157-159: The current phrasing ("in absence of noise, the system would be fully deterministic... in living cells, however, stochastic bursts... change the trajectory") risks conflating predicting population-level percentages with predicting colony-level trajectories. It would help to clearly separate (i) the ability to predict the overall fraction of ON/OFF (green/blue) colonies from inducer conditions (which is largely deterministic at the population level) from (ii) the intrinsically stochastic choice of state made by any given founder cell and its colony.

      (6) P11, lines 193-195 (promoter engineering): The main text currently only refers to screening variants and choosing pLux76; I suggest briefly stating in the main text (not only in the supplement) what was changed (for example, promoter box variants, core promoter strength modifications) and what design criteria were used (reduced leakiness, increased dynamic range).

      (7) Use of fitted lines (Figures 2, 4, 5, 7): Wherever fitted curves are overlaid on data, the asuthors should indicate in the figure legend the explicit form of the fit as well as the fit equation/ parameters. As a reader, it is difficult to interpret what is empirical smoothing versus what is a mechanistic functional form.

      (8) P13, lines 232-235: The comparison between induction directly with C6-HSL and induction from sender colonies is qualitative ("significantly smaller range"). The authors should provide distances (for example, in mm) for the induction range in each case and, if possible, approximate total HSL amounts or concentrations, so that the reader can appreciate the magnitude of the difference.

      (9) P13, lines 259-262: The authors model the transition to the stationary phase via a monotonically decreasing sigmoid in time for biosynthetic capacity. What is the rationale or literature basis for this approach to model entry into the stationary phase? The authors should cite prior work and clarify why this form is appropriate here, versus alternatives (nutrient diffusion limitation, logistic growth with resource depletion, etc.).

      (10) Figure 6c: Are the areas of the plate shown in each column the same field of view across conditions/time, or are these simply representative regions selected per condition (possibly from different plates)? The caption/legend should clarify whether these are matched locations and how images were chosen.

      (11) Figure 7a: The combination of solid, dashed, and dash-dot arrows/lines is visually hard to read. I suggest replacing the dash-dot line with a fully dotted line or using different colours (if consistent with journal style) to improve readability.

      (12) Figure 7e and similar analyses: The authors should explain in the Methods and/or captions how "distance from sender colonies" is computed when multiple senders exist. Is the distance always measured to the nearest sender, and how are cases handled where a receiver is in the overlapping influence of several senders? This clarification is important for interpreting the fitted curves.

    4. Reviewer #3 (Public review):

      This manuscript presents an engineered 3-step circuit in E. coli that combines toggle-switch-based symmetry breaking with quorum-sensing interactions to generate colony-scale spatial patterns. The work is interesting as a synthetic circuit integration study and as a demonstration of self-organized patterning across physically separated colonies. The authors provided a compelling demonstration of the characterization/tuning of parts to guide the overall system engineering. A notable strength is the demonstration that a single circuit can generate a range of self-organized spatial patterns across separate colonies.

      However, I think the paper needs to tone down the extent to which the system demonstrates multi-step differentiation or morphogenesis, which is not critical for making the paper valuable. Only the first step of their circuit design (Figure 1), the toggle switch, generates stable alternative states. The latter steps are mainly signal-dependent reporter activation states layered on top of the blue receiver state, rather than true fate transitions. The authors explicitly state that red expression is added without replacing the blue identity, and they also acknowledge that red cells lose their identity upon restreaking unless they remain near sender cells. That substantially weakens the differentiation analogy and makes the Waddington framing too strong.

      A related concern is that the 3rd step does not introduce a new spatial organizing rule. The authors show that the second signal remains confined to cells already receiving the first signal, and explicitly conclude that it functions only as an autocrine cue rather than a second paracrine layer. As a result, the 3-step system seems more like an added local readout or maturation layer. Overall, the main 2-step outcome is sparse green sender colonies surrounded by red-expressing blue receivers, with distant receivers remaining blue. That is a valid engineered pattern, but it is still a local, threshold-response circuit architecture.

      The autonomy claim should be toned down and stated more precisely. The plate patterning occurs without externally imposed spatial gradients, which is a strength. However, by design, the overall system behavior depends strongly on pre-culture inducer conditions that set the sender:receiver ratio, and this externally imposed history is central to the final pattern. This property is tied to how the circuit is designed where steps 2 and 3 largely respond to symmetry breaking introduced in step 1, which is dependent on both history and initialization on the plate. In particular, currently the pattern formation process is quite variable (e.g. figure 5), depending on how different colonies flip the toggle switch, and consequently, how many become senders and how many become receivers. It would have been fascinating if they could also demonstrate the differentiation within individual colonies, leading to intra-colony patterns. This aspect should at least be discussed.

      The mathematical model is useful in guiding both the characterization of parts, modules and the overall system. However, the claims around its quantitative predictive power should also be made narrower. The simulations are built from multiple fitted and partly hand-tuned components, including toggle-switch response curves, colony-growth rules, diffusion, reporter-response functions, and activity decline. This supports a calibrated qualitative reconstruction of the observed patterns, but not a strong predictive or mechanistic validation.

      Other specific points:

      (1) Given the topic of the work, the authors should cite closely relevant studies in programming pattern formation, including:<br /> Cao et al, Cell 2016 Collective space-sensing coordinates pattern scaling in engineered bacteria<br /> Rajasekaran et al, Cell 2024 A programmable reaction-diffusion system for spatiotemporal cell signaling circuit design<br /> Lu et al, BioRxiv 2024 Discovery of interpretable patterning rules by integrating mechanistic modeling and deep learning

      (2) The model assumes identical diffusion coefficients for C6-HSL and C14-HSL despite their substantially different molecular sizes and hydrophobicities. This assumption could distort kinetic lag with differential diffusion in explaining the autocrine confinement of the third step. Its impact should at least be explored in the simulations.

      (3) The mCherry response parameters change significantly between the 2-step and 3-step systems. The authors acknowledged this change but did not provide a clear explanation.

      (4) The 3-step system is evaluated at only a single condition with no simulation comparison, in contrast to the systematic 11-condition validation of the 2-step system.

    1. eLife Assessment

      This important study provides detailed insights into the metabolic states of hemocyte populations across developmental stages and in both physiological and pathological contexts, including during immune challenge. The study provides convincing evidence by comparing the relative utilization of glycolysis and oxidative phosphorylation in Drosophila larval immune cells, and can have implications for metabolic programs that shape immune function in health and disease.

    2. Reviewer #1 (Public review):

      Summary:

      The metabolic profiles of immune cells under steady-state or immune-activated conditions remain poorly characterized. The authors find that embryonically derived hemocytes in Drosophila larvae predominantly utilize mitochondrial respiration to generate energy and exhibit minimal glycolysis rates under unchallenged conditions. Hemocytes developmentally elevate ATP production rates. Mitochondrial respiration drives metabolic activation in larval hemocytes. More specifically, lamellocytes exhibit unique metabolic activities, including enhanced trehalose catabolism and mitochondrial remodeling, required for their encapsulation response.

      Strengths:

      The study shows the metabolism that is most likely to operate in different immune cells in Drosophila during development and also during infection. This is related to mitochondrial organization and proliferation and/or differentiation state.

      Weaknesses:

      Even though there is a rigorous analysis of mitochondrial activity using the Sea Horse analyzer, the analysis of diverse mitochondrial activities in the different immune cell types across development and in infection could be carried out using microscopy. ROS, mitochondrial membrane potential, NADH/+ and FADH/+ levels in vivo are likely to give a more specific readout of change in cellular activities. The activities of mitochondrial fusion and fission need to be collectively tested to understand their role in development and also in infection. The relevance of the change in mitochondrial activity for development or immunity remains to be tested.

    3. Reviewer #2 (Public review):

      Summary:

      This study presents an analysis of the metabolism of Drosophila larval immune cells during development and activation. The authors compared the utilization of glycolysis and oxidative phosphorylation for energy metabolism. Although this topic has been widely discussed and well-studied in immune cell research, particularly in mammals, it has received little attention in insects. The authors demonstrated that quiescent and activated larval Drosophila immune cells predominantly use mitochondrial oxidative phosphorylation to produce energy. This finding is significant for the emerging field of insect immunometabolism research and is interesting in comparison to mammalian immunity, where immune cell activation is often associated with a shift toward greater reliance on glycolysis.

      Strengths:

      Using the Agilent Seahorse system, the authors developed and fine-tuned a method to measure the energy metabolism of Drosophila immune cells, obtaining high-quality, robust data. Through genetic manipulations targeting immune cells specifically, they analyzed metabolic changes in cells with different activations, going beyond developmental changes. They convincingly demonstrated ATP production, primarily in the mitochondria of immune cells, at various developmental stages and in various activated states. The results presented mostly support the conclusions drawn. This methodology and its results are valuable for further studies of insect immunometabolism. In a broader context, they are also valuable for comparing the metabolism of immune cells across different animal groups.

      Weaknesses:

      The genetic manipulations used were suitable for obtaining immune cells of various types and activation states, such as proliferation, differentiation, and immune activation. However, this method has limitations: the mixture of different cell types was always analyzed, and the specific type of interest was often a minority cell population. Had the other cells remained in their initial control state, the observed change in metabolism could have been primarily attributed to the desired cell type. However, the remaining cells that did not transform into the desired type were also usually influenced or activated in some way, making it difficult to determine to which group the observed change should be attributed. For example, consider the induction of lamellocyte differentiation using Hml>Hop[tum]. There are approximately 1,000 lamellocytes per larva, but according to Supplementary Figure 4, there are still about 5,000 Hml+ cells, and even these cells have activated Jak/Stat signaling. Therefore, it can be assumed that they are also activated. After a real infection, the proportion of lamellocytes is greater, but the remaining plasmatocytes are also activated. The authors should mention these limitations more clearly. However, as the authors correctly note, solving this problem will require single-cell approaches, which current technologies still limit. I see this as a problem when interpreting the proliferation effect. The crucial question is what percentage of the analyzed cells induced by Hml>Ras[V12] were actually in the division stage. Not all hemocytes are Hml+, so not all are induced. Of those that are induced, how many are in the division stage at the time of analysis? Meanwhile, those that were not dividing at that moment also had activated Ras, which triggers many processes besides division. Information on what percentage of the analyzed cells were dividing is missing. This information is important because the finding that dividing Drosophila immune cells primarily use mitochondria and oxidative phosphorylation to produce ATP contrasts with the debated significance of the Warburg effect in dividing mammalian cells. This finding would be significant, but unfortunately, it is not robustly supported by the presented data.

    4. Reviewer #3 (Public review):

      Summary :

      This study investigates the metabolic profiles of hemocytes across multiple stage/conditions and suggests that hemocytes act as regulators of metabolism rather than merely receivers of metabolic cues. The authors show that hemocytes rely primarily on mitochondrial respiration, which is further enhanced during proliferation in development or upon genetic manipulation of plasmatocytes, but not crystal cells.

      Metabolic respiration is also activated in lamellocytes, and this activation correlates with changes in mitochondrial morphology. The authors further attempt to identify mechanisms underlying this activation, proposing that mitochondrial fission may contribute to the ability of lamellocytes to encapsulate wasp eggs.

      Strengths:

      This work provides detailed and valuable insights into the metabolic phenotypes of hemocyte populations at different developmental stages and under both physiological and pathological conditions. The authors perform a longitudinal assessment of hemocyte metabolism and compare metabolic states across contexts.

      Importantly, they provide evidence that hemocytes regulate metabolism to perform essential immunological functions, such as wasp egg encapsulation. This reinforces the view that hemocytes are key regulators and communicators that adapt their metabolic programs according to developmental and environmental demands.

      Weaknesses:

      The results presented are insightful, although several controls and validations could strengthen the conclusions. It would be preferable to also include responder transgenes alone as a control for leakiness, and the scRNA-seq findings would benefit from in vivo validation.

      Some conclusions appear inconsistent or insufficiently supported. For instance, although mitochondrial respiration in plasmatocytes peaks at 96 h AEL, this increase is not accompanied by detectable mitochondrial rearrangement, which remains constant between 96 h AEL and 120 h AEL.

      In general, the authors should temper some statements or provide further data.

    1. eLife Assessment

      This manuscript makes a valuable contribution to the concept of fragility of meta-analyses via the so-called 'ellipse of insignificance for meta-analyses' (EOIMETA). The strength of evidence is convincing, supported primarily by an example of the fragility of meta-analyses in the association between Vitamin D supplementation and cancer mortality, but the approach could be applied in other meta-analytic contexts. The significance of the work could be enhanced with a more thorough assessment of the impact of between-study heterogeneity, additional case studies, and improved contextualization of the proposed approach in relation to other methods.

    2. Reviewer #1 (Public review):

      [Editors' note: this version has been assessed by the Reviewing Editor without further input from the original reviewers. The authors have addressed the comments raised in the previous round of review.]

      Summary:

      This manuscript addresses an important methodological issue-the fragility of meta-analytic findings-by extending fragility concepts beyond trial-level analysis. The proposed EOIMETA framework provides a generalizable and analytically tractable approach that complements existing methods such as the traditional Fragility Index and Atal et al.'s algorithm. The findings are significant in showing that even large meta-analyses can be highly fragile, with results overturned by very small numbers of event recodings or additions. The evidence is clearly presented, supported by applications to vitamin D supplementation trials, and contributes meaningfully to ongoing debates about the robustness of meta-analytic evidence. Overall, the strength of evidence is moderate to strong.

      Strengths:

      (1) The manuscript tackles a highly relevant methodological question on the robustness of meta-analytic evidence.

      (2) EOIMETA represents an innovative extension of fragility concepts from single trials to meta-analyses.

      (3) The applications are clearly presented and highlight the potential importance of fragility considerations for evidence synthesis.

    3. Reviewer #3 (Public review):

      Summary and strengths:

      In this manuscript, Grimes presents an extension of Ellipse of Insignificant (EOI) and Region of Attainable Redaction (ROAR) metrics to meta-analysis setting as metrics for fragility and robustness evaluation of meta-analysis. The author applies these metrics to three meta-analyses of Vitamin D and cancer mortality, finding substantial fragility in their conclusions. Overall, I think extension/adaption is a conceptually valuable addition to meta-analysis evaluation, and the manuscript is generally well-written.

    4. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This manuscript addresses an important methodological issue-the fragility of meta-analytic findings-by extending fragility concepts beyond trial-level analysis. The proposed EOIMETA framework provides a generalizable and analytically tractable approach that complements existing methods such as the traditional Fragility Index and Atal et al.'s algorithm. The findings are significant in showing that even large meta-analyses can be highly fragile, with results overturned by very small numbers of event recodings or additions. The evidence is clearly presented, supported by applications to vitamin D supplementation trials, and contributes meaningfully to ongoing debates about the robustness of meta-analytic evidence. Overall, the strength of evidence is moderate to strong.

      Strengths:

      (1) The manuscript tackles a highly relevant methodological question on the robustness of meta-analytic evidence.

      (2) EOIMETA represents an innovative extension of fragility concepts from single trials to meta-analyses.

      (3) The applications are clearly presented and highlight the potential importance of fragility considerations for evidence synthesis.

      Reviewer #3 (Public review):

      (1) The manuscript would benefit from a clearer explanation of in what sense EOIMETA is generalizable. The author mentions this several times, but without a clear explanation of what they mean here.

      This is a point I was remiss not to better elucidate. With regards to generalisation, the text has been modified to explicitly state that generalisability in this context means no specific study dependence, just a net number of subjects required to flip a result. The text reads:

      “Atal's method is highly useful, but one possible objection is that it has the downside of non-generalisability, as it finds very specific combinations of trials and patients that would have to be re-coded (events classified as non-events and vice-versa) for results to become insignificant. For example, an Atal meta-analytic fragility of 4 pertains to a specific and often unique circumstance when 4 patients could be recoded from a specific study or combinations thereof to change outputs, but this does not generalise to any 4 patients in that meta-analysis. This makes this definition of meta-analytic fragility useful but not general, and perhaps less intuitive to interpret than a typical RCT fragility metric. In this work, we establish a generalizable meta-analytic fragility metric, based upon Ellipse of Insignificance (EOI) analysis for dichotomous outcome trials. This method creates a pool of events and non-events in both arms, adjusted for weighing, and answers the general question of how many patients would have to be effectively recoded in a meta-analysis for results to flip, without requiring specific study identification.”

      (2) The authors mentioned the proposed tools assume low between-study heterogeneity. Could the author illustrate mathematically in the paper how the between-study heterogeneity would influence the proposed measures? Moreover, the between-study heterogeneity is high in Zhang et al's 2022 study. It would be a good place to comment on the influence of such high heterogeneity on the results, and specifying a practical heterogeneity cutoff would better guide future users.

      This is a very fair observation, and I need to better explain myself here! So there are effectively two measures of heterogeneity considered in this work; the typical value from a meta-analysis and the measure of divergence between the crude and the inverse-variance weighed adjusted – when these differ my small amounts, one could conceivably use either measure. I’ve changed the text to better reflect this, including:

      “This modification in akin to pooled in a meta-analysis, and adjusts for study level heterogeneity. After this modification, a standard EOI analysis can then be applied to the vector . In addition, we can also employ ROAR analysis to the same vector, yielding the raw number of patients in either or both arm who could be added a given direction to change the result, and exact combination of control and experimental group redactions required to change the result from a significant finding to a null one. Caveats for implementation and interpretation are outlined in the discussion section.”

      (3) I think clarifying the concepts of "small effect", "fragile result", and "unreliable result" would be helpful for preventing misinterpretation by future users. I am concerned that the audience may be confusing these concepts. A small effect may be related to a fragile meta-analysis result. A fragile meta-analysis doesn't necessarily mean wrong/untrustworthy results. A fragile but precise estimate can still reflect a true effect, but whether that size of true effect is clinically meaningful is another question. Clarifying the effect magnitude, fragility, and reliability in the discussion would be helpful.

      This is an excellent suggestion – I’ve tried to do it with percentages, as in table 2, but these are minute in the case of the vitamin D trials, partially I suspect because they are extraordinarily weak. The Cohen’s H for these meta-analyses yields tiny values, which I think might be tied to the virtually negligible percentages we obtain for number needed to flip. With stronger data, it might be worth expanding this into a useful heuristic measure for robustness, though I don’t think vitamin D data as in this work is going to help us much. In light of the reviewer’s excellent comment, I added the following:

      In light of the reviewer’s excellent comment, I added lines 230-240 in the revised manuscript.

      (4) Comments on revisions:

      I am unable to find the author's responses to my previous round comments (Reviewer #3) in the revision package, though replies to the other reviewers are present. I will provide my updated feedback once these responses are available for review.

      My sincere apologies, I neglected the specific comments in error – this document should address them now, thank you again for giving this your time and consideration!

    1. eLife Assessment

      This is a valuable study presenting convincing data indicating that the bacterial GTPases EngA and ObgE enable single-step reconstitution of functional 50S ribosomal subunits under near-physiological conditions. The study elegantly bridges the gap between the non-physiological aspects of the previous two-step reconstitution method and the extract-dependent iSAT system to enable assembly of highly functional ribosomes under translation-compatible conditions. The reported findings represent substantial progress towards achieving a bottom-up reconstruction of the translation machinery from synthetic parts.

    2. Reviewer #1 (Public review):

      [Editors' note: this version has been assessed by the Reviewing Editor without further input from the original reviewers. The authors have provided new data and text that addresses all of the reviewers' comments on the previous versions in a wholly satisfactory way.]

      Summary:

      This study presents evidence that addition of the two GTPases EngA and ObgE to reactions comprised of rRNAs and total ribosomal proteins purified from native bacterial ribosomes can bypass the requirements for non-physiological temperature shifts and Mg+2 ion concentrations for in vitro reconstitution of functional E. coli ribosomes.

      Strengths:

      This advance allows ribosome reconstitution in a fully reconstituted protein synthesis system containing individually purified recombinant translation factors, with the reconstituted ribosomes substituting for native purified ribosomes to support protein synthesis. This represents a significant development in the long-term effort to produce synthetic cells.

    3. Reviewer #2 (Public review):

      This study has developed a single-step method to assemble active bacterial ribosomes under near-physiological conditions by using the GTPase factors EngA and ObgE. These factors eliminate the need for the traditional, harsh manipulations of temperature and magnesium levels. This integration is an important step toward the bottom-up construction of synthetic cells.

    4. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This study presents evidence that addition of the two GTPases EngA and ObgE to reactions comprised of rRNAs and total ribosomal proteins purified from native bacterial ribosomes can bypass the requirements for non-physiological temperature shifts and Mg+2 ion concentrations for in vitro reconstitution of functional E. coli ribosomes.

      Strengths:

      This advance allows ribosome reconstitution in a fully reconstituted protein synthesis system containing individually purified recombinant translation factors, with the reconstituted ribosomes substituting for native purified ribosomes to support protein synthesis. This represents a significant development in the long-term effort to produce synthetic cells.

      Weaknesses:

      The authors carried out additional experiments indicating that ~60% of the reconstituted ribosomes are functional and that a significant proportion are capable of synthesizing GFP from the correct initiation codon to the correct stop codon, and also of producing an enzymatically active protein at appreciable levels. Their SDS-PAGE and MS analyses of N-terminally tagged GFP are also quite useful but did not assess the frequency of initiation at the wrong start codon, termination at the incorrect stop codon, or the frequency of frameshifting during elongation. This would require examining additional reporters designed to examine dependence on a Shine-Dalgarno sequence or the impact of an in-frame stop codon to assess the fidelity of initiation and termination events, respectively, and one with a programmed frameshift site to assess the elongation fidelity of their reconstituted ribosomes.

      In response to the reviewer’s comment, we expanded the MS analysis and performed additional analyses against amino acid sequences corresponding to all three reading frames (updated Supplementary Data 2). As a result, only a single peptide fragment likely derived from the +1 frame was detected, but its intensity was approximately 1/1000 of that of peptide fragments detected from the normal frame. No other out-of-frame peptides were detected, and no evidence of stop-codon readthrough was found. We consider that these results suggest that the kind of deterioration in ribosome function is not occurring in the reconstituted ribosomes. Because this analysis cannot completely rule out abnormal translation events such as initiation from internal start codons or termination at internal stop codons, we also added a statement acknowledging that further analyses will be required to examine all aspects of the translation reaction.

      Reconstitution studies in the past have succeeded by using all recombinant, individually purified RPs that, if successful here, would have eliminated the possibility that one or more unknown ribosome assembly factors that co-purify with native ribosomes was added to their reconstitution reactions.

      The issue raised by the reviewer was already added at the end of the Discussion in the previous revision. We fully agree with the reviewer’s point and we are currently continuing research in our laboratory aimed at achieving a more fundamental understanding of ribosome assembly.

      Reviewer #2 (Public review):

      This study has developed a single-step method to assemble active bacterial ribosomes under near-physiological conditions by using the GTPase factors EngA and ObgE. These factors eliminate the need for the traditional, harsh manipulations of temperature and magnesium levels. This integration is an important step toward the bottom-up construction of synthetic cells.

      Comments on revisions:

      The authors have addressed my concerns in the previous round of review.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      The authors are urged to acknowledge that more sophisticated reporter assays would be required to compare the frequencies of errors occurring at each step of translation using their reconstituted versus native ribosomes.

      As described in our response to Reviewer #1, we performed additional MS analyses, updated Supplementary Data 2, and added a statement acknowledging the reviewer’s comment.

    1. eLife Assessment

      This study investigates how the HIV inhibitor lenacapavir influences capsid mechanics and interactions with the nuclear pore complex. It provides important insights into how drug-induced hyperstabilization of the viral shell can compromise its structural integrity during nuclear entry. The modeling is technically sophisticated, and the analyses provide convincing support for the mechanistic conclusions.

    2. Reviewer #1 (Public review):

      [Editors' note: this version has been assessed by the Reviewing Editor without further input from the original reviewers. The authors have addressed the comments raised in the previous round of review.]

      The paper from Hudait and Voth details a number of coarse-grained simulations as well as some experiments focused on the stability of HIV capsids in the presence of the drug lenacapavir. The authors find that LEN hyperstabilizes the capsid, making it fragile and prone to breaking inside the nuclear pore complex.

      Comments on previous round of revisions:

      I found that the authors addressed my concerns satisfactorily. The other reviewer raised a number of important points regarding the nuances of the model and the interpretation of the simulations, which the authors rebutted. I think the paper in its current form now is a worthwhile addition to the literature.

    3. Reviewer #3 (Public review):

      This is a technically sophisticated study that integrates coarse-grained modeling with live-cell imaging to address an important and timely question regarding HIV-1 capsid inhibition by lenacapavir.

      In summary, in my view, the manuscript represents a solid contribution to the field.

    4. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      The paper from Hudait and Voth details a number of coarse-grained simulations as well as some experiments focused on the stability of HIV capsids in the presence of the drug lenacapavir. The authors find that LEN hyperstabilizes the capsid, making it fragile and prone to breaking inside the nuclear pore complex.

      I found the paper interesting. I have a few suggestions for clarification and/or improvement.

      (1) How directly comparable are the NPC-capsid and capsid-only simulations? A major result rests on the conclusion that the kinetics of rupture are faster inside the NPC, but are the numbers of LENs bound identical? Is the time really comparable, given that the simulations have different starting points? I'm not really doubting the result, but I think it could be made more rigorous/quantitative.

      (2) Related to the above, it is stated on page 12 that, based on the estimated free-energy barrier, pentamer dissociation should occur in ~10 us of CG time. But certainly, the simulations cover at least this length of time?

      (3) At first, I was surprised that even in a CG simulation, LEN would spontaneously bind to the correct site. But if I read the SI correctly, LEN was parameterized specifically to bind to hexamers and not pentamers. This is fine, but I think it's worth describing in the main text.

      Comments on revisions:

      I found that the authors addressed my concerns satisfactorily. The other reviewer raised a number of important points regarding the nuances of the model and the interpretation of the simulations, which the authors rebutted. I think the paper in its current form now is a worthwhile addition to the literature.

      Reviewer #3 (Public review):

      I have carefully reviewed the manuscript, the two referee reports, and the authors' detailed responses. I appreciate the substantial effort the authors have invested in addressing the reviewers' comments, and I also recognize the strength and ambition of the work. This is a technically sophisticated study that integrates coarse-grained modeling with live-cell imaging to address an important and timely question regarding HIV-1 capsid inhibition by lenacapavir.

      Embedded within Reviewer #2's report are several substantive points that warrant careful consideration, particularly with respect to framing, terminology, and engagement with the broader literature. I view my role here is to distinguish those issues from claims that I do not find to be supported.

      We thank Reviewer 3 for the positive assessment of our work.

      First, I do not agree with Reviewer #2's central assertion that the manuscript lacks novelty or fails to present meaningful new findings. While individual elements of the system studied herecapsid docking at the NPC, lenacapavir-induced capsid hyperstabilization, capsid rupture, and competition with FG- nucleoporins-have been observed previously, this work provides a coherent, mechanistic account of how these elements are coupled. In particular, the proposed sequence linking LEN-induced lattice hyperstabilization, preferential pentamer loss at the narrow end, NPC-induced mechanical stress, and failure of nuclear import represents a nontrivial integration that goes beyond prior phenomenological observations. I therefore do not view this work as redundant with existing literature.

      We thank Reviewer 3 for the positive assessment of our work.

      That said, Reviewer #2 is correct to note that the manuscript would benefit from broader and more explicit engagement with recent independent studies, including computational and hybrid modeling efforts that address capsid mechanics, nuclear entry, and LEN effects using different frameworks. While the authors' bottom-up coarse-grained approach is clearly distinct and, in many respects, more systematically derived, eLife readers would benefit from a clearer discussion of how the present results relate to, complement, or differ from these other approaches. I strongly encourage the authors to add a short discussion paragraph situating their work within this broader context, without disparaging alternative models.

      We have now added several sentences describing papers that use two other CG models that are of some relevance to our work at the beginning of the fourth paragraph of the Introduction, and we have also highlighted the distinguishing features of our work at the end of that paragraph.

      Second, I find that some mechanistic claims in the manuscript would benefit from more careful language distinguishing model-conditioned interpretation from de novo prediction. This applies in particular to discussions of LEN binding heterogeneity and stoichiometry, as well as to conclusions drawn from biased enhanced-sampling simulations. While I agree with the authors that parameterization does not invalidate mechanistic insight, it is important to be precise about what aspects of the behavior emerge from the simulations versus what is constrained by prior experimental knowledge. Modest tightening/revising of language (e.g., "suggests," "is consistent with," "within the model") would address this concern without weakening the scientific conclusions.

      We have revised and softened the language in several places as suggested. However, we do still asert that our overall CG modeling approach is quite rigorous. The use of limited “top down” information on LEN binding is not problematic and in fact warranted in this problem.

      Third, Reviewer #2 raises a legitimate semantic issue regarding the use of the term "elasticity." The manuscript infers changes in capsid mechanical response using heterogeneous elastic network models, which quantify effective stiffness and deformability rather than elasticity in the macroscopic materials sense. I recommend that the authors clarify this definition explicitly in the text to avoid confusion and unnecessary debate.

      We have now added a clarification at the end of the third paragraph of the subsection entitled “LEN binding to the capsid results in hyperstabilized lattice domains”. We have also added text in the second paragraph of the Discussion. Our view is that our perspective is more useful for this problem than a “macroscopic” perspective as the capsid is, in fact, a mesoscopic object and not a macroscopic one.

      Finally, I note that several of Reviewer #2's objections-particularly those asserting circular reasoning, misuse of enhanced sampling methods, or invalidity of coarse-grained predictions reflect a misunderstanding of contemporary bottom-up coarse-grained modeling rather than genuine methodological flaws. I do not believe these points require further rebuttal or revision beyond what the authors have already provided.

      We agree.

      In summary, in my view, the manuscript represents a solid contribution to the field, provided that the authors undertake a limited set of targeted revisions aimed at improving framing, clarity, and engagement with the broader literature. Addressing these points will strengthen the manuscript and ensure that its contributions are clearly and fairly communicated to the community.

      We have done so as suggested by the reviewer.

    1. eLife Assessment

      This valuable study examines the cleavage of motor neuron nucleoporins by proteases of enterovirus D68, a pathogen associated with acute flaccid myelitis. The evidence supporting the effects of EV-D68 proteases on nuclear import and export is generally solid, as is the independent examination of EV-D68 protease on spinal cord neuron toxicity. The specific conclusions related to RNA export were considered overstated relative to the data presented.

    2. Reviewer #1 (Public review):

      Summary:

      Zinn and colleagues investigated the role of proteases 2A and 3C of enterovirus D68 (EVD68), an emerging pathogen associated with outbreaks of acute flaccid myelitis (AFM), a polio-like disease, on the nucleocytoplasmic trafficking in different systems, including human neurons derived from pluripotent cells. They found that 2A specifically cleaved Nup98 and POM121. Using reporter proteins and RNA synthesis and trafficking assays in cells expressing viral proteases, they showed that 2A induces broad loss of the nuclear pore barrier function, but, surprisingly, the RNA export appears to be minimally affected. Since nucleocytoplasmic trafficking defects are known to be associated with neuropatologies, they propose a hypothesis that 2A-dependent cleavage of nucleoporins in motoneurons underlies the development of EVD68-induced AFM. They further show that a 2A-specific inhibitor increases the survival of human neurons differentiated from stem cells upon EVD68 infection.

      Strengths:

      Use of multiple methods to investigate the effect of 2A and 3C expression on nucleoporin cleavage and nucleocytoplasmic trafficking.

      Comments on revisions:

      The following issues remain unresolved:

      First, the authors still do not show representative images confirming specific nucleoporin degradation (Fig.1), which is the main focus of the work.

      Second, the conclusion that 2A-mediated degradation of the nucleo-cytoplasmic barrier does not affect export of the RNA from the nucleus is not supported by the presented data. The representative images shown in Fig 3C do not have the signal for GFP (like in Fig. 2), and therefore it is impossible to see if those cells indeed express EVD68 proteases.

      Moreover, to show RNA export, not only the decrease of nuclear EU signal should be quantified, but also the increase of the cytoplasmic signal. The diminishing of the nuclear staining may not necessarily reflect RNA export, but may well be explained by nuclease activity, all the more relevant in cells expressing 2A, where the nuclear-cytoplasmic barrier is disrupted and cytoplasmic nucleases may enter the nucleus.

      The same applies to images in Fig. 3D. There are no markers of infection; moreover, the experiment description indicates that EU labeling began at 24 h post-infection with an MOI of 5, i.e., essentially all cells should have been infected. This is difficult to believe as the replication cycle of most EVD68 strains in HeLa cells is no longer than 12 h, yet the images do not show any signs of CPE, and demonstrate a strong EU signal, inconsistent with the expected inhibition of nuclear transcription, a known attribute of enterovirus infections.

      The claim that nuclear transcription and RNA export remain unaffected in conditions of 2A-mediated disruption of the nucleo-cytoplasmic barrier is very strong and requires equally strong evidence.

    3. Reviewer #2 (Public review):

      Summary:

      This manuscript investigates the role of EV-D68 proteases 2A and 3C in nuclear pore complex (NPC) dysfunction and their contribution to motor neuron toxicity. The authors demonstrate that both proteases cleave only a limited number of nucleoporins, with 2A^pro showing the strongest impact by inhibiting nuclear import and export of proteins and disrupting NPC permeability without affecting RNA export. Importantly, treatment with the 2A^pro inhibitor telaprevir reduced neuronal cell death in a dose-dependent manner, achieving neuroprotection at concentrations below those required to inhibit viral replication. The study addresses a relevant mechanism underlying EV-D68-induced neuropathology and explores a potential therapeutic intervention.

    4. Reviewer #3 (Public review):

      Summary:

      The author showed expression of the viral proteases 2Apro and 3Cpro of EV-D68, which cleaved specific components of the nuclear pore complex (Nup98 and POM121 by 2Apro), and 2A but not 3C expression altered nuclear import and export. Similar nucleocytoplasmic transport deficits are observed in EV-D68-infected RD cells and iPSC-derived motor neurons (diMNs). 2A inhibitor telaprevir partially rescued the nucleocytoplasmic transport deficits and suppressed neuronal cell death after infection. While it's clear that 2A can cleave NPC proteins and affect nuclear transport, the link to neurotoxicity after EV-D68 infection is less convincing.

      This study opens up a very intriguing hypothesis: that EV-D68 2Apro could be directly responsible for motor neuron cell death, mediated by POM121 and possibly Nup98 cleavage, that ultimately results in paralysis known as acute flaccid myelitis. This hypothesis notably does run counter to other published data showing that human neuronal organoids derived from iPSCs can support productive EV-D68 infection for weeks without cell death and that EV-D68-infected mice can have paralysis prevented by depletion of CD8 T cells, still with EV-D68 infection of the spinal cord. However, even if 2Apro is not ultimately responsible for motor neurons dying in human infections, that does not exclude the possibility that cleavage of nups could still disrupt motor neuron function. Notably, most children with AFM have some amount of motor function return after their acute period of paralysis, but most still have some residual paralysis for years to life. It is possible that 2A pro could mediate the acute onset of weakness, while T cells killing neurons could determine the amount of long-term, residual paralysis.

      Strengths:

      The characterization of nuclear pore complex components that appear to be targets of both poliovirus and EV-D68 proteases is quite thorough and expansive, so this data set alone will be useful for reference to the field. And the process by which the authors narrowed their focus to EV-D68 2Apro reducing Nup98 and POM121 as consequential to both import and export of nuclear cargo but not RNA was technically impressive, thorough, and convincing. As will be detailed below, when the authors move from studying over-expressed proteases in transformed cell lines to studying actual virus infection in both transformed cell lines and iPSC-derived neurons, some of the data only indirectly support their conclusions; however, the quality of the experiments performed is still high. So even if the claim that 2Apro causes neurotoxicity is circumstantial, the data certainly are intriguing and certainly justify further study of the effects of EV-D68 2Apro on the NPC and how this impacts pathogenesis. This is a convincing start to an intriguing line of inquiry.

      Comments on revisions:

      The authors have returned a stronger revised manuscript, being responsive to most of the combined reviewers' comments. It was especially important to add the clarity and specificity that the data in this manuscript did not establish a direct link for 2Apro causing AFM. The authors have clarified this language adequately, such that it is appropriate to remove the "incomplete" portion of the short assessment as they have requested. Adding in experiments with EV-D68 virus infection to complement their work with recombinant proteases also strengthened their conclusions.

      There are still some areas where discrepancies remain, although these are minor and can mostly be acknowledged as limitations of their approach rather than needing more experiments, unless the authors choose to do the additional experiments. To try to make this understandable, I have copied from the rebuttal letter (*) original comment, (**) author's rebuttal, and (***) a reply to the rebuttal:

      (*)(2) Telaprevir was able to rescue nucleocytoplasmic transport in RD cells at low concentrations (Figure 4A). It is not shown if this correlates with its antiviral effect in RD cells, or could this correlate with inhibition of 2A cleavage of Nup98 or POM121, which is never measured.

      (**) In the aforementioned new experiment in Figure 4A, we have also included a dose-response curve for telaprevir showing its inhibition of POM121 and Nup98 cleavage.

      (***) Fig.4A is in diMN not RD cells. The EC50 of telaprevir could be very different in RD cell vs diMNs. This question remains unanswered.

      (*) (3) Building off of the prior point, the authors' claim that the neuroprotective effect of telaprevir is independent of its antiviral effect is not well-founded. Figure 4E (neuroprotection) was done with MOI 5, and Figure 4G (virus growth) was MOI 0.5. Telaprevir neuroprotection is not shown at MOI 0.5, nor is the neuroprotective effect correlated with inhibition of 2A cleavage of Nup98 or POM121.

      (**) The selection of MOIs for these two experiments was limited by technical considerations. If the viral growth curve were to be performed at MOI 5, it would be confounded by cell death. Further, a low MOI is required in order to allow multiple rounds of infection, and is therefore more sensitive for assaying the effect of telaprevir on viral replication. On the other hand, at MOI 0.5 diMN death is very gradual, and the neuroprotection assay we would have lacked the statistical power to determine whether a rescue of this small magnitude of toxicity is significant. The EC50 of telaprevir is not expected to vary at different MOIs.

      (***) This should be discussed in the Discussion as a limitation of the experiment.

      (**) We have also now correlated the inhibition of 2Apro cleavage of Nup98 and POM121 with the neuroprotective effect at comparable concentrations of telaprevir, as described above.

      (***) Unless you quantify this, my eye disagrees with you. In Fig.4A, cleavage of NUP98 is rescued by 3uM telaprevir, but that does not seem to be the case for POM121.

      Additionally, in Fig. 4D, why is only NLS but not NES is impaired in diMN? This should be discussed.

    5. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      This valuable study examines the cleavage of motor neuron nucleoporins by proteases 2A and 3C of enterovirus D68, a pathogen associated with acute flaccid myelitis. The evidence supporting the effects of EV-D68 proteases on nuclear import and export is solid and confirms previous results on the specific targeting of nucleoporins by proteases from other enteroviruses. However, the claim that cleavage of nucleoporins by EV-D68 2A is neurotoxic, though intriguing, is incomplete, as the evidence is largely indirect.

      We appreciate that the reviewers highlighted multiple strengths of manuscript, including its detailed mechanistic dissection of the disrupted composition and function of the nuclear pore complex during EV-D68 infection, the finding that the viral 2A protease is toxic to motor neurons, and that several novel hypotheses on the pathogenesis of acute flaccid myelitis that are raised by our work.

      It appears that two independent eLife Assessments were made regarding the strength of evidence in our manuscript. The evidence supporting the impact of EV-D68 proteases on the NPC was felt to be solid.

      A second assessment was made as to whether our data support that “the cleavage of nucleoporins by EV-D68 2A is neurotoxic”. We would like to clarify that we did not intend to make this second claim in our manuscript and thought that we had been careful not to do so. In response to reviewer and editorial feedback, we have edited the text to improve the clarity on this issue. Although our data show that 2A<sup>pro</sup> is toxic to motor neurons, it cannot yet be determined whether this toxicity is mediated via 2A<sup>pro</sup>’s effects on the NPC. That is a logical hypothesis that arises from our manuscript, which we are testing through ongoing work that will require a significant volume of experiments that are outside the scope of the present study. We view this manuscript as an important first step towards a comprehensive understanding of the role of the 2A protease in the pathogenesis of AFM. Please see the response to point # 3 of Reviewer 2 below for a more detailed discussion of this issue and the changes we have made to the text in response. We respectfully request that a judgement on the role of nucleoporin cleavage as the mechanism of neurotoxicity not be included in the eLife Assessment.

      Also in response to reviewer feedback that our data was too reliant on the expression of recombinant viral proteins in isolation, we have added additional experiments extending our results into the context of live virus infection of cell lines and motor neurons. We feel that our revised manuscript has been improved as a result of the reviewers’ and editor’s input, and provides strong support for the following claims: (1) NPC composition and function is disrupted during EV-D68 infection, (2) 2A<sup>pro</sup> is primarily responsible for functional disruption, and (3) 2A<sup>pro</sup> is neurotoxic.

      We appreciate your review of this revised manuscript. Detailed responses to each of the reviewers’ comments are provided below.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Zinn and colleagues investigated the role of proteases 2A and 3C of enterovirus D68 (EVD68), an emerging pathogen associated with outbreaks of acute flaccid myelitis (AFM), a polio-like disease, on the nucleocytoplasmic trafficking in different systems, including human neurons derived from pluripotent cells. They found that 2A specifically cleaved Nup98 and POM121. Using reporter proteins and RNA synthesis and trafficking assays in cells expressing viral proteases, they showed that 2A induces broad loss of the nuclear pore barrier function, but, surprisingly, the RNA export appears to be minimally affected. Since nucleocytoplasmic trafficking defects are known to be associated with neuropatologies, they propose a hypothesis that 2A-dependent cleavage of nucleoporins in motoneurons underlies the development of EVD68-induced AFM. They further show that a 2A-specific inhibitor increases the survival of human neurons differentiated from stem cells upon EVD68 infection.

      Strengths:

      Use of multiple methods to investigate the effect of 2A and 3C expression on nucleoporin cleavage and nucleocytoplasmic trafficking.

      We thank the reviewer for detailed and accurate review of our manuscript and recognition of these strengths.

      Weaknesses:

      Overall, the paper follows multiple others that extensively investigated the cleavage of nucleoporins by enterovirus 2As, so the results are of limited novelty. The hypothesis that infection of motoneurons is the cause of EVD68-induced neurological complications so far is supported by only one autopsy report. Other data suggest that infection of other cell types, such as astrocytes, and/or inflammatory cell infiltration in the CNS, are likely to be responsible for the symptoms. In any case, the claim that EVD68 is specifically neurotoxic because of the 2A-dependent cleavage of nucleoporins in neurons is unfounded, as the virus will be just as "toxic" for other infected cell types.

      While we agree that other papers have investigated this pathway in other enteroviruses, we note that our work is the first to do so in Enterovirus D68 and the most comprehensive study, in terms of the number of nucleoporins studied. As we reviewed in paragraph 5 of the introduction section, the activities of enterovirus proteases against specific nucleoporins varies from strain to strain, and is important to understand any strain-specific effects before determining whether this pathway is relevant to toxicity in AFM.

      The infection of motor neurons is strongly supported not only by the aforementioned autopsy data [1], but also by mouse model data demonstrating replication of EV-D68 within motor neurons in the anterior horn of the spinal cord.[2] There are also numerous reports of electromyography and nerve conduction studies from human AFM patients demonstrating that the site of pathology is the spinal motor neuron.[3-10]

      By contrast, infection of astrocytes has been demonstrated only in primary murine astrocyte cultures in which no neurons were present [11]. Therefore, while the available data suggest that EV-D68 infection of astrocytes is possible, in the in vivo context of human and mouse spinal cord, tropism to motor neurons appears to be preferential. The relative toxicity of neuron-autonomous vs non-autonomous processes such as glial dysfunction and inflammatory cell infiltration remain to be elucidated, and are not mutually exclusive.

      The paper also requires a more convincing presentation of the data.

      We are uncertain what other specific changes the reviewer would like to see based on this comment, but feel that the revisions have improved the presentation of the data.

      Reviewer #2 (Public review):

      Summary:

      This manuscript investigates the role of EV-D68 proteases 2A and 3C in nuclear pore complex (NPC) dysfunction and their contribution to motor neuron toxicity. The authors demonstrate that both proteases cleave only a limited number of nucleoporins, with 2A^pro showing the strongest impact by inhibiting nuclear import and export of proteins and disrupting NPC permeability without affecting RNA export. Importantly, treatment with the 2A^pro inhibitor telaprevir reduced neuronal cell death in a dose-dependent manner, achieving neuroprotection at concentrations below those required to inhibit viral replication. The study addresses a relevant mechanism underlying EV-D68-induced neuropathology and explores a potential therapeutic intervention.

      Strengths:

      (1) Provides significant mechanistic insight into how EV-D68 proteases alter NPC function and contribute to neuronal toxicity.

      (2) The use of recombinant 2A and 3C proteins allows clear dissection of the specific contribution of each protease.

      (3) Demonstrates a therapeutic effect of telaprevir, with neuroprotection independent of viral replication inhibition, adding translational value to the findings.

      (4) The topic is highly relevant given the association of EV-D68 with acute flaccid myelitis.

      We thank the reviewer for their insightful comments and recognition of these strengths in our study.

      Weaknesses:

      (1) Most experiments were performed with recombinant proteases, lacking validation in the context of viral infection, where both proteases act simultaneously.

      In response to this concern, we have added additional experiments in the context of viral infection. We show that POM121 and Nup98 are also cleaved in motor neurons infected with EV-D68 and that their cleavage is inhibited by telaprevir (Fig 4A). We also repeated the EU pulse-chase RNA export assay in EV-D68-infected RD cells and again found no effect on RNA export (Fig 3D-E).

      (2) The conclusion that RNA export is unaffected requires confirmation during actual infection.

      As above, we have repeated this experiment in EV-D68 RD cells, showing no effect of EV-D68 infection on RNA export.

      (3) The reduction of neurotoxicity by telaprevir does not fully demonstrate that the protective effect is solely mediated through NPC preservation; additional analyses of eIF4G cleavage, nucleoporin integrity, and stress granules are needed.

      We agree that while the evidence in our manuscript raises the hypothesis that telaprevir-mediated neuroprotection is mediated via NPC preservation, it does not fully demonstrate this to be the case. As discussed above, we have been careful to state only the following conclusions: (1) NPC composition and function is disrupted during EV-D68 infection, (2) 2A<sup>pro</sup> is primarily responsible for functional disruption, and (3) 2A<sup>pro</sup> is neurotoxic.

      Future work will determine the extent to which NPC dysfunction contributes to 2A<sup>pro</sup>-mediated motor neuron toxicity versus other potential targets of 2A<sup>pro</sup>, as suggested by the reviewer. This work is already underway in our lab and it is clear to us the additional experiments required will be extensive, likely 1-2 additional manuscripts. These experiments are therefore beyond the scope of the present study, which represents a key first step in this line of inquiry.

      We specifically acknowledged in the Discussion that “A significant limitation of our study, however, is that we cannot exclude potentially toxic effects of 2A<sup>pro</sup> on aspects of host neuronal biology aside from the NPC.” We have also made the following adjustments to the text to make it more clear that this remains an open question:

      Change the title to more clearly separate the effects of 2Apro on NPC function and motor neuron toxicity as independent events: “Enterovirus D68 2A protease causes nuclear pore complex dysfunction and independently contributes to motor neuron toxicity”

      In the abstract, shortened the following sentence: “We therefore sought to determine the impact of EV-D68 proteases on NPC composition and function” to avoid any implicit connection that a mechanistic link has been established between these two concepts. Neurotoxicity is now introduced later in the abstract by saying “Independently, we show…” instead of “We further show…”

      Removed language in the last paragraph of the Results section that may have been construed to suggest a mechanistic linkage: “Because similar deficits have been reported to contribute to neurotoxicity in neurodegenerative disease…” and simply stated “We next sought to determine the extent to which 2Apro activity independently contributes to motor neuron injury during EV-D68 infection.”

      Edited the opening sentence of the discussion, where it was ambiguous whether the word “their” was referring to the enterovirus protease (which was our intent) or to NPC disruption as the cause of motor neuron toxicity. We removed the discussion of toxicity from this paragraph entirely to remove such confusion.

      Edited the final paragraph of the discussion to include “We have also demonstrated that 2A<sup>pro</sup> activity contributes to nucleocytoplasmic transport dysfunction and separately to cell death in motor neurons infected with EV-D68”. We then go on to discuss the hypothesis that this toxicity might be mediated partially or entirely through NPC dysfunction, and propose that this be a focus of further study.

      (4) The study would be strengthened by including another 2A inhibitor (e.g., boceprevir) to confirm the specificity of telaprevir's protective effects.

      While we would like to be able to include multiple pharmacologic inhibitors of 2A<sup>pro</sup>, unfortunately telaprevir is the only known inhibitor of EV-D68 2A<sup>pro</sup>. The same study that identified telaprevir as an EV-D68 2A<sup>pro</sup> inhibitor also evaluated boceprevir and determined that its inhibitory activity against 2A<sup>pro</sup> is minimal [12].

      Reviewer #3 (Public review):

      Summary:

      The author showed expression of the viral proteases 2Apro and 3Cpro of EV-D68, which cleaved specific components of the nuclear pore complex (Nup98 and POM121 by 2Apro), and 2A but not 3C expression altered nuclear import and export. Similar nucleocytoplasmic transport deficits are observed in EV-D68-infected RD cells and iPSC-derived motor neurons (diMNs). 2A inhibitor telaprevir partially rescued the nucleocytoplasmic transport deficits and suppressed neuronal cell death after infection. While it's clear that 2A can cleave NPC proteins and affect nuclear transport, the link to neurotoxicity after EV-D68 infection is less convincing.

      This study opens up a very intriguing hypothesis: that EV-D68 2Apro could be directly responsible for motor neuron cell death, mediated by POM121 and possibly Nup98 cleavage, that ultimately results in paralysis known as acute flaccid myelitis. This hypothesis notably does run counter to other published data showing that human neuronal organoids derived from iPSCs can support productive EV-D68 infection for weeks without cell death and that EV-D68-infected mice can have paralysis prevented by depletion of CD8 T cells, still with EV-D68 infection of the spinal cord. However, even if 2Apro is not ultimately responsible for motor neurons dying in human infections, that does not exclude the possibility that cleavage of nups could still disrupt motor neuron function. Notably, most children with AFM have some amount of motor function return after their acute period of paralysis, but most still have some residual paralysis for years to life. It is possible that 2A pro could mediate the acute onset of weakness, while T cells killing neurons could determine the amount of long-term, residual paralysis.

      We thank the reviewer for their thoughtful comments. As discussed above, we agree that the present data demonstrate that 2A<sup>pro</sup> causes NPC dysfunction and is toxic in motor neurons, but has not proven that the mechanism of neurotoxicity is via NPC dysfunction.

      We appreciate the commentary on novel hypotheses opened by our work. Our recent thinking on this topic has been similar and we look forward to addressing these ideas further in future studies. Motor neuron dysfunction and motor neuron death may ultimately prove to have separate causes. The infection of motor neurons is likely the initiating event, with multiple downstream consequences which may be neuron-autonomous, or mediated by glial and inflammatory responses, or a mixture thereof.

      Strengths:

      The characterization of nuclear pore complex components that appear to be targets of both poliovirus and EV-D68 proteases is quite thorough and expansive, so this data set alone will be useful for reference to the field. And the process by which the authors narrowed their focus to EV-D68 2Apro reducing Nup98 and POM121 as consequential to both import and export of nuclear cargo but not RNA was technically impressive, thorough, and convincing. As will be detailed below, when the authors move from studying over-expressed proteases in transformed cell lines to studying actual virus infection in both transformed cell lines and iPSC-derived neurons, some of the data only indirectly support their conclusions; however, the quality of the experiments performed is still high. So even if the claim that 2Apro causes neurotoxicity is circumstantial, the data certainly are intriguing and certainly justify further study of the effects of EV-D68 2Apro on the NPC and how this impacts pathogenesis. This is a convincing start to an intriguing line of inquiry.

      We appreciate the reviewer’s recognition of our comprehensive evaluation of NPC disruption and our approach to arriving at a mechanistic understanding of this process. We agree with the reviewer’s viewpoint that the present study represents a beginning, rather than a conclusive end to this line of inquiry. For technical reasons, we were able to achieve more rigorous and mechanistic data in cell lines expressing recombinant proteins than in neurons infected with live virus. In response the reviewers’ comments, as described above, we have added additional experiments in this revision in which we further evaluate nucleoporin cleavage and RNA export during live virus infection, and performed these experiments in iPSC-derived neurons whenever it was technically feasible to do so.

      Weaknesses:

      This study falls a bit shy of actually showing that 2Apro effects are causing motor neuron toxicity because the evidence of this is fairly indirect. At points, the authors do admit these limitations, but at other times, they claim to have shown the link directly. The following are reasons why these claims are only indirectly supported:

      We agree that we have shown direct toxicity of 2A<sup>pro</sup> in motor neurons, but have not shown that the mechanism is via NPC dysfunction. We felt that we were careful to frame our conclusions as such. However, we have revised the text to improve the clarity on this point as described above.

      (1) Cleavage of Nup98 and POM121 after EV-D68 infection in RD cells and diMNs is never demonstrated.

      We have added data showing the cleavage of POM121 and Nup98 in EV-D68 infected diMNs (Figure 4A).

      (2) Telaprevir was able to rescue nucleocytoplasmic transport in RD cells at low concentrations (Figure 4A). It is not shown if this correlates with its antiviral effect in RD cells, or could this correlate with inhibition of 2A cleavage of Nup98 or POM121, which is never measured.

      In the aforementioned new experiment in Figure 4A, we have also included a dose-response curve for telaprevir showing its inhibition of POM121 and Nup98 cleavage.

      (3) Building off of the prior point, the authors' claim that the neuroprotective effect of telaprevir is independent of its antiviral effect is not well-founded. Figure 4E (neuroprotection) was done with MOI 5, and Figure 4G (virus growth) was MOI 0.5. Telaprevir neuroprotection is not shown at MOI 0.5, nor is the neuroprotective effect correlated with inhibition of 2A cleavage of Nup98 or POM121.

      The selection of MOIs for these two experiments was limited by technical considerations. If the viral growth curve were to be performed at MOI 5, it would be confounded by cell death. Further, a low MOI is required in order to allow multiple rounds of infection, and is therefore more sensitive for assaying the effect of telaprevir on viral replication. On the other hand, at MOI 0.5 diMN death is very gradual, and the neuroprotection assay we would have lacked the statistical power to determine whether a rescue of this small magnitude of toxicity is significant. The EC<sub>50</sub> of telaprevir is not expected to vary at different MOIs.

      We have also now correlated the inhibition of 2A<sup>pro</sup> cleavage of Nup98 and POM121 with the neuroprotective effect at comparable concentrations of telaprevir, as described above.

      (4) The use of mixed virus isolates only in the diMNs is problematic because different EV-D68 isolates are known to have drastically different effects on pathogenesis in mice. Since all initial data were generated with the MO isolate, adding the additional MD isolate to the diMN experiments actually adds uncertainty to the conclusions. It is not clear if the authors infected different cultures with the different isolates and combined the data or infected all cultures with a mixture of the two isolates. If the former, then the data should be reported separately to see the effect of each individual strain, which would be interesting to EV-D68 virologists. If the latter, then there is no way to know from these data whether one of the two isolates had increased fitness over the other and exerted a dominant effect. If the MD isolate overtook the MO isolate, from which all other data in this manuscript are derived, then we have much less of an idea how much the data from the first three figures supports the final figure.

      We apologize for the lack of clarity in describing this experiment. The MO/2014 and MD/2018 isolates were not mixed. These were performed in separate experiments, each with four biologically independent replicates. The original figure showed the mean and SEM for these 8 replicates together. To improve clarity, we separated each viral strain into its own panel of the figure. We have also increased the rigor of the statistical analysis in this experiment by using Cox proportional hazard regression instead of ANOVA.

      Recommendations for the authors:

      Reviewing Editor Comments:

      Please consider both public reviews above and recommendations for the authors below. The general consensus among reviewers is that more evidence is needed to support the claim that 2A causes motor neuron toxicity during infection.

      Reviewer #1 (Recommendations for the authors):

      Most of the conclusions are made upon analysis of images, yet the images themselves are seldom shown. It is difficult to evaluate the validity of conclusions without seeing the material that was analyzed.

      (1) Figure 1. Representative Western blots should be shown.

      We considered including representative western blots in this already large figure, however the figure size and complexity became un-manageable because the figure summarizes the quantification of 246 Western blots. In the original submission, we uploaded a supporting data file that included complete un-cropped Western blots for all experiments, including ladders, loading controls, and clear labeling of the samples. We believe these data allow the reader to assess the quality and reliability of our Western blot experiments while maintaining the approachability of the figures and data presentation. We have also included these supporting data again in the revised manuscript.

      (2) Figure 3. Representative images should be shown. This is especially important for the ethynyl-uridine labeling experiment. It would be highly surprising that RNA transcription and processing would proceed normally in 2A-expressing cells on the background of a major redistribution of nuclear proteins. One possible explanation for that would be that cells that can be analyzed express a relatively small amount of 2A, which is known to be toxic, and thus may not fully represent the cellular changes upon infection. The results from bona fide infected cells would be much more convincing.

      Representative images have been added for the ethynyl-uridine pulse-chase experiment, and this experiment has been repeated in RD cells infected with EV-D68. Transfection of proteases or infection of the cells utilized the same protocols and timeframes upon which nucleoporin cleavage and disruption of protein transport were found to be present. The timepoint for all of these experiments was selected to precede the onset of toxicity, and the representative images demonstrate normal cellular morphology. We also selected for analysis only GFP+ cells with normal morphology, ensuring that only viable 2A<sup>pro</sup>-GFP-expressing cells were included in the analysis. The new experiments again showed no effect on RNA export. We were equally surprised as the reviewers by this outcome. However, as we note in the text, disruption of RNA export has not been uniformly present across all enteroviruses previously studied.

      (3) Figure 4 A-D. Similarly, representative images should be shown.

      We have added representative images for these experiments, which are now Fig 4B-E.

      (4) Figure 4G. The demonstration that the "neuroprotective" effect of 2A inhibitor is not related to the inhibition of viral replication requires a control showing that a similar inhibition of viral replication by an inhibitor with another target would not similarly diminish cell toxicity.

      Neuronal survival experiments showed inhibition of toxicity with concentrations of telaprevir as low as 0.3 uM, a concentration at which there was no significant effect on viral replication. Telaprevir had only a marginal inhibitory effect on viral replication at 10uM (achieving statistical significance in only one of two strains), and no consistent effect on replication at lower concentrations. Therefore, the suggested control experiment would not be possible, because the neuroprotective concentration of telaprevir does not inhibit viral replication

      Reviewer #2 (Recommendations for the authors):

      Major concerns:

      (1) Most of the experiments were performed with recombinant 2A and 3C proteins. While these experiments are highly informative for dissecting the role of each protease in NPC dysfunction, it would be important to also perform experiments in the context of infection. How are import and export processes affected when both proteases are present during infection? How is passive transport modified under these conditions?

      Thank you for this important comment. Please see the above discussion of additional experiments that we added utilizing live virus infection to complement the experiments that used recombinant proteins.

      (2) The results regarding RNA export in the presence of recombinant 2A and 3C proteases suggest that RNA export is not altered. It would be important to confirm this finding during infection.

      We agree that this is an important experiment, and have done so as described above.

      (3) While the background information suggests that NPC dysfunction contributes to neurotoxicity, the observed reduction of neurotoxicity by telaprevir does not demonstrate that this effect is solely due to the action of 2A on the NPC. It would be important to evaluate the integrity of eIF4G, nucleoporins, and stress granules during treatment.

      We agree that additional experiments would be required to determine the extent to which the toxicity of 2A<sup>pro</sup> is mediated through its effects on the NPC versus other potential targets. Please see above discussion for more details.

      (4) Including another 2A inhibitor (e.g., boceprevir) would strengthen the conclusions by confirming the results obtained with telaprevir.

      Please see above discussion of boceprevir

      Reviewer #3 (Recommendations for the authors):

      (1) Preferred ICTV nomenclature abbreviates rhinovirus as RV instead of HRV, so the authors should change their abbreviations appropriately. See Simmonds et al.

      Archives of Virology (2020) 165:793-797 https://doi.org/10.1007/s00705-019-04520-6

      We have updated these abbreviations accordingly.

      (2) There is no mention of Figures 1C and 1D in the text.

      These have been added in the appropriate locations.

      (3) In the section "2A protease alters nucleocytoplasmic trafficking of protein substrates" it would be very helpful to just directly state what each construct is meant to demonstrate. Along the lines of "NLS-tdTomato should be located in the nucleus, so seeing more signal in the cytoplasm would indicate a defect in nuclear import." And something equivalent for the other two constructs.

      Thank you for the suggestion. We have added descriptions of the use and interpretation for each construct.

      (4) The following sentence would be more accurate with the addition of "partially" because the effect is not returned to normal levels: "The mislocalization of NLS-tdTomato was partially rescued by 3μM telaprevir."

      We have edited this as recommended.

      (5) SNAP29 is probably a typo and meant to be CREB in the legend of Figure 1B.

      Thank you for catching this. We have corrected this to CREB.

      (6) "Panel A" should likely be "Panel E" in the Figure 4F legend.

      We have corrected this to refer to the appropriate panel, which has also been re-lettered due to the addition of new panels to this figure.

      (7) The authors should at least show representative Western blot data used to determine the data for Figure 1 in a supplemental figure.

      As discussed above, these Western blots were included as supplemental data in the original submission, and have also been included in the revised version.

      (8) As suggested in the public comments, if the diMNs were infected separately with the MO and MD strains of EV-D68, those data should be separated from each other and reported individually. In any case, whatever was done (combined virus inoculum or separate inocula) needs to be clarified.

      These data are now reported separately. Please see above discussion for details.

      References:

      (1) Vogt MR, Wright PF, Hickey WF, De Buysscher T, Boyd KL, Crowe JE, Jr. Enterovirus D68 in the Anterior Horn Cells of a Child with Acute Flaccid Myelitis. N Engl J Med. May 26 2022;386(21):2059-2060. doi:10.1056/NEJMc2118155

      (2) Hixon AM, Yu G, Leser JS, et al. A mouse model of paralytic myelitis caused by enterovirus D68. PLoS Pathog. Feb 2017;13(2):e1006199. doi:10.1371/journal.ppat.1006199

      (3) Andersen EW, Kornberg AJ, Freeman JL, Leventer RJ, Ryan MM. Acute flaccid myelitis in childhood: a retrospective cohort study. Eur J Neurol. Aug 2017;24(8):1077-1083. doi:10.1111/ene.13345

      (4) Elrick MJ, Gordon-Lipkin E, Crawford TO, et al. Clinical Subpopulations in a Sample of North American Children Diagnosed With Acute Flaccid Myelitis, 2012-2016. JAMA Pediatr. Feb 1 2018;173(2):134-139. doi:10.1001/jamapediatrics.2018.4890

      (5) Hovden IA, Pfeiffer HC. Electrodiagnostic findings in acute flaccid myelitis related to enterovirus D68. Muscle Nerve. Nov 2015;52(5):909-10. doi:10.1002/mus.24738

      (6) Knoester M, Helfferich J, Poelman R, et al. Twenty-Nine Cases of Enterovirus-D68 Associated Acute Flaccid Myelitis in Europe 2016; A Case Series and Epidemiologic Overview. Pediatr Infect Dis J. Jan 2018;38(1):16-21. doi:10.1097/INF.0000000000002188

      (7) Martin JA, Messacar K, Yang ML, et al. Outcomes of Colorado children with acute flaccid myelitis at 1 year. Neurology. Jul 11 2017;89(2):129-137. doi:10.1212/WNL.0000000000004081

      (8) Saltzman EB, Rancy SK, Sneag DB, Feinberg Md JH, Lange DJ, Wolfe SW. Nerve Transfers for Enterovirus D68-Associated Acute Flaccid Myelitis: A Case Series. Pediatr Neurol. Nov 2018;88:25-30. doi:10.1016/j.pediatrneurol.2018.07.018

      (9) Van Haren K, Ayscue P, Waubant E, et al. Acute Flaccid Myelitis of Unknown Etiology in California, 2012-2015. JAMA. Dec 22-29 2015;314(24):2663-71. doi:10.1001/jama.2015.17275

      (10) Natera-de Benito D, Berciano J, Garcia A, E MdL, Ortez C, Nascimento A. Acute Flaccid Myelitis With Early, Severe Compound Muscle Action Potential Amplitude Reduction: A 3-Year Follow-up of a Child Patient. J Clin Neuromuscul Dis. Dec 2018;20(2):100-101. doi:10.1097/CND.0000000000000217

      (11) Rosenfeld AB, Warren AL, Racaniello VR. Neurotropism of Enterovirus D68 Isolates Is Independent of Sialic Acid and Is Not a Recently Acquired Phenotype. Mbio. 2019;doi:10.1128/mBio

      (12) Musharrafieh R, Ma C, Zhang J, et al. Validating Enterovirus D68-2A(pro) as an Antiviral Drug Target and the Discovery of Telaprevir as a Potent D68-2A(pro) Inhibitor. J Virol. Jan 23 2019;doi:10.1128/JVI.02221-18

    1. eLife Assessment

      This study offers an important advance by extending an intuitive visualization tool that enables assessment of how dendritic and synaptic currents potentially shape neuronal output. The evidence supporting the tool's capabilities is convincing, with well-documented code, algorithmic innovation, and application to hippocampal pyramidal neurons. The work will be of interest to computational and systems neuroscientists seeking accessible methods to examine dendritic computations.

    2. Reviewer #1 (Public review):

      Summary

      Fogel & Ujfalussy report an extension of a visualization tool that was originally designed to enable an understanding of detailed biophysical neuron models. Named "extended currentscape", this new iteration enables visual assessment of individual currents across a neuron's spatially extended dendritic arbor with simultaneous readout of somatic currents and voltage. The overall aim was to permit a visually intuitive understanding for how a model neuron's inputs determine its output. This goal was worthwhile and the authors achieved it. Demonstrating the utility of extended currentscape, the authors leverage their models to generate interesting and detailed biophysical insights into widely studied neurophysiological phenomena with clear behavioral relevance. Overall, this study provides a valuable and well-characterized biophysical modeling resource to the neuroscience community.

      Strengths

      The authors significantly extended a previously published open-source biophysical modeling tool. Beyond providing important new capabilities, the potential impact of extended currentscape is boosted by its integration with preexisting resources in the field.

      In keeping with the authors' goal to provide an approachable platform with intuitive visualizations of how current flows through neurons, the manuscript is approachable to non-computationalists. In particular, a dedicated glossary and elegant illustrations in Figure 2 boost accessibility for biologists.

      Extended currentscape produces intriguing and detailed predictions spanning neurophysiological phenomena such as local dendritic spikes, complex spike generation, and feature selectivity (hippocampal place fields). By triggering analysis of modeled synaptic inputs on these events, the authors trace their origins from dendritic integration to synaptic input patterns.

      The authors cleverly apply a graph theoretical approach to efficiently model bidirectional current flow throughout a neuron's dendritic arbor. As a result, extended currentscape can run on a standard personal computer.

      The code is well-documented and freely available via GitHub.

      Weaknesses

      While extended currentscape meets its objective of modeling and illustrating the propagation of axial currents throughout a model neuron in great detail, it requires simulation and measurement of synaptic input currents. For this reason, there currently exists a very high technical barrier to conclusively test its intriguing predictions: simultaneous readout of synaptic inputs throughout a neuron's dendritic arbor. Mitigating this weakness, the authors propose a relatively more feasible alternative approach in Discussion: simultaneous voltage imaging of dendrites and their soma while estimating synaptic inputs from the distributions of voltage dynamics along individual dendritic branches.

    3. Reviewer #2 (Public review):

      The electrical activity of neurons and neuronal circuits is dictated by the concerted activity of multiple ionic currents. Because directly investigating these currents experimentally is not possible with current methods, researchers rely on biophysical models to develop hypotheses and intuitions about their dynamics. Models of neural activity produce large amounts of data that are hard to visualize and interpret. The currentscape technique helps visualize the contributions of currents to membrane potential activity, but it is limited to model neurons without spatial properties. The extended currentscape technique overcomes this limitation by tracking the contributions of the different currents from distant locations. This extension allows tracking not only the types of currents that contribute to the activity in a given location, but also visualizing the spatial region where the currents originate. The procedure is first illustrated in a simple setting that allows testing its validity in an intuitive situation where a cell with an apical trunk and two dendritic branches responds to synaptic inputs. The procedure is then applied to study the initiation of complex spike bursts in a model hippocampal place cell.

      The extended currentscape method represents a significant improvement over the original technique, which is already utilized by several research groups. By enabling the analysis of current contributions in spatially extended models, this technique provides a new lens for investigating neuronal and circuit dynamics and will be of use to the modeling community.

      Comments on revisions:

      The changes in Figure 2 greatly improved the manuscript.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Fogel & Ujfalussy report an extension of a visualization tool that was originally designed to enable an understanding of detailed biophysical neuron models. Named "extended currentscape", this new iteration enables visual assessment of individual currents across a neuron's spatially extended dendritic arbor with simultaneous readout of somatic currents and voltage. The overall aim was to permit a visually intuitive understanding for how a model neuron's inputs determine its output. This goal was worthwhile and the authors achieved it. Their manuscript makes two additional contributions of note: (1) a clever algorithmic approach to model the axial propagation of ionic currents (recursively traversing acyclic graph subsections) and (2) interesting, albeit not easily testable, insights into important neurophysiological phenomena such as complex spike generation and place field dynamics. Overall, this study provides a valuable and well-characterized biophysical modeling resource to the neuroscience community.

      Strengths:

      The authors significantly extended a previously published open-source biophysical modeling tool. Beyond providing important new capabilities, the potential impact of "extended currentscape" is boosted by its integration with preexisting resources in the field.

      The code is well-documented and freely available via GitHub.

      The author's clever portioning algorithm to relate dendritic/synaptic currents to somatic yielded multiple intriguing observations regarding when and why CA1 pyramidal neurons fire complex spikes versus single action potentials. This topic carries major implications for how the hippocampus represents and stores information about an animal's environment.

      Weaknesses:

      While extended currentscape is clearly a valuable contribution to the neuroscience community, this reviewer would argue that it is framed in a way that oversells its capabilities. The Abstract, Introduction, Results, and Methods all contain phrases implying that extended currentscape infers dendritic/synaptic currents contributing to somatic output., i.e. backwards inference of unknown inputs from a known output. This is not the case; inputs are simulated and then propagated through the model neuron using a clever partitioning algorithm that essentially traverses a biologically undirected graph structure by treating it like a time series of tiny directed graphs. This is an impressive solution, but it does not infer a neuron's input structure.

      We are sorry if our text could be interpreted as if we were inferring unobserved inputs from the known outputs. This was not intentional and we were unaware of the possibility of such interpretation.

      In fact, at the beginning of the Results, we started the description of the extended currentscape method by explicitly stating that we need to measure the input currents: “Our method … requires measuring the membrane and axial currents throughout the dendritic tree of a neuron (in every node of the circuit)”.

      To further clarify that our method starts with measuring the input currents, we made this information explicit already in the abstract (“Our approach relies on the iterative decomposition of the axial current flowing between neighbouring compartments in proportion to the underlying membrane currents measured in the model.”), and in the Introduction (“Even if the membrane currents are known, studying the impact of particular ion channels on the neuronal response in such a dynamical system under in vivo conditions is hindered by two major obstacles”). We also rewrote several parts of the text to remove any phrases that could imply the inference of the inputs (line 568). We believe that after clarifying this at the beginning of the paper, the readers will not misinterpret our descriptions later in the text.

      Because a directed acyclic graph architecture is shown in Figure 2, it is unintuitive that the authors can infer bidirectional current flow, e.g. Figure 3 showing current flowing from basal dendrites and axon to soma, and further towards the apical dendrites. This is explained in Methods, but difficult to parse from Results amidst lots of rather abstract jargon (target, reference, collision, compartment). Figure 2 would have presented an opportunity to clearly illustrate the author's portioning algorithm by (1) rooting it in the exact morphology of one of their multicompartmental model neurons and (2) illustrating that "target" and "reference" have arbitrary morphological meanings; they describe the direction of current flow which is reevaluated at each time step.

      We thank for this comment. We agree that the concepts introduced here to explain our method are rather abstract and could be difficult to understand. To help the reader we followed the instructions of Reviewer and redesigned Fig. 2 to provide a step by step explanation of the extended currentscape method. In particular,

      We used a simpler model where the structure of the graph can be directly related to the morphology of the model.

      We show that the target node can connect multiple subtrees with axial currents flowing in different directions. We explain that in this case the inward and the outward subtrees are pruned and partitioned separately.

      We provide a glossary in Table 1 to ensure that the readers can follow our description and do not get lost amidst lots of rather abstract jargon.

      We also clarified that although the target compartment is chosen arbitrarily by the user, it remains the same for all time points throughout the analysis.

      Analyses in Figure 7, C and D, are insightfully devised and illuminating. However, they could use some reconciliation with Figure 5 regarding initiation of individual APs versus CSBs within place fields.

      We thank the reviewer for the positive comments and also for pointing out the potential source of misunderstanding. We slightly changed the text at Fig 5 to emphasize that this is a single example trial, and we added the following sentence to the paragraph describing Fig 7CD: “Consequently, the somatic current dynamics before the iAP and the CSB presented in Fig 5Cc-Dd can be regarded as illustrative samples from a broad distribution, but the differences observed between them are not representative.}”

      The intriguing observations generated by extended currentscape also point to its main weakness, which the authors openly acknowledge: as of now, no experimental methods exist to conclusively tests its predictions.

      We agree with the Reviewer that not being able to apply our extended currentscape method to reveal the current types driving real neurons recorded in vivo is currently a weakness of our approach. However, we would like to emphasize that it may be feasible to use it to estimate the spatial distribution of the membrane currents driving the cell based on in vivo voltage imaging data, as we briefly outline in the discussion.

      Reviewer #2 (Public review):

      Summary

      The electrical activity of neurons and neuronal circuits is dictated by the concerted activity of multiple ionic currents. Because directly investigating these currents experimentally isn't possible with current methods, researchers rely on biophysical models to develop hypotheses and intuitions about their dynamics. Models of neural activity produce large amounts of data that is hard to visualize and interpret. The currentscape technique helps visualize the contributions of currents to membrane potential activity, but it's limited to model neurons without spatial properties. The extended currentscape technique overcomes this limitation by tracking the contributions of the different currents from distant locations. This extension allows tracking not only the types of currents that contribute to the activity in a given location, but also visualizing the spatial region where the currents originate. The method is applied to study the initiation of complex spike bursts in a model hippocampal place cell.

      Strengths.
>

      The visualization method introduced in this work represents a significant improvement over the original currentscape technique. The extended currentscape method enables investigation of the contributions of currents in spatially extended models of neurons and circuits. 
>

      Weaknesses.

      The case study is interesting and highlights the usefulness of the visualization method. A simpler case study may have been sufficient to exemplify the method, while also allowing readers to compare the visualizations against their own intuitions of how currents should flow in a simpler setting. 
>

      We thank the reviewer for this comment. In fact we had been also considering to include a simpler case study to illustrate the extended currentscape method in the original submission. In accordance with the comments from Reviewer 1, we now use a simple model to introduce the concepts in Figure 2 and provide a few examples where the reader can compare the results with their own intuition in simpler cases.

      Recommendations for the authors:

      Reviewing Editor Comments:

      (1) Model complexity vs. intuition/validation. The case study relies on a very complex CA1 model, making it difficult to build intuition about current flow and to validate the visualization. Inclusion of a simpler benchmark (e.g., soma plus a dendrite with two branches, fewer compartments) is recommended to demonstrate how the extended currentscape behaves in a more tractable setting.

      Inspired by the suggestions of the Reviewers, we modified Figure 2 and now first use a simple model with a soma and a dendrite with two branches to introduce the concepts of our analysis. We start with a few examples where the reader can compare the results with their own intuition in simpler cases.

      (2) Rationale and citations for input structure. The in vivo-like input design (untuned inhibition; 12 co-tuned excitatory clusters with large conductances; the goal of generating place fields) would benefit from a more explicit rationale and substantially more literature support. Alternative plausible scenarios (e.g., distributed co-tuned inputs and homosynaptic plasticity) should be articulated, and choices situated within the experimental literature on CA1 excitation/inhibition, including tuning and anti-tuning results.

      We extended the paragraph in the Results describing the input structure and added the most important references there. We added further references to the Methods section where we argue that “Reliable place cell tuning can be achieved by functional synaptic clustering without increased excitatory drive in the place field (Ujfalussy and Makara 2020) or via strong excitatory drive without input clustering (Grienberger et al., 2017, Ujfalussy and Makara, 2020). However, experimental data indicates that both of these mechanisms are present and contribute to the activity of place cells (Adoff et al., 2021,Tasciotti et al., 2025)” and “although interneurons can display spatial tuning, they typically have a broad tuning with low selectivity (Ego-Stengel et al., 2007, Dupret et al., 2013, Geiller et al., 2020). A weak disinhibition within the place field can also contribute to the selective firing of place cells (Geiller et al., 2022, Valero et al., 2022), this was not necessary for place cell activity in novel environments (Geiller et al., 2022) and the overall inhibitory input to place cells is largely untuned (Grienberger et al., 2017).”

      (3) Scope of PCA-based claims. The interpretations derived from the PCA analysis appear broader than warranted, given subcellular heterogeneity and the dominance of somatic action potential variance. These claims should be tempered with more explicit statements about what PCA can and cannot resolve in this context.

      We thank the Reviewer for the opportunity and encouragement to clarify this part of the text. We agree with the Editor and the Reviewers that the results of the PCA analysis can not be used to support claims regarding the presence or the absence of independent dendritic events. In fact, we aimed to use it as an illustration that global activity tends to dominate PCA analysis even when the “neuron is mainly driven by strong, functionally clustered synaptic inputs to a few dendritic branches”. We acknowledge that we did not formulate this point clearly in the original submission. Therefore we substantially rewrote this part of the Results and performed additional analysis to clarify that there is a substantial amount of soma-independent dendritic activity in our model that remains invisible for a PCA based analysis.

      Reviewer #1 (Recommendations for the authors):

      Major concerns:

      (1) Depolarization-inactivated K+ may be an important consideration to model burst-firing.

      Our current model includes 2 kinds of transient K+ channels that show inactivation after depolarization: a proximal and a distal type, as the original model in Jarsky et al., 2005. We now made this explicit in the main text (line 178).

      (2) Description of the in vivo-like model's excitatory and inhibitory input structure needs many more citations of biological studies to communicate rationale for the author's decisions, e.g. untuned inhibitory neurons, organization of a subset of excitatory inputs into 12 function synaptic clusters with co-tuned presynaptic neurons and outsized synaptic conductances. The goal is clearly to create CA1 pyramidal neurons with place fields, which would be helpful to state upfront. But additionally, (a) place fields could arise from homosynaptic potentiation of distributed co-tuned excitatory inputs (e.g., Bittner, et al. 2017 study describing BTSP made no assumptions) and (b) CA1 inhibitory interneurons can be spatially tuned (Ego-Stengel & Wilson, 2006; Wilent & Nitz, 2007; Geiller, et al. 2020) and even anti-tuned (Geiller, et al. 2021).

      We thank the Reviewer for pointing out the lack of appropriate references in this section. We made the following changes in the manuscript:

      (1) Stated explicitly that the goal was to create place cell activity.

      (2) Added references to the main text to justify our choices of the inputs (lines 234-241).

      (3) We included a longer rationale for the choice of synaptic clusters and the lack of inhibitory (anti-)tuning in the Methods section, describing the neuron model. In brief, Adoff et al., 2021 reported more clustering of excitatory inputs within the place field. In our model, the degree of clustering is somewhat larger than the clusters reported. Although inhibitory neurons can be tuned, their tuning is much weaker than that of place cells and seems to play only a minor role in the generation of place fields (Grienberger et al., 2017). The presence of inhibitory anti-tuning is controversial: although Geiller et al., 2021 reported weak (~10%) anti-tuning, they did not find it in novel environments, indicating that it is not needed for spatially selective activity (lines 628-646).

      (3) Interpretation of principal component-based analyses shown in Figure 4 could be toned down. As written in section "CSBs in the CA1 pyramidal neuron", it sounds like CA1 pyramidal neuron dendrites display minimal autonomous activity. However, PCA does not seem well-suited to address the heterogeneity of subcellular voltage dynamics over physiologically relevant timescales. Somatic action potentials, and their backpropagation/modulation of dendritic voltage, would of course explain a very large fraction of variance. However, if local dendritic events summate over fine timescales to initiate somatic firing, it is hard to imagine this important nuance being detected. On the other hand, it is hard to imagine single dendritic branches driving robust somatic firing except in the relatively extreme situation in which large numbers of synapses synchronously drive the same branch to initiate a local Ca2+ spike (Figure 3, A-C).

      We agree with the reviewer that PCA can not reveal the potential dendritic origin of somatic APs, and thus is not suitable to assess the role of local dendritic spikes in shaping the output of the cell. We wanted to highlight here that even in cells with excitable dendrites driven by strong, local input clusters, exhibiting frequent local dendritic spikes, the dendritic membrane potential dynamics will be dominated by global fluctuations with surprisingly little sign of local dynamics in the PCA components. As the reviewer also pointed out, this may not be surprising as local events either remain spatially restricted and thus contribute little to the overall variability of the dendritic Vm or they initiate somatic APs and will thus be counted as global events.

      To demonstrate the high propensity of local dendritic events, we analysed local Vm peaks in dendritic branches and found that ~7.6% of the peaks were not coupled to somatic APs.

      Although this number could seem low, we emphasize that most of the 92.4% of the dendritic peaks coupled to APs potentially reflect the backpropagation of the same somatic events to multiple dendritic sites. To confirm this, we performed an additional analysis measuring the spatial extent (number of branches involved) of the individual dendritic events. We found that 90% of the events remained local, restricted to a few dendritic branches, while 10% of the events were global, associated with BAPs and involving the majority of the dendritic tree. Interestingly, these global events dominate the PCA analysis and are responsible for >90% of the dendritic Vm peaks. These results are included in a new panel in Figure 4H.

      We conclude that, “this way, although only 10% of the dendritic Vm events were associated with bAPs, they were ~60-times larger than local events and they dominated the PCA analysis even in the presence of local regenerative dendritic events driven by strong, functionally clustered synaptic inputs.” We believe that this model and analysis could serve as an important benchmark for future experimental studies investigating the structure of membrane potential correlations in in vivo voltage imaging data (Lee et al., 2026).

      (4) One suggestion would be to display more data as shown in Figure 4F, with a longer X axis to clarify the temporal relationship between local dendritic spikes and the first somatic action potential.

      We added a few more examples including the CSBs presented in Fig8G-I as a new supplementary Figure S4. We also slightly extended the x-axis on this supplementary figure as the reviewer requested.

      If the models indicate that passively filtered EPSPs drive most somatic action potentials, as seems to be the case in Figure 5, then this would also be helpful to show as in Figure 4F.

      In Fig 5 we showed two examples of isolated APs. The first AP was indeed driven by passively filtered EPSPs. The second one was preceded and possibly caused by a dendritic spike, as highlighted by the black arrowhead labelled c in Fig. 5Cc. We further analysed the currents driving iAPs in Fig 7B and C, and found that there is considerable heterogeneity in the magnitude of the dendritic Na currents driving the soma before action potentials. Figure 8 and Figure S3 (now Fig. S5) show further examples for iAPs driven either by passively filtered EPSPs or dendritic spikes. We also included these examples in the new supplementary Figure S4.

      (5) Another suggestion would be to use one-hot vectors containing onset times of different event types, since this would divorce the amplitude/duration of events from their influence over total variance.

      In this paper our goal was to illustrate the ability of the extended currentscape method to reveal the origin of the axial currents driving neuronal activity. In Fig. 4, our primary intention was to characterize the membrane potential response of the model in a way that is easily comparable with experimental data. To further quantify the frequency of local events, we added a new panel showing the spatial extent of dendritic events (Fig. 4H). To make our model more comparable with recent publications, we also calculated two additional metrics used to evaluate the relationship between somatic and dendritic activity (Fig 4I-J). We hope that these additional analyses help the reader to characterize the prevalence and impact of local dendritic events on somatic activity.

      (6) From section "Input conditions for complex spike burst generation", paragraph 2: "Note that synapse density, the ion channel mechanisms and the input statistics are identical for tuft and oblique branches,...". The authors should justify this parameterization given the numerous known differences between tuft and oblique branches in both of these regards and acknowledge accompanying interpretational caveats.

      We agree with the reviewer that experimental data demonstrated several significant differences between the tuft and oblique branches regarding both the inputs they receive and the way they process it. However, in the present paper we chose not to include these differences for several reasons:

      Here we aimed to focus on the abilities of the dendritic currentscape methods and use CSBs as a case study to illustrate how dendritic currentscape can reveal the membrane currents underlying complex neuronal responses.

      Currently there is no CA1PN model that would be able to reproduce all data regarding tuft and oblique integration and would be able to fire calcium spikes. We only wanted to make minimal modifications to the existing CA1PN model to make it capable of generating Ca-spikes and CSBs. We are currently working towards developing and extensively testing a new model, examining the role of these regional differences in CSB generation.

      Although there is information regarding input statistics and dendritic physiology in the literature, many of the relevant parameters are underconstrained. We wanted to avoid overfitting by keeping the model simple.

      By maintaining identical inputs and ion channel distribution we can distinctly highlight the special role of tuft morphology in CSB generation. Altering the inputs or the ion channel density for the tuft would make the interpretation more ambiguous, and elucidating the specific role of the different factors in CSB generation is the subject of future investigations.

      In sum, although we acknowledge that our model does not reflect the full complexity of CA1 PNs and its inputs, we regard this simplicity as a useful feature of the model. We added a section discussing potential future extensions of the model and highlighting interpretational caveats in the discussion (lines 482-490).

      (7) Given the debate in the field regarding the level of functional autonomy present in dendrites, the authors' finding that dendritic voltage largely tracks that of the soma (though see concern above re: PCA), and their access to specific currents, the authors have an important opportunity investigate the divergence between Ca2+ and voltage sensors as reporters of dendritic activity.

      For instance, why have some studies reported relatively common isolated dendritic Ca2+ transients in CA1 pyramidal neurons while other studies, including voltage imaging studies, have reported the opposite?

      We thank the Reviewer for the opportunity to highlight a few important points regarding functional autonomy of dendrites based on the analysis of our model. We would like to first note that only parallel calcium and voltage imaging studies will be able to ultimately resolve this debate. Nevertheless, below we briefly summarize our take on this issue.

      (1) In general, most Ca2+ imaging studies found that soma-independent dendritic events are rare. "Isolated dendritic transients (no coincident somatic event; see fig. S6, C and D, for example) were overall rare. Isolated apical dendritic Ca2+ transients, which have not previously been reported in CA1PNs, were larger and more frequent than those observed in basal dendrites." (O’Hare et al., 2022). "Activity in the ... basal dendrites ... along the track but outside of the place field was rarely observed” (Sheffield and Dombeck, 2014) and “overall, isolated dendritic transients were similar in size but occurred far less frequently than coincident dendrite-soma transients”, or “data indicate that spatially reliable dendritic firing was almost exclusively yoked to somatic tuning, likely reflecting strong backpropagation of burst firing during traversals of the somatic PF” (Rolotti et al., 2022). Consistent with this observation, a dendritic Vm peak chosen randomly from any branch has ~93% probability to be related to a bAP in our model. However, it is also true that ~90% of events in the model are local events, simply because isolated events involve ~60-times fewer branches (1.8 on average) than events associated with bAPs (114 branches) in the model. If the spatial extent of typical local events are also similarly small in real neurons as in the model, then even rare occurrences of dendritic events may reveal substantial dendritic independence. We added a section quantifying the functional autonomy of dendrites in the model in the main text, around Fig 4H.

      (2) Ca2+ indicators are slower and nonlinear and thus they are somewhat unreliable reporters of dendritic voltage events, especially in distal dendrites (Wu et al., 2026; Gonzalez et al., 2026). To illustrate this, we calculated three metrics in our model that were also reported in recent dendritic Ca2+ imaging studies (Rolotti et al., 2022, Sheffield et al., 2014, 2017). First, we calculated the fraction of bAPs detected in a branch (called dendrite-soma coupling in Rolotti et al., 2022, see their Fig. 2C) as a function of the distance of the branch from the soma (our new Fig. 4I). In the Ca2+ imaging data, this was essentially constant ~30% between distances 5-100 µm from the soma. In contrast, the fraction of bAPs detected in the model was 100% in this range as bAPs propagation failures did not occur before µ100 µm. This is also consistent with a recent voltage imaging study showing that even low-transmission bAPs reliably propagate to the proximal dendrites (Lee et al., 2026, Fig 3G). The low and distance independent dendrite-soma coupling reported by Rolotti et al. can only be reconciled with the known biophysics of neurons if the recorded calcium signal is unreliable reporter of the underlying voltage. Indeed, it has been reported that Ca signals associated with bAPs can be absent in some dendritic branches (Landau et al., 2022) or that local, nonlinear Ca signals can appear in the absence of local regenerative voltage response (Weber et al., 2016, Tran-Van-Minh et al., 2016) and that the Ca signals are highly variable across cells (Eltes et al., 2019).

      Second, we calculated the fraction of local events as a function of the distance from the soma (our Fig 4J; see also Fig. 2F in Rolotti et al.). When averaged across all branches, this was somewhat lower in the model (18%) than in the data (38%) which, again, could be explained by the low reliability of detecting global voltage events in all compartments based on the calcium signal.

      Third, the range of branch-spike-prevalence (BSP) values in our model (0.5-0.9; Fig. 4H) seem consistent with that reported (0.4-0.8) at first (Fig 4C of Sheffield et al., 2014; Fig 2 of Sheffield et al., 2017). However, we note that there are several important differences: for technical reasons, Sheffield et al. reported BSP for place field traversals and not for individual events, and they measured Ca2+ dynamics in the basal dendrites. Since bAPs are almost always present in all basal dendrites in the model (basal BSP > 0.9 for all events with somatic spikes) and place field traversals were always accompanied by somatic APs, BSP for basal dendrites would be nearly 1 in the model. Thus, the lower BSP values reported by Sheffield et al. could be explained by the limited reliability of the Ca2+ indicators in reporting regenerative voltage events in neuronal processes.

      We briefly discussed these differences in the Discussion (lines 474-478).

      (3) Finally, to our knowledge, there are 3 relevant in vivo voltage imaging studies in CA1 PNs. Liao et al., 2024 found that in induced place cells the tuning of dendritic events (presumably local or back-propagating Na-spike) was similar to the somatic tuning, which is consistent with our model where dendritic activity and tuning is dominated by bAPs. However, they did not acquire simultaneous signals from the dendrites and the soma so they could not study the independence of the dendritic events. Lee et al. (2026) found that only 10% of the dendritic events are not associated with a somatic spike, which is lower than the number of independent events in the model. However, the events they found were generated in the distal apical trunk (their Fig 3D) and they could not record from the most distal branches where most of the isolated events were generated in our model. Gonzalez et al., 2026 measured voltage and calcium in selected locations within the dendritic tree, and could not reliably estimate the fraction of isolated events throughout the cell. (Gonzalez et al, 2024 measured voltage only in single spines and soma, but did not quantify independent dendritic events; Wong-Campos et al., 2023 measured dendritic integration and bAPs in L23 branches; Wu et al. 2026 recorded in CA2 neurons.)

      We added a paragraph in the discussion comparing the level of functional autonomy present in the model dendrites to recent Ca- and voltage-imaging studies (lines 467-474).

      Minor concerns:

      (1) Abstract:

      There is a need to explain what currentscape is - even at the cost of not invoking its name. To a reader not familiar with currentscape, the abstract is extremely difficult to understand.

      We reworded the title and the abstract to make them more accessible to readers not familiar with the term currentscape.

      (2) "Currentscape analysis of place field dynamics" section:

      It would be helpful to emphasize upfront that dendritic determinants of individual somatic APs versus CSBs will be discussed separately. Since somatic action potentials are discussed before CSBs, I found this section initially confusing as I attributed those findings to CSBs until reading the next paragraph.

      We added a sentence to clarify that we analysed subthreshold responses, APs and CSBs separately.

      (3) Bottom of p2 discussing mixed literature on what drives CSBs in CA1 PCs:

      Overall accurate and useful point, but an important nuance is glossed over which misportrays state of field. References ex vivo studies that fail to drive CSBs with somatic current injection and in vivo study successfully doing so. These aren't really conflicting results. In vivo current injection co-occurs with spontaneous synaptic input, which is high in CA1 and results in PCs that are significantly depolarized at rest relative to those in acute slices. Bittner 2017 ex vivo results are consistent with this: CSBs driven by Cs+-based internal solution to block K+ channels (partially, using strategy of purposefully high series resistance). Similar situation in vivo given that A-type K+ channels are inactivated by depol. Resulting increase in input resistance lowers input threshold to CSB. This is clarified in Results, p.5: "Under in vivo-like synaptic input conditions (see below and Methods), dendritic Ca2+-spikes could also be evoked by somatic current injection (Fig. S1E), as in Bittner et al. (2015).", which makes p. 2 feel especially awkward.

      We agree with the Reviewer that these are not necessarily conflicting results. We rephrased this section, emphasizing that the role of the different input pathways in the initiation of CSBs are not clear.

      (4) Abbreviating "pyramidal neuron" with PC is confusing:

      PC often means place cell. The authors could change this, such that PC refers to "pyramidal cell", or else use PN as an abbreviation. It is important to avoid confusion, especially because place cell dynamics feature prominently in the manuscript.

      Thanks for the suggestion. We replaced PC with PN throughout the manuscript.

      (5) Only apical dendritic parameters are described in section 2 of Results, but the full morphology is shown in Figure 3B with basal currents shown in panels C and F. Some clarification is needed - either what currents were considered for basal dendrites and why, or else why basal dendritic current parameters were not considered for this simulation using apical dendritic current injection but nonetheless examining basal dendritic currents.

      We clarified in the text that the original model contained a standard set of Na and K channels (line 178).

      (6) Clarify "i" and "s" in the Figure 3C legend - "intrinsic" and "synaptic" white letterings are small/hard to see in the bottom subpanels.

      We now spell out intrinsic and synaptic in the Figure and increased the contrast of the letterings.

      (7) Regarding the computational benefit of recursively decomposing axial currents along an adaptively truncated acyclic graph, it would be useful to (a) include a supplemental figure benchmarking this approach to standard approaches to quantify the described gain in computational efficiency and (b) describe computing hardware in the Methods.

      We included an estimated benefit of the pruning process (line 758) as well as the utilised computing hardware and the simulation times in the Methods (line 776).

      Reviewer #2 (Recommendations for the authors):

      The manuscript is in great shape, it is well organized, and the figures are gorgeous. I believe that the extended currentscape is a great extension of the original currentscape method. In particular, the possibility of partitioning currents by the spatial location of their sources is a great addition. 
>

      Recommendations:

      (1) The method is applied in the context of an interesting case study that highlights its usefulness. However, the model in the study is so complex that it is difficult to develop an intuition of how currents should be flowing, and this makes it hard to intuitively validate the visualization method. I think that applying the extended currentscape in a simpler model - maybe a soma with a dendrite with two branches, fewer compartments - would be instrumental in developing this intuition. 
>

      We now first use a simple model with a soma and a dendrite with two branches to introduce the concepts in Figure 2 and provide a few examples where the reader can compare the results with their own intuition in simpler cases. We also added the currentscape analysis of a standard, two-compartmental model from Pinsky and Rinzel, 1994 as Supplementary Figure 1.

      (2) I found a number of typos and minor stylistic details you may want to fix in a revised version of the manuscript.

      (a) Abstractine, line 12. I believe the word "recursive" is a bit technical at this point. It's meaning in this context becomes clear after ones goes through the details of the algorithm (Figure 2). 
>

      We replaced the word “recursive” with “iterative”. We hope that this will make the abstract clearer for the readers. In fact, we realized that the word iterative is a better description of the algorithm, so we replaced the “recursive” with “iterative” consistently throughout the manuscript.

      (b) Figure 1, caption."Since we included the capacitive current, the magnitude of the inward and the outward currents is identical (Kirchhoff's law)."This sentence can be confusing. If the inward and outward currents are the same, the membrane potential doesn't change. I believe that you are including the capacitive current in the inward (or outward) currents.

      Indeed, we included the capacitive current in the inward or outward currents. We changed the text to clarify this.

      (c) Lines 92-93. I do not fully understand this sentence. Are you making an assumption? What does 'continuos flow of axial current' mean?
>

      By ‘continuous flow of axial current’ we meant a spatially continuous stream of axial currents flowing from the reference to the target. To clarify this, we added the explanatory sentence: “i.e., if the axial current is not blocked or reversed between the reference and the target.”

      (d) Equation (1.) Why summing axial currents over j? Is this for the case of a branching point?

      The compartment could be 1) part of a continuous segment of dendritic branch, where axial currents can flow from the distal and the proximal direction (sum over 2); 2) It can be a branch point with 3 axial currents; 3) or it can be a leaf compartment with only one axial current, in which case the summation is not relevant. We clarified this in the text.

      (e) Figure 2, caption. Typo. "When the axial currents flows…" Should it be 'current'? - Figure 3, caption. Typo in (C) "Extended currentscape" 
>

      Corrected.

      (f) Figure 4. I cannot see the grey lines or the dotted lines mentioned in the caption. 
>

      We added an arrow highlighting the gray and the dotted lines in the figure.

      (g) Figure 5, caption. "Red boxes highlight regions analyzed in panels B-D."Because this is a spatially extended model, region may be confused with spatial location, but you are highlighting a temporal interval.
>

      We rephrased the caption referring to temporal intervals now.

      (h) Line 341. This is a numerical experiment, correct? 
>

      We clarified in the text and added that it was indeed a simulation experiment.

      (i) Line 349. Should it be 'distributions'? 
>

      Corrected

      (j) Line 422. Typo. Missing space 'in vivousing'
>

      Corrected

      (k) Line 537. "Preprocessing membrane…" I found this entire subsection a bit confusing and hard to read.

      We rephrased this subsection to clarify it and facilitate reading.

    1. eLife Assessment

      This important study reports characterisation of hepatocyte molecular pathways affected by a glycyrrhizin derivative in both in vivo and in vitro mouse models of alcohol-associated liver disease. The authors show convincing evidence indicating that IPP delta isomerase 1 (Idi1) is an intermediate in these pharmacological effects, via the binding of the glycyrrhizin derivative to an upstream regulator of Idi1, HSD11B1, although significant questions remain about some of the experiments and analyses provided. The findings would be of interest to immunologists and pharmacologists interested in liver inflammation and its amelioration.

    2. Reviewer #1 (Public review):

      Summary:

      In this article by Xiao et al. the authors aimed to identify the precise targets by which magnesium isoglycyrrhizinate (MgIG) functions to improve liver injury in response to ethanol treatment. The authors found through a series of in-vivo and molecular approaches that MgIG treatment attenuates alcohol-induced liver injury through a potential SREBP2-IdI1 axis. The revised manuscript adds to a previous set of literature showing MgIG improves liver function across a variety of etiologies, and also provides mechanistic insight into its mechanism of action. All major weaknesses were addressed in the revised submission.

      Strengths:

      (1) The authors use a combination of approaches from both in-vivo mouse models to in-vitro approaches with AML12 hepatocytes to support the notion that MgIG does improve liver function in response to ethanol treatment.

      (2) The authors use both knockdown and overexpression approaches, in-vivo and in-vitro, to support most of the claims provided.

      (3) Identification of HSD11B1 as the protein target of MgIG, as well as confirmation of direct protein-protein interactions between HSD11B1/SREBP2/IDI1 is novel.

      Weaknesses:

      The authors addressed all my concerns.

    3. Reviewer #2 (Public review):

      Summary:

      In this manuscript, the authors investigated magnesium isoglycyrrhizinate (MgIG)'s hepatoprotective actions in chronic-binge alcohol-associated liver disease (ALD) mouse models and ethanol/palmitic acid-challenged AML-12 hepatocytes. They found that MgIG markedly attenuated alcohol-induced liver injury, evidenced by ameliorated histological damage, reduced hepatic steatosis, and normalized liver-to-body weight ratios. RNA sequencing identified isopentenyl diphosphate delta isomerase 1 (IDI1) as a key downstream effector. Hepatocyte-specific genetic manipulations confirmed that MgIG modulates the SREBP2-IDI1 axis. The mechanistic studies suggested that MgIG could directly target HSD11B1 and modulate the HSD11B1-SREBP2-IDI1 axis to attenuate ALD. This manuscript is of interest to the research field of ALD.

      Strengths:

      The authors have performed both in vivo and in vitro studies to demonstrate the action of magnesium isoglycyrrhizinate on hepatocytes and an animal model of alcohol-associated liver disease.

      Original comment (1):

      In Supplemental Figure 1A, all the treatment arms (A-control, MgIG-25 mg/kg, MgIG-50 mg/kg) showed body weight loss compared to the untreated controls. However, Figure 1E showed body weight gain in the treatment arms (A-control and MgIG-25 mg/kg), why? In Supplemental Figure 1A, the mice with MgIG (25 mg/kg) showed the lowest body weight, compared to either A-control or MgIG (50 mg/kg) treatment. Can the authors explain why MgIG (25 mg/kg) causes bodyweight loss more than MgIG (50 mg/kg)? What about the other parameters (ALT, ALS, NAS, etc.) for the mice with MgIG (50 mg/kg)?

      Author's response:

      We agree that this observation does not strictly follow a dose-dependent pattern. In vivo responses to pharmacological interventions, particularly in metabolic and liver disease models, are not always linear. The relatively greater body weight reduction observed in the 25 mg/kg group may be influenced by inter-individual variability, differences in metabolic adaptation, or sample size-related variation. Importantly, these differences in body weight were not statistically significant. Therefore, we selected the 50 mg/kg dose for subsequent animal experiments, as it demonstrated more consistent and stable improvements across multiple parameters, including body weight, ALT, AST, TG, and TC.

      New comment:

      My first question: All the treatment arms (A-control, MgIG-25 mg/kg, MgIG-50 mg/kg) showed significant body weight loss compared to the untreated controls (Supplemental Figure 1A), but the body weight significantly increased in the treatment arms (A-control and MgIG-50 mg/kg) compared to the untreated controls (Figure 1E). Why?

      My second question: Mice with MgIG (25 mg/kg) showed the lowest body weight, compared to either A-control or MgIG (50 mg/kg) treatment. According to the authors' explanation, the MgIG (25 mg/kg) caused bodyweight loss are attributed to inter-individual variability, differences in metabolic adaptation, or sample size-related variation. Did these differences happen in MgIG (25 mg/kg) only? or in all other groups? The mouse group assignment should be randomized; however, a large variation in bodyweight was seen in MgIG (25 mg/kg) group. It is not convincing for the author to select MgIG (50 mg/kg) group for subsequent animal experiments, because of a large variation in MgIG (25 mg/kg) group, and because that MgIG (50 mg/kg) group demonstrated more consistent and stable improvements across multiple parameters. The author should reanalyze and compare all the raw data between MgIG (50 mg/kg) group and MgIG (25 mg/kg) group, and address the issues being pointed out and justify rationale for the animal group assignment.

      Original comment (2):

      IL-6 is a key pro-inflammatory cytokine significantly involved in ALD, acting as a marker of ALD severity. Can the authors explain why MgIG 1.0 mg/ml shows higher IL-6 gene expression than MgIG (0.1-0.5 mg/ml)? Same question for the mRNA levels of lipid metabolic enzymes Acc1 and Scd1.

      Author's response:

      Thank you for this important comment. We agree that IL-6, as well as lipid metabolism-related genes such as Acc1 and Scd1, are key indicators in ALD. The relatively higher expression observed at 1.0 mg/mL MgIG compared to lower concentrations (0.1-0.5 mg/mL) may be related to experimental constraints associated with the MgIG formulation used in this study. Specifically, to maintain consistency with our in vivo experiments, we used a clinically available liquid formulation of MgIG (5 mg/mL), which is approved for intravenous administration in China. Due to its relatively low stock concentration, achieving higher working concentrations (e.g., 1.0 mg/mL) in vitro required a larger volume of the MgIG solution, thereby proportionally reducing the volume of culture medium. This reduction in effective culture conditions may adversely affect hepatocyte viability and function. Supporting this, our CCK-8 and LDH assays indicated that higher MgIG concentrations were associated with subtle cytotoxicity or impaired cell status.

      New comment:

      The author's response did not answer my question. If the authors believe it could be experimental constraints associated with the MgIG formulation, then it is questionable for this MgIG formulation used in all other associated experiments. The experiments, at least those the MgIG formulation associated experiments, need to be repeated.

      Original comment (3):

      For the qPCR results of Hsd11b1 knockdown (siRNA) and Hsd11b1 overexpression (plasmid) in AML-12 cells (Figure 5B), what is the description for the gene expression level (Y axis)? Fold changes versus GAPDH? Hsd11b1 overexpression showed non-efficiency (20-23, units on Y axis), even lower than the Hsd11b1 knockdown (above 50, units on Y axis). The authors need to explain this. For the plasmid-based Hsd11b1 overexpression, why does the scramble control inhibit Hsd11b1 gene expression (less than 2, units on the Y axis)? Again, this needs to be explained.

      Author's response:

      Thank you for this important comment, and we apologize for the lack of clarity in the Y-axis labeling, which may have led to misunderstanding.

      As shown in Figures 5A and 5B, we have revised the Y-axis description to clearly indicate that gene expression levels are presented as relative expression normalized to GAPDH (fold change relative to the control group).

      New comment:

      The author explained the relative expression was normalized to GAPDH (fold change), but they did not answer my question. My question is for Figure 5B. in Figure 5B (left, Hsd11b1-KD), scramble control showed over 100 (unit), however, in Figure 5B (right, Hsd11b1-OE), scramble control showed only 0.5-1 (unit). The data seemed that authors used same scramble control for both KD and OE? If yes, they should provide more details of the KD and OE experiments and explain why this happened. If they used plasmid for OE control, they also need to clarify it. In addition, qPCR is not a good assay to show the success of KD or OE, Western blotting should be done as convincing data to show the success of KD or OE.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      (1) A few of the claims made are not supported by the references provided. For instance, line 76 states MgIG has hepatoprotective properties and improved liver function, but the reference provided is in the context of myocardial fibrosis.

      Thank you for the correction. We have made the revision on page 4, line74.

      (2) MgIG is clinically used for the treatment of liver inflammatory disease in China and Japan. In the first line of the abstract, the authors noted that MgIG is clinically approved for ALD. In which countries is MgIG approved for clinical utility in this space?

      Thank you for this important comment. MgIG has been recommended for the treatment of alcoholic liver disease (ALD) in Chinese clinical guidelines (2018). We have clarified this point in the manuscript (Page 5, Line 79-80).

      (3) Serum TGs are not an indicator of liver function. Alterations in serum TGs can occur despite changes in liver function.

      Thank you for this important comment. We fully agree that serum triglycerides (TGs) are not a direct indicator of liver function. ALT and AST are more appropriate markers for hepatocellular injury, whereas TG and TC primarily reflect systemic and hepatic lipid metabolism status. We have made the necessary revisions as suggested on page 12, lines 285-288

      (4) There are discrepancies in the results section and the figure legends. For example, line 302 states Idil is upregulated in alcohol fed mice relative to the control group. The figure legend states that the comparison for Figure 2A is that of ALD+MgIG and ALD only.

      We thank the reviewer for the valuable suggestion. Accordingly, we have revised the legend for Figures 2A and 2B as suggested.

      (5) Oil Red O staining provided does not appear to be consistent with the quantification in Figure 1D. ORO is nonspecific and can be highly subjective. The representative image in Figure 1C appears to have a much greater than 30% ORO (+) area.

      Thank you for this insightful comment. We acknowledge that Oil Red O (ORO) staining can be influenced by background signal and may appear subjective in representative images. In our quantification, only well-defined lipid droplets with strong positive staining were included, while diffuse background staining (e.g., light reddish hue) was excluded. This may explain the apparent discrepancy between the representative image and the quantified ORO-positive area. To further strengthen the reliability of our findings, we additionally measured hepatic triglyceride (TG) and total cholesterol (TC) levels. These biochemical assays yielded results consistent with the ORO quantification, thereby supporting our conclusion regarding lipid accumulation. Please refer to page12, lines 285-288. As requested, we have added the required information to Figures 1G.

      (6) The connection between Idil expression in response to EtOH/PA treatment in AML12 cells with viability and apoptosis isn't entirely clear. MgIG treatment completely reduces Idi1 expression in response to EtOH/PA, but only moderate changes, at best, are observed in viability and apoptosis. This suggests the primary mechanism related to MgIG treatment may not be via Idi1.

      Thank you very much. We agree that although MgIG almost completely reverses Idi1 expression induced by EtOH/PA, the improvements in cell viability and apoptosis are only moderate, suggesting a potential discrepancy between these observations. This may indicate that Idi1 functions as a permissive factor, rather than the sole mediator, in this pathological process. In other words, while modulation of Idi1 contributes to the protective effects of MgIG, additional pathways are likely involved in mediating its overall impact on hepatocyte viability and apoptosis. We have clarified this point in the revised manuscript (Page 12, Lines 325–335), stating that MgIG exerts its protective effects against ethanol-induced hepatocellular injury, at least in part, through the regulation of Idi1.

      (7) The nile red stained images also do not appear representative with its quantification. Several claims about more or less lipid accumulation across these studies are not supported by clear differences in nile red.

      Thanks a lot. We acknowledge that Nile Red staining can be influenced by imaging conditions and may appear less distinct in representative images, which could affect visual interpretation. To minimize subjectivity, all images were analyzed using a consistent and standardized thresholding method across groups. We agree that the visual differences in Nile Red staining alone may not be sufficiently pronounced to fully support the quantitative conclusions. Therefore, to strengthen the reliability of our findings, we have included additional biochemical measurements, including serum TG and TC levels, as well as hepatic TG and TC content. These independent assays consistently support the observed changes in lipid accumulation. The corresponding data have been added to the revised manuscript (page 12, lines 285-288)

      (8) The authors make a comment that Hsd11b1 expression is quite low in AML12 cells. So why did the authors choose to knockdown Hsd11b1 in this model?

      Thank you for this important comment. Although the basal expression of Hsd11b1 in untreated AML-12 cells is relatively low, we observed that it is inducible upon EtOH/PA stimulation, indicating its functional relevance under stress conditions. Therefore, knockdown experiments were performed to assess its contribution to EtOH/PA-induced hepatocellular injury. We have clarified this point in the revised manuscript (page 15, lines 281-382).

      (9) Line 380 - the claim that MGIG weakens the interaction between HSD11b1 and SREBP2 cannot be made solely based on one Western blot.

      Thank you for this important comment. We agree that the conclusion that MgIG weakens the interaction between HSD11B1 and SREBP2 should not be based solely on a single co-IP/Western blot experiment. In the revised manuscript, we have therefore toned down this statement to more appropriately reflect the data. Specifically, we now describe this result as a preliminary observation suggesting a potential modulation of the interaction, rather than a definitive conclusion. Please refer to Page 15, line 391.

      (10) It's not clear what the numbers represent on top of the Western blots. Are these averages over the course of three independent experiments?

      Thank you for this helpful comment. We apologize for the lack of clarity in the original figure presentation. The numbers shown above the Western blot bands represent the densitometric quantification of protein expression normalized to GAPDH, calculated from three independent experiments. However, this information was not clearly specified in the original figure, which may have led to confusion. To address this concern, we have now revised the manuscript by explicitly clarifying the meaning of these values in the figure legends. In addition, we have added bar graphs showing the quantified results from three independent experiments for Figures S3A, S4D, S6B, and S8H to improve transparency and data presentation.

      (11) The claim in line 382 that knockdown of Hsd11b1 resulted in accumulation of pSREBP2 is not supported by the data provided in Figure 6D.

      Thank you for pointing out this issue. We sincerely apologize for the incorrect description in the original manuscript. This was a wording error. We have made the revision on page 15, line394-396.

      (12) None of the images provided in Figure 6E support the claims stated in the results. Activation of SREBP2 leads to nuclear translocation and subsequent induction of genes involved in cholesterol biosynthesis and uptake. Manipulation of Hsd11b1 via OE or KD does not show any nuclear localization with DAPI.

      Thank you for this important comment. We agree that the original description was not sufficiently clear, which may have led to misunderstanding of the results. To clarify, Figure 6E includes two experimental contexts. Under basal (physiological) conditions in AML-12 cells, manipulation of Hsd11b1 (overexpression or knockdown) does not significantly affect the subcellular distribution of SREBP2. However, under EtOH/PA-induced stress conditions, Hsd11b1 overexpression promotes both nuclear and cytoplasmic levels of SREBP2, whereas Hsd11b1 knockdown reduces SREBP2 expression in both compartments. We have made the revision on page 16, line399.

      (13) The entire manuscript is focused on this axis of MgIG-Hsd11b1-Srebp2, but no Srebp2 transcriptional targets are ever measured.

      We sincerely appreciate this great suggestion. We have made the necessary revisions as suggested on page 12, lines 285-288, line 292 by adding the mRNA changes of Lcn2 and Ldlr, which are SREBP2 target genes. As requested, we have added the required information to Figures 1F and 1H.

      (14) Acc1 and Scd1 are Srebp1 targets, not Srebp2.

      Thank you for this important comment. We agree that Acc1 and Scd1 are well-established downstream target genes of SREBP1 rather than SREBP2. To better support our proposed SREBP2-related mechanism, we further examined canonical SREBP2 downstream target genes, including Lcn2 and Ldlr. The results are consistent with activation of SREBP2 signaling in our model. These data have now been included in the revised manuscript (Page 12, Lines 285–288 and 292; Figures 1F and 1H).

      (15) A major weakness of this manuscript is the lack of studies providing quantitative assessments of Srebp2 activation and true liver lipid measurements.

      Thank you for this important comment. We acknowledge the concern regarding the lack of direct quantitative assessment of SREBP2 activation in the original version of the manuscript. To address this limitation, we have strengthened the evidence supporting SREBP2 activation using multiple complementary approaches. Specifically, we assessed the expression of canonical SREBP2 downstream target genes (Page 12, Lines 285–288 and 292; Figures 1F and 1H), together with Western blot analysis (Figure 6D) and immunofluorescence staining (Figure 6F), which collectively support activation of SREBP2 signaling in the EtOH/PA-induced ALD model.

      In addition, to provide a more comprehensive evaluation of hepatic lipid accumulation, we measured serum TG and TC levels, as well as hepatic TG and TC content. These biochemical analyses further confirm the presence of significant lipid accumulation in our model. We have made the necessary revisions as suggested on page 12, lines 285-288 (Figure 1G).

      Reviewer #2 (Public review):

      (1) In Supplemental Figure 1A, all the treatment arms (A-control, MgIG-25 mg/kg, MgIG-50 mg/kg) showed body weight loss compared to the untreated controls. However, Figure 1E showed body weight gain in the treatment arms (A-control and MgIG-25 mg/kg), why? In Supplemental Figure 1A, the mice with MgIG (25 mg/kg) showed the lowest body weight, compared to either A-control or MgIG (50 mg/kg) treatment. Can the authors explain why MgIG (25 mg/kg) causes bodyweight loss more than MgIG (50 mg/kg)? What about the other parameters (ALT, ALS, NAS, etc.) for the mice with MgIG (50 mg/kg)?

      We agree that this observation does not strictly follow a dose-dependent pattern. In vivo responses to pharmacological interventions, particularly in metabolic and liver disease models, are not always linear. The relatively greater body weight reduction observed in the 25 mg/kg group may be influenced by inter-individual variability, differences in metabolic adaptation, or sample size–related variation. Importantly, these differences in body weight were not statistically significant. Therefore, we selected the 50 mg/kg dose for subsequent animal experiments, as it demonstrated more consistent and stable improvements across multiple parameters, including body weight, ALT, AST, TG, and TC.

      (2) IL-6 is a key pro-inflammatory cytokine significantly involved in ALD, acting as a marker of ALD severity. Can the authors explain why MgIG 1.0 mg/ml shows higher IL-6 gene expression than MgIG (0.1-0.5 mg/ml)? Same question for the mRNA levels of lipid metabolic enzymes Acc1 and Scd1.

      Thank you for this important comment. We agree that IL-6, as well as lipid metabolism–related genes such as Acc1 and Scd1, are key indicators in ALD. The relatively higher expression observed at 1.0 mg/mL MgIG compared to lower concentrations (0.1–0.5 mg/mL) may be related to experimental constraints associated with the MgIG formulation used in this study.

      Specifically, to maintain consistency with our in vivo experiments, we used a clinically available liquid formulation of MgIG (5 mg/mL), which is approved for intravenous administration in China. Due to its relatively low stock concentration, achieving higher working concentrations (e.g., 1.0 mg/mL) in vitro required a larger volume of the MgIG solution, thereby proportionally reducing the volume of culture medium. This reduction in effective culture conditions may adversely affect hepatocyte viability and function.

      Supporting this, our CCK-8 and LDH assays indicated that higher MgIG concentrations were associated with subtle cytotoxicity or impaired cell status.

      (3) For the qPCR results of Hsd11b1 knockdown (siRNA) and Hsd11b1 overexpression (plasmid) in AML-12 cells (Figure 5B), what is the description for the gene expression level (Y axis)? Fold changes versus GAPDH? Hsd11b1 overexpression showed non-efficiency (20-23, units on Y axis), even lower than the Hsd11b1 knockdown (above 50, units on Y axis). The authors need to explain this. For the plasmid-based Hsd11b1 overexpression, why does the scramble control inhibit Hsd11b1 gene expression (less than 2, units on the Y axis)? Again, this needs to be explained.

      Thank you for this important comment, and we apologize for the lack of clarity in the Y-axis labeling, which may have led to misunderstanding.

      As shown in Figures 5A and 5B, we have revised the Y-axis description to clearly indicate that gene expression levels are presented as relative expression normalized to GAPDH (fold change relative to the control group).

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) Use terms that show directionality to help the readers comprehend the data. For instance, Line 295 states MgIG treatment also modulated the expression.... In reality, MgIG treatment reduced the expression of those genes relative to ethanol-fed control mice.

      Thank you very much for this precious suggestion. We have thoroughly revised this part as ‘In line with the observed histological and physiological improvements, MgIG treatment also reduced the expression of genes involved in lipid synthesis metabolism (Srebp1, Srebp2, Acc1, and Scd1, Lcn2, and Ldlr), inflammation (Tnf-α and Il-6), and pro-apoptosis (Bax) while restored the level of anti-apoptotic gene (Bcl2) in the liver tissue of EtOH mice (Fig. 1G-1H).’. Please refer to page 12, lines 290-294.

      (2) Oil Red O staining is subjective and nonspecific. The authors make a claim that serum TGs are an indicator of liver function; however, measurement of hepatic TGs would be a better measure here and more consistent with the ORO staining.

      We sincerely appreciate this great suggestion. We have made the necessary revisions as suggested on page 12, lines 285-288 as ‘Notably, significant differences were observed between the EtOH group and the MgIG-treated (EtOH+M) group in serum levels of liver enzymes (ALT and AST), serum lipid parameters (TG and TC), as well as Liver TG and TC contents—-key indicators of liver function and lipid metabolism.’. As requested, we have added the required information to Figures 1G.

      (3) The focus of the paper is on this SREBP2 axis. However, in Figure 1, the authors do not show any SREBP2 target genes. This would be helpful in interpreting SREBP2 activity. Further, hepatic free cholesterol levels would also strengthen these data.

      We sincerely appreciate this great suggestion. We have made the necessary revisions as suggested on page 12, lines 285-288, line 292 by adding the mRNA changes of Lcn2 and Ldlr, which are SREBP2 target genes. As requested, we have added the required information to Figures 1F and 1H.

      (4) Labels showing directionality on the volcano plots in Figures 2A, B would be of great help here. It's unclear which groups are on the left or right.

      Thank you very much! The authors have revised Figures 2A-C as requested. Please refer to the new version of Figures 2A-C.

      (5) Ensure consistency in what is written in the results and the figure legends. See Figure 2 volcano plots for examples. The volcano plot in Figure 2B has no figure legend.

      We thank the reviewer for the valuable suggestion. Accordingly, we have revised the legend for Figures 2B as suggested.

      (6) Ensure consistency in the nomenclature. In some cases, the authors use ALD+MgIG, and in others, they just use MgIG. My recommendation would be to use Ctrl, EtOH, EtOH+M.

      We sincerely appreciate this great suggestion. We have made the necessary revisions as suggested on page 6, lines 111-112, page 11, line 280 and page 12, line 282, 284, 293, 298, 301.

      (7) The gene enrichment analysis in Figure 2C should also include some text about directionality, either in the figure or the figure legend. Upregulated DEGs in the MgIG group? It's unclear.

      We thank the reviewer for the valuable suggestion. Accordingly, we have revised the legend for Figures 2C as suggested.

      (8) The authors should consider shuffling the order of some of the figures for better transitions from one panel to the next. For instance, Figure 3B, C shows cell viability responses before showing the siRNA and OE are effective in knocking down and overexpressing their protein of interest.

      We thank the reviewer for the valuable suggestion. Accordingly, we have revised the legend for Figures 3B and 3C as suggested.

      (9) The authors need to be consistent in the colors that are used in the figures. It's incredibly hard to follow, as presented.

      We appreciate the reviewer's comment regarding color consistency. In response, we have carefully revised all figures to ensure consistent use of colors across the manuscript. The updated versions are shown in Figures 3, 6, and 7.

      (10) For Nile Red staining, multiple images at a lower objective need to be shown and/or cellular triglycerides and cholesterol levels should be quantified.

      We appreciate the reviewer's insightful comment regarding the Nile Red staining. In response, we have quantified triglyceride and total cholesterol levels in the cell supernatant, which are now presented on page 12, line 285-287 and Figures 2F. Furthermore, we have included additional Nile Red staining images at a lower objective in Supplementary Figures 2D, 3B, 4C to better illustrate the lipid droplet distribution.

      (11) Line 362 refers to Figure 4 when it should refer to Figure 5.

      Thank you very much! The authors have revised on page 14, line 364.

      (12) qPCR should be performed on canonical Srebp2 targets throughout the manuscript to tie in the MgG treatment with changes in sterol sensing and Srebp2.

      Thank you for your valuable suggestion. The results are now included on page 12, lines 292 and 311, and the corresponding data in Figures 1H and 2G have been enhanced accordingly.

      Reviewer #2 (Recommendations for the authors):

      (1) The statement, figure labeling, and figure legend for Figure 1A-C are confused. The MgIG dosing on the X-axis for Figure 2D is missing.

      Thank you for the correction. We have revised this problem. Please refer to the new version of Figure 1A-C and Figure 2D.

      (2) Figure 3E is not well described in the main text and figure legend. What are those numbers on top of the blotting bands? It was guessed that the numbers were the mean for each group. But where is the SD or SE for each group? It is hard to tell the statistical significance without showing SD or SE. The same question applies to Figure 5E, Figure 56C-6D, and Figure 7G.

      We sincerely appreciate this great suggestion. We have made the necessary revisions as suggested on page 13, lines 317-322. As suggested, we have added the required information to Figures S3A, S4D, S6B and S8H.

    1. eLife Assessment

      This valuable study advances our understanding of how organisms respond to chronic oxidative stress. Using the nematode C. elegans, the authors identified key neuronal signaling molecules and their receptors that are required for stress signaling and survival. The evidence supporting the conclusions is solid, including rigorous genetics, stress response analysis, and transcriptional profiling. This research will be of broad interest to neuroscientists and researchers working in the field of oxidative stress regulation.

    2. Reviewer #2 (Public review):

      In this paper, Biswas et al. describe the role of acetylcholine (ACh) signaling in protection against chronic oxidative stress in C. elegans. They showed that disruption of ACh signaling in either unc-17 mutant or gar-3 mutants led to sensitivity to toxicity caused by chronic paraquat (PQ) treatment. Using RNA seq, they found that approximately 70% of the genes induced by chronic PQ exposure in wild type failed to upregulate in these mutants. The overexpression of gar-3 selectively in cholinergic neurons was sufficient to promote protection against chronic PQ exposure in an ACh-dependent manner. The study points to a previously undescribed role for ACh signaling in providing organism-wide protection from chronic oxidative stress likely through the transcriptional regulation of numerous oxidative stress-response genes. The paper is well-written, and the data are robust. While the study identifies the muscarinic ACh receptor gar-3 as an important regulator of the response to PQ, the specific neurons in which gar-3 functions were not unambiguously identified, and the sources of ACh that regulate GAR-3 signaling and the identities of the tissues targeted by gar-3 remain unknown.

      Comments on revisions:

      No further comments.

    3. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The researchers aimed to identify which neurotransmitter pathways are required for animals to withstand chronic oxidative stress. This work thus has important implications for disease processes that are caused/linked to oxidative stress. This work identified specific neurotransmitters and receptors that coordinate stress resilience, both prior to and during stress exposure. Further, the authors identified specific transcriptional programs coordinated by neurotransmission that may provide stress resistance.

      Strengths:

      The manuscript is very clearly written with a well-formulated rationale. Standard C. elegans genetic analysis and rescue experiments were performed to identify key regulators of the chronic oxidative stress response. These findings were enhanced by transcriptional profiling that identified differentially expressed genes that likely affect survival when animals are exposed to stress.

      Weaknesses:

      Where the gar-3 promoter drives expression was not discussed in the context of the rescue experiments in Fig 7.

      Comments on revisions:

      This issue has now been appropriately addressed in the revision.

      We thank the reviewer for their time and constructive feedback.

      Reviewer #2 (Public review):

      In this paper, Biswas et al. describe the role of acetylcholine (ACh) signaling in protection against chronic oxidative stress in C. elegans. They showed that disruption of ACh signaling in either unc17 mutant or gar-3 mutants led to sensitivity to toxicity caused by chronic paraquat (PQ) treatment. Using RNA seq, they found that approximately 70% of the genes induced by chronic PQ exposure in wild type failed to upregulate in these mutants. The overexpression of gar-3 selectively in cholinergic neurons was sufficient to promote protection against chronic PQ exposure in an AChdependent manner. The study points to a previously undescribed role for ACh signaling in providing organism-wide protection from chronic oxidative stress likely through the transcriptional regulation of numerous oxidative stress-response genes. The paper is well-written, and the data are robust, though some conclusions seem preliminary and are not fully support the current data (see below). While the study identifies the muscarinic ACh receptor gar-3 as an important regulator of the response to PQ, the specific neurons in which gar-3 functions were not unambiguously identified, and the sources of ACh that regulate GAR-3 signaling and the identities of the tissues targeted by gar-3 were not addressed.

      Comments on revisions:

      The authors addressed my comments adequately in their revised submission. Please include representative images to accompany the quantification of the new results presented in Fig S4A.

      We thank the reviewer for their time and constructive feedback. We now include representative images as requested.

    1. eLife Assessment

      In this valuable study, the authors conducted an impressive amount of atomistic simulations with a glycosylated HIV-1 envelope glycoprotein (Env) trimer in a realistic asymmetric lipid bilayer. The aim was to probe how Env transmembrane domain, cytoplasmic tail, and membrane environment influence ectodomain orientation and antibody epitope exposure. The simulations convincingly show that ectodomain motion is dominated by tilting relative to the membrane and explicitly demonstrate the role of membrane asymmetry in modulating the protein conformation and orientation. Additional analyses of the authors' deposited MD trajectories could serve as invaluable extensions of this work to probe, for example, for exposure of cryptic epitopes and potential allosteric coupling.

    2. Reviewer #1 (Public review):

      Summary:

      In the manuscript "Conformational Variability of HIV-1 Env Trimer and Viral Vulnerability", the authors study the fully glycosylated HIV-1 Env protein using an all-atom forcefield. It combines long all-atom simulations of Env in a realistic asymmetric bilayer with careful data analysis. This work clarifies how the CT domain modulates the overall conformation of the Env ectodomain and characterizes different MPER-TMD conformations. The authors also carefully analyze the accessibility of different antibodies to the Env protein.

      Strengths:

      This paper is state-of-the-art given the scale of the system and the sophistication of the methods. The biological question is important, the methodology is rigorous, and the results will interest a broad elife audience. The authors also establish strong connections to previous literature and acknowledge the limitations of the CT-truncated protein construct, which enhances the manuscript's relevance to the community.

    3. Reviewer #2 (Public review):

      In this work, the authors elucidate how a viral surface protein behaves in a membrane environment and how its large-scale motions influence the exposure of antibody-binding sites. Using long-timescale, all-atom molecular dynamics simulations of a fully glycosylated, full-length protein embedded in a virus-like membrane, the study systematically examines the coupling between ectodomain motion, transmembrane orientation, membrane interactions, and epitope accessibility. Multiple model variants differing in cleavage state, initial transmembrane configuration, and presence of the cytoplasmic tail are compared to identify general features of protein-membrane dynamics relevant to antibody recognition.

      A major strength of this study is the scope and ambition of the simulations. The authors perform multiple microsecond-scale simulations of a highly complex, biologically realistic system that includes the full ectodomain, transmembrane region, cytoplasmic tail, glycans, and a heterogeneous membrane. The finding that the ectodomain explores a wide range of tilt angles while the transmembrane region remains more constrained, with limited correlation between the two, offers useful conceptual insight into how global motions may be accommodated without large rearrangements at the membrane anchor. The explicit consideration of membrane and glycan steric effects on antibody accessibility further strengthens the study.

      The main limitations relate to sampling and model dependence inherent to simulations of this size and complexity. The analysis of antibody accessibility is based on geometric and steric criteria, which do not capture potential conformational adaptations of antibodies or membrane remodeling during binding; the authors have appropriately noted this as a limitation.

      In the revised manuscript, the authors have addressed all previously raised concerns. Time series plots of the tilt angles have been added, figure captions and visual encodings have been clarified, quantitative descriptions of angular distributions have been strengthened, and the distance metric for MPER exposure is now accompanied by temporal data. The overall presentation is substantially improved, and the conclusions are well supported by the data as presented.

    4. Reviewer #3 (Public review):

      Summary:

      This study uses large-scale all-atom molecular dynamics simulations to examine the conformational plasticity of the HIV-1 envelope glycoprotein glycoprotein (Env) in a membrane context, with particular emphasis on how the transmembrane domain (TMD), cytoplasmic tail (CT), protomer cleavage, and membrane environment influence ectodomain orientation and antibody epitope exposure. By comparing Env constructs with and without the CT, explicitly modeling glycosylation, and embedding Env in an asymmetric lipid bilayer, the authors aim to provide an integrated view of how membrane-proximal regions and lipid interactions shape Env antigenicity, including epitopes targeted by MPER-directed antibodies.

      Strengths:

      The authors have made a genuine effort to address the concerns raised in the first round of review, and the revised manuscript is substantively improved. The addition of dynamical cross-correlation maps, expanded citation of prior computational work, clarification of the membrane composition rationale, data deposition to Zenodo, and the new discussion contextualizing the independence of ectodomain and TMD motions are all welcome. Several scientifically interesting aspects of the work merit highlighting before the remaining concerns are addressed.

      A key strength of this work remains the scope, scale, and realism of the simulation systems. The authors construct a very large, nearly complete-Env-scale model that includes a glycosylated Env trimer embedded in an asymmetric bilayer, enabling analysis of membrane-protein interactions that are difficult to capture experimentally. The inclusion of specific glycans at reported sites, and the focus on constructs with and without the CT or cleavage, are well motivated by existing biological and structural data.

      The observation that R696 orientation and its interacting partners give rise to asymmetric protomer conformations and distinct TMD tilts is a notable finding. The statement that interactions between R696 and lipid headgroups or CT residues can be strong enough to introduce a kink into the TMD is well-supported by representative snapshots and consistent with prior isolated-TMD simulations. The use of two initialization depths ("high" and "low") to probe R696 leaflet preference is methodologically interesting and the authors' interpretation - that there is a slight bias toward cytoplasmic leaflet interactions, but that these contacts could be highly dynamic over the course of viral entry - is appropriately cautious. It would be valuable to explicitly frame this as a hypothesis with testable predictions that future experimental or enhanced-sampling work could address. Similarly, the equilibration-driven kinking of the TMD core, consistent with prior isolated-TMD studies, represents a useful validation that extends those earlier observations to the intact trimeric context.

      The simulations reveal substantial tilting motions of the ectodomain relative to the membrane, with angles spanning roughly 0-30{degree sign} (and up to ~40{degree sign} in some analyses), while the ectodomain itself remains relatively rigid. This framing, that much of Env's conformational variability arises from rigid-body tilting rather than large internal rearrangements, is an important conceptual contribution. The authors also provide interesting observations regarding asymmetric bilayer deformations, including localized thinning and altered lipid headgroup interactions near the TMD and CT, which suggest a reciprocal coupling between Env and the surrounding membrane.

      The analysis of antibody-relevant epitopes across the prefusion state, including the V1/V2 and V3 loops, the CD4 binding site, and the MPER, is another strength. The study makes effective use of existing experimental knowledge in this context, for example by focusing on specific glycans known to occlude antibody binding, to motivate and interpret the simulations.

      Finally, the revised discussion provides more context that situates the study's findings and discrepancies within the broader literature, strengthening the manuscript's clarity and interpretability.

      Weaknesses:

      The revised work is much improved, but still includes substantive issues with writing including organization, such as paragraph run-ons, and citation issues. Improving these would help readers make the most of this important study.

      The revised Introduction now includes a paragraph summarizing prior MD work, which is an improvement. However, the paragraph remains structured around the limitations and setup of previous studies (e.g., "early studies were constrained by limited computational resources", short trajectory lengths, isolated constructs) rather than their findings. Readers benefit most from understanding what those studies showed - and where the present work confirms, extends, or diverges from those results. The current framing inadvertently positions prior work as deficient scaffolding rather than as independent data points converging on shared conclusions. The Introduction could be revised to briefly summarize the key biological conclusions from prior MD studies alongside their technical context, which could then be revisited in their appropriate place alongside key results.

      The authors have verified that PDB entries are cited at first mention, and this is noted. However, a recurring issue remains: key literature-supported conclusions appear in the Results and Discussion sections without accompanying citations at each point of use. Passages that summarize experimental or computational findings - particularly those used to validate or contextualize the authors' own results - require citation at every point of claim, not only at first introduction of a reference. This is not a minor stylistic preference. Downstream readers, systematic reviewers, and automated tools that map literature to claims (e.g., scite) rely on co-occurrence of claims and citations within the same passage. A citation appearing several paragraphs earlier does not carry attribution forward. As a practical example: the statement that "MPER-targeting antibodies bind effectively only after the gp120-gp41 trimer undergoes major conformational rearrangements toward a fusion-intermediate or post-fusion state (Frey et al., 2008; Alam et al., 2009; Chen et al., 2014; Lee et al., 2016)", which is appropriate. That same standard of inline attribution should be applied throughout - including in Results and Discussion subsections where prior experimental findings are mentioned without citation.

      Additionally, cited literature should be framed to highlight convergence with the authors' conclusions, not primarily to limitations of previous studies. Where prior studies independently support a finding, this should be stated explicitly. Independent replication across methods and systems is one of the strongest arguments for ground truth; treating it as such would improve the manuscript's scientific standing.

      Finally, the dynamical cross-correlation maps assess ectodomain-TMD coupling, and the authors appropriately acknowledge that microsecond simulations capture only the closed ground state. However, the revised manuscript does not address the question raised in the first review regarding CT-TMD and CT-ectodomain correlations. The Results section states that "very weak correlations between the ectodomain and the TMD" were found, but it is not clear whether the CT was included in this analysis or whether analogous correlation maps for CT-TMD and CT-ectodomain pairs were computed for the full-length systems. Additional analyses of the authors' deposited MD trajectories-such as probing for exposure of cryptic epitopes and potential allosteric coupling-could serve as valuable extensions of this work.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      In the manuscript "Conformational Variability of HIV-1 Env Trimer and Viral Vulnerability", the authors study the fully glycosylated HIV-1 Env protein using an all-atom forcefield. It combines long all-atom simulations of Env in a realistic asymmetric bilayer with careful data analysis. This work clarifies how the CT domain modulates the overall conformation of the Env ectodomain and characterizes different MPER-TMD conformations. The authors also carefully analyze the accessibility of different antibodies to the Env protein.

      Strengths:

      This paper is state-of-the-art, given the scale of the system and the sophistication of the methods. The biological question is important, the methodology is rigorous, and the results will interest a broad audience.

      Weaknesses:

      The manuscript lacks a discussion of previous studies. The authors should consider addressing or comparing their work with the following points:

      (1) Tilting of the Env ectodomain has also been reported in previous experimental and theoretical work: https://doi.org/10.1101/2025.03.26.645577

      (2) A previous all-atom simulation study has characterized the conformational heterogeneity of the MPER-TMD domain: https://doi.org/10.1021/jacs.5c15421

      (3) Experimental studies have shown that MPER-directed antibodies recognize the prehairpin intermediate rather than the prefusion state: https://doi.org/10.1073/pnas.1807259115

      (4) How does the CT domain modulate the accessibility of these antibodies studied? The authors are in a strong position to compare their results with the following experimental study: https://doi.org/10.1126/science.aaa9804

      Based on the Reviewer’s comments and suggestions, we have added a discussion related to each previous study mentioned above.

      (1) Tilting of the Env ectodomain has also been reported in previous experimental and theoretical work: https://doi.org/10.1101/2025.03.26.645577

      At the end of the third paragraph (originally the second paragraph) in the Discussion section we added:

      “Shehata et al. also built a model of full-length gp120–gp41 trimer embedded in a lipid bilayer and performed all-atom simulations, in which a tilting motion of the ectodomain was observed. Based on the analysis of accessible surface area using different probe radii, they reported that antibody epitopes on the ectodomain are largely shielded by glycans, while the MPER epitope is mainly occluded by the membrane with tilt angles above 30° required to achieve greater MPER exposure (Shehata et al., 2025).”

      (2) A previous all-atom simulation study has characterized the conformational heterogeneity of the MPER-TMD domain: https://doi.org/10.1021/jacs.5c15421

      In the middle of the first paragraph in the Discussion section we added:

      “This is consistent with the all-atom simulations of MPER–TMD–CT and MPER–TMD in an asymmetric membrane conducted by Majumder et al., which likewise show multiple different conformational states of MPER and TMD (Majumder et al., 2025).”

      (3) Experimental studies have shown that MPER-directed antibodies recognize the prehairpin intermediate rather than the prefusion state: https://doi.org/10.1073/pnas.1807259115

      The paper mentioned by the Reviewer mainly reports the NMR structure of the MPER and TMD. In this study, the authors experimentally examined a series of MPER mutations to assess whether alterations in the MPER affect epitope accessibility in other regions of the Env ectodomain. This study did not investigate whether MPER-directed antibodies recognize the prehairpin intermediate. Instead, it cited prior studies (Frey et al.; 2008, Alam et al., 2009; and Chen et al., 2014) reporting that MPER-directed antibodies target the prehairpin intermediate conformation. We have already cited two of them (Alam et al., 2009 and Chen et al., 2014) in the original preprint, and we have now added the third one (Frey et al., 2008) in the revised manuscript.

      In the middle of the third paragraph (originally the second paragraph) in the Discussion section we added:

      “This is consistent with experiment studies indicating that MPER-targeting antibodies bind effectively only after the gp120–gp41 trimer undergoes major conformational rearrangements toward a fusion-intermediate or post-fusion state (Frey et al., 2008; Alam et al., 2009; Chen et al., 2014; Lee et al., 2016).”

      (4) How does the CT domain modulate the accessibility of these antibodies studied? The authors are in a strong position to compare their results with the following experimental study: https://doi.org/10.1126/science.aaa9804

      At the beginning of the second paragraph in the Discussion section we added:

      “Comparison of the full-length and CT-truncated systems shows that the primary difference arises from changes in the lipid bilayer, particularly in the exoplasmic leaflet, whereas differences in protein conformation and dynamics are less evident. Previous experimental studies have reported that mutations of the TMD residue and CT truncation can substantially affect antigenicity of ectodomain (Edwards et al., 2002; Chen et al., 2015; Dev et al., 2016). However, the ectodomain remains relatively rigid in our simulations for both full-length and CT-truncated systems. It is unclear whether this behavior reflects insufficient conformational sampling or artifacts associated with the model structures. Structural information for the CT is very limited, and the NMR structure (PDB ID: 7LOI) was the only available CT structure at the time the simulation systems were constructed. As a result, the extent to which this structure represents the native CT conformation remains uncertain. Additional experimental structural characterization of the CT will be important for achieving a more complete understanding of its functional role.”

      Reviewer #1 (Recommendations for the authors):

      A minor point: The RMSD values in Figure 3-figure supplement 1, seem a little too small. Please check the units.

      Figure 3-figure supplement 1 shows the RMSD of the ectodomain. Prior to RMSD calculation, the snapshots extracted from each trajectory were aligned to the initial structure using the ectodomain as the reference to avoid falsely high RMSD values arising from different orientations of the ectodomain. The relatively small RMSD values therefore reflect the intrinsic structural stability of the ectodomain, indicating that its internal conformation remains stable even though it undergoes substantial tilting motions.

      Reviewer #2 (Public review):

      Summary:

      In this work, the authors aim to elucidate how a viral surface protein behaves in a membrane environment and how its large-scale motions influence the exposure of antibody-binding sites. Using long-timescale, all-atom molecular dynamics simulations of a fully glycosylated, full-length protein embedded in a virus-like membrane, the study systematically examines the coupling between ectodomain motion, transmembrane orientation, membrane interactions, and epitope accessibility. By comparing multiple model variants that differ in cleavage state, initial transmembrane configuration, and presence of the cytoplasmic tail, the authors aim to identify general features of protein-membrane dynamics relevant to antibody recognition.

      Strengths:

      A major strength of this study is the scope and ambition of the simulations. The authors perform multiple microsecond-scale simulations of a highly complex, biologically realistic system that includes the full ectodomain, transmembrane region, cytoplasmic tail, glycans, and a heterogeneous membrane. Such simulations remain technically challenging, and the work represents a substantial computational and methodological effort.

      The analysis provides a clear and intuitive description of large-scale protein motions relative to the membrane, including ectodomain tilting and transmembrane orientation. The finding that the ectodomain explores a wide range of tilt angles while the transmembrane region remains more constrained, with limited correlation between the two, offers useful conceptual insight into how global motions may be accommodated without large rearrangements at the membrane anchor.

      Another strength is the explicit consideration of membrane and glycan steric effects on antibody accessibility. By evaluating multiple classes of antibodies targeting distinct regions of the protein, the study highlights how membrane proximity and glycan dynamics can differentially influence access to different epitopes. This comparative approach helps place the results in a broader immunological context and may be useful for readers interested in antibody recognition or vaccine design.

      Overall, the results are internally consistent across multiple simulations and model variants, and the conclusions are generally well aligned with the data presented.

      Weaknesses:

      The main limitations of the study relate to sampling and model dependence, which are inherent challenges for simulations of this size and complexity. Although the simulations are long by current standards, individual trajectories explore only portions of the available conformational space, and several conclusions rely on pooling data across a limited number of replicas. This makes it difficult to fully assess the robustness of some quantitative trends, particularly for rare events such as specific epitope accessibility states.

      In addition, several aspects of the model construction, including the treatment of missing regions, loop rebuilding, and initial configuration choices, are necessarily approximate. While these approaches are reasonable and well motivated, the extent to which some conclusions depend on these modeling choices is not always fully clear from the current presentation.

      Finally, the analysis of antibody accessibility is based on geometric and steric criteria, which provide a useful first-order approximation but do not capture potential conformational adaptations of antibodies or membrane remodeling during binding. As a result, the accessibility results should be interpreted primarily as model-based predictions rather than definitive statements about binding competence.

      Despite these limitations, the study provides a valuable and carefully executed contribution, and its datasets and analytical framework are likely to be useful to others interested in protein-membrane interactions and antibody recognition.

      Based on the Reviewer’s comments, we have revised the Discussion section to emphasize the limitation related to model construction and analysis of antibody accessibility.

      In the middle of the second paragraph in the Discussion section we added:

      “Similar limitations apply to other modeled regions where structural information is incomplete, including missing loops in the ectodomain, the cleavage site and heptad repeat 2 where two PDB structures (IDs: 6B0N and 7LOI) were merged. These regions introduce additional uncertainty, and the extent to which they influence the interpretation of our results remains an open question.”

      In the middle of the third paragraph (originally the second paragraph) in the Discussion section we added:

      “In addition, this analysis is based on geometric and steric criteria without accounting for potential conformational adaptations of gp120–gp41, antibodies, or the membrane; therefore, the calculated frequency of antibody accessibility should be interpreted as an approximation rather than a definitive indicator of binding competence.”

      Reviewer #2 (Recommendations for the authors):

      (1) Lines 45-47: The phrase "A major breakthrough was the design of ..." may be confusing. The gp140 trimer refers to a naturally occurring form of the HIV envelope protein rather than a structure designed de novo. If this statement refers to the development of a specific experimental construct or model system, this should be clarified to avoid misunderstanding.

      We have revised the sentence to clarify that the statement refers to soluble gp140 trimer constructs developed to stabilize the prefusion Env ectodomain for structural and immunological studies.

      At the beginning of the second paragraph in the Introduction section, we have modified the following:

      “A major advance was the development of soluble gp140 trimers, composing gp120 and the ectodomain portion of gp41, designed to stabilize the prefusion Env trimer for structural and immunological characterization.”

      (2) Figure 1A: The figure displays a model structure lacking the cytoplasmic tail. Given that the full-length model is central to the study, the authors may wish to explain why the truncated structure is shown here or consider displaying the full-length model to better reflect the complete system analyzed.

      We have combined Figure 1 and Figure 1—figure supplements 1 to show both full-length and CT-truncated models in one figure. We have also added an explanation of why the CT-truncated model was used as the primary system for analysis.

      In the middle of the third paragraph in the Introduction section we added:

      “However, structural information for the CT remains limited, leading to uncertainty in its conformational organization. To reduce potential bias arising from this uncertainty, we also generated a CT-truncated model and used it as the primary system for analysis (Figure 1, Figure 1—figure supplements 1).”

      We have modified Figure 1

      We removed Figure 1—figure supplements 1

      (3) Line 106: The probability distributions of θEC and θTM are cited in support of the statement that the angles "typically range from ... with occasional tilting." Providing explicit quantitative measures (for example, means, percentiles, or fractions of time spent in different angular regimes) would strengthen this claim.

      We have revised the text to explicitly indicate that only 0.7‰ of the sampled θ<sub>EC</sub> values are greater than 40°.

      In the middle of the first paragraph in the subsection “The ectodomain maintains a rigid internal structure and tilts independently of the TMD” we have modified the following:

      “Across trajectories, θ<sub>EC</sub> typically ranges from 0° to 40°, with only 0.7‰ exceeding 40°”.

      (4) Figure 2: The meaning of the contour lines is not clearly explained. If these represent probability density estimates of angular values over the trajectory, this should be stated explicitly. In addition, because the angles may evolve over time, it would be helpful to clarify how temporal drift is accounted for in the contour representation.

      We have clarified in both the main text and the figure caption that the contour lines in Figure 2B represent the joint probability density of the ectodomain and TMD tilt angles. We have also added Figure 2—figure supplements 5–8 showing the temporal evolution of the ectodomain and TMD tilt angles.

      In the middle of the first paragraph in the subsection “The ectodomain maintains a rigid internal structure and tilts independently of the TMD” we have modified the following:

      “The temporal evolution of θ<sub>EC</sub> and θ<sub>TM</sub> is additionally shown in Figure 2—figure supplements 5–8. For the CT-truncated systems, the joint probability densities of θ<sub>EC</sub> and θ<sub>TM</sub> calculated from the final 0.5 µs of each trajectory are shown in Figure 2B, while those for the full-length systems are shown in Figure 2—figure supplement 9.”

      In the caption of Figure 2 we have modified the following:

      “(B) Probability densities of ectodomain and TMD tilt angles, calculated from CT-truncated systems with various initial configurations.”

      We have added Figure 2—figure supplements 5–8.

      We have modified the following:

      “The original Figure 2—figure supplements 5 has been renumbered as Figure 2—figure supplements 9.”

      (5) Figure 2 (supplements): Some datasets are shown using scatter plots, while others are presented as contour plots. Using a consistent visualization style across panels or clearly explaining the rationale for the different representations would improve clarity.

      The contour plots in Figure 2B and Figure 2—figure supplements 9 show the joint distribution of the ectodomain and TMD tilt angles during the final 0.5 µs of each trajectory, whereas the scatter plots in Figure 2—figure supplements 1–4 illustrate the variations of the tilt angles across different time intervals. Each 1-µs trajectory was divided into four 0.25-µs intervals, indicated by light gray, dark gray, black, and red respectively, as shown in the legends of Figure 2—figure supplements 1–4. We have clarified in the main text that the multi-colored scatter plots are intended to demonstrate that large conformational changes predominantly occurred during the first 0.5 µs of each trajectory.

      In the middle of the first paragraph in the subsection “The ectodomain maintains a rigid internal structure and tilts independently of the TMD” we have modified the following:

      “Each 1-µs trajectory is divided into four consecutive 0.25-µs intervals, and data points from each interval are distinguished by four different colors (Figure 2—figure supplements 1–4). The variations of θ<sub>EC</sub> and θ<sub>TM</sub> over time show that large conformational changes predominantly occurred during the first 0.5 µs, followed by convergence of the θ<sub>EC</sub> and θ<sub>TM</sub> distributions during the second 0.5 µs in most trajectories.”

      (6) As noted in Line 97, θEC and θTM tilt independently. In this context, presenting time series plots of θEC and θTM separately would be highly informative. Such plots would help readers distinguish between equilibration behavior, drift from initial conditions, and equilibrium fluctuations.

      We have added Figure 2—figure supplements 5–8 showing the temporal evolution of the ectodomain and TMD tilt angles, as noted in our response to comment (4).

      (7) Figure 3A: It is not immediately clear which panels correspond to top views and which correspond to side views. Explicitly labeling these views in the figure or caption would reduce ambiguity.

      We have added labels in Figure 3A to clearly denote the top-view and side-view panels.

      (8) Figure 3B: The description "...by solid and transparent colors..." is ambiguous, as it is unclear whether this refers to color intensity or transparency. The caption would benefit from explicitly stating the visual encoding used (for example, darker/lighter colors or left/right bars).

      We have revised the figure caption to clarify which boxes correspond to cleaved systems and which correspond to uncleaved systems.

      In the caption of Figure 3 we have modified the following:

      “For each residue, the distribution from cleaved systems is shown in dark color (left), and that from uncleaved systems is shown in light color (right).”

      (9) Figure 4H: The definition of "frequency" expressed as a percentage is unclear. If this represents the fraction of snapshots in which two atoms fall within a specified distance range, this should be stated explicitly. The authors should also clarify whether the reported quantity is a probability or a rate, and ensure that the units and terminology are consistent.

      We have revised the figure caption to clarify that the frequency represents the fraction of snapshots in which the heavy atoms of a TMD residue and the interacting component are within 5 Å.

      In the caption of Figure 4 we have modified the following:

      “For each TMD residue–interacting component pair, the frequency represents the fraction of snapshots in which the heavy atoms of the TMD residue and the corresponding component are within 5 Å. Bar shading reflects this fraction, with fully filled bars indicating 100% and empty bars indicating 0%.”

      (10) Line 170: The manuscript describes a "rapid rearrangement" of the transmembrane domain at early simulation times. It would be helpful to clarify whether this regime is considered equilibration and whether it is excluded from subsequent analyses. Plotting time series of the relevant tilting angles and transmembrane rearrangement metrics could help address this point.

      We have clarified that the TMD underwent conformational changes early in the equilibration stage to enable R696 to interact with lipid headgroups, ions, or CT residues, and these interactions were largely maintained throughout the production stage. The time series of TMD tilting angles are now shown in Figure 2—figure supplements 5–8. Notably, the TMD exhibits heterogeneous conformational changes, including tilting, bending, and partial loss of helical structure. Therefore, no single metric or limited set of metrics can comprehensively capture the full extent of TMD conformational variability.

      In the middle of the first paragraph in the subsection “The energetically unfavorable R696 in the hydrophobic core results in asymmetric, kinked TMD conformations and disrupts membrane integrity” we have modified the following:

      “Early in the equilibration stage, the TMD rapidly rearranged to allow R696 residues to interact with more favorable partners, including negatively charged lipid headgroups from either leaflet, ions and water molecules diffusing into the bilayer center, as well as polar and positively charged groups in the CT when present. Once the interactions between R696 residues and their binding partners (lipid headgroup, ions or CT residues) were established, they remained stable with minimal changes throughout the production stage.”

      (11) Line 213: As with earlier sections, time series plots of θEC and θTM, similar to those shown in Figure 3-figure supplement 1, would greatly aid interpretation by showing whether these angles drift or fluctuate around stable values.

      The time series of θ<sub>EC</sub> and θ<sub>TM</sub> are now shown in Figure 2—figure supplements 5–8. Line 213 refers to the conformational variability of the MPER. For the same reason discussed in our response to comment (10), the MPER exhibits even greater conformational heterogeneity than the TMD, and therefore cannot be adequately described by a single or small set of geometric metrics such as tilt or bending angles.

      (12) Lines 216-222: The term "trajectories" may be misleading in this context. It is unclear whether the differences discussed arise from different trajectories of the same system or from different systems altogether. Clarifying this distinction would improve interoperability.

      In this paragraph, we describe MPER conformational variations observed across all trajectories from all systems. A preceding sentence has been modified to emphasize that all trajectories from all systems are included. In addition, we have clarified which specific trajectory is referred to when discussing each example.

      At the beginning of the first paragraph in the subsection “MPER adopts diverse conformations, and its exposure depends on both MPER and TMD conformations” we have modified the following:

      “…, and a wide variety of conformations were sampled across all trajectories from all systems.”

      “Such conformation and orientation were maintained in some trajectories such as CL<sup>ΔCT</sup>3 (the third trajectory of the cleaved, CT-truncated system with the low TMD position, Figure 4—figure supplement 2C). In other trajectories, such as CL<sup>CT</sup>1, the helix-turn-helix MPER in one protomer shifted into a horizontal orientation parallel to the membrane surface (Figure 4—figure supplement 6A). In UL<sup>ΔCT</sup>1, the entire MPER adopted a more vertical arrangement, with both MPER-N and MPER-C tilted outward (Figure 4E, Figure 4—figure supplement 4A). We also observed in UH<sup>ΔCT</sup>3 and UL<sup>ΔCT</sup>3 that the HR2 helix in the ectodomain, MPER, and TMD merged into a continuous long helix (Figure 4C, F, Figure 4—figure supplement 3C, 4C). In addition, loss of helical structure within the MPER was common, particularly in the MPER-C region, which often transitioned to a random coil.”

      (13) Lines 280 and 287: Similar concerns apply to the use of the term "trajectories." If observations differ primarily between systems rather than between trajectories within a system, revising the wording accordingly would avoid confusion.

      We have revised the text to clarify that all trajectories from all systems are considered collectively.

      In the middle of the second paragraph in the subsection “Ectodomain epitopes are conditionally accessible, whereas MPER epitopes are virtually inaccessible in the closed prefusion state” we have modified the following:

      “When considering all trajectories from all systems collectively, approximately half of them exhibited at least one protomer with >35% accessibility (Supplementary file 1–Supplementary Table 2).”

      (14) Figure 5B: Providing a time series of the distance dF673, at least in the Supporting Information, would help assess sampling and equilibration. Such plots would complement the probability distributions and increase confidence in the reported trends.

      We have added Figure 5—figure supplement 1 showing the time series of the distance d<sub>F673</sub> to complement the probability distribution in Figure 5B.

      In the middle of the second paragraph in the subsection “MPER adopts diverse conformations, and its exposure depends on both MPER and TMD conformations”, we have modified the following:

      “In the initial ‘low’ and ‘high’ TMD configurations, dF673 was 6.1 Å and 9.1 Å, respectively, but across simulations it spanned a wide range from -15 Å to 20 Å (Figure 5A, B, Figure 5—figure supplement 1).”

      We have added Figure 5—figure supplement 1.

      Reviewer #3 (Public review):

      Summary:

      This study uses large-scale all-atom molecular dynamics simulations to examine the conformational plasticity of the HIV-1 envelope glycoprotein (Env) in a membrane context, with particular emphasis on how the transmembrane domain (TMD), cytoplasmic tail (CT), and membrane environment influence ectodomain orientation and antibody epitope exposure. By comparing Env constructs with and without the CT, explicitly modeling glycosylation, and embedding Env in an asymmetric lipid bilayer, the authors aim to provide an integrated view of how membrane-proximal regions and lipid interactions shape Env antigenicity, including epitopes targeted by MPER-directed antibodies.

      Strengths:

      A key strength of this work is the scope and realism of the simulation systems. The authors construct a very large, nearly complete Env-scale model that includes a glycosylated Env trimer embedded in an asymmetric bilayer, enabling analysis of membrane-protein interactions that are difficult to capture experimentally. The inclusion of specific glycans at reported sites, and the focus on constructs with and without the CT, are well motivated by existing biological and structural data.

      The simulations reveal substantial tilting motions of the ectodomain relative to the membrane, with angles spanning roughly 0-30° (and up to ~50° in some analyses), while the ectodomain itself remains relatively rigid. This framing, that much of Env's conformational variability arises from rigid-body tilting rather than large internal rearrangements, is an important conceptual contribution. The authors also provide interesting observations regarding asymmetric bilayer deformations, including localized thinning and altered lipid headgroup interactions near the TMD and CT, which suggest a reciprocal coupling between Env and the surrounding membrane.

      The analysis of antibody-relevant epitopes across the prefusion state, including the V1/V2 and V3 loops, the CD4 binding site, and the MPER, is another strength. The study makes effective use of existing experimental knowledge in this context, for example, by focusing on specific glycans known to occlude antibody binding, to motivate and interpret the simulations.

      Weaknesses:

      While the simulations are technically impressive, the manuscript would benefit from more explicit cross-validation against prior experimental and computational work throughout the Results and Discussion, and better framing in the introduction. Many of the reported behaviors, such as ectodomain tilting, TMD kinking, lipid interactions at helix boundaries, and aspects of membrane deformation, have been described previously in a range of MD studies of HIV Env and related constructs (e.g., PMC2730987, PMC2980712, PMC4254001, PMC4040535, PMC6035291, PMC12665260, PMID: 33882664, PMC11975376). Clearly situating the present results relative to these studies would strengthen the paper by clarifying where the simulations reproduce established behavior and where they extend it to more complete or realistic systems.

      A related limitation is that the work remains largely descriptive with respect to conformational coupling. Numerous experimental studies have demonstrated functional and conformational coupling between the TMD, CT, and the antigenic surface, with effects on Env stability, infectivity, and antibody binding (e.g., PMC4701381, PMC4304640, PMC5085267). In this context, the statement that ectodomain and TMD tilting motions are independent is a strong conclusion that is not fully supported by the analyses presented, particularly given the authors' acknowledgment that multiple independent simulations are required to adequately sample conformational space. More direct analyses of coupling, rather than correlations inferred from individual trajectories, would help align the simulations with the existing experimental literature. Given the scale of these simulations, a more thorough analysis of coupling could be this paper's most seminal contribution to the field.

      The choice of membrane composition also warrants deeper discussion. The manuscript states that it relies on a plasma membrane model derived from a prior simulation-based study, which itself is based on host plasma membrane (PMID: 35167752), but experimental analyses have shown that HIV virions differ substantially from host plasma membranes (e.g., PMC46679, PMC1413831, PMC10663554, PMC5039752, PMC6881329). In particular, virions are depleted in PC, PE, and PI, and enriched in phosphatidylserine, sphingomyelins, and cholesterol. These differences are likely to influence bilayer thickness, rigidity, and lipid-protein interactions and, therefore, may affect the generality of the conclusions regarding Env dynamics and antigenicity. Notably, the citation provided for membrane composition is a laboratory self-citation, a secondary source, rather than a primary experimental study on plasma membrane composition.

      Finally, there are pervasive issues with citation and methodological clarity. Several structural models are referred to only by PDB ID without citation, and in at least one case, a structure described as cryo-EM is in fact an NMR-derived model. Statements regarding residue flexibility, missing regions in structures, and comparisons to prior dynamics studies are often presented without appropriate references. The Methods section also lacks sufficient detail for a system of this size and complexity, limiting readers' ability to assess robustness or reproducibility.

      With stronger integration of prior experimental and computational literature, this work has the potential to serve as a valuable reference for how Env behaves in a realistic, glycosylated, membrane-embedded context. The simulation framework itself is well-suited for future studies incorporating mutations, strain variation, antibodies, inhibitors, or receptor and co-receptor engagement. In its current form, the primary contribution of the study is to consolidate and extend existing observations within a single, large-scale model, providing a useful platform for future mechanistic investigations.

      Following the Reviewer’s comments and suggestions, we have revised the manuscript accordingly.

      While the simulations are technically impressive, the manuscript would benefit from more explicit cross-validation against prior experimental and computational work throughout the Results and Discussion, and better framing in the introduction. Many of the reported behaviors, such as ectodomain tilting, TMD kinking, lipid interactions at helix boundaries, and aspects of membrane deformation, have been described previously in a range of MD studies of HIV Env and related constructs (e.g., PMC2730987, PMC2980712, PMC4254001, PMC4040535, PMC6035291, PMC12665260, PMID: 33882664, PMC11975376). Clearly situating the present results relative to these studies would strengthen the paper by clarifying where the simulations reproduce established behavior and where they extend it to more complete or realistic systems.

      We have added a summary of the prior computational studies in the Introduction section.

      At the beginning of the third paragraph in the Introduction section we added:

      “Molecular dynamics (MD) simulations have been employed to investigate the stability and conformational properties of monomeric and trimeric helical TMD in both aqueous and lipid bilayer environments since late 2000s (Kim et al., 2009; Gangupomu et al., 2010; Baker et al., 2014; Baker et al., 2014; Hollingsworth et al., 2018). Early studies were constrained by limited computational resources and therefore the simulation times are relatively short. Subsequent work employed metadynamics to probe rare events (Gangupomu et al., 2010; Baker et al., 2014), and simulations performed on Anton supercomputers extended sampling to multi-microsecond time scale (Baker et al., 2014). Piai and coworkers determined the NMR structure of a construct comprising the MPER, TMD, and CT, and carried out MD simulations to access the structural stability of the trimeric MPER–TMD–CT complex (Piai et al., 2021). Majumder et al. subsequently simulated the same MPER–TMD–CT complex and applied a machine learning-based approach to classify its conformational ensemble (Majumder et al., 2025). Maillie et al. combined conventional MD, steered MD, and coarse-grained simulations to examine interactions between MPER-targeting antibodies and membrane lipids (Maillie et al., 2025). In addition, MD simulations have been extensively applied to the well-studied ectodomain. Despite these advances, it remains challenging to investigate the gp120–gp41 trimer as an intact entity considering its structural complexity.”

      We have also added a discussion of previous MD simulation studies to the Result section regarding interactions of the TMD residue R696 with ions and lipid headgroups.

      At the end of the first paragraph in the subsection “The energetically unfavorable R696 in the hydrophobic core results in asymmetric, kinked TMD conformations and disrupts membrane integrity”

      “Previously, Kim et al. reported that the inter-chain interactions between protonated R696 gradually diminished over a short simulation time (23 ns), leading to increased crossing angles and reduced bundle length (Kim et al., 2009). Gangupomu et. al and Baker et. al observed that R696 snorkeled toward either exoplasmic or endoplasmic headgroups in simulations of the TMD monomer, resulting in TMD tilting and membrane thinning due to water penetration and lipid headgroups interacting with R696 (Gangupomu et al., 2010; Baker et al., 2014; Baker et al., 2014). These observations are consistent with our finding. Hollingsworth et. al also reported membrane thinning; however, they attributed this effect to interfacial interactions of R683 and R707 with both leaflets and proposed that R696 only interacted with water and ions permeating into the center of the TMD timer (Hollingsworth et al., 2018).”

      A related limitation is that the work remains largely descriptive with respect to conformational coupling. Numerous experimental studies have demonstrated functional and conformational coupling between the TMD, CT, and the antigenic surface, with effects on Env stability, infectivity, and antibody binding (e.g., PMC4701381, PMC4304640, PMC5085267). In this context, the statement that ectodomain and TMD tilting motions are independent is a strong conclusion that is not fully supported by the analyses presented, particularly given the authors' acknowledgment that multiple independent simulations are required to adequately sample conformational space. More direct analyses of coupling, rather than correlations inferred from individual trajectories, would help align the simulations with the existing experimental literature. Given the scale of these simulations, a more thorough analysis of coupling could be this paper's most seminal contribution to the field.

      We have added a discussion of the coupling between TMD, CT and Env antigenicity, and the independent motion of ectodomain and TMD in our simulation.

      In the middle of the second paragraph in the Discussion section

      “Our analysis of the ectodomain and TMD coupling indicates that the motions of these two domains are largely independent. This observation does not contradict experimental studies demonstrating functional coupling between the TMD, CT, and the antigenic profiles of Env (Chen et al., 2015; Dev et al., 2016). Munro et al. proposed that unliganded Env is intrinsically dynamic, transitioning among three distinct prefusion conformations: a closed ground state (predominant), a transient state, and a CD4-/co-receptor-stabilized state. Both laboratory-adapted and clinically isolated strains can spontaneously transition among these three states, although their relative occupancies differ (Munro et al., 2014). It is therefore possible that TMD mutations or CT truncation also alter the equilibrium distribution among three states, thereby affecting the epitope exposure, particularly for epitopes that are occluded in the closed ground state while exposed in the CD4-/co-receptor-stabilized state. However, transition among three states occur on millisecond-to-second timescales. Our simulations on microsecond timescales primarily capture conformational variations within the closed ground state and suggest that the MPER acts as a hinge, providing substantial flexibility that enables the ectodomain and TMD to move independently while Env remains in the closed ground state.”

      We have also calculated the dynamical cross-correlation maps showing very weak correlations between the ectodomain and the TMD.

      At the end of the first paragraph in the subsection “The ectodomain maintains a rigid internal structure and tilts independently of the TMD”

      “We also calculated the dynamical cross-correlation maps (Ichiye et al., 1991) of Cα atoms for all systems using CPPTRAJ (Roe et al., 2013). The results indicate only very weak correlations between the ectodomain and the TMD (Figure 2—figure supplements 10–13).”

      We have added Figure 2—figure supplements 10–13.

      The choice of membrane composition also warrants deeper discussion. The manuscript states that it relies on a plasma membrane model derived from a prior simulation-based study, which itself is based on host plasma membrane (PMID: 35167752), but experimental analyses have shown that HIV virions differ substantially from host plasma membranes (e.g., PMC46679, PMC1413831, PMC10663554, PMC5039752, PMC6881329). In particular, virions are depleted in PC, PE, and PI, and enriched in phosphatidylserine, sphingomyelins, and cholesterol. These differences are likely to influence bilayer thickness, rigidity, and lipid-protein interactions and, therefore, may affect the generality of the conclusions regarding Env dynamics and antigenicity. Notably, the citation provided for membrane composition is a laboratory self-citation, a secondary source, rather than a primary experimental study on plasma membrane composition.

      We have added references to primary experimental studies on plasma membrane composition (van Meer et al., 2008; Sampaio et al., 2011), as well as the prior simulation study proposing the lipid and cholesterol distributions (Ingolfsson et al., 2014).

      At the beginning of the Membrane subsection in the Materials and methods section

      We have modified the following:

      The full-length and CT-truncated gp120–gp41 models were embedded into an asymmetric lipid bilayer with the lipid composition corresponding to a mammalian plasma membrane (van Meer et al., 2008; Sampaio et al., 2011; Ingolfsson et al., 2014; Pogozheva et al., 2022),

      We have also clarified the limitations associated with the choice of lipid composition and emphasized the need to investigate its influence in future studies.

      At the end of the second paragraph in the Discussion section we added:

      “In addition to the limitations inherent to protein structure modeling, the choice of lipid composition remains an open question. In this work, we selected an asymmetric mammalian plasma membrane because it is one of the 18 complex biomembrane systems we previously studied (Pogozheva et al., 2022), and among them, it provides the closest available approximation to the HIV membrane. Nevertheless, experimental studies have reported differences in lipid composition between HIV virions and the host plasma membrane (Aloia et al., 1993; Brugger et al., 2006; Huarte et al., 2016; Mucksch et al., 2019; Tomishige et al., 2023). Although we do not anticipate that our main conclusions regarding Env domain motions and MPER flexibility would change substantially, evaluating the influence of lipid composition represents an important direction for future work.”

      Finally, there are pervasive issues with citation and methodological clarity. Several structural models are referred to only by PDB ID without citation, and in at least one case, a structure described as cryo-EM is in fact an NMR-derived model. Statements regarding residue flexibility, missing regions in structures, and comparisons to prior dynamics studies are often presented without appropriate references. The Methods section also lacks sufficient detail for a system of this size and complexity, limiting readers' ability to assess robustness or reproducibility.

      We have corrected the error in which PDB structure 7LOI was described as a cryo-EM structure; it is in fact an NMR structure. We have also verified that all PDB structures are properly cited at their first occurrence in the manuscript.

      We have clarified that the modeling of palmitoylation sites, glycans and lipid bilayers are done in an automated fashion by different modules in CHARMM-GUI, and added Supplementary file 1–Supplementary Table 8 showing the simulation settings for equilibration and production stages.

      At the end of the subsection “Modeling of full-length gp120–gp41 trimer” we have modified the following:

      “Two mutations (S764C and S837C) were introduced in the CT to restore the palmitoylation sites, and lipid tails oriented towards the hydrophobic core of the bilayer were then attached to the palmitoylation sites using the PDB Manipulation module in CHARMM-GUI (Jo et al., 2008; Jo et al., 2014; Park et al., 2023) (Figure 1D).”

      At the end of the subsection “Glycosylation” we added:

      “The select glycan sequences were represented in the Glycan Reader Sequence format (Jo et al., 2011; Park et al., 2017) and added to the corresponding glycosylation sites using the Glycan Reader & Modeler graphical interface.”

      In the middle of the subsection “Membrane” we added:

      “Membrane systems were constructed using CHARMM-GUI Membrane Builder, which provides a user-friendly graphical interface for selecting lipid types and defining their numbers in each leaflet (Jo et al., 2007; Jo et al., 2009; Wu et al., 2014; Lee et al., 2016; Lee et al., 2019).”

      In the middle of the subsection “Simulation details” we added:

      We have modified the following:

      “Positional and dihedral restraints were applied to proteins, glycans, and lipids, with force constants progressively reduced over successive intervals (Supplementary file 1–Supplementary Table 8).”

      We added Supplementary file 1–Supplementary Table 8.

      Reviewer #3 (Recommendations for the authors):

      Major concerns:

      (1) Strengthen analysis of conformational coupling: Consider analyses that more directly assess coupling between the TMD/CT and ectodomain, such as residue-residue correlation networks, comparisons to smFRET-defined conformational states, or data-driven (e.g., machine learning-based) trajectory analyses. Machine-learning analysis would be particularly helpful in understanding otherwise elusive allosteric networks that could govern large-scale behavior. Discuss how, due to the apparent local minima that occur after ~0.5 us, enhanced sampling methods might be employed to better cover the Env conformational landscape.

      We have calculated the dynamical cross-correlation maps showing very weak correlations between the ectodomain and the TMD.

      At the end of the first paragraph in the subsection “The ectodomain maintains a rigid internal structure and tilts independently of the TMD”

      “We also calculated the dynamical cross-correlation maps (Ichiye et al., 1991) of Cα atoms for all systems using CPPTRAJ (Roe et al., 2013). The results indicate only very weak correlations between the ectodomain and the TMD (Figure 2—figure supplements 10–13).”

      We added Figure 2—figure supplements 10–13.

      We have also noted in the Discussion section that enhanced sampling methods could be employed to better explore the conformational landscape of Env trimer, including fluctuations within the closed state as well as transitions among the closed ground, transient and CD4/co-receptor-stabilized states proposed in the previous experimental study (Munro et al., 2014).

      In the middle of the second paragraph in the Discussion section we added:

      “Enhanced sampling methods could be applied to more thoroughly explore the conformational landscape, including not only variations within the closed ground state but also transitions among the closed ground, transient and CD4-/co-receptor-stabilized states.”

      (2) Qualify strong independence claims: Rephrase or further support statements asserting independence of ectodomain and TMD motions, particularly in light of known experimental evidence for coupling (PMC4701381, PMC4304640, PMC5085267).

      In addition to adding the dynamical cross-correlation maps showing very weak correlations between the ectodomain and the TMD, we have added a discussion of the coupling between TMD, CT, and Env antigenicity, and the independent motion of ectodomain and TMD in our simulation.

      In the middle of the second paragraph in the Discussion section we added:

      “Our analysis of the ectodomain and TMD coupling indicates that the motions of these two domains are largely independent. This observation does not contradict experimental studies demonstrating functional coupling between the TMD, CT, and the antigenic profiles of Env (Chen et al., 2015; Dev et al., 2016). Munro et al. proposed that unliganded Env is intrinsically dynamic, transitioning among three distinct prefusion conformations: a closed ground state (predominant), a transient state, and a CD4-/co-receptor-stabilized state. Both laboratory-adapted and clinically isolated strains can spontaneously transition among these three states, although their relative occupancies differ (Munro et al., 2014). It is therefore possible that TMD mutations or CT truncation also alter the equilibrium distribution among three states, thereby affecting the epitope exposure, particularly for epitopes that are occluded in the closed ground state while exposed in the CD4-/co-receptor-stabilized state. However, transition among three states occur on millisecond-to-second timescales. Our simulations on microsecond timescales primarily capture conformational variations within the closed ground state and suggest that the MPER acts as a hinge, providing substantial flexibility that enables the ectodomain and TMD to move independently while Env remains in the closed ground state.”

      (3) Clarify membrane composition assumptions: Provide a clearer rationale for the chosen lipid composition, and explicitly discuss how differences between host plasma membranes and HIV virions (e.g., PS, sphingomyelin, and cholesterol enrichment) may affect the conclusions.

      We have clarified the limitations associated with the choice of lipid composition and emphasized the need to investigate its influence in future studies.

      At the end of the second paragraph in the Discussion section we added:

      “In addition to the limitations inherent to protein structure modeling, the choice of lipid composition remains an open question. In this work, we selected an asymmetric mammalian plasma membrane because it is one of the 18 complex biomembrane systems we previously studied (Pogozheva et al., 2022), and among them, it provides the closest available approximation to the HIV membrane. Nevertheless, experimental studies have reported differences in lipid composition between HIV virions and the host plasma membrane (Aloia et al., 1993; Brugger et al., 2006; Huarte et al., 2016; Mucksch et al., 2019; Tomishige et al., 2023). Although we do not anticipate that our main conclusions regarding Env domain motions and MPER flexibility would change substantially, evaluating the influence of lipid composition represents an important direction for future work.”

      (4) Address citation and reference issues: Replace PDB-only references with proper citations, correct mischaracterizations of structure determination methods, and ensure all supplementary citations are fully referenced.

      We have corrected the error in which PDB structure 7LOI was described as a cryo-EM structure; it is in fact an NMR structure. We have also verified that all PDB structures are properly cited at their first occurrence in the manuscript.

      (5) Expand the Methods section: Provide additional detail on system construction, glycan modeling, lipid asymmetry, equilibration, sampling, and limitations, including a discussion of potential benefits of enhanced-sampling approaches.

      We have clarified that the modeling of palmitoylation sites, glycans and lipid bilayers are done in an automated fashion by different modules in CHARMM-GUI, and added Supplementary file 1–Supplementary Table 8 showing the simulation settings for equilibration and production stages.

      At the end of the subsection “Modeling of full-length gp120–gp41 trimer” we have modified the following:

      “Two mutations (S764C and S837C) were introduced in the CT to restore the palmitoylation sites, and lipid tails oriented towards the hydrophobic core of the bilayer were then attached to the palmitoylation sites using the PDB Manipulation module in CHARMM-GUI (Jo et al., 2008; Jo et al., 2014; Park et al., 2023) (Figure 1D).”

      At the end of the subsection “Glycosylation” we added:

      “The select glycan sequences were represented in the Glycan Reader Sequence format (Jo et al., 2011; Park et al., 2017) and added to the corresponding glycosylation sites using the Glycan Reader & Modeler graphical interface.”

      In the middle of the subsection “Membrane” we added:

      “Membrane systems were constructed using CHARMM-GUI Membrane Builder, which provides a user-friendly graphical interface for selecting lipid types and defining their numbers in each leaflet (Jo et al., 2007; Jo et al., 2009; Wu et al., 2014; Lee et al., 2016; Lee et al., 2019).”

      In the middle of the subsection “Simulation details” we have modified the following:

      “Positional and dihedral restraints were applied to proteins, glycans, and lipids, with force constants progressively reduced over successive intervals (Supplementary file 1–Supplementary Table 8).”

      We added Supplementary file 1–Supplementary Table 8.

      The discussion of potential benefits of enhanced-sampling approaches is included in our response to major concern (1).

      (6) Data availability: In addition to code, deposit all MD trajectories for re-analysis. The scale of this simulation was likely costly (GPU time), and so data availability is imperative.

      We have deposit MD simulation trajectories to Zenodo.

      At the end of the section “Data availability” we added:

      “The simulation trajectories can be found at https://doi.org/10.5281/zenodo.18853902, https://doi.org/10.5281/zenodo.18854615, and https://doi.org/10.5281/zenodo.18854639.”

      Minor:

      (1) Stylistic: Suggested to revise Figure 1 to provide a clearer overview of all constructs with consistent nomenclature (e.g., "full-length" versus "ΔCT") and explicit domain boundaries. With a better overview figure, the current figures could comprise the Figure 1 associated with Figures 1 and 2.

      We have combined Figure 1 and Figure 1—figure supplement 1 to show both full-length and CT-truncated models in one figure.

      We have modified Figure 1.

      We have removed Figure 1—figure supplements 1.

      (2) Explicitly cross-validate against prior studies: Integrate comparisons to existing MD simulations and experimental studies (e.g., PMC2730987, PMC2980712, PMC4254001, PMC4040535, PMC6035291, PMC4701381, PMC5085267) directly into the Results and Discussion.

      We have added discussion of previous MD simulation studies to the Result section regarding interactions of the TMD residue R696 with ions and lipid headgroups.

      At the end of the first paragraph in the subsection “The energetically unfavorable R696 in the hydrophobic core results in asymmetric, kinked TMD conformations and disrupts membrane integrity” we have modified the following:

      “Previously, Kim et al. reported that the inter-chain interactions between protonated R696 gradually diminished over a short simulation time (23 ns), leading to increased crossing angles and reduced bundle length (Kim et al., 2009). Gangupomu et. al and Baker et. al observed that R696 snorkeled toward either exoplasmic or endoplasmic headgroups in simulations of the TMD monomer, resulting in TMD tilting and membrane thinning due to water penetration and lipid headgroups interacting with R696 (Gangupomu et al., 2010; Baker et al., 2014; Baker et al., 2014). These observations are consistent with our finding. Hollingsworth et. al also reported membrane thinning; however, they attributed this effect to interfacial interactions of R683 and R707 with both leaflets and proposed that R696 only interacted with water and ions permeating into the center of the TMD timer (Hollingsworth et al., 2018).”

      The discussion of PMC4701381 and PMC5085267 is included in our response to major concern (2).

      (3) "In the cryo-EM structure (PDB ID: 7LOI)": This is an NMR model and lacks citation.

      We have corrected this error and added the citation at the first occurrence of PDB ID: 7LOI in the Result section.

      In the middle of the first paragraph in the subsection “The energetically unfavorable R696 in the hydrophobic core results in asymmetric, kinked TMD conformations and disrupts membrane integrity” we have modified the following:

      “In the NMR structure (PDB ID: 7LOI) (Piai et al., 2021),”

      (4) "Higher RMSF values were observed in the residues missing from the cryo-EM structure": This is lacking citation, as there are multiple cryo-EM structures and several dynamics studies using NMR.

      The missing residues here specifically refer to those absent in the cryo-EM structure (PDB ID: 6B0N) used for model building, rather than all cryo-EM structures in the PDB. We have revised the text to clarify this distinction.

      In the middle of the second paragraph in the subsection “The ectodomain maintains a rigid internal structure and tilts independently of the TMD” we have modified th following:

      “Higher RMSF values were observed in the residues missing from the cryo-EM structure (PDB ID: 6B0N) (Sarkar et al., 2018), which was used for the ectodomain in model building (these missing residues are highlighted in red in Figure 1A, B),”

    1. eLife Assessment

      This study provides fundamental insights by demonstrating that the Nanog mRNA coding sequence (CDS) and 3′UTR domains are spatially segregated and functionally distinct in pluripotent stem cells and blastocysts, with 3′UTR-enriched border cells primarily influencing morphogenesis and CDS-enriched inner cells largely regulating transcription and epigenetic programs. The work opens a novel conceptual avenue for understanding how separable mRNA domains can differentially control cell behavior and differentiation. However, the evidence is incomplete, as key aspects of the molecular nature, biogenesis, and precise characterization of the separated 3′UTR and CDS RNA species, as well as causal links between their perturbation and the observed phenotypes (e.g., via rescue and deeper characterization of 3′UTR elements), remain to be fully established.

    2. Reviewer #1 (Public review):

      Summary:

      There is evidence that some genes encode mRNAs from which separate processed transcripts may arise, separating the coding sequence (CDS) from the 3'-UTR, and with both mRNA elements remaining stable in the cell. However, the functional consequences of these mRNA fragments have not been firmly established. In the manuscript by Yang et al., the authors probe the mRNA domain architecture of Nanog in the context of embryonic stem cell colonies and blastocysts. The authors detect spatial separation of Nanog CDS-containing mRNA from abundant Nanog 3'-UTR RNAs depending on the cell position in 2D embryonic stem cell colonies or in blastocysts.

      Strengths:

      The phenotypic analyses of the Nanog mRNA hold promise for revealing distinct roles for the Nanog encoded protein and a separate RNA encompassing the Nanog 3'-UTR.

      Weaknesses:

      There are a number of questions about the molecular nature of the mRNA species that the authors should address in order for the results to be firmly established, as noted below.

      (1) It is not clear how the authors verified that their probes are specific for Nanog CDS or 3'-UTR regions. Especially for the 3'-UTR probe, it is confusing why colonies show green only regions, suggesting only the CDS is present. I would expect the CDS and 3'-UTR probes to colocalize in the interior cells. Is it possible that the 3'-UTR probe is targeting another RNA?

      (2) It would help for the authors to include a graphic similar to Figure 3, Figure Supplement 1A, that diagrams the location of the CDS and 3'-UTR probes (this should also be done for Oct4 and Sox2). This graphic could also show all potential polyadenylation signals.

      (3) I think, based on the fluorescence patterns, there is evidence that the signal for the Nanog 3'-UTR probe is nuclear (images with DAPI staining), but this is not commented on that I could find. This should be discussed, as nuclear retention has implications for the noncoding function of the 3'-UTR fragment.

      (4) Figure 2, Figure Supplement 1A needs a better explanation. It's not clear how the reads map to the different regions of the Nanog mature mRNA. The authors should show examples at different ratios of CDS to 3'-UTR. Do the reads have a sharp boundary at the junction of where the isolated 3'-UTR is thought to occur?

      (5) I looked in the Zenbu browser at human NANOG CAGE mapping in the FANTOM5 dataset. I could not see evidence for substantial capping of a 3'-UTR fragment when filtering for embryonic cell types. Given the strong signal for the 3'-UTR in border cells, I would expect to see evidence for capping if the RNA were indeed capped. This suggests that if it exists, it is likely uncapped and (as noted in point 3) is likely nuclear retained.

      (6) Are there predicted polyadenylation signals near the end of the CDS that would generate a short 3'-UTR, and are these signals conserved across mammals?

      (7) It would help to see a zoomed-in view of the region targeted by one of the guide RNAs in the 3'-UTR, and where that site is relative to the polyadenylation signal. Is the polyadenylation signal upstream, i.e., CDS proximal?

      (8) A final note, the use of green and red together will be challenging for those who are colorblind. Providing a different false color palette would be helpful.

      I am refraining from comments on the cell biology and morphological insights, as they are remote from my core expertise.

    3. Reviewer #2 (Public review):

      Summary:

      This manuscript shows that the coding sequence (CDS) and 3' untranslated region (3'UTR) of mRNA transcripts from the Nanog gene have distinct expression patterns and functions. In both human and mouse embryonic stem cells colonies and blastocysts, these domains are spatially segregated, with 3'UTR-enriched cells occupying the borders and CDS-enriched cells residing in the interior. CDS mRNA expression is correlated with the expected regulation of transcription and epigenetics associated with the Nanog protein. Interestingly, expression of the 3'UTR appears to play an independent role in cell behavior and colony morphogenesis. Indeed, deletion of the 3'UTR causes specific defects in cell spreading and protrusive activity, with alteration in the localization of adhesion and cytoskeleton-associated proteins. Remarkably, a large proportion of those defects are rescued upon ROCK inhibition. Deletion of either Nanog CDS or 3'UTR leads to distinct modifications in the differentiation competence.

      Strengths:

      The independent role of 3'UTR mRNA domains, although identified in neurosciences a couple of years ago, is a novel and exciting field relatively unexplored in early development.

      The manuscript offers a multilayer series of experiments, in ES cells colony, blastocysts, and embryoid bodies, including imaging, -omics, genetic and pharmacological challenges, and differentiation experiments, thereby unveiling very convincingly the role of Nanog 3'UTR in morphogenesis.

      Weaknesses:

      The pathways leading to the generation of those distinct transcript domains are unknown. Although the functional differential roles are well demonstrated, whether the expression patterns are a cause or a consequence of the cells' localisation in the embryo remains to be explored.

    4. Reviewer #3 (Public review):

      Summary:

      In this manuscript, Yang et al reported distinct functions of the protein-coding sequence (CDS) and the 3' untranslated region (UTR) in the Nanog mRNA in pluripotent stem cells. They first observed different localization patterns for the CDS and 3' UTR in embryonic stem cells and in blastocyst embryos, and this pattern correlates with cell populations in different pluripotent states based on single-cell sequencing data. To characterize the potentially distinct functions of these regions, the authors generated knockout (KO) cell lines in which either the CDS or the 3' UTR was genetically ablated. These deletions led to different phenotypes in multiple assays. These results provided evidence that the CDS and 3' UTR of an mRNA could have distinct functions. Although these results are potentially interesting, several questions need to be addressed before the validity of their conclusion can be confirmed.

      Strengths:

      This study provides evidence for distinct functions of the protein-coding sequence and 3' untranslated region of an mRNA in pluripotent stem cells. The concept could be more broadly applied.

      Weaknesses:

      The initial observation (distinct localization of CDS and 3' UTRs) and the causal relationship between the KO and phenotype need further validation.

      Major points:

      (1) The authors showed distinct localization patterns of the CDS and 3' UTRs in human and mouse ESCs and blastocysts, and the overlap between their signals was minimal (Figure 1). Does this mean that the CDS and 3' UTR RNAs exist separately? For example, in cells that only showed signals for 3' UTRs, do these RNAs only contain 3' UTRs and lack CDS? Was this confirmed by RNA-seq experiments? If so, how are they generated (i.e., by transcription from a novel promoter or partial degradation of the full-length mRNAs)? This is a key question. Without a clear characterization of these RNAs, the rest of the study cannot be substantiated.

      (2) To confirm that the phenotypes of CDS or 3' UTR KO cells were caused by the deleted regions instead of other artifacts, rescue experiments should be performed.

      (3) As over-expression of the 3' UTR showed a phenotype, important regions within it should be identified, and also the possibility that the 3' UTR contains open reading frame(s) and is translated should be tested.

    5. Author response:

      eLife Assessment

      This study provides fundamental insights by demonstrating that the Nanog mRNA coding sequence (CDS) and 3′UTR domains are spatially segregated and functionally distinct in pluripotent stem cells and blastocysts, with 3′UTR-enriched border cells primarily influencing morphogenesis and CDS-enriched inner cells largely regulating transcription and epigenetic programs. The work opens a novel conceptual avenue for understanding how separable mRNA domains can differentially control cell behavior and differentiation. However, the evidence is incomplete, as key aspects of the molecular nature, biogenesis, and precise characterization of the separated 3′UTR and CDS RNA species, as well as causal links between their perturbation and the observed phenotypes (e.g., via rescue and deeper characterization of 3′UTR elements), remain to be fully established.

      We thank the editors and the three reviewers for their careful and constructive engagement with our manuscript. We greatly appreciate the reviewers’ recognition of the conceptual significance of the study and their thoughtful suggestions for strengthening the mechanistic and molecular characterization of the work. We have carefully considered all points raised and outline below the revisions planned for the revised manuscript.

      The phenomenon of differential CDS and 3’UTR expression is not unique to Nanog. Independent 3’UTR and CDS expression and differential CDS/3’UTR usage has been observed across multiple genes, tissues, and developmental contexts, including genome-wide (Mercer et al., 2011) and transcriptome scale studies (Kocabas et al., 2025, Ji et al., 2021). Prior studies have proposed that isolated 3’UTRs may arise through regulated RNA processing pathways coupled to exonucleolytic degradation and, in some cases, recapping mechanisms (Malka et al, 2017, Haberman et al., 2024). While the precise molecular mechanisms underlying isolated Nanog CDS and 3’UTR generation remain unresolved, our observations (contained here) support regulated RNA processing models. Our original submission included a brief discussion of this topic; however the revised manuscript will include substantially expanded analyses and discussion of the generation of isolated Nanog CDS and 3’UTR species.

      The revised manuscript will address the major concerns regarding:

      (1) The molecular nature, biogenesis, and precise characterization of the separated 3′UTR and CDS mRNA species

      (2) The causal relationship between perturbation of these RNA species and the observed phenotypes, including additional rescue experiments and deeper computational characterization of putative, functional 3′UTR elements.

      Specifically:

      (A) New supplementary analyses and schematics designed to further clarify the conceptual and mechanistic framework of the study, including:

      (i) Computational examination of the Nanog 3’UTR across all reading frames for open reading frames (ORFs).

      (ii) As suggested by Reviewers 1 and 3, single cell traces of Nanog mRNA expression from the full-length mESC dataset used in this study, illustrating distinct transcript isoforms and CDS/3’UTR expression patterns across individual cells, complementing the color-coded tSNE analyses currently presented in Fig. 2.

      (iii) Expanded schematic model and analyses addressing possible mechanisms underlying the generation of isolated Nanog CDS and 3’UTR enriched RNA species, including transcript architecture, predicted RNA structural barriers, and exonucleolytic processing models.

      (iv) Expanded discussion of the predominantly nuclear localization of the Nanog 3’UTR signal and its implications for transcript biogenesis, processing, and potential noncoding functions.

      (B) Correction of all minor labeling errors.

      (C) Additional experimental analyses, including:

      - Expansion of Nanog 3’UTR overexpression and rescue experiments to include cell spreading assays.

      - Expanded analysis of the effects of ROCK pathway inhibitors on colony morphology and cytoskeletal organization.

      - Examination of the ability of ROCK inhibition to restore normal embryoid body formation.

      Collectively, these planned revisions are intended to strengthen the mechanistic framing, molecular characterization, and broader significance of the study while clarifying the interpretation and scope of the conclusions.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      There is evidence that some genes encode mRNAs from which separate processed transcripts may arise, separating the coding sequence (CDS) from the 3'-UTR, and with both mRNA elements remaining stable in the cell. However, the functional consequences of these mRNA fragments have not been firmly established. In the manuscript by Yang et al., the authors probe the mRNA domain architecture of Nanog in the context of embryonic stem cell colonies and blastocysts. The authors detect spatial separation of Nanog CDS-containing mRNA from abundant Nanog 3'-UTR RNAs depending on the cell position in 2D embryonic stem cell colonies or in blastocysts.

      Strengths:

      The phenotypic analyses of the Nanog mRNA hold promise for revealing distinct roles for the Nanog encoded protein and a separate RNA encompassing the Nanog 3'-UTR.

      Weaknesses:

      There are a number of questions about the molecular nature of the mRNA species that the authors should address in order for the results to be firmly established, as noted below.

      (1) It is not clear how the authors verified that their probes are specific for Nanog CDS or 3'-UTR regions. Especially for the 3'-UTR probe, it is confusing why colonies show green only regions, suggesting only the CDS is present. I would expect the CDS and 3'-UTR probes to colocalize in the interior cells. Is it possible that the 3'-UTR probe is targeting another RNA?

      We thank the reviewer for raising the important question of probe specificity. We realize that the data that underlying this concern is the absence of colocalizing between CDS and 3’UTR probes in colony border cells.

      The absence of CDS/3’UTR colocalization in colony border cells is not due to probe failure but instead reflects the principal observation underlying the study. If Nanog CDS and 3’UTR sequences were present exclusively as intact full-length transcripts in a strict stoichiometric ratio, Nanog positive cells would be expected to be positive for both probes (appearing yellow). Instead, border cells exhibit strong 3’UTR signal with minimal or absent CDS signal, while adjacent interior cells show the opposite pattern.

      The fact that both probes robustly detect signal within the same sample but in spatially distinct cell populations, argues that both probes are functional and that the observed differential localization reflects genuine biological differences in levels of transcript components.

      The CDS probe targets ~300 bp within the coding region, while the 3’UTR probe targets ~300 bp within the proximal region of the Nanog 3’UTR. Hybridization specificity was validated as described in the Methods and in our previous studies (Kocabas et al 2015; Ji et al 2021), including negative controls. We additionally now provide a supplemental figure (New Figure 1-figure supplement 2A), highlighting that the Nanog 3’UTR and CDS probes label cell populations distinct from each other, further indicating their specificity.

      In addition, full-length scRNA seq datasets from both mouse and human ESCs demonstrate differential CDS/3’UTR expression patterns for Nanog and many other genes. To further clarify this point, the revised manuscript will include single cell transcript traces from mESCs illustrating the distinct Nanog isoforms detected across individual cells (New Figure 2-figure supplement 1A)

      (2) It would help for the authors to include a graphic similar to Figure 3, Figure Supplement 1A, that diagrams the location of the CDS and 3'-UTR probes (this should also be done for Oct4 and Sox2). This graphic could also show all potential polyadenylation signals.

      We agree that additional schematic clarification would improve readability. The revised manuscript will include schematics showing the locations of the CDS and 3’UTR probes for Nanog, Sox2 and Oct4 (New Fig. 1- figure supplement 1A).

      (3) I think, based on the fluorescence patterns, there is evidence that the signal for the Nanog 3'-UTR probe is nuclear (images with DAPI staining), but this is not commented on that I could find. This should be discussed, as nuclear retention has implications for the noncoding function of the 3'-UTR fragment.

      The reviewer is correct that the Nanog 3’UTR signal mostly nuclear. Whie this was noted in (the original) Figure 1-figure supplement 2A, we agree that it is possible that mechanistic and functional implications were not sufficiently discussed in the original manuscript. The revised manuscript will include expanded discussion of the relationship between nuclear localization transcript processing, and potential noncoding functions of isolated Nanog 3’UTR species

      (4) Figure 2, Figure Supplement 1A needs a better explanation. It's not clear how the reads map to the different regions of the Nanog mature mRNA. The authors should show examples at different ratios of CDS to 3'-UTR. Do the reads have a sharp boundary at the junction of where the isolated 3'-UTR is thought to occur?

      We thank the reviewer for this suggestion. The revised manuscript will include new single cell read maps across the Nanog locus from full length mESC scRNA-seq datasets (New Figure 2-figure supplement 1A), illustrating distinct CDS enriched and 3’UTR enriched transcript isoforms across individual cells.

      These analyses indicate that some CDS dominant transcripts contain 3’UTR sequence, while many appear to contain little or no detectable 3’UTR sequence. Conversely, many 3’UTR enriched transcripts contain only minimal or truncated CDS sequence. Importantly full CDS and 3’UTR mRNA components are frequently not present in a strict 1:1 ratio, either within individual cells, or across cell populations.

      The revised manuscript will also include expanded supplementary analyses integrating transcript architecture, predicted RNA structural barriers, polyadenylation analysis, and single cell coverage patterns to further examine possible mechanisms underlying the generation of isolated Nanog CDS and 3’UTR species (New Figure 2-figure supplement 1B,C).

      (5) I looked in the Zenbu browser at human NANOG CAGE mapping in the FANTOM5 dataset. I could not see evidence for substantial capping of a 3'-UTR fragment when filtering for embryonic cell types. Given the strong signal for the 3'-UTR in border cells, I would expect to see evidence for capping if the RNA were indeed capped. This suggests that if it exists, it is likely uncapped and (as noted in point 3) is likely nuclear retained.

      Prior studies have reported isolated uncapped and recapped 3’UTR species in multiple systems (Malka et al, 2017; Haberman et al, 2024). We agree that the predominantly nuclear localization and lack of a strong CAGE signal for Nanog are important observations and will expand discussion of these points in the revised manuscript.

      (6) Are there predicted polyadenylation signals near the end of the CDS that would generate a short 3'-UTR, and are these signals conserved across mammals?

      Computational analysis of the mouse Nanog 3'UTR identifies a single canonical PAS (AATAAA) at position 1074, located at the 3’ end of the annotated 3’UTR and this terminal PAS is conserved across mammals. These analyses will be included as a supplementary figure and discussed further in the revised manuscript section addressing Nanog transcript biogenesis.

      (7) It would help to see a zoomed-in view of the region targeted by one of the guide RNAs in the 3'-UTR, and where that site is relative to the polyadenylation signal. Is the polyadenylation signal upstream, i.e., CDS proximal?

      This will be provided in the revised manuscript (New Figure 2-figure supplement 1C,i) Two guide RNAs were used to generate the Nanog 3’UTR deletions. The downstream guide is upstream of the terminal polyadenylation signal at nt 1074 to preserve polyadenylation of the remaining Nanog CDS containing transcript.

      Consistent with this, all Nanog 3’UTR knockout lines retain normal Nanog protein levels. The revised manuscript will include supplementary schematics showing guide RNA positions relative to the CDS, 3’UTR probes, and terminal PAS.

      (8) A final note, the use of green and red together will be challenging for those who are colorblind. Providing a different false color palette would be helpful. 

      We appreciate this attention to accessibly. The red/green color combination was chosen to provide the highest contrast between CDS and 3’UTR signals in the in situ hybridization experiments, which is important for visualizing their differential spatial localization. We will ensure that figure legends clearly indicate channel assignments throughout the manuscript.

      I am refraining from comments on the cell biology and morphological insights, as they are remote from my core expertise.

      Reviewer #2 (Public review):

      Summary:

      This manuscript shows that the coding sequence (CDS) and 3' untranslated region (3'UTR) of mRNA transcripts from the Nanog gene have distinct expression patterns and functions. In both human and mouse embryonic stem cells colonies and blastocysts, these domains are spatially segregated, with 3'UTR-enriched cells occupying the borders and CDS-enriched cells residing in the interior. CDS mRNA expression is correlated with the expected regulation of transcription and epigenetics associated with the Nanog protein. Interestingly, expression of the 3'UTR appears to play an independent role in cell behavior and colony morphogenesis. Indeed, deletion of the 3'UTR causes specific defects in cell spreading and protrusive activity, with alteration in the localization of adhesion and cytoskeleton-associated proteins. Remarkably, a large proportion of those defects are rescued upon ROCK inhibition. Deletion of either Nanog CDS or 3'UTR leads to distinct modifications in the differentiation competence.

      Strengths:

      The independent role of 3'UTR mRNA domains, although identified in neurosciences a couple of years ago, is a novel and exciting field relatively unexplored in early development.

      The manuscript offers a multilayer series of experiments, in ES cells colony, blastocysts, and embryoid bodies, including imaging, -omics, genetic and pharmacological challenges, and differentiation experiments, thereby unveiling very convincingly the role of Nanog 3'UTR in morphogenesis.

      Weaknesses:

      The pathways leading to the generation of those distinct transcript domains are unknown. Although the functional differential roles are well demonstrated whether the expression patterns are a cause or a consequence of the cells' localization in the embryo remains to be explored.

      We thank the reviewer for these thoughtful comments and for recognizing the potential significance of independent 3’UTR functions in early developmental systems.

      Regarding the mechanisms underlying generation of distinct CDS and 3’UTR transcript domains, the revised manuscript will include new supplementary analyses and schematic models addressing possible Nanog transcript processing pathways, as outlined above.

      We agree that the relation between spatial location and Nanog 3’UTR expression is an important question. Specifically, it remains unclear whether cells first acquire high Nanog 3’UTR expression and subsequently localize to the colony border or whether border position itself promotes high Nanog 3’UTR expression.

      Our current data suggest that both processes may contribute. Deletion of the Nanog 3’UTR does not prevent colonies from establishing border/interior pattern, indicating that high Nanog 3’UTR is not strictly required for border pattern itself. At the same time, Nanog 3’UTR overexpression and rescue experiments increased the likelihood of border localization, suggesting that elevated Nanog 3’UTR expression promotes behaviors associated with border occupancy.

      Reviewer #3 (Public review):

      Summary:

      In this manuscript, Yang et al reported distinct functions of the protein-coding sequence (CDS) and the 3' untranslated region (UTR) in the Nanog mRNA in pluripotent stem cells. They first observed different localization patterns for the CDS and 3' UTR in embryonic stem cells and in blastocyst embryos, and this pattern correlates with cell populations in different pluripotent states based on single-cell sequencing data. To characterize the potentially distinct functions of these regions, the authors generated knockout (KO) cell lines in which either the CDS or the 3' UTR was genetically ablated. These deletions led to different phenotypes in multiple assays. These results provided evidence that the CDS and 3' UTR of an mRNA could have distinct functions. Although these results are potentially interesting, several questions need to be addressed before the validity of their conclusion can be confirmed.

      Strengths:

      This study provides evidence for distinct functions of the protein-coding sequence and 3' untranslated region of an mRNA in pluripotent stem cells. The concept could be more broadly applied.

      Weaknesses:

      The initial observation (distinct localization of CDS and 3' UTRs) and the causal relationship between the KO and phenotype need further validation.

      Major points:

      (1) The authors showed distinct localization patterns of the CDS and 3' UTRs in human and mouse ESCs and blastocysts, and the overlap between their signals was minimal (Figure 1). Does this mean that the CDS and 3' UTR RNAs exist separately? For example, in cells that only showed signals for 3' UTRs, do these RNAs only contain 3' UTRs and lack CDS? Was this confirmed by RNA-seq experiments? If so, how are they generated (i.e., by transcription from a novel promoter or partial degradation of the full-length mRNAs)? This is a key question. Without a clear characterization of these RNAs, the rest of the study cannot be substantiated.

      We thank the reviewer for raising this important question, which overlaps substantially with several key points raised by Reviewer #1 concerning the molecular nature and characterization of the Nanog CDS and 3’UTR species.

      Colony border cells exhibit strong Nanog 3’UTR signal with minimal detectable CDS signal, while adjacent interior cells show the reciprocal pattern. These observations strongly suggest the existence of distinct Nanog transcript species rather than exclusively full-length transcripts containing stoichiometric amounts of both CDS and 3’UTR sequence.

      This conclusion is independently supported by full-length Smart-seq2 scRNA seq datasets from both mouse and human ESCs, which provide transcript coverage across both CDS and 3’UTR regions.

      (2) To confirm that the phenotypes of CDS or 3' UTR KO cells were caused by the deleted regions instead of other artifacts, rescue experiments should be performed.

      Rescue experiments were included in the original submission (Fig. 4). The revised manuscript will expand these analyses to include cell spreading. We will also include additional ROCK pathway modulation experiments.

      (3) As over-expression of the 3' UTR showed a phenotype, important regions within it should be identified, and also the possibility that the 3' UTR contains open reading frame(s) and is translated should be tested.

      The revised manuscript will also include supplementary computational analyses of the Nanog 3’UTR, including open reading frame prediction, Kozak scoring, and evolutionary conservation analysis. (New Figure 2-figure supplement 1B). These analyses identify no evidence for strongly supported coding potential within the 3’UTR. Further, isolated Nanog 3’UTR transcripts are largely confined to the nucleus, making active translation unlikely.

      The revised manuscript will include new supplementary analyses addressing Nanog transcript structure and possible biogenesis mechanisms (New Figure 2-figure supplement 1C).

      References:

      ViennaRNA/RNA fold – Lorenz et al 2011 Algorithms Mol Biol 6:26- RNA Secondary Structure stem loop, minimum free energy (MFE) prediction

      NCBI BLASTP- Altschul et al (1990) J Mol Biol 215:403- ORF conservation, protein sequence similarity search

      NCBI Entrez/Biohthon- Cock et al (2009) Bioinformatics 25:1422- sequence retrieval

      PhastCons/UCSC multiz alignments- Siepel et al (2005) Genome Res 15:1034- evolutionary conservation scoring

      UCSC Genome Browser- Kent et al. (2002) Genome Res 12:996-1006- conservation track access

      Eaton et al (2020) Mol Cell 78:439- Stall model

      Brannan et al (2012) Genes Dev 26:2621-Stall model

      Addition to Methods.

      ORFs (≥10 amino acids) were identified in all three forward frames according to Kozak (1987). Evolutionary conservation was assessed by BLASTP (Altschul et al., 1990) against RefSeq proteins. Poly(A) signals were identified by pattern matching for canonical and non-canonical hexamers. Conserved sequence blocks were obtained from UCSC PhastCons tracks (Siepel et al., 2005). RNA secondary structures were predicted using ViennaRNA RNAfold (Lorenz et al., 2011) with a sliding 80-nt window. The stall model for isolated transcript generation follows Eaton et al. (2020).

    1. eLife Assessment

      There is a need for better and safer dengue virus live attenuated vaccines. This manuscript describes important findings that could lead to the design of a strongly immunogenic, tetravalent live attenuated vaccine for dengue, without the risk of causing antibody-dependent enhancement. However, the experimental evidence presented is incomplete since only constructions of one serotype were tested to prove the principle.

    2. Reviewer #1 (Public review):

      Summary:

      Dalben et al. grafted the fusion loop mature (FLM) modification, based on a previously reported D2-FLM, to another serotype DENV4, and adapted them to replicate in Vero cells for live attenuated vaccine (LAV) manufacturing while retaining favorable antigenic profiles, generating two new strains: D2-vFLM and D4-vFLM. Deep sequencing revealed adapted mutations at the junction of envelope domains I and II (EDI and EDII), and both D2-vFLM and D4-vFLM showed no evidence of ADE in the presence of FL-targeting Abs. Sera from D2-vFLM immunized mice displayed strong homotypic and reduced heterotypic neutralization compared to wild-type viruses, with minimal to no ADE potential in vitro. Moreover, D2-vFLM immunization completely protected AG129 mice from lethal challenge with mouse-adapted D220. They demonstrate that the FLM modification platform is transferable across serotypes and yields strains with favorable immunogenicity and reduced ADE risk. The FLM approach provides a promising path toward the development of a safer tetravalent DENV LAV.

      Strengths:

      The authors carried out a series of experiments to generate and characterize two new strains (D2-vFLM and D4-vFLM) of FLM-modified viruses, and showed their antigenic and immunogenic profiles. The observation that the FLM modification platform is transferable across serotypes and yields strains with favorable immunogenicity and reduced ADE risk is interesting.

      Weaknesses:

      However, one concern is the total number of mutations (including originally introduced and compensatory mutations) in this FLM vaccine platform, and it is not clear regarding the future directions for the proof-of-concept vaccine in this study.

    3. Reviewer #2 (Public review):

      Summary:

      In this manuscript, YR Dalben et al describe the generation of DENV2 and DENV4 strains with mutations in the fusion loop (FL) of the E protein and pre-membrane (prM) protein to limit potential antibody-dependent enhancement (ADE) resulting from vaccination with live-attenuated vaccines and adapted these strains for growth in Vero cells. They show that the DENV2 version D2-vFLM is immunogenic and generates neutralizing serum against DENV2 and DENV4 after 2 boosts and is protective against lethal challenge. Serum from D2-vFLM also showed no ADE against DENV4.

      Strengths:

      Overall, the paper is well written and presented, and the data presented support most of the conclusions made. Grafting D2-FLM mutations to DENV4 and adapting both to growth in Vero cells is a good step to show that this method could be used to generate production-level LAV. The growth and stability data are clear and well-conducted.

      Weaknesses:

      However, there are several weaknesses, mostly in regard to the immunogenicity data, that limit the overall impact. The FLM mutations were only grafted to DENV4 but not to the other Dengue serotypes. The authors acknowledge that this is a proof-of-concept, but generating mutants of the other serotypes would strengthen the idea that this could be used to develop a tetravalent LAV. Immunizations in mice were only performed for D2-vFLM but not D4-vFLM. Immunogenicity data for D4-vFLM would strengthen this work if it shows that it can be immunogenic, protective, and limit ADE, as is shown for D2-vFLM. ADE from D2-vFLM was only tested against DENV4; does it also limit ADE from the other serotypes? This would better show that these mutations do limit ADE across serotypes and not just a single one.

      Additionally, some of the immunization data likely need to be repeated:

      The authors should describe why they pooled the sera from the mice and whether they purified total IgG or not (Figure 5). They should also probably repeat the challenge experiment since it was 4 mice (D2) against 5 (D2-vFLM), and it is unclear if there is a statistical difference between the results obtained. It is not even mentioned in the Results section (D2 result vs D2-FLM), and thus unclear if using D2-FLM is an improvement in the way the data is currently presented.

    4. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Dalben et al. grafted the fusion loop mature (FLM) modification, based on a previously reported D2-FLM, to another serotype DENV4, and adapted them to replicate in Vero cells for live attenuated vaccine (LAV) manufacturing while retaining favorable antigenic profiles, generating two new strains: D2-vFLM and D4-vFLM. Deep sequencing revealed adapted mutations at the junction of envelope domains I and II (EDI and EDII), and both D2-vFLM and D4-vFLM showed no evidence of ADE in the presence of FL-targeting Abs. Sera from D2-vFLM immunized mice displayed strong homotypic and reduced heterotypic neutralization compared to wild-type viruses, with minimal to no ADE potential in vitro. Moreover, D2-vFLM immunization completely protected AG129 mice from lethal challenge with mouse-adapted D220. They demonstrate that the FLM modification platform is transferable across serotypes and yields strains with favorable immunogenicity and reduced ADE risk. The FLM approach provides a promising path toward the development of a safer tetravalent DENV LAV.

      Strengths:

      The authors carried out a series of experiments to generate and characterize two new strains (D2-vFLM and D4-vFLM) of FLM-modified viruses, and showed their antigenic and immunogenic profiles. The observation that the FLM modification platform is transferable across serotypes and yields strains with favorable immunogenicity and reduced ADE risk is interesting.

      We thank reviewer 1 for the encouraging comments for our work.

      Weaknesses:

      However, one concern is the total number of mutations (including originally introduced and compensatory mutations) in this FLM vaccine platform, and it is not clear regarding the future directions for the proof-of-concept vaccine in this study.

      Author response table 1.

      We summarize the mutations in the FLM platform below.

      The maturation mutations are located at the furin cleavage site, which is buried within the membrane or virion. As a result, only five mutations are surface exposed, two of which are in the fusion loop region targeted for removal. Therefore, for a proof-of-concept study, the total number of mutations remains well within the genetic diversity observed among DENV genotypes.

      Compensatory mutations may affect overall DENV antigenicity. Notably, one such mutation, K204R, has been reported to alter antigenicity and could contribute to the improved safety profile of the vaccine. However, we have also shown that multiple adaptive pathways can support Vero cell adaptation, and our data indicate that K204R is not absolutely required for this process.

      Reviewer #2 (Public review):

      Summary:

      In this manuscript, YR Dalben et al describe the generation of DENV2 and DENV4 strains with mutations in the fusion loop (FL) of the E protein and pre-membrane (prM) protein to limit potential antibody-dependent enhancement (ADE) resulting from vaccination with live-attenuated vaccines and adapted these strains for growth in Vero cells. They show that the DENV2 version D2-vFLM is immunogenic and generates neutralizing serum against DENV2 and DENV4 after 2 boosts and is protective against lethal challenge. Serum from D2-vFLM also showed no ADE against DENV4.

      Strengths:

      Overall, the paper is well written and presented, and the data presented support most of the conclusions made. Grafting D2-FLM mutations to DENV4 and adapting both to growth in Vero cells is a good step to show that this method could be used to generate production-level LAV. The growth and stability data are clear and well-conducted.

      We thank reviewer 2 for the encouraging comments for our work.

      Weaknesses:

      However, there are several weaknesses, mostly in regard to the immunogenicity data, that limit the overall impact. The FLM mutations were only grafted to DENV4 but not to the other Dengue serotypes. The authors acknowledge that this is a proof-of-concept, but generating mutants of the other serotypes would strengthen the idea that this could be used to develop a tetravalent LAV.

      We selected DENV2 and DENV4 because they are the most genetically divergent. Currently, our data should support the FLM mutations that can be grafted on both DENV2 and DENV4, likely extend to their corresponding genotypes. We agree that the FLM mutations should be evaluated in additional serotypes. We also have promising preliminary data for FLM mutation grafting in DENV1 and are currently applying the same approach to DENV3. We hope to include these results, whether positive or negative, in the revised manuscript.

      Immunizations in mice were only performed for D2-vFLM but not D4-vFLM. Immunogenicity data for D4-vFLM would strengthen this work if it shows that it can be immunogenic, protective, and limit ADE, as is shown for D2-vFLM.

      We are currently immunizing AG129 mice with DV4 and D4-vFLM, followed by heterotypic challenge with D220. Because DENV vaccine-related hospitalization in clinical trials typically occurs 3 - 4 years after vaccination, we are cautious about whether this experimental design will fully capture the added safety benefit of the FLM mutations. We are also developing a passive immunization model in AG129 mice using diluted DENV4 serum to better mimic long-term waning antibody titers. We will include the future findings in the revised manuscript.

      ADE from D2-vFLM was only tested against DENV4; does it also limit ADE from the other serotypes? This would better show that these mutations do limit ADE across serotypes and not just a single one.

      We are trying to keep the scope of the paper within DENV2 and DENV4, however, we will perform ADE and neutralization assays for all four serotypes in the revised manuscript.

      Additionally, some of the immunization data likely need to be repeated:

      The authors should describe why they pooled the sera from the mice and whether they purified total IgG or not (Figure 5).

      We used pooled serum, consisting of equal volumes from each mouse, rather than purified IgG. In Figure 5, our goal was to show the overall increase in serum titer after each immunization using cheek-bleed samples from individual animals. Because the available sample volume was limited, we pooled the sera for this analysis. We also measured end-point serum titers for each individual animal.

      They should also probably repeat the challenge experiment since it was 4 mice (D2) against 5 (D2-vFLM), and it is unclear if there is a statistical difference between the results obtained. It is not even mentioned in the Results section (D2 result vs D2-FLM), and thus unclear if using D2-FLM is an improvement in the way the data is currently presented.

      This experiment was designed to determine whether D2-vFLM protects AG129 mice against homotypic challenge as effectively as DV2-WT. Although the sample size was small, the results support our conclusion. However, we agree with the reviewer that the study should include more animals, and we will increase the group size to n > 8 to 10 in the revised experiment.

    1. eLife Assessment

      The authors propose a "simplified" model for intrinsically bursting neurons with explicitly controllable parameterization of oscillatory dynamics. The evidence that the modeling approach is generally appropriate and practical for modeling rhythmic bursting neurons and neural circuits is currently incomplete. Based on what the authors present, this model appears to have limited neurobiological relevance and utility but may be useful as a controller for an artificial system, such as in neuro-robotics applications.

    2. Reviewer #1 (Public review):

      Summary:

      The authors present a simplified neural bursting model with explicitly controllable parameterization of oscillator dynamics designed for neural circuit modeling involved in rhythm generation.

      Strengths:

      (1) The purpose of the model and applied abstractions are well articulated and justified (2D model, independent parameter control).

      (2) Explicit control of burst duration, inter-burst interval, amplitude, resetting-behavior/entrainment. This allows modelers to focus on circuit interactions and is especially useful when details of intrinsic currents and bursting mechanisms are unknown. One could even imagine a scenario where this model would help identify predictions on key underlying burst generation mechanisms.

      (3) The model is well described and validated with simulations and comparisons to the base model and one alternative model.

      (4) Circuit-level validation is convincing, as it reproduces not only trivial examples.

      (5) The underlying mechanism in phase space is well reasoned and justified, extends previous work, e.g., by McKean, by improving usability.

      Weaknesses:

      (1) The paper heavily relies on numerical demonstrations but does not provide a formal analysis of stability, bifurcations, or entrainment. While appropriate for the intended purposes, a more formal footing could strengthen the model.

      (2) Lots of nice demonstrations are shown, but it is less clear how model parameterization was chosen, how behavior depends on parameterization, and in what parameter ranges certain behavior can be expected. A more detailed description of parameterization/exploration of parameter space would greatly benefit anyone using this model in the future.

      (3) Some claims on reproduction of prior locomotor CPG model and production of "more biologically realistic activity" by the presented model are overstated. The key feature of the locomotor CPG models cited was that they not only reproduced speed-dependent gait expression of intact mice, but also changes of gait expression after silencing/removal of specific commissural and long propriospinal interneurons (e.g., selective loss of trot after deleting of V0V; changes in gait expression and step-to-step variability after silencing of descending long-propriospinal neurons or ascending V3 LPNs). While likely (at least partially) feasible with the model formulation, the correspondence of these silencing/ablation of neuron classes has not been shown by the model. Importantly, though, it appears that authors didn't show how the model in general behaves under the influence of noise, which is key to reproducing LPN silencing.

    3. Reviewer #2 (Public review):

      Summary:

      The authors propose a reduced model for intrinsically bursting neurons. The model simply consists of exponential decay of an adaptation variable in a phenomenological silent phase, an exponential growth of that variable in an active phase, and imposed thresholds for jumps between these phases, with some add-ons to allow for effects such as input-dependence.

      Strengths:

      The model could be used as a controller for an artificial system that needs to switch between on and off states with separate control of state durations. It has some flexibility to allow for variable levels of the activity variable during the active phase. The authors show that the model can be tuned to capture phase response properties of neurons and patterns generated by small networks of neurons.

      Weaknesses:

      The proposed approach lacks biological relevance, practicality, and originality.

      (1) Biological relevance:

      Central pattern generators and other bursting neurons use specific physical principles to generate their bursts of activity. These principles place constraints on the tuning of these bursts, including relationships between active and silent phase durations and other properties. By discarding these relationships, the proposed model risks losing key constraints that affect performance in biologically relevant scenarios. The proposed model does not allow for the emergence of interesting dynamical phenomena, which occur naturally in neurons and neuronal networks.

      It is also important to note that spikes within bursts can be important and of interest. Biophysical models allow for easy extension to include spikes via fast sodium and potassium currents. The proposed model does not allow for such extensibility.

      Finally, as shown in the seminal early-2000s work of Izhikevich, building on fast-slow decomposition work by Rinzel and others, there is a wide variety of possible neuronal bursting patterns. At the very least, several of these have been observed in neuronal recordings. The authors' model is specific to square-wave bursting.

      (2) Practicality:

      The model makes use of various cut-off functions and other aspects that are implemented as rules. Combining rules with differential equations makes for an awkward modeling framework that is inconvenient to implement, conceptualize, and analyze (e.g., from a bifurcation perspective). Moreover, the authors add more and more adjustments to their basic framework to capture additional features, but these add-ons simply make the model more, and unnecessarily, complicated and awkward. It's worth noting that the authors argue for their model based on the idea that more biophysical models are difficult to tune, yet they compare their model to a biophysical one that they were able to tune to achieve the various patterns that they study. They do not give any indication of how easy or hard it was to tune their own model, nor do they compare simulation times between the two models. I do note that the biophysical model seems to have 22 parameters, whereas the simplified one has 21 in Table 2, which is essentially the same number. Finally, although the authors give some extensions of the model to match observed data, their model does not seem useful for predicting performance in never-before-tested scenarios.

      (3) Originality:

      As the authors note, the use of low-dimensional, specifically planar, neural models dates back to early authors such as FitzHugh and Nagumo. What the authors fail to acknowledge is that Rinzel, Terman, Kopell, and others did seminal work on neuronal activity, including phenomena such as post-inhibitory rebound and fast threshold modulation, using a relaxation oscillation framework, starting several decades ago. Their work included applications to central pattern generators (e.g., see Terman and collaborators on respiratory CPGs). It is astonishing that the authors don't seem to be aware of this work and do not mention it at all. Moreover, I don't see any advantage of the proposed framework over the earlier relaxation oscillator setting, where many important mechanistic principles have already been analyzed, including extensions to networks. On a related note, even through they propose a piecewise linear model, the authors do not cite the substantial existing work on piecewise linear models (e.g., Hahnloser, Neural Networks, 1998, for an early example; 2024 SIAM Review article by Coombes et al and references therein for much more) including work specifically on bursting, nor do they cite various other previous efforts to capture bursting with simplified models including work on piecewise linear maps by Aguirre et al.

    4. Reviewer #3 (Public review):

      This computational modeling study introduces the methodology of replacing bursting neurons in a model circuit with a simplified piecewise-linear model with an "active" and a "quiet" state representing, respectively, the burst of spikes and the inter-burst interval. The shape of the active state loosely represents the intra-burst firing rate. Because (piecewise) linear systems are explicitly solvable, the transitions from quiet to active and vice versa can be calculated explicitly to match exactly what a biophysically realistic model or a biological neuron does in different conditions. The base piecewise-linear model is built to represent a 2D biophysical neuron with a cubic v-nullcline. The simplicity of the model allows for matching the kinetics of more complex models with a tractable simplified set of equations, as exemplified by approximations of burst duration and amplitude, phase-response curves, entrainment, and, finally, mimicking the activities of two CPG circuit models using this simplified representation.

      Major comments

      (1) The use of piecewise linear approximations to explicitly estimate properties of biophysical neurons is a well-known and common technique. This study adds nothing to the technique in terms of novelty.

      (2) Although the model explicitly matches active and inactive durations of a circuit neuron, the dynamics are explicitly "clamped" by the user because the reduced model parameters explicitly depend on the input. There are cases where this is useful, for example, when we are interested in the dynamics of _other_ neurons (B, C, D, ...) within the context of activity, and we "clamp" the dynamics of neuron A. One should note that this is no better than having a look-up table. Effectively, to give a comparison, it is like using a sine wave to represent a pacemaker neuron and explicitly define its frequency at different input levels so that it responds "dynamically". However, the neuron is restricted to what the user puts in, and therefore, calling it a dynamical system is entirely wrong. I am afraid that the use of this crude tool is not described well enough in the manuscript to warn a naïve user not to fall for this trap.

      (3) The phase resetting curves are used incorrectly. PRCs are useful when the perturbation is weak (soft), which would demonstrate the nature of the vector field near the limit cycle and therefore inform us of the nature of its stability or instability. A hard PRC would always reset the cycle to the fixed offset from the perturbation phase and is therefore uninformative in understanding dynamics. (It is, however, useful experimentally in identifying which neurons are part of the CPG.) The authors clearly know that the dynamics of the system away from the limit cycle do not conserve those of a biophysical neuron. So what is the point?

      (4) I work on the STG, one of the systems exemplified here. Even in the small and relatively regular CPGs of the STG, the definition of the active and quiet parts of a burst is often less clear than what the authors suggest. Bursting neurons often do multiple bursts in a cycle, and therefore, substituting the burst envelope is a subjective matter. This is even more problematic in bursting neurons in the brain, where there is often no quiet period. This should be discussed.

    5. Author response:

      We thank the editors and reviewers for their time and feedback. We are encouraged by the feedback that the purpose and abstractions of the model are well articulated and justified, that the explicit control of bursting characteristics is useful, and that the circuit-level validations are convincing.

      Before responding to individual reviewer comments, we would like to address the framing in the current assessment that the model "appears to have limited neurobiological relevance and utility but may be useful as a controller for an artificial system, such as in neuro-robotics applications." We respectfully suggest that this framing understates the model's relevance to neuroscience. Specifically, a growing body of literature aims to understand biological motor control by building embodied simulations. Yet, these simulations either use overly simple artificial neural network (ANN) units without dynamics or computationally intensive biophysical ones that are difficult to train. Our model is not intended as a biophysical account of how individual neurons generate bursts at the level of ionic mechanisms or spikes that goal is already well served by the conductance-based and reduced biophysical models we cite. Rather, its contribution is to make intrinsic bursting dynamics readily incorporable into neural circuit models that can be used in complex settings, with parameters that map directly onto quantities that circuit-level neuroscience most often measures and tunes in models (burst duration, duty cycle, amplitude, shape, input dependence). Indeed, Reviewer #1 notes that: "The purpose of the model and applied abstractions are well articulated and justified [...] This allows modelers to focus on circuit interactions and is especially useful when details of intrinsic currents and bursting mechanisms are unknown. One could even imagine a scenario where this model would help identify predictions on key underlying burst generation mechanisms."

      We see our work as a neuroscience contribution as much as a neuro-robotics one. Bringing tractable, controllable bursting into this regime allows circuit modelers to study how intrinsic bursting interacts with circuit connectivity without committing to specific biophysical mechanisms, and it lets ANN-style models incorporate a class of dynamics that is biologically pervasive but currently underrepresented. We validated the model against two well-studied biological CPGs (the crustacean pyloric circuit and the mammalian locomotor circuit) precisely because the target use case is biological circuit modeling.

      While we remain committed to the belief that bringing bio-inspired neurons with interpretable intrinsic dynamics into ANN-style modeling of biological control systems is a useful contribution as an eLife Methods paper, the reviews have made clear that we have not situated our work clearly enough within the literature. In revision, we will sharpen this positioning in the Introduction and Discussion, and better situate the model relative to both the long tradition of non-spiking relaxation-oscillator and piecewise-linear modeling in neuroscience and also to current trends in simulated control.

      Public Reviews:

      Reviewer #1 (Public review):

      (1) Formal analysis

      The paper heavily relies on numerical demonstrations but does not provide a formal analysis of stability, bifurcations, or entrainment. While appropriate for the intended purposes, a more formal footing could strengthen the model.

      We agree that a formal dynamical-systems treatment would deepen the work, and we appreciate the reviewer's acknowledgment that the numerical-only approach may nevertheless be appropriate for the intended purposes. Because the model is hybrid (continuous dynamics combined with discrete switching rules), a full formal analysis is non-trivial, and we view it as a substantial follow-up rather than something to fold into the present manuscript. In revision, we will discuss more explicitly the opportunities such formal analysis presents.

      (2) Parameter tuning and parameter-space characterization

      It is less clear how model parameterization was chosen, how behavior depends on parameterization, and in what parameter ranges certain behavior can be expected.

      We agree that this would substantially improve usability, and we will expand this aspect of the paper. The revision will include: (a) more details describing how parameters maps onto observable features of the bursting waveform, (b) recommended parameter ranges and the qualitative behaviors expected at their boundaries, and (c) practical guidance for tuning the model to match observations or embed into circuits.

      (3) Locomotor CPG interneuron ablation and noise

      The correspondence of these silencing/ablation of neuron classes has not been shown by the model. Importantly, though, it appears that authors didn't show how the model in general behaves under the influence of noise.

      The reviewer is right that the cited work establishes validity of the circuit model in large part through silencing/ablation experiments, and we did not reproduce those experiments. We understand those gait expression phenomena to be arising from non-bursting interneuron activations and a robust solution found for connection weights between them. The half-center bursting neurons only see a time-varying input signal, and their response is well-characterized by the constant, pulse, and periodic analyses we perform. As such, we chose to reproduce a few key experiments to retain a focus on our simplified neuron model. We will rephrase the relevant passages to make this scope explicit and ensure that our reproduction claims are appropriately stated. We will also expand on how the model interfaces with noise together with the proposed parameter-space characterization.

      Reviewer #2 (Public review):

      (1) Biological relevance

      Central pattern generators and other bursting neurons use specific physical principles to generate their bursts of activity. These principles place constraints on the tuning of these bursts, including relationships between active and silent phase durations and other properties. By discarding these relationships, the proposed model risks losing key constraints that affect performance in biologically relevant scenarios.

      We agree that biophysical models impose constraints that arise from underlying mechanisms. For instance, as input alters the curved shape of nullcline-v in Figure 1, the active/quite phase durations and duty cycle change in constrained ways. The question seems to be if our model is too flexible for instance, making it too easy to achieve desired phase durations, duty cycles, and other input-dependent responses. We see this as a valuable feature of our model, not a bug. Firstly, even if our model may be expressive enough to achieve a variety of response profiles (as in Figure 3—figure supplement 3), the careful modeler will ensure matching to experimental observations. Moreover, in many circuit systems, the relevant biophysical details are often unknown for the specific neurons being modeled as noted by Reviewer #1, and the modelers' primary goal is to reproduce circuit-level activity. Such can be achieved easily with a simplified model, and also with a biophysical model as data becomes available. Finally, we should note that modelers can and do tune the parameters of biophysical models within determined ranges in order to achieve desired phase durations and duty cycles, relaxing constraints somewhat in order to reproduce appropriate activity.

      It is also important to note that spikes within bursts can be important and of interest. [...] The authors' model is specific to square-wave bursting.

      We agree that spikes are important and interesting in many settings, and we believe that biophysical models would be most appropriate in these cases. In many cases, too, some abstraction and simplification is desirable, and this would not necessarily detract from the model's biological relevance. As we discuss in our high-level comments, we aim to bring intrinsic bursting dynamics into the ANN-style modeling regime that typically neglects intrinsic dynamics altogether. While the simplified model may be limited in some ways, it is nevertheless useful for many common biologically relevant scenarios, as validated by our circuit experiments. Finally, we would note that many of the raised limitations (no intra-burst spike structure, restricted bursting class, abstracted constraints) are shared by the relaxation-oscillator and piecewise-linear traditions that the reviewer cites approvingly, which suggests that our model lies along a familiar abstraction continuum rather than outside it. In revision, we will explicitly acknowledge that the model captures a basic/regular form of bursting within a broader taxonomy, and clarify the conditions under which abstracting the biophysical constraints is appropriate.

      (2) Practicality

      The model makes use of various cut-off functions and other aspects that are implemented as rules. Combining rules with differential equations makes for an awkward modeling framework

      On the modeling framework, we would defend the hybrid formulation (rules + ODE) as our aim is to prioritize usability by modelers, not the simplicity or elegance of equations. While a "pure-ODE" Fitzhugh-Nagumo-style polynomial may seem simple and elegant—with dv/dt = av^3 + bv^2 + cv + d and a, b, c, d parameters as the reviewer has pointed out a lot of complexity can arise from this. Tuning these parameters is far from intuitive, as small changes can produce nonlinear effects and qualitative shifts in behavior. Achieving the right phase durations, input-dependent scaling, waveform amplitude and shape, phase delays, and other characteristics simultaneously to match experimental data is quite cumbersome in the elegant models, not to mention the biophysical models. In contrast, these characteristics are easy to control in our model, because we translate complex dynamical behavior from implicit to explicit and surface a set of interpretable and tunable parameters.

      The authors argue for their model based on the idea that more biophysical models are difficult to tune, yet they compare their model to a biophysical one that they were able to tune to achieve the various patterns that they study. They do not give any indication of how easy or hard it was to tune their own model [...] The biophysical model seems to have 22 parameters, whereas the simplified one has 21 in Table 2, which is essentially the same number.

      To clarify, we did not tune the biophysical model, but rather copied its parameters from the cited work. We will make this more explicit in the relevant Methods section.

      We could not simply specify or tune these parameters because they have complex biological priors that must be derived from experimental data for example, the membrane capacitance (20 pF), ionic conductance and reversal potentials (4.5 nS, -62.5 mV), and many gating kinetics parameters (slopes, midpoints, time constants for sigmoid/bell curves).

      It is often the case that such parameters must be estimated in specific preparations then reused and refined over many years. For instance, the biophysical model we compare to borrowed parameters from (Kim et al. 2022), which retuned time constants relative to (Danner et al. 2017), which altered NaP conductance from (Danner et al. 2016), which retuned duty cycles from (Molkov et al. 2015), which adapted from respiratory networks of (Rubin et al. 2008), which used gating kinetics parameters from (Butera et al. 1999). Similarly, the crustacean pyloric circuit model we compare to is from (Alonso and Marder 2020), which augmented the circuit and parameters of (Prinz et al. 2004), which sampled from a database of procedurally generated parameters from (Prinz et al. 2003), which developed parameter priors from the lobster STG experimental work of (Turrigiano et al. 1995). These brief descriptions of the multi-decade lineage of parameter sets omit the substantial parallel and preceding work related their development, but they suffice to demonstrate the incredible science and effort that goes into building biophysical models for particular circuits. Such data is often unavailable and such detail is often undesirable for different research goals, in which case our simplified model is a valuable and practical tool.

      The key parameters of our simplified model are observable quantities like active/quiet durations (in seconds), input-dependent duration scaling (as a fraction of intrinsic durations), input strength that induces tonic firing, etc. As such, tuning the bursting neuron parameters for circuit models was easy, with manual tuning from scratch taking less than 1 day. As Table 3 shows, the resulting parameters are often simple, elegant numbers and can be derived directly from observations. For instance, the pyloric PD active and quiet durations (200 ms and 800 ms, respectively) are set using the exact target values that (Alonso and Marder 2020) encode in their objective for a genetic algorithm to tune their model’s biophysical parameters (or rather, a subset of them for tractability).

      Thus, the 22-vs-21 comparison is not very informative, because the parameters are not comparable in kind. However, to make it easier to tune our model, we will revise the manuscript to include: (a) more details describing how parameters maps onto observable features of the bursting waveform, (b) recommended parameter ranges and the qualitative behaviors expected at their boundaries, and (c) practical guidance for tuning the model to match observations or embed into circuits.

      (3) Originality

      What the authors fail to acknowledge is that Rinzel, Terman, Kopell, and others did seminal work on neuronal activity [...] The authors do not cite the substantial existing work on piecewise linear models [...] I don't see any advantage of the proposed framework over the earlier relaxation oscillator setting, where many important mechanistic principles have already been analyzed, including extensions to networks.

      We thank the reviewer for these pointers and apologize for the gap in our literature coverage. While we had cited McKean, FitzHugh-Nagumo, Izhikevich, et al. as representative examples of different model classes, we agree that the broader relaxation-oscillator and piecewise-linear traditions deserve more comprehensive treatment including Rinzel, Terman, Kopell, et al. on relaxation-oscillators; and Hahnloser, Coombes, Aguirre, et al. on piecewise-linear models. We will expand the related work discussion and clarify how our contribution is novel and valuable.

      To be clear, we do not claim to be the first to use piecewise-linear models for neurons. Our intended contribution is the specific construction a rectangular limit cycle whose horizontal/vertical decoupling permits a closed-form mapping from interpretable parameters to burst features and the demonstration that this construction integrates cleanly into firing-rate circuit models of biological CPGs, which we believe will provide realism for more complex models with learned components.

      Moreover, in contrast to many other relaxation-oscillator models including the elegant Fitzhugh-Nagumo-style model we discussed above, our model is not aimed at establishing mechanistic principles or being simple enough to analyze formally. It is a practical tool that affords precise control of many bursting characteristics, which is important for closer alignment between firing-rate circuit models and biological activity. We will state this contribution more precisely in the revision so it is not conflated with a broader novelty claim.

      Reviewer #3 (Public review):

      (1) Novelty of piecewise-linear approximation

      The use of piecewise linear approximations to explicitly estimate properties of biophysical neurons is a well-known and common technique. This study adds nothing to the technique in terms of novelty.

      We agree that piecewise-linear approximations of neurons are not themselves novel, and we have not intended to claim otherwise: We cite the McKean model as a direct predecessor and, prompted by Reviewer #2, we will substantially expand citations to the relaxation-oscillator and piecewise-linear traditions (Rinzel, Terman, Kopell, Hahnloser, Coombes, Aguirre, et al.). Our intended contribution is not the use of piecewise-linear pieces per se but the specific construction: a rectangular limit cycle whose horizontal/vertical decoupling permits a closed-form, interpretable mapping from burst features (duration, duty cycle, amplitude, shape, input dependence) to dynamics, and clean integration into firing-rate circuit models of biological CPGs. We will revise the relevant passages so this contribution and the boundaries of our novelty claim are stated precisely.

      (2) Dynamical system mechanism

      This is no better than having a look-up table [...] The neuron is restricted to what the user puts in, and therefore, calling it a dynamical system is entirely wrong.

      We would like to take the opportunity to clarify this point, because the model's behavior is much richer than the lookup-table characterization suggests. The model is closed-loop: trajectories evolve through coupled state variables whose response to time-varying input depends on current state, not on a precomputed table of input-to-output values.

      Specifically:

      (a) The input represents the net time-varying synaptic drive, not a clamped voltage level;

      (b) The adaptation and voltage variables evolve according to coupled differential equations both on and off the limit cycle;

      (c) The duration and scale parameters only constrain active/quiet durations at input endpoints (-1, 0, +1), while the response at intermediate inputs is determined by the dynamics and other parameters such as the adaptation time constant, which can qualitatively reshape the constant-input response curve (Figure 3—supplement figure 3);

      (d) The response to a transient input depends on the current state for example, excitatory pulses early in the active phase have little effect, as in the biophysical model.

      This is a direct result of the simplified model using a similar limit cycle and nullcline structure as the biophysical model’s dynamical system (Figure 1).

      (3) PRC usage

      The phase resetting curves are used incorrectly. PRCs are useful when the perturbation is weak (soft) [...] A hard PRC would always reset the cycle to the fixed offset from the perturbation phase and is therefore uninformative in understanding dynamics.

      We appreciate this point and would like to clarify what we show and why. We present finite (non-infinitesimal) PRCs across a range of input strengths and signs, spanning both the "soft" (weak-perturbation) regime as well as the "hard" (strong-perturbation) regime, rather than focusing on the "hard" regime alone. Importantly, even in the strong-perturbation regime we do not see that pulses "always reset the cycle to the fixed offset from the perturbation phase". In Figure 4, we see that the active phase exhibits a non-resetting region whose size and location depend on parameters. This region governs entrainability and phase-locking offset, and is thus a key aspect of the neuron's dynamics. Moreover, the strong-perturbation regime is also biologically relevant in our circuit examples. For instance, the inhibitory connections within the pyloric CPG are strong enough to cause hard resets, and these resets shape the circuit-level dynamics we reproduce. We will revise the pulse-input section to state these points more explicitly so the rationale is clear for showing PRCs across a range of inputs.

      (4) Defining active/quiet phases

      The definition of the active and quiet parts of a burst is often less clear than what the authors suggest. Bursting neurons often do multiple bursts in a cycle, and therefore, substituting the burst envelope is a subjective matter. This is even more problematic in bursting neurons in the brain, where there is often no quiet period.

      We agree that waveform envelope can be subjective in some preparations, and we can add this caveat to the discussion.

      On neurons with no quiet period, we note that this behavior is in fact already supported in our model, as seen in Figure 3: under strong excitatory input, both the biophysical and simplified models enter a regime in which firing rate never reaches zero. As the model can generally be viewed as an abstract limit cycle that maps onto periodic waveforms through the firing function, the quiet phase need not correspond to literal silence.

      On more complex waveforms, we could imagine different firing functions that produce richer burst shapes including multi-peak bursts, but we have not tried this explicitly. Of course, for research questions concerned with irregular bursting or spike-to-burst transitions, a lower-level biophysical model would be more appropriate. In revision, we will expand on how the firing function could produce more complex burst shapes.

    1. eLife Assessment

      This study presents potentially important findings linking peripheral inflammation to the remodeling of perinodal adipose tissue and draining lymph nodes, suggesting a mechanism by which local tissue inflammation can reshape LN structure and metabolism. The idea is solid and supported by observations. However, the evidence remains incomplete in parts, as several conclusions rely on correlative weight and cellularity measurements, and macrophage involvement requires further validation.

    2. Reviewer #1 (Public review):

      The idea is super interesting, and the subsequent work is potentially significant because it links peripheral inflammation to remodelling of perinodal adipose tissue and draining lymph nodes. This suggests an antigen-independent manner by which local tissue inflammation can communicate with and reshape immune organ structure and tissue metabolism. However, the evidence is suggestive. For instance, many conclusions rely on correlational weight/cellularity relationships, models with confounders (spontaneous wounding; potentially systemic IMQ), and macrophage dependence inferred from a single pharmacologic approach without definitive depletion/lineage or tracer-based causal link.

      Major Comments:

      (1) "Wounding/fighting" evidence is confounding.

      Unless I am mistaken, a large part of the argument for inflammation-driven perinodal fat pad atrophy and LN expansion relies on spontaneous fighting injuries in co-housed CCR2-/- males, including animals "culled...due to excessive wounding." Because wound severity, duration, infection load, stress, and cage dynamics are uncontrolled, isn't it difficult to assign causality to "cutaneous inflammation"?

      (2) The "CCR2-independent macrophage" conclusion.

      The manuscript interprets persistence/accumulation of macrophages despite reduced inflammatory monocytes as CCR2-independent recruitment or local proliferation. However, CCR2 deficiency can alter immune baselines and long-term tissue remodelling. Perhaps consider bone marrow chimeras (WT to CCR2-/-, CCR2-/- to WT ????) or an inducible CCR2 deletion approach to separate developmental/systemic effects from acute inflammation-driven mechanisms. If "in situ proliferation" is proposed, include a direct readout (e.g., Ki67 in ATMs in the fat pad).

      (3) IMQ and systemic effects.

      The work relies on topical Aldara/imiquimod as an "inflammation without antigen" driver of distal LN/fat-pad remodelling. But IMQ is well known (and cited by the authors) to enter circulation and drive systemic responses, which could blur whether effects are truly draining-site specific vs systemic metabolic/inflammatory effects. It would be ideal to provide systemic context: plasma cytokines and/or metabolic readouts (e.g., circulating FFAs) to distinguish local vs systemic drivers.

      (4) Macrophage dependence is inferred from CSF1R inhibitor treatment.

      However, validation of macrophage depletion and specificity is incomplete. The manuscript uses AZD7507 (CSF1R inhibitor) and observes partial rescue of fat pad/LN phenotype while skin severity (PASI) is unaffected. But, to this reviewer, the data shown do not clearly quantify actual macrophage depletion efficiency in the target fat pad, and LN at endpoint, and CSF1R blockade can affect multiple myeloid populations. Therefore, show absolute macrophage counts (and likely other myeloid populations) in fat pad and LN with/without AZD7507 at the analysed timepoints, not only outcome weights. (The methods describe dosing but not endpoint depletion quantification??)

      (5) Fat pad atrophy/LN expansion is a correlation.

      The paper emphasises negative correlations between fat pad and LN weights/cellularity at baseline and with inflammation. But correlation does not establish whether fat pad lipolysis drives LN expansion, whether LN changes drive fat remodelling, or whether both reflect systemic mediators. Add tissue-level evidence distinguishing true adipocyte loss vs other contributors to "weight change" (e.g., oedema/fibrosis).

      (6) Evidence for "fatty acid donation" from fat pad to LN.

      The lipid data are described as "exemplary," and the inference that LN fatty acids originate from the fat pad is based on temporal ordering and relative abundance. This does not rule out plasma spillover, LN-intrinsic metabolism, or altered lymph flow.

    3. Reviewer #2 (Public review):

      The authors aim to demonstrate skin inflammation is associated with fat pad atrophy and lymph node expansion. They further propose that these phenotypes are driven by the recruitment and lipid metabolism of CCR2-independent macrophages.

      The authors took advantage of two skin inflammation models, fight-induced and imauimod-induced skin inflammation and analyzed multiple tissues, including skin, fat pads, and lymph nodes. Using a macropahge-depletion method (e.g., CSF-1R inhibitor), the authors further suggest the inverse correlation between fat pads atrophy and lymph node expansion is macropahge-dependent. While the study identifies this intriguing inverse correlation during skin inflammation, the causal pathway linking fat pad atrophy and lymph nodes enlargement has not been clearly established.

      To improve the rigor of the manuscript, the authors address the following concerns;

      (1) CCR2-deficient mice showed reduced inflammatory monocytes and monocyte-derived macrophages (PMID:16462739; 16341265). During tissue inflammation, CCR2+ classical monocytes are typically recruited to the injured peripheral tissues, including skin, where they differentiate into monocyte-derived macrophages (PMID:38474365). While inflammatory monocytes were reduced in the skin (Figure 3 d), fat pads (Figure 4a, S2D) of CCR2-deficient mice, macrophage numbers were significantly increased in these mice. It remains unclear whether CCR2-independent macrophages were newly recruited from alternative sources or tissue-resident macrophages underwent local self-proliferation to compensate for the loss of CCR2+ monocyte-derived macrophages.

      (2) In line 258, the authors state that there was "a significant reduction in CD11C- CD206+ anti-inflammatory macrophages (Figure 4b i-iii)". However, the quantification data in Figure 4b iii do not appear to show any reduction in anti-inflammatory macrophages in either males or females. Please reconcile this discrepancy between the text and the figure.

      (3) Although CD11C and CD206 were historically used as markers of inflammatory and anti-inflammatory markers, respectively. These markers are no longer considered sufficient to define the macrophage polarization state, particularly in adipose tissue, where they are constitutively expressed by resident macrophages (PMID:34210853). Numerous studies have demonstrated substantial macrophage diversity/heterogeneity across iWAT, eWAT, and brown fat tissues. The authors should discuss adipose macrophage diversity beyond the outdated M1/M2 frame.

    1. eLife Assessment

      This important study provides a comprehensive map of how touch-sensitive neurons in the fly head connect to downstream circuits, revealing parallel pathways that preserve spatial organization and identifying a developmentally defined circuit linking sensory input to grooming behavior. The evidence is convincing, with detailed anatomical reconstruction and quantitative analysis supporting the main claims, while the link to behaviour remains based on prior functional work. The study will be of interest to neuroscientists studying sensory processing and motor control, and provides an invaluable resource for future functional investigations.

    2. Joint Public Review:

      Summary:

      Calle-Schuler et. al. reconstruct all the pre- and post-synaptic neurons to the bristle mechanosensory neurons on the adult fly head to understand if neural circuits support the parallel mechanosensory pathways, which could be instrumental in shaping the sequential motor patterns during fly grooming. They find that most presynaptic neurons, interneurons and excitatory post synaptic neurons are also somatotopically organized, such that each neuron is more connected to bristles mechanosensory neurons that are closer on the head and less connected to bristles mechanosensory neurons that are further away. These include the direct BMN-BMN circuits, excitatory interneurons, as well as the inhibitory networks. They also identify that the one entire hemi-lineage 23b form excitatory postsynaptic circuit with BMNs, highlighting how these circuits and hence their function could be developmentally determined.

      Strengths:

      This is a complete map of the all the neurons which make 5 or more pre- and post-synaptic connections of the fly head BMNs. Using this, the authors have identified various trends such as ascending neurons provide most of the GABAergic inhibitory input, which could provide the presynaptic inhibition essential for the parallel model for sequential grooming generation. Moreover, they identified that the entire cholinergic hemilineage 23b is postsynaptic to BMNs. Both their excitatory postsynaptic connectivity and inhibitory presynaptic connectivity demonstrate core motifs of the parallel circuits necessary for the hierarchical suppression model of grooming sequence.

      Weaknesses:

      Somatotropic organization with hierarchical suppression is an elegant mechanism to generate sequential motor sequence during grooming. Yet, anatomical connectivity alone, in absence of functional connectivity, cannot explain the grooming motor sequences. Future work should be aimed at mapping the functional connectivity with behavioral sequence.

      Closing statement:

      The authors have addressed the major concerns regarding clarity, scope, and interpretation. The manuscript is now significantly improved and is clearly framed as an anatomical resource that identifies circuit motifs consistent with existing models of grooming control.

    3. Author response:

      The following is the authors’ response to the original reviews.

      We sincerely thank the Reviewers for their careful reading and insightful critiques, which have helped make the manuscript clearer and more impactful.

      In response to the Reviewers, we substantially revised the manuscript to improve clarity, framing, and accessibility for readers outside the Drosophila connectomics community, while keeping the core conclusions unchanged. We clarified the study’s scope (defining parallel circuit architecture rather than testing sufficiency for reconstructing grooming sequence order), restructured the last Introduction paragraph, several Results sections, and the Discussion to foreground the main findings and their relevance to the parallel hierarchical-suppression model. We also added key methodological clarifications for non-specialist readers, including how BMN classes were identified in FAFB by a correlative approach (with type-level, not single-bristle, resolution), how FlyWire/Codex synapse counts are defined (contacts vs T-bars), how sensory BMNs can have postsynaptic sites, and what is meant by ascending vs descending neurons in a brain-only dataset. Across the Results, we improved terminology and definitions (e.g., projection zones, hemilineage 23b, BMN nomenclature such as BM-InOm), clarified what derives from prior work (Eichler et al., 2024) versus new analyses, strengthened interpretation of BMN→motor connections as likely modulatory, and expanded explanation of postsynaptic partner categories. We also revised figures and legends to better highlight overlap/segregation and somatotopy, moved the cosine-similarity matrices into the main figures (new Figure 9), added a new graphical summary figure (new Figure 15), and explicitly acknowledged key limitations, including one-hemisphere analysis and lack of VNC coverage in FAFB.

      In addition, in response to the suggestion of a rank-order test relating BMN→second-order wiring to the grooming hierarchy, we clarified throughout the revised manuscript that this study does not aim to test whether connectivity alone is sufficient to reconstruct grooming sequence order, and we removed wording that could imply such a claim. As detailed in our response to that specific critique below, sequence sufficiency is outside the scope of this study, and a simple linear ordering based on aggregate synapse weights is not straightforward to interpret in this system (e.g., BM-Taste vs. BM-InOm output strength does not track grooming order, BMNs likely contribute to multiple behaviors, and head grooming order is not resolved at sufficient granularity). We therefore respectfully request that the sentence in the eLife Assessment suggesting that the paper is weakened by not including this analysis be removed. As currently written, it frames an out-of-scope analysis as a missing test of the manuscript’s main claims and may mislead readers about the paper’s intended contribution: a synaptic-resolution anatomical definition of parallel BMN circuit architecture and motifs consistent with hierarchical suppression.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Calle-Schuler et. al. reconstruct all the pre- and post-synaptic neurons to the bristle mechanosensory neurons on the adult fly head to understand how neural circuits determine the sequential motor patterns during fly grooming. They find that most presynaptic neurons, interneurons, and excitatory postsynaptic neurons are also somatotopically organized, such that each neuron is more connected to bristles mechanosensory neurons that are closer on the head and less connected to bristles mechanosensory neurons that are further away. These include the direct BMN-BMN circuits, excitatory interneurons, as well as the inhibitory networks. They also identify that the entire hemi-lineage 23b forms excitatory postsynaptic circuits with BMNs, highlighting how these circuits and hence their function could be developmentally determined.

      Strengths:

      This is a complete map of all the neurons that make 5 or more pre- and post-synaptic connections of the fly head BMNs. Using this, the authors have identified various trends, such as ascending neurons providing most of the GABAergic inhibitory input, which could provide the presynaptic inhibition essential for the parallel model for sequential grooming generation. Moreover, they identified that the entire cholinergic hemilineage 23b is postsynaptic to BMNs.

      Weaknesses:

      Although the somatotropic organization is an elegant mechanism to generate sequential motor sequences during grooming, none of the analyses in the paper directly demonstrate that this somatotropic connectivity is sufficient to generate hierarchical suppression and reconstruct the grooming sequence. If somatotropic organization is sufficient, then hierarchical clustering should recover the grooming sequence. Their detailed connectome enables the authors to test if some networks are more crucial for grooming sequence than others: to what extent can each network individually (ascending neurons-BMN alone) or a combination (BMN-BMN, ascending-BMN, BMN-descending, etc.) recover the sequence observed during grooming. If all the pre- and post-synaptic neurons put together cannot explain the sequence, then the sequence is probably determined by individual synaptic strengths or other key downstream neurons.

      We appreciate the Reviewer’s interest in how BMN connectivity relates to the grooming sequence, and agree that understanding how mechanosensory circuits contribute to hierarchical action selection is an important direction. In this study, however, our goal was not to test whether connectivity alone is sufficient to reconstruct the full grooming sequence. Rather, we focused on defining the parallel circuit architecture underlying individual grooming movements and on identifying anatomical features—most notably extensive presynaptic inhibition—that are consistent with previously proposed models of hierarchical suppression.

      We recognize that aspects of the Introduction and the references cited there to prior work on the grooming sequence may have led some readers to expect a direct sequence-prediction analysis. To address this, we revised the Introduction and Results to clarify the scope of the study and adjusted language to avoid implying that we aimed to derive the grooming order from connectivity. Consistent with this framing, the Abstract mentions the sequence only in the context of presynaptic inhibition, which provides anatomical support for existing models of hierarchical suppression. We therefore do not draw conclusions about the ordering of grooming movements from the connectome itself. Details of the specific manuscript revisions are provided below in the Recommendations for authors section.

      The Reviewer suggests testing whether somatotopic organization is sufficient to recover the grooming sequence by clustering BMN connectivity or by examining whether specific subnetworks (e.g., BMN → ascending, BMN → descending, or BMN→BMN pathways) reproduce the sequence. We carefully considered these possibilities. However, several factors currently limit the interpretability of such analyses.

      First, synaptic weight alone does not align with known features of the grooming sequence. For example, BM-Taste neurons contribute the majority of BMN synaptic output, yet proboscis grooming is not the first head grooming movement, whereas BM-InOm neurons contribute less than 9% of total output despite eye grooming occurring first. As we now clarify in the Results, global synapse number therefore does not predict the order of grooming movements.

      Second, BMNs likely distribute signals across multiple behavioral pathways beyond grooming, including circuits involved in feeding and escape behaviors. Because the connectome aggregates all postsynaptic targets, analyses based solely on connectivity strength cannot isolate the subset of circuits specifically responsible for grooming-related action selection.

      Third, the head grooming sequence itself has not been resolved at the spatial granularity required for such analyses across head regions. While eye grooming is well characterized as the first head movement, the relative ordering among antennae, proboscis, and other head bristle regions remains less clearly defined, making it difficult to evaluate correspondence between connectivity-derived rankings and behavioral order.

      Because of these limitations, we concluded that clustering or network-based analyses aimed at reconstructing the grooming sequence from connectivity alone would be difficult to interpret and therefore chose not to include them. Accordingly, we have deliberately avoided claiming that the connectome is sufficient to generate the grooming sequence. Instead, we interpret the somatotopic architecture and inhibitory circuitry described here as anatomical features consistent with previously proposed models of hierarchical suppression, while leaving the question of sufficiency for future studies that integrate connectomics with functional and behavioral analyses.

      Given that we do not claim sufficiency of the connectome for producing the grooming sequence, we respectfully request that the eLife Assessment avoid framing the manuscript around this expectation, as wording that implies the manuscript should reconstruct the sequence from connectivity could misrepresent the intended scope of the study and potentially mislead readers about its primary contributions.

      Reviewer #2 (Public review):

      Summary:

      Schuler et al. present an extensive analysis of the synaptic connectivity of mechanosensory head bristles in the brain of Drosophila melanogaster. Based on the previously described set of bristle afferent neurons, (BMNs), located on the head, the study aims to provide a complete, quantitative assessment of all synaptic partners in the ventral brain. Activation of head bristles induces grooming behavior, which is hierarchically organized, and hypothesized to be grounded in a parallel cellular architecture in the central brain. The authors found evidence that, at the synaptic level, neurons downstream of the BMN afferents, namely the postsynaptic LB23 interneurons and recurrent GABAergic neurons (involved in sensory gain control), are organized in parallel, following the somatotopic organization described for the BMN afferents. This study, therefore, represents an important step towards a better understanding of the cellular circuits that govern the hierarchical order of sequentially organized grooming behavior in Drosophila melanogaster.

      The study is well done, the images are well designed and extensive in number, but the account is challenging to read and digest for the reader outside the Drosophila /connectome community. It is amazing what can be done with the connectome nowadays using the up-to-date FAFB dataset, the analytical and visual tools (as in FlyWire), in combination with known anatomy/physiology/behavior in DM. I suggest that the authors provide more detail on hemilineages, their relationship to the FAB connectome, the predicted neurotransmitter identity, and the use of statistical CatMAID tools used in some of the Figures.

      A graphical summary at the end of the study would be very useful to highlight the important findings focusing on neuron populations identified in this study and their position in the hypothesized parallel central circuitry of BMNs.

      We thank the Reviewer for the thoughtful and constructive comments. In response, we substantially revised the manuscript to improve clarity and accessibility, particularly for readers outside the Drosophila connectomics community. We rewrote portions of the Introduction, Results, and Discussion to better foreground the main findings, reduce density, and more clearly distinguish prior work from the new analyses presented here. We also added methodological clarification throughout, including how BMN classes were identified in the FAFB dataset using a correlative, type-level approach, how FlyWire/Codex synapse counts are defined, and clarified terminology related to projection zones, pre- versus postsynaptic structure, and partner classes. To address the Reviewer’s request for more developmental context, we added a more explicit definition of hemilineages at first mention in the Abstract and Results. In addition, we revised figures and legends to make the somatotopic and parallel organization of the circuitry easier to interpret, including moving the cosine-similarity matrices into the main figures. Finally, in direct response to the Reviewer’s suggestion for a higher-level synthesis, we added a new graphical summary figure (Figure 15) at the end of the manuscript to highlight the principal neuron populations identified in the study and their proposed positions within the parallel central BMN circuitry. Together, we believe these revisions have made the manuscript clearer, more accessible, and better framed for a broad readership while preserving its core conclusions. Details of these changes are provided in the Recommendations for the authors section.

      Reviewer #3 (Public review):

      Summary:

      The authors set out to extend their previous mapping of Drosophila head mechanosensory neurons (Eichler et al., 2024) by reconstructing their full second-order connectome. Their aim is to reveal how bristle mechanosensory neurons (BMNs) interface with excitatory and inhibitory partners to generate location-specific grooming movements, and to identify the circuit motifs and developmental lineages that support this transformation.

      Strengths:

      The strengths of this work are clear. The authors present a comprehensive synaptic-resolution connectome for BMNs, identifying nearly all of their pre- and postsynaptic partners. This dataset reveals important circuit motifs:

      (1) BMNs provide feedforward excitation to descending neurons, feedforward inhibition to interneurons, and are themselves strongly regulated by GABAergic presynaptic inhibition.

      (2) These motifs together support the idea that BMN activity is locally gated and hierarchically suppressed, fitting well with known behavioural sequences of grooming.

      (3) The study also shows that connectivity preserves somatotopy, such that BMNs from neighbouring bristle populations converge onto shared partners, while distant BMNs remain segregated.

      (4) A developmental analysis reveals both primary and secondary partners, suggesting a layered scaffold plus adult-specific elaborations.

      (5) Finally, the identification of hemilineage 23b (LB23) as a core postsynaptic pathway - incorporating previously described antennal grooming neurons (aBN2) - provides a striking link between developmental lineage, anatomical connectivity, and behavioral output.

      (6) Together, the dataset represents a valuable resource for the neuroscience community and a foundation for future functional studies.

      Weaknesses:

      There are also some weaknesses that mostly only limit clarity.

      (1) The writing is dense, with results often presented in a cryptic fashion and the functional implications deferred to the discussion. As a result, the significance of circuit motifs such as BMN→motor or reciprocal inhibitory loops is sometimes buried, rather than highlighted when first described.

      We thank the Reviewer for this helpful suggestion. In response, we revised several sections of the Results to improve clarity and more clearly highlight the functional significance of key circuit motifs when they are first introduced. Specifically, we streamlined dense passages and added brief explanatory statements linking motifs such as reciprocal inhibitory loops to their potential roles in the proposed parallel circuit architecture. Additional details of these revisions are provided in the Recommendations for the authors section below.

      (2) Some assumptions require more explanation for non-specialist readers - for example, how bristle identity is inferred in EM in the absence of cuticular structures, or what is meant by "ascending" and "descending" in a dataset that does not include the ventral nerve cord. While some of this comes from the earlier paper, it would help readers of this one to explain this.

      In response, we added clarifying text describing how BMN types were identified in the FAFB dataset using a correlative approach based on stereotyped projection morphologies and prior light-level anatomical data, and we explicitly state the limits of this type-level assignment in the absence of cuticular bristles in the EM volume. We also expanded the explanation of partner categories, including what is meant by “ascending” and “descending” neurons in a brain-only dataset. Additional details of these revisions are provided in the Recommendations for the authors section.

      (3) Visualization choices also sometimes obscure key conclusions: network graphs can be visually appealing but do not clearly convey somatotopy or BMN-type differences; heatmaps or region-level matrices would make the parallel, block-like organization of the circuit more evident.

      We incorporated connectivity matrices (cosine-similarity heatmaps) into the main figures to more clearly illustrate the somatotopic and parallel organization of BMN connectivity, complementing the network graph visualizations (new Figure 9). These matrices make the block-like structure of BMN partner relationships more apparent and help highlight differences among BMN types; additional details are provided in the Recommendations for the authors section.

      (4) The data might also speak to roles beyond grooming (e.g., mechanosensory modulation of posture or feeding), and a brief acknowledgement of this would broaden the impact.

      We added text acknowledging that BMNs contribute to additional behaviors beyond grooming, such as feeding and other mechanosensory-guided actions. These roles are supported by prior studies of bristle function and are also consistent with the diverse downstream circuits revealed in the connectome. This clarification broadens the interpretation of the dataset while maintaining the primary focus of the study on grooming-related circuitry.

      (5) The restriction to one hemisphere should be explicitly acknowledged as a limitation when framing this as a 'comprehensive' connectome.

      We thank the Reviewer for this suggestion. We now explicitly acknowledge this limitation in both the Results and Discussion.

      In the Results section entitled “The BMN connectome” we added a sentence at the end of the paragraph that mentions the limitations. This sentence reads: “In addition, because our analysis was restricted to BMNs entering the left hemisphere, the complete right-side BMN connectome is not included, limiting assessment of bilateral symmetry, inter-hemispheric coordination, and variability across sides.”

      The last paragraph of the first Discussion section describes limitations to our ‘comprehensive’ connectome. The text in this paragraph pertaining to the left/right variability reads: Second, the analysis focuses only on BMNs from the left hemisphere. Although contralateral neurons synapsing with left-side BMNs are included, the absence of the right-side BMN connectome limits assessment of bilateral symmetry, interhemispheric coordination, and side-to-side variability.

      Overall, the authors achieve their main goal: they convincingly show that BMNs connect into parallel, somatotopically organized pathways, with LB23 providing a key lineage-based link from sensory input to grooming output. The dataset is carefully analyzed, and while the presentation could be streamlined, the connectome will be a valuable resource for researchers studying sensory processing, motor control, and the logic of circuit organization.

      Recommendations for the authors:

      Reviewing Editor Comments:

      We enjoyed this work and are enthusiastic about its contribution: the resource is valuable, and the anatomical evidence is solid. Most of our suggestions concern clarity and visualization, detailed below.

      In addition, the editors and reviewers felt one focused analysis would materially strengthen the paper: please use the BMN→second-order synapse weights to produce a similarity-based, one-dimensional order of BMN types and test its agreement with the known grooming sequence (e.g., via a rank correlation). A positive result would support sufficiency of the mapped wiring for the sequence; if not, the claims can be framed as "consistent with" rather than "sufficient for."

      We appreciate the Reviewers’ interest in how BMN connectivity relates to the grooming sequence, and agree that understanding how mechanosensory circuits contribute to hierarchical action selection is an important direction. In this study, however, our goal was not to test whether connectivity alone is sufficient to reconstruct the full grooming sequence. Rather, we focused on defining the parallel circuit architecture underlying individual grooming movements and on identifying anatomical features—most notably extensive presynaptic inhibition—that are consistent with previously proposed models of hierarchical suppression.

      We recognize that references in the Introduction to prior work on the grooming sequence may have led some readers to expect a direct sequence-prediction analysis. To address this, we revised the Introduction and Results to clarify scope and adjusted language to avoid implying that we aimed to derive the grooming order from connectivity. Consistent with this framing, the Abstract mentions the sequence only in the context of presynaptic inhibition, which provides anatomical support for existing models of hierarchical suppression. We do not draw conclusions about the ordering of grooming movements from the connectome itself.

      The Reviewer-suggested analysis—using BMN-to-partner synaptic weights to derive a linear ordering of BMN types—is conceptually reasonable, but its interpretability is limited at present. First, synaptic weight alone does not align with known features of the grooming sequence: BM-Taste neurons contribute the majority of BMN synaptic output, yet proboscis grooming is not the first head movement, whereas BM-InOm neurons contribute less than 9% of output despite eye grooming occurring first. Second, BMNs likely project to multiple pathways supporting distinct behaviors, such as feeding and escape, complicating any attempt to infer a single grooming hierarchy from aggregate connectivity. Third, the head grooming sequence itself has not been resolved at the granularity required for such an analysis, particularly among the antennae, proboscis, and other head bristle regions. Accordingly, we have deliberately refrained from making claims that connectivity is sufficient to generate the grooming order.

      Given that we do not claim sufficiency of the connectome for producing the grooming sequence, we respectfully request that this point be removed from the public eLife Assessment, as its current wording implies an unmet expectation outside the intended scope of the study and could mislead readers about the manuscript’s primary contributions. We appreciate the opportunity to clarify our framing and to ensure that the goals and outcomes of the work are accurately represented.

      Revisions.

      (1) Gave the last paragraph of the Introduction more structure to clearly state the main findings of the study in the context of what we learned about the circuit architecture proposed by the parallel model of hierarchical suppression.

      New paragraph: “Here, we define the synaptic connectivity of head BMNs by mapping nearly all of their pre- and postsynaptic partners—including other BMNs, ascending and descending neurons, interneurons, and motor neurons—within the FAFB dataset. Consistent with a parallel model, we find that both presynaptic and postsynaptic partners are somatotopically organized, preserving the spatial layout of the bristle map and revealing a set of parallel mechanosensory pathways that correspond to distinct head regions. Within the postsynaptic population, we identify the developmentally-related cholinergic hemilineage 23b (LB23), whose members exhibit region-specific BMN connectivity and include neurons previously shown to elicit aimed head grooming movements when activated. This demonstrates how LB23 neurons participate in parallel postsynaptic pathways that may drive discrete components of head grooming. On the input side, BMNs receive substantial presynaptic inhibition from predominantly GABAergic partners, providing strong feedback and feedforward control over mechanosensory signaling. This inhibitory architecture is consistent with hierarchical-suppression models in which inhibition regulates sensory gain and prioritizes competing actions in the grooming sequence. Together, this mechanosensory connectome reveals core organizational principles—parallel somatotopic architecture, region-specific excitatory pathways, and strong inhibitory regulation—that are thought to constitute foundational circuit motifs supporting head grooming.”

      (2) In the Results section entitled “BMN synapses show large quantitative variation across types”, we added text to the third paragraph that makes it clear that raw synapse numbers alone do not predict the sequence, if one just compares the first movement (eye grooming) and a later movement in the sequence (proboscis grooming).

      That text reads: “Notably, if grooming order were driven simply by relative sensory drive—i.e., by BMN types with the strongest synaptic output eliciting cleaning of their corresponding locations first—then synapse number should track the grooming sequence. Instead, differences in synapse number do not align with the order of the grooming sequence: BM-Taste neurons account for the majority of BMN output, yet proboscis grooming is not the first head grooming movement performed, whereas BM-InOm neurons contribute only a small fraction of output despite eye grooming occurring first (Figure 1E, Figure 2A,B). This indicates that global synapse number alone is not a reliable predictor of the grooming sequence.”

      (3) In the results section entitled “BMN postsynaptic partners are excitatory and inhibitory”, we added text to two different sentences to better link the results with what we are trying to test with respect to the parallel model of hierarchical suppression.

      Modified sentence 1: “This excitation is hypothesized in the parallel model to help form BMN feedforward circuits that elicit aimed grooming of specific body locations, while feedforward inhibition could mediate suppression of competing grooming movements (Figure 1 – figure supplement 1A, B).”

      Modified sentence 2: “Taken together, the BMN postsynaptic partners include a diverse set of neurons that mediate both feedforward excitation and inhibition and feedback inhibition, features predicted by the parallel model.”

      (4) In the Results section entitled “BMNs and LB23 neurons form somatotopic pathways that elicit aimed grooming, we added text to the first sentence that better ties the section to the overall goals of the manuscript.

      That text now reads: “In accordance with the parallel model of grooming, we hypothesize that BMNs connect with somatotopically organized excitatory parallel pathways eliciting aimed grooming of specific head locations (Figure 1 – figure supplement 1A, C).”

      Reviewer #1 (Recommendations for the authors):

      (1) The connectivity matrix (like that in Lesser et al., 2024, Nature, and also in Figure 9, Figure Supplement 1 of this paper) is an easier-to-digest representation of the various connections shown in Figure 2.

      We agree that connectivity matrices provide a clearer and more accessible representation of these data. Based on the context of this and other comments, we understand the Reviewer to be referring to Figure 9 rather than Figure 2. In response, we have moved the cosine-similarity connectivity matrices previously shown in Figure 9 – figure supplement 1 into the main manuscript, where they now appear as Figure 9.

      These matrices depict similarity among BMN postsynaptic partners. At present, we are unable to generate equivalent matrices for presynaptic partners due to recent personnel constraints in the lab. For this reason, we have retained the original network-graph representation (now Figure 10) to display the full pre- and postsynaptic connectome structure.

      We hope this compromise addresses the Reviewer’s request while clearly presenting the available analyses.

      (2) Again, "Cosine based clustering is essential to demonstrate the somatotropic organization" the data in Figure 9 - Figure Supplement 1 demonstrates this better than the main Figure 9. This supplementary figure would be a great addition to the main manuscript.

      Please see the preceding response for details on the changes that we made to address this reviewer comment.

      (3) Figure 9 - Figure Supplement 1A: Can the authors explain why the InOm occur in two clusters (red in top and bottom)? Do InOm neurons show two different kinds of connectivity patterns?

      This is a great question! We had written a possible explanation for this in the Discussion section entitled “A synaptic resolution connectome of a head somatotopic map”.

      “One notable exception to this pattern is the BM-InOm population, which occupies a central position in network diagrams and exhibits broad connectivity similarity with BMNs from across the head (Figure 9A, Figure 10A-E). This likely reflects the large surface area of the compound eyes, which span dorsal, ventral, and posterior regions and neighbor multiple bristle populations. Consistent with previous work showing morphological diversity among BM-InOm neurons (Eichler et al., 2024), our output connectivity analysis suggests the presence of multiple BM-InOm subtypes defined by distinct partner profiles (Figure 9A). Future work will be needed to determine how this heterogeneity relates to spatial organization within the eye.”

      Reviewer #2 (Recommendations for the authors):

      All further comments for the authors are aimed at a better understanding of the text and for clarity. The manuscript needs revision.

      (1) Ventral brain:

      Please specify this term. Is it the SEG, or the gnathal ganglion? Throughout the paper, 'ventral brain', or 'brain', is the only anatomical terms you use. Are all pre-/post- partners of BMNs located in this region? I understand that you provide a statistical analysis on a network level, here, but as far as I know, the neuropil regions in Drosophila are reported in more detail on the macroscopic level (see, e.g., Itoh).

      Based on our understanding of the Ito et al reference, SEG was “retired” in that manuscript in favor of gnathal ganglia. We considered using the term subesophageal zone (SEZ) in the manuscript, but ultimately chose not to adopt it. In the Drosophila brain nomenclature (Ito et al., 2014), the SEZ is defined as a region below the esophagus that encompasses multiple neuropils, such as the gnathal ganglia (GNG) and saddle (SAD), rather than a single anatomically discrete structure.

      In our dataset, the GNG are the ventral-most neuropil containing the BMN projections and the highest density of BMN-related synapses, and we therefore refer to this structure explicitly where appropriate. However, BMN pre- and postsynaptic partners are not confined to the GNG or to the SEZ as a whole; some partner neurites extend dorsally into additional neuropils. As a result, the term SEZ does not accurately capture the full spatial extent of the BMN connectome analyzed here.

      For clarity and consistency across analyses that span multiple adjacent neuropils, we therefore use the broader functional descriptor “ventral brain”, while explicitly identifying the gnathal ganglia and other neuropils when discussing neuropil-level synapse distributions. We believe this approach most accurately reflects both the anatomical organization of the circuit and the scope of our analysis.

      Given this Reviewer’s comment, we anticipate that not mentioning the SEZ in this manuscript might result in similar confusion among readers of our manuscript. Therefore, we now mention the SEZ and the supraesophageal zone (SPZ) at the end of the Results section entitled “Synapses of BMN partners are mostly concentrated in the ventral brain”. We also added the SEZ and the SPZ to the new last summary figure (Figure 15) to help clarify the locations of the BMNs and their second order connectome.

      That text reads: “Thus, while most neuropils containing synapses of second-order BMN partners are located below the esophagus (in the subesophageal zone, SEZ), we found more limited involvement of neuropils in the supraesophageal zone (SPZ; above the esophagus), suggesting relatively limited direct top-down control.”

      (2) Please provide greater clarity in your use of the terms synapse-presynapse-pre- and postsynaptic partners:

      In insects, synapses are polyads. It is therefore essential to distinguish whether by presynaptic (pre) you mean 1. the number of T-bars (presynaptic sites) or 2. the number of (outgoing) synaptic contacts made by a single presynaptic T-bar site. For example, a synapse configured as a tetrad (a polyad) consists of one presynaptic T-bar opposed to four postsynaptic profiles and can be counted either as one synapse (one presynaptic site, one T-bar, in CATMAID: a presynaptic connector) OR as four (outgoing) synaptic connections since the single T-bar connects to four different postsynaptic profiles. This distinction is crucial for quantifying synaptic networks in insects. Thus, the "number of synapses" may refer to 1. The number of presynaptic sites = number of T-bars = number of polyads formed by a particular neuron. 2. the number of actually outgoing synaptic contacts, a number that also reflects the degree of polyadicity. 3. number of postsynaptic sites (that is easy).

      This distinction (regarding the counts of presynapses) was reported in previous connectome studies (e.g., Horne, 2018; Gruber, 2025; Schlegel,2023). Schlegel notes: ' Insect synapses are polyadic, i.e., each presynaptic site can be associated with multiple postsynaptic sites. In contrast to the Janelia hemibrain dataset, the synapse predictions used in FlyWire do not have a concept of a unitary presynaptic site associated with a T-bar. Therefore, presynapse counts used in this paper do not represent the number of presynaptic sites but rather the number of outgoing connections.' End of citation from Schlegel.

      We thank the Reviewer for highlighting this important distinction. We now clarify in the Materials and methods that synapse counts are based on Codex/FlyWire annotations, which report individual pre- and postsynaptic contacts rather than unitary presynaptic sites (T-bars), consistent with prior FlyWire-based connectome studies (e.g., Schlegel et al.). We also added a brief clarification in the Results indicating that pre- and postsynaptic numbers refer to incoming and outgoing contacts.

      We added a sentence to the first section of the Materials and methods entitled “Connectome data and neuron meshes”. This text reads: “Synapse counts throughout this study are based on FlyWire/Codex synapse annotations and represent the number of individual pre- to postsynaptic contacts (incoming or outgoing connections), rather than the number of presynaptic active sites (T-bars); thus, presynaptic counts reflect polyadic connectivity as described previously (Schlegel et al., 2023).”

      (3) In your study, a potential misunderstanding of this distinction arises when comparing statements on line 168 versus line 184:

      On line 168, you state: '... each BMN type having .... more postsynaptic than presynaptic sites'. However, on line 184 you state: 'There were significantly more postsynaptic than presynaptic partners, in agreement with the BMNs containing more presynaptic than postsynaptic structures. These are contradictory: the statement on line 168 seems to refer to the number of presynaptic T-bars, while on line 184 you refer to the number of actually outgoing connections (which more accurately reflects the degree of polyadicity). Since BMNs are sensory afferent, they are indeed expected to have more outgoing synapses into the central brain.

      We thank the Reviewer for identifying this mistake. We have revised the sentence at former line 168 to now read: “In addition to differing in total synapse number, BMN types vary in their pre- versus postsynaptic composition: all BMNs contain both (Eichler et al., 2024), with presynaptic sites outnumbering postsynaptic sites by ~2× to ~9× across types (mean ≈5:1 output-to-input ratio, Figure 2 – figure supplement 1A, B, Supplementary file 2, Supplementary file 3).”

      (4) Identification of bristle sensory afferents in the brain:

      This is explained in more detail in the Eichler paper, but not here. I do not understand how you identified these neurons in the FAFB dataset. The number and distribution of the individuum of the FABF EM dataset are not known, and because there is variability in the number of bristles in individual flies, the true number of bristle neurons for synaptic analysis can only be estimated. The correlative approach necessary to find the bristle sensory neurons in the FAFB set is still unclear to me. See also my comments on Figure 1.

      We thank the Reviewer for raising this point. We agree that our original draft did not clearly explain the correlative approach used to identify head BMNs in the FAFB dataset, and we have revised the manuscript to make this workflow explicit.

      In our prior work (Eichler et al., 2024), we quantified the number of bristles in each head bristle population and assessed the extent to which populations are invariant versus variable across individuals. This established an expected range for BMN counts by bristle population and clarified the level of variability that can be expected biologically.

      We then identified BMN types corresponding to specific bristle populations using different techniques, such as dye fills and light microscopy, which allowed us to define the characteristic projection morphologies and CNS entry routes associated with each population. These light-level anatomical signatures provided the basis for locating the corresponding axons in the FAFB EM volume and reconstructing the same neuron classes in EM. Importantly, because bristles themselves are not present in the EM volume, this approach supports type-level assignment (bristle population/BMN class) rather than single-bristle resolution, and we now state this explicitly to avoid overinterpretation.

      To ensure this is clear to readers who have not read Eichler et al., we have added explanatory text in the Results and expanded the Figure 1 legend describing: (i) how BMN types were identified and matched, (ii) what can and cannot be resolved given natural bristle-number variability, and (iii) how this impacts interpretation of “completeness” at the level of BMN types rather than individual bristles.

      In paragraph 1 of the first Results section, entitled “BMN synapses are somatotopically distributed in the ventral brain”, we added text that briefly describes the previous linkage of the head BMNs to the FAFB dataset. That text reads: “In prior work (Eichler et al., 2024), we showed that head bristle populations are innervated by specific BMN types whose axons project to distinct, spatially localized regions (projection zones) in the ventral brain (Figure 1C,D, left, Figure 1 – figure supplement 2A-E). This was determined using dye fills and light-microscopy-based tracing to identify BMN types innervating defined head bristle populations and to establish their characteristic brain projection morphologies. Bristle population counts and their variability across individuals provided expectations for BMN number per type. This quantitative constraint, combined with the highly stereotyped projection morphologies, provided a correlative anatomical framework to locate and reconstruct nearly all BMNs in the FAFB serial-section EM volume and map their projections into the CNS. Because FAFB does not include the head cuticular bristles, individual BMNs could not be linked to single bristles. Therefore, these assignments are necessarily correlative and provide type-level (population) rather than single-bristle resolution. Nevertheless, this level of resolution was sufficient to define somatotopically organized projection zones."

      (5) Results:

      (a) Line 102: explain hemilineage 23 B

      We added text in the manuscript to better define hemilineages.

      In the Abstract, we added to a sentence that highlights that the LB23 neurons are developmentally related. That sentence now reads: “We identified an excitatory cholinergic hemilineage (hemilineage 23b), a developmentally related group of neurons that elicits aimed head grooming and exhibit differential connectivity with BMNs from distinct head locations, revealing a lineage-based somatotopically organized parallel circuit architecture.”

      Results section entitled “The entire cholinergic hemilineage 23b (LB23) is postsynaptic to BMNs”, we added a sentence that defines hemilineage at its first mention in the Results section. We also made slight modifications to the preceding and following sentences. That text reads: “To identify neurons crucial for establishing the BMN-postsynaptic parallel pathways that elicit head grooming movements, we focused on secondary hemilineages. In the Drosophila CNS, a hemilineage refers to the cohort of neurons derived from a single stem cell-like neuroblast that share a common developmental origin, stereotyped morphology, and are thought to have related functional roles within a circuit (Harris et al., 2015; Wreden et al., 2017). This focus was motivated by earlier findings that neurons whose activation elicited head grooming had morphologies consistent with specific hemilineages (Hampel et al., 2015; Seeds et al., 2014).”

      (b) Line 151: - line 171: it is not clear to me what a projection zone is.

      We thank the Reviewer for raising this point. We agree that the term “projection zone” benefits from a brief clarification. We have made minor edits at two locations to explicitly state that projection zones refer to spatially localized regions of BMN axonal arborization and synaptic distribution corresponding to specific head locations.

      Changes made in the manuscript:

      A sentence that first introduces the term in the fourth paragraph of the Introduction now reads: “Indeed, the BMN axon projections in the central nervous system (CNS) show a somatotopic arrangement, where distinct projection zones—spatially localized regions of axonal arborization and synaptic output—correspond to specific head and body locations (Eichler et al., 2024; Johnson and Murphey, 1985; Murphey et al., 1989; Newland, 1991; Newland et al., 2000; Tsubouchi et al., 2017).”

      In a sentence in the first paragraph of the first Results section, we added a brief clarifying definition of “projection zones” at their first mention in the Results. That sentence reads: In prior work (Eichler et al., 2024), we showed that head bristle populations are innervated by specific BMN types whose axons project to distinct, spatially localized regions (projection zones) in the ventral brain (Figure 1C,D, left, Figure 1 – figure supplement 2A-E).

      (c) Input-output versus presynapse-postsynapse?

      A revised sentence in the last sentence of the Results section makes this distinction clear: In addition to differing in total synapse number, BMN types vary in their pre- versus postsynaptic composition: all BMNs contain both (Eichler et al., 2024), with presynaptic sites outnumbering postsynaptic sites by ~2× to ~9× across types (mean ≈5:1 output-to-input ratio, Figure 2 – figure supplement 1A,B, Supplementary file 2, Supplementary file 3).

      (6) Figures:

      For clarity, it would be helpful if you indicated by the arrow the name of the sensory location (antenna, eye, etc.).

      We appreciate this suggestion. Major sensory locations corresponding to different head bristle populations are indicated in Figure 1 – figure supplement 1C. We explored adding these labels directly to Figure 1A, but found that doing so made the panel overly crowded and less clear. To improve visibility while keeping the main figure uncluttered, we now explicitly direct readers to this figure supplement in the Introduction.

      Specifically, we added a reference to Figure 1 – figure supplement 1C in the following sentence in the Introduction: Dust-induced head grooming is performed by the forelegs that start with the eyes and progress to other locations such as the proboscis and antennae (major head locations shown in Figure 1 – figure supplement 1C) (Seeds et al., 2014).

      (a) Figure 1:

      A: the presence of bristle types on the head. Are the JO afferents you mention in the text reported here?

      Figure 1 does not include the JONs, which were described in detail in our previous study (Hampel et al., 2020).

      The JONs are mentioned in the Figure 1 – figure supplement 1. We have added text to this legend to indicate that the JONs are not the subject of this study. This text reads: “(C) Mechanosensory neurons from different head locations project to distinct, somatotopically organized zones in the ventral brain and elicit aimed grooming of those locations, including the antennae (via JONs [Johnston’s organ neurons; not analyzed in this study] and BMNs), eyes (BMNs), and proboscis (BMNs).”

      Are the reconstructions shown 1 B-D also from the Eichler paper?

      We regret that this was not explicitly stated in the figure legend, and have revised the legend to distinguish between what was previously published and what is new to this study.

      In the Figure 1 legend, we revised the following sentence: (C, D) Reconstructed BMN projections in the ventral brain (left, previously described in (Eichler et al., 2024)) and their corresponding pre- and postsynaptic sites (right, this study), colored by type according to the bristles that they innervate.

      To make this clearer in the main text, we have rewritten the first sentence in the first paragraph of the Results: In prior work (Eichler et al., 2024), we showed that head bristle populations are innervated by specific BMN types whose axons project to distinct, spatially localized regions (projection zones) in the ventral brain (Figure 1C,D, left, Figure 1 – figure supplement 2A-E).

      The dots are symbolic, or do they represent the number of bristles? The number of bristles cannot be identified, and thus stems from the FABF dataset.

      The dots are symbolic and do not represent the number of bristles in the FAFB dataset. As noted in response to a related reviewer comment above, the numbers and variability of head bristles were quantified in our prior work (Eichler et al., 2024). We also used dye fills and light-microscopy approaches, which provided the framework for linking BMN types to bristle populations. We have clarified this point in the revised manuscript, as described in the response above.

      Synapse number of bristle afferents: number of all pre-and postsynaptic contacts?

      We have addressed this point above.

      (b) Figure 2:

      Again, the term synapses refers to all pre-and postsynaptic contacts ?

      The Figure 2 legend indicates that synapse numbers include both input and output synapses. Additionally, now the first reference to Figure 2 indicates that numbers refer to both input and output synapses.

      (c) Figure 2:

      Supplement presynaptic/postsynaptic means pre- and post partner?

      Presynaptic: number of BMNs that were connected with at least 5 synapses to any given presynaptic partner (n), the numbers of synaptic inputs to BMNs (inputs), and the number of presynaptic partners (partners). Postsynaptic: number of BMNs that were connected with at least 5 synapses to any given postsynaptic partner, the numbers of synaptic outputs to postsynaptic partners, and the number of postsynaptic partners.

      (d) Figure 3:

      Explain downstream-upstream

      Downstream refers to postsynaptic while upstream refers to presynaptic partners or pathways.

      Comparing the right side of the Sankey d. with your diagram in B, just by judging, I see more partners of descending (post) than interneurons (post) in A. However, in B, there are clearly more postsynaptic interneurons than descending posts? There are no numbers in Figure 3A.

      This is a great point! Figure 3A (the Sankey diagram) summarizes the fraction of BMN synaptic output distributed across partner classes, normalized within each BMN type. In this representation, descending neurons occupy a larger fraction because, across BMN types, they collectively receive a higher proportion of BMN output synapses.

      In contrast, Figure 3B (the sunburst plot) summarizes the number of distinct postsynaptic partner neurons in each category. Here, interneurons are more numerous than descending neurons, even though individual interneurons tend to receive fewer BMN synapses on average.

      Thus, the two plots are consistent: descending neurons are fewer in number but receive more synapses per neuron, whereas interneurons are more numerous but receive fewer synapses per neuron on average. When postsynaptic synapse counts are summed (as in the bottom plots), the totals for descending neurons and interneurons can therefore appear similar, despite their different representations in the Sankey diagram.

      We have added text in the Results section entitled “BMN synaptic partners in the CNS: ascending, descending, and interneurons”. Text was added here because it also nicely responds to another Reviewer comment below for more description of the postsynaptic partners. That added text reads: “Interneurons are more numerous as distinct partner neurons, whereas descending neurons receive a larger fraction of BMN output synapses across BMN types (Figure 3A,B). Thus, descending neurons are fewer in number but tend to receive more BMN synapses per neuron on average, while interneurons are more numerous but often receive fewer synapses per neuron.”

      (e) Figure 10: I cannot see colored circles. I found Figure 10 very hard to understand. Is this a visualization created in CATMAID? As I mentioned before, a graphical summary highlighting the information flow and architecture of the circuits analyzed in this study would be useful. In such a diagram, you could combine the findings of your study, the open question, and the undeciphered pathways. In short, a schematic of the current knowledge of the potentially parallel and recurrent architecture of the BMN circuitry.

      Figure 10 (now Figure 11) is intended to specifically examine neurons that are both pre- and postsynaptic to BMNs, rather than to summarize the full connectome. The goal of this figure is to highlight two features of pre/post neurons: their somatotopic connectivity with BMN types and the presence of bilaterally symmetric neuron pairs that connect to common BMN populations.

      This visualization was generated from connectome-derived connectivity data and not from CATMAID, although it uses neuron reconstructions and synapse annotations from the FAFB dataset. The colored nodes represent BMN types and are now consistently referred to as “dots” rather than “circles” to better match their appearance. We have simplified the figure legend to clarify these points.

      In response to this and related comments, we also added a new graphical summary figure (Figure 15) at the end of the manuscript that schematically summarizes the information flow and parallel, recurrent architecture of the BMN circuitry at a higher level.

      (7) Discussion:

      I found the first part of your discussion hard to read; the second part is better. You can condense the discussion by mentioning the results/hypothesis of previous work once, and avoiding repetitions, such as the uniqueness of the BMN connectome/FAB dataset.

      In response to this comment, we condensed the opening portion of the Discussion by reducing repetition of background and prior findings, particularly references to earlier BMN work and the uniqueness of the FAFB dataset. We streamlined overlapping sections, mentioned prior hypotheses and results only once, and focused the revised text more directly on the new contributions of this study—namely, the synaptic-resolution organization, somatotopic connectivity, and circuit principles revealed by the BMN connectome.

      There are several cases of vague sentences, e.g.: a) Line 827: 'Head BMNs project from bristles to somatotopically organized zones in the brain (? ventral brain ?), with those innervating neighboring populations (? of bristles ?) occupying overlapping zones (Figure 1A-D)'.

      We made this suggested change: Head BMNs project from bristles to somatotopically organized zones in the ventral brain, with those innervating neighboring bristle populations occupying overlapping zones (Figure 1A-D).

      A remark: maybe you should indicate in Figure 1D the overlapping and segregated zones. The resolution is very low in these images.

      We thank the Reviewer for this comment and agree that overlap versus segregation of projection zones was not sufficiently guided in the original presentation. Rather than adding arrows to Figure 1C,D, which we felt would reduce clarity, we now explicitly describe how overlap and segregation can be identified based on color mixing of BMN synapses in the text and figure legend. In addition, we highlight these features more clearly in Figure 1 – figure supplement 3, which provides higher-resolution, multi-view visualizations of BMN synapses where overlap and non-overlap are most evident.

      Results:

      Segregation between projection zones is apparent where synapses of distinct BMN types occupy non-overlapping regions with little or no color mixing, whereas overlap between projection zones is visible as spatial intermixing of differently colored synapses from neighboring BMN types (Figure 1C, D, right, Figure 1 – figure supplement 3A-E).

      Figure 1 legend:

      Overlapping projection zones are evident where synapses of different BMN types spatially intermingle, whereas segregated zones show little or no color mixing.

      Figure 1 – figure supplement 3 legend:

      These views highlight both overlapping projection zones, visible as intermingled synapses of different colors from neighboring BMN types, and segregated zones, where synapses from distinct BMN types remain spatially separated with minimal color mixing.

      (b) Line 860: What is: 'location groomed'?

      Added a clarification to this sentence: Thus, the location groomed (i.e. antennae) corresponds to the location of the majority of BMN inputs.

      (c) Line 944: 'The sensory to motor resolution' What do you mean, here?

      We have revised this sentence to “The spatial resolution of the sensory-to-motor transformation in this parallel circuit architecture remains to be tested.”

      (d) The term: 'neighboring bristles' is unclear. Does it mean 'neighbor relates to members within he same bristle type (antennae)', or 'bristles of different types', e.g. antennae and eye bristles.

      We thank the Reviewer for raising this point. Throughout the manuscript, the term “neighboring bristles” is used primarily to refer to neighboring bristle populations (i.e., bristles from different anatomical groups that are spatially adjacent on the head). In some contexts, the term is also used more generally to describe spatial proximity, regardless of whether the bristles belong to the same or different populations. Importantly, in both cases, the usage reflects the same underlying observation: BMNs innervating bristles that are spatially closer—whether within or between populations—show greater similarity in their postsynaptic connectivity than BMNs innervating more distant bristles.

      (e) Avoid abbreviations, or explain shortly, the term under discuss: line 725: BMlnOm?

      We thank the Reviewer for pointing out that the BMN nomenclature was not sufficiently clear. BMNs are named according to the bristle population they innervate (e.g., BM-Ant neurons innervate antennal bristles; BM-InOm neurons innervate interommatidial eye bristles), as defined in the Figure 1 legend. To improve clarity, we ensured that the first occurrences of these terms in the Results explicitly include the corresponding head location (e.g., “eye BM-InOm neurons”), and we added brief contextual reminders at later points where this abbreviation appears. These changes clarify the meaning of BM-InOm and related abbreviations without introducing additional terminology.

      Changes made:

      Figure 1 legend: clarified that BMNs are named according to the bristle population they innervate (e.g., BM-Taste neurons innervate Taste bristles).

      Results, early first section (second paragraph): added head-location qualifiers at first mention (e.g., “eye BM-InOm neurons,” “proboscis BM-Taste neurons”) in sentences such as: “35 BM-Taste neurons innervating Taste bristles on the proboscis…” and “405 eye BM-InOm neurons innervating the interommatidial bristles on the eyes…”.

      Later Results text where the abbreviation appears (including the sentence addressing the 5-synapse cutoff): added “eye” before BM-InOm for context (e.g., “although 555 eye BM-InOm neurons are present… only 405 meet the five-synapse threshold”).

      (f) LB23 hemilineage: what was that again?

      We added text in the manuscript to better define hemilineages. This is described above in response to another Reviewer suggestion.

      (g) Line 732: What are ascending neurons?

      We had already included a definition of ascending neurons in the second Results section entitled “The BMN connectome”. Since this was not clear to the Reviewers, we expanded on this section. There is now a new paragraph in this same section. This paragraph reads:

      “Partners were grouped into five morphological categories—interneurons, descending neurons, ascending neurons, BMNs, and motor neurons—following FlyWire annotations (Dorkenwald et al., 2024). Interneurons were defined as neurons whose soma and all neurites were confined to the brain. Descending neurons were defined as neurons whose somata are located in the CNS and whose neurites extend into the descending tracts toward the ventral nerve cord (VNC). Conversely, ascending neurons were identified as neurons whose neurites enter the brain through the cervical connective and whose somata lie outside the FAFB imaged volume, resulting in only their neurites being visible in the dataset.”

      (h) Line 896: What is lineage matching?

      We thank the Reviewer for pointing this out. We realized that this sentence did not add clarity and contributed little to the manuscript, so we removed the sentence that used “lineage matching” from the manuscript.

      (i) Line 926: The Previous work ... sentence makes no sense to me.

      The sentence was reworked and now reads: “The mechanosensory neurons hypothesized from the parallel model that elicit the Drosophila grooming sequence were identified in previous work (Eichler et al., 2024; Hampel et al., 2020a, 2017, 2015; Mueller et al., 2019; Seeds et al., 2014; Zhang et al., 2020).”

      (j) The FAB-dataset is indeed unique, but the fact that it is repeated several times in your discussion does not ensure understanding of the obviously complex circuit architecture potentially underlying behavior. Please, focus on your discussion strictly and condense your arguments to the specific contribution and outcome of the data in the current manuscript.

      In response to this comment, we condensed the opening portion of the Discussion by reducing repetition of background and prior findings, particularly references to earlier BMN work and the uniqueness of the FAFB dataset. We streamlined overlapping sections, mentioned prior hypotheses and results only once, and focused the revised text more directly on the new contributions of this study—namely, the synaptic-resolution organization, somatotopic connectivity, and circuit principles revealed by the BMN connectome.

      (k) At some parts of the discussion, it is not clear to me, if you refer to results of the actual study or refer to previous studies (Hampel, Eichler) e.g., 'Our work has shown ...' on line 872.or '...we find ... LB23 neuron elicit antennal grooming....'. or line 909: Our work reveals ......

      Sentence a former line 872 was revised and now reads: “While our past and present work together reveal that a subpopulation of LB23 neurons elicits antennal grooming, we also find evidence that other LB23 neurons in the hemilineage elicit additional head grooming movements.”

      Sentence at former line 909 was revised and now reads: “Our previous work and the present study reveal that the antennal grooming circuit receives inputs from two different classes of antennal mechanosensory neurons, the BMNs and JONs.”

      Reviewer #3 (Recommendations for the authors):

      All my comments are mostly only for clarity.

      (1) It would help readers if the manuscript explicitly stated how a sensory neuron can be postsynaptic - i.e., that BMN axons receive inhibitory inputs in the CNS - since this may not be intuitive to a broader audience.

      We appreciate this comment and added the following text to the last paragraph of the first Results section: As expected for sensory afferents, BMNs provide synaptic output to downstream circuits; however, the presence of postsynaptic sites may be less intuitive, and reflects that BMNs can also receive synaptic input onto their central axons within the CNS.

      (2) Figure 1 is a helpful context, but since much of it is directly reused from Eichler et al., 2024, it would strengthen the presentation if you clarified what is new here (e.g., the synapse quantification) versus what is recap. In addition, for readers less familiar with EM connectomics, it would be valuable to spell out how bristle neurons are assigned to classes in the absence of bristles themselves in the volume - i.e., that classification rests on stereotyped nerve entry and projection zones, which allow type-level but not single-bristle resolution. Explicitly flagging these methodological boundaries up front would make it clearer what information comes from the current work, what derives from previous reconstructions, and what the limits of resolution are.

      We have addressed this recommendation above for a similar suggestion by Reviewer 2 (see above for details). In brief, we inserted an overview of the methodology used to identify BMN types in the FAFB dataset, and we now explicitly state the limitations of this correlative approach. We added a sentence in the first paragraph of the Results section that states, “Because FAFB does not include the head cuticular bristles, individual BMNs could not be linked to single bristles. Therefore, these assignments are necessarily correlative and provide type-level (population) rather than single-bristle resolution.” In addition, we revised the Figure 1 legend to more clearly distinguish panels and reconstructions that were previously reported in Eichler et al. (2024) from synapse quantification and analyses that are new to the present study.

      (3) BMNs from neighboring bristle populations converge onto shared partners, while distant BMNs remain segregated - while the overlap was clear, the segregation was not visually clear in the first figure.

      We thank the Reviewer for this suggestion. We have addressed this point in our response to a similar comment from Reviewer 2 (see above), where we clarified how overlap versus segregation can be identified in Figure 1 and strengthened the text and figure legends to guide readers to these features without adding clutter to the figure.

      (4) The identification of direct BMN → motor neuron synapses is intriguing, but since these inputs make up only a small fraction of motor neuron synapses, it would help if the authors explicitly cautioned readers that these are likely modulatory contributions rather than stand-alone reflex arcs. This would prevent over-interpretation of the sensory-motor link. Similarly with the BMN>BMN connections.

      We thank the Reviewer for this suggestion. We revised the Results section “BMN postsynaptic motor neurons” to more explicitly caution that the direct BMN → motor neuron connections are likely modulatory rather than stand-alone reflex arcs, consistent with their small contribution to total motor neuron input. The revised text reads: “However, BMN inputs accounted for only a small fraction of total synapses onto each motor neuron (≦6.28% of total inputs/BMN type, Figure 4 – figure supplement 1, Supplementary file 7), suggesting a modulatory contribution rather than direct sensory-driven motor activation.”

      (5) Since the FAFB dataset only includes the brain, it would be helpful to clarify what is meant by "ascending" and "descending" partners in this context - namely that ascending neurons are VNC-derived axons entering the brain, while descending neurons are brain-derived neurons projecting out toward the VNC. Explicitly stating this will prevent confusion, given that all BMNs themselves terminate in the SEZ.

      We had already included definitions in the second Results section entitled “The BMN connectome”. Since this was not clear to the Reviewers, we expanded on this section. There is now a new paragraph in this same section. This paragraph reads: Partners were grouped into five morphological categories—interneurons, descending neurons, ascending neurons, BMNs, and motor neurons—following FlyWire annotations (Dorkenwald et al., 2024). Interneurons were defined as neurons whose soma and all neurites were confined to the brain. Descending neurons were defined as neurons whose somata are located in the CNS and whose neurites extend into the descending tracts toward the ventral nerve cord (VNC). Conversely, ascending neurons were identified as neurons whose neurites enter the brain through the cervical connective and whose somata lie outside the FAFB imaged volume, resulting in only their neurites being visible in the dataset.

      (6) In the section titled "BMN synaptic partners in the CNS: ascending, descending, and interneurons", the balance of explanation is skewed toward presynaptic input to BMNs. It would strengthen clarity if you expanded equally on the postsynaptic side (i.e., BMN outputs) or explicitly signposted why the focus here is on inputs. That way, readers won't be left wondering whether outputs are less important or just deferred to later figures.

      We have revised the section that was previously skewed toward presynaptic BMNs. This section also addresses some confusion about interpreting Figure 3, from a critique from Reviewer 2. The section now reads: “Postsynaptic connections were predominantly interneurons (56%), with significant contributions from descending (28%) and ascending (16%) neurons (Figure 5D, F,H,J). Interneurons are more numerous as distinct partner neurons, whereas descending neurons receive a larger fraction of BMN output synapses across BMN types (Figure 3A, B). Thus, descending neurons are fewer in number but tend to receive more BMN synapses per neuron on average, while interneurons are more numerous but often receive fewer synapses per neuron. Together, these partner categories underscore the strong integration of BMNs with local brain circuitry (interneurons), and with pathways linking the brain and ventral nerve cord (VNC), through ascending neurons that provide VNC-derived synaptic input and descending neurons that carry BMN output toward the VNC.”

      (7) The network diagrams in Figure 9 convey clustering, but a complementary heatmap of BMN type × partner connectivity could highlight the parallel organization more clearly. This would make the block-like separation of dorsal, ventral, and posterior subnetworks more immediately apparent, reinforcing the conclusion of parallel somatotopy-based processing. This section would also benefit from drawing the functional message more explicitly: that BMNs form largely independent, somatotopically aligned pathways with regional overlap, supporting the idea of parallel grooming circuits. Right now, the text reads as a connectivity catalog, and the key concept of parallel regional architecture risks being underemphasized.

      We agree that connectivity matrices provide a clear and accessible representation of these data. We have moved the cosine-similarity connectivity matrices previously shown in Figure 9 – figure supplement 1 into the main manuscript, where they now appear as Figure 9. These matrices depict similarity among BMN postsynaptic partners. For this reason, we have retained the original network-graph representation (now Figure 10) to display the full pre- and postsynaptic connectome structure.

      Based on the Reviewer’s suggestion to clearly state the key concepts of the parallel architecture, we added a sentence to the end of the Results section entitled: Somatotopy-based connectivity among BMN synaptic partners in the CNS. That text reads: “Thus, the BMNs form largely independent, somatotopically aligned pathways with regional overlap, supporting the idea of parallel grooming circuits.”

      (8) It would help if the manuscript if the authors explained more explicitly the somatotopy logic (that reciprocal inhibition preserves local head regions, ensuring that suppression and gain control act locally) more clearly. At present, the narrative is buried in network-graph detail - a heatmap or simple region-level summary would make this organizational principle much clearer to readers.

      We thank the Reviewer for this suggestion. To make the somatotopy logic of pre/post feedback inhibition clearer and less buried in network-graph detail, we revised the text in this Results section to more explicitly distinguish (i) reciprocal, head-region–localized inhibitory loops that could support local gain control from (ii) non-reciprocal cross-type inhibitory pathways that could contribute to heterotypic suppression between head regions. In addition, we modified the figure to more clearly convey somatotopy by adding text on the plot and updating the legend to state: “Bold text indicates the general head location of BMNs on the plot, revealing somatotopy-based connectivity with pre/post neurons (i.e. ventral, dorsal, posterior, and the ventral/dorsal transition).”

      (9) Please adjust the section title, "LB23 hemilineage member neurons elicit aimed head grooming movements" to avoid implying new functional experiments. For example:

      (a) "LB23 neurons include previously defined antennal grooming command neurons" or

      (b) "LB23 hemilineage anatomically corresponds to grooming-related neurons".

      This would make it clear that the contribution here is anatomical linkage, not fresh functional data.

      We changed the section title to the Reviewer-suggested title b: LB23 hemilineage anatomically corresponds to grooming-related neurons

      (10) The current network graphs in Figure 13B are not very intuitive - it is hard to visually extract the somatotopy. A connectivity heatmap or matrix (BMN types on one axis, LB23 neurons or subgroups on the other, with synapse strength as colour) would make the block-like, region-specific mapping immediately clear. A coarse-grained version (e.g., dorsal/ventral/posterior BMNs vs LB23 subgroups) could further highlight the parallel, somatotopically organized pathways. This would better support the central claim of Figure 13 than the current spring-layout graphs. Figure 13F does this for BMN inputs onto aBN2 neurons. (But it is presented only in binary form; could the authors not add a graded colour scale proportional to synapse number?)

      The binary form was necessary because the results are from different sources (i.e. Catmaid versus flywire synapse counts) with different synapse numbers.

      We modified the Figure 13B to more clearly convey somatotopy by adding text on the plot and updating the legend to state: “Bold text indicates the general head location of BMNs on the plot, revealing

      somatotopy-based connectivity with LB23 neurons (i.e. ventral, dorsal, and posterior head).” We hope that this modification satisfies the Reviewer.

    1. eLife Assessment

      This important study combines optogenetic manipulations with wide-field cortical imaging to investigate the neural basis of context-dependent sensory processing. It provides compelling evidence that the retrosplenial cortex modulates behavioral responses to whisker deflection depending on the behavioral context. The paper will be of strong interest to neuroscientists studying cortical mechanisms of sensorimotor processing.

    2. Reviewer #1 (Public review):

      Summary

      The strength of this manuscript lies in the behavior: mice use a continuous auditory background (pink vs brown noise) to set a rule for interpreting an identical single-whisker deflection (lick in W+ and withhold in W− contexts) while always licking to a brief 10 kHz tone. Behaviorally, animals acquire the rule and switch rapidly at block transitions and take a few trials to fully integrate the context cue. What's nice about this behavior is the separate auditory cue, which shows the animals remain engaged in the task, so it's not just that the mice check out (i.e., become disengaged in the W- context). The authors then use optical tools, combining cortex-wide optogenetic inactivation (using localized inhibition in a grid-like fashion) with widefield calcium imaging to map what regions are necessary for the task and what the local and global dynamics are. Classic whisker sensorimotor nodes (wS1/wS2/wM/ALM) behave as expected with silencing reducing whisker-evoked licking. Retrosplenial cortex (RSC) emerges as a somewhat unexpected, context-specific node: silencing RSC (and tjS1) increases licking selectively in W−, arguing that these regions contribute to applying the "don't lick" policy in that context. I say somewhat because work from the Delamater group points to this possibility, albeit in a Pavlovian conditioning task and without neural data.

      The widefield imaging shows that RSC is the earliest dorsal cortical area to show W+ vs W− divergence after the whisker stimulus, preceding whisker motor cortex, consistent with RSC injecting context into the sensorimotor flow. A "Context Off" control (continuous white noise; same block structure) impairs context discrimination, indicating the continuous background is actually used to set the rule (an important addition!) Pre-stimulus functional-connectivity analyses suggest that there is some activity correlation that maps to the context presumably due to the continuous background auditory context. Simultaneous opto+imaging projects perturbations into a low-dimensional subspace that separates lick vs no-lick trajectories in an interpretable way.

      In my view, this is a clear, rigorous systems-level study that identifies an important role for RSC in context-dependent sensorimotor transformation, thereby expanding RSC's involvement beyond navigation/memory into active sensing and action selection. The behavioral paradigm is thoughtfully designed, the claims related to the imaging are well defended, and the causal mapping is strong.

      Comments on revisions:

      The authors have been responsive to the prior review and I think the manuscript is a valuable and important addition to the literature.

    3. Reviewer #2 (Public review):

      Summary:

      The authors aim to understand the neural basis of context-dependent sensory processing and decision-making.

      Strengths:

      They used an innovative behavioral paradigm where the action-outcome association changes independent of the sensory stimulus. This allowed the authors to disentangle the effect of behavioral context on sensory processing in RSC. Using this approach combined with optogenetic silencing, they discover that RSC activity is necessary for suppressing a lick response when the stimulus switches to the unrewarded context. The authors provide compelling evidence that the RSC is an important node of context-dependent sensory processing.

      Weaknesses:

      Sensory processing appears to be entangled with jaw/tongue movement initiation. Nonetheless, it is clear that RSC and motor cortex convey contextual signals with a very short latency.

      Comments on revisions:

      Thank you for updating the manuscript. Good work.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary

      The strength of this manuscript lies in the behavior: mice use a continuous auditory background (pink vs brown noise) to set a rule for interpreting an identical single-whisker deflection (lick in W+ and withhold in W− contexts) while always licking to a brief 10 kHz tone. Behaviorally, animals acquire the rule and switch rapidly at block transitions and take a few trials to fully integrate the context cue. What's nice about this behavior is the separate auditory cue, which shows the animals remain engaged in the task, so it's not just that the mice check out (i.e., become disengaged in the W- context). The authors then use optical tools, combining cortexwide optogenetic inactivation (using localized inhibition in a grid-like fashion) with widefield calcium imaging to map what regions are necessary for the task and what the local and global dynamics are. Classic whisker sensorimotor nodes (wS1/wS2/wM/ALM) behave as expected with silencing reducing whisker-evoked licking. Retrosplenial cortex (RSC) emerges as a somewhat unexpected, context-specific node: silencing RSC (and tjS1) increases licking selectively in W−, arguing that these regions contribute to applying the "don't lick" policy in that context. I say somewhat because work from the Delamater group points to this possibility, albeit in a Pavlovian conditioning task and without neural data. I would still recommend the authors of the current manuscript review that work to see whether there is a relevant framework or concept (Castiello, Zhang, Delamater, 'The retrosplenial cortex as a possible 'sensory integration' area: a neural network modeling approach of the differential outcomes effect of negative patterning', 2021, Neurobiology of Learning and Memory).

      The widefield imaging shows that RSC is the earliest dorsal cortical area to show W+ vs W− divergence after the whisker stimulus, preceding whisker motor cortex, consistent with RSC injecting context into the sensorimotor flow. A "Context Off" control (continuous white noise; same block structure) impairs context discrimination, indicating the continuous background is actually used to set the rule (an important addition!) Pre-stimulus functional-connectivity analyses suggest that there is some activity correlation that maps to the context presumably due to the continuous background auditory context. Simultaneous opto+imaging projects perturbations into a low-dimensional subspace that separates lick vs no-lick trajectories in an interpretable way.

      In my view, this is a clear, rigorous systems-level study that identifies an important role for RSC in context-dependent sensorimotor transformation, thereby expanding RSC's involvement beyond navigation/memory into active sensing and action selection. The behavioral paradigm is thoughtfully designed, the claims related to the imaging are well defended, and the causal mapping is strong. I have a few suggestions for clarity that may require a bit of data analysis. I also outline one key limitation that should be discussed, but is likely beyond the scope of this manuscript.

      Major strengths

      (1) The task is a major strength. It asks the animal to generate differential motor output to the same sensory stimulus, does so in a block-based manner, and the Context-Off condition convincingly shows that the continuous contextual cue is necessary. The auditory tone control ensures this is more than a 'motivational' context but is decision-related. In fact, the slightly higher bias to lick on the catch trials in the W+ context is further evidence for this.

      (2) The dorsal-cortex optogenetic grid avoids a 'look-where-we-expect' approach and lets RSC fall out as a key node. The authors then follow this up with pharmacology and latency analyses to rule out simple motor confounds. Overall, this is rigorous and thoughtfully done.

      (3) While the mesoscale imaging doesn't allow for cellular resolution, it allows for mapping of the flow of information. It places RSC early in the context-specific divergence after whisker onset, a valuable piece that complements prior work.

      (4) The baseline (pre-stim) functional connectivity and the opto-perturbation projections into a task subspace increase the significance of the work by moving beyond local correlates.

      Key limitation

      The current optogenetic window begins ~10 ms before the sensory cue and extends 1s after, which is ideal for perturbing within-trial dynamics but cannot isolate whether RSC is required to maintain the context-specific rule during the baseline. Because context is continuously available, it makes me wonder whether RSC is the locus maintaining or, instead, gating the context signal. The paper's results are fully consistent with that possibility, but causality in the pre-stimulus window remains an open question. (As a pointer for future work, pre-stimulusonly inactivation, silencing around block switches, or context-omission probe trials (e.g., removing the background noise unexpectedly within a W+ or W- context block), could help separate 'holding' from 'gating' of the rule. But I'm not suggesting these are needed for this manuscript, but would be interesting for future studies.)

      We thank the reviewer for the comprehensive summary of our work.

      We also thank the reviewer for highlighting the work from the Delamater group (Castiello et al., 2021), and we now briefly discuss this paper on P. 14 Lines 434-437 writing: “RSC was shown to contribute to negative patterning in behavioral tasks requiring rats to learn that the simultaneous presentation of two stimuli lead to an opposite outcome than each individual stimulus (Castiello et al., 2021).”

      We also agree with the reviewer’s noted ‘Key limitation’ regarding the role of RSC as either maintaining context representation or serving a gating function. The reviewer proposes an exciting set of further experiments inactivating RSC at different time points to investigate when RSC activity is needed. We hope to carry out such experiments in the future. We now include a brief discussion of this interesting point on P. 14-15 Lines 455-459 writing: “First, further inactivation experiments would shed light on the timing at which RSC activity is necessary for the integration of contextual information. Specifically, it would be of great interest to inactivate RSC at different time points such as during the intertrial interval or at the transition between contexts.”

      We have of course also addressed each of the more detailed comments from the “Recommendations for the authors” section, please see below.

      Reviewer #2 (Public review):

      Summary:

      The authors aim to understand the neural basis of context-dependent sensory processing and decision-making.

      Strengths:

      They used an innovative behavioral paradigm where the action-outcome association changes independent of the sensory stimulus. This theoretically allows the authors to disentangle the effect of behavioral context on sensory processing. Using this approach combined with optogenetic silencing, they discover that RSC activity is necessary for suppressing a lick response when the stimulus switches to the unrewarded context.

      Weaknesses:

      Sensory processing appears to be entangled with jaw/tongue movement initiation. Activity in M1 and RSC during auditory-evoked lick responses appears to be identical to activity during whisker-evoked lick responses, indicating that movement initiation is the main driver of M1/RSC activity, rather than changes in the flow of sensory information. If sensory information were the main driver of the initial M1/RSC response, then auditory evoked responses should have a longer latency. Perhaps this is beyond the resolution of the calcium indicator or imaging frame rate. It is not clear from the data shown if differences in S1 activity when comparing W+ and W- stimulation are caused by context-sensitive sensory processing or whisker movement following whisker deflection.

      We thank the reviewer for the comments on our work and we agree that separating sensory processing and movement initiation is very important. In the revised manuscript, we have carried out several new analyses to specifically address the points of the reviewer. The most important point is that context-dependent activity in RSC emerges at ~50 ms after the whisker stimulus, which precedes any differences in movements of the jaw or whisker. Although sensory and motor representations become increasingly entangled after stimulus delivery, we think that the first ~100 ms after the whisker stimulus is a relatively safe period for analysing sensory processing and decision making before overt context-dependent differences in movements.

      Addressing the specific point “Activity in M1 and RSC during auditory-evoked lick responses appears to be identical to activity during whisker-evoked lick responses, indicating that movement initiation is the main driver of M1/RSC activity, rather than changes in the flow of sensory information.” - We have now directly compared the pattern of cortical activity evoked by whisker and auditory stimuli in correct trials in the W+ context (new Figure 3 – figure supplement 2). As expected, activity in wS1/wS2 and A1 is stronger in whisker and auditory trials respectively, following their sensory modalities. However, we also evidence a stronger response of wM1/wM2 in whisker trials as early as 40 to 60 ms following the stimulus, showing the specificity to the whisker system. We also observe a stronger response of RSC to whisker than to auditory stimulus. The auditory and whisker evoked responses are therefore different.

      Addressing the specific point “If sensory information were the main driver of the initial M1/RSC response, then auditory evoked responses should have a longer latency. Perhaps this is beyond the resolution of the calcium indicator or imaging frame rate.” – As stated above, the responses to auditory and whisker stimuli are different.

      Addressing the specific point “It is not clear from the data shown if differences in S1 activity when comparing W+ and W- stimulation are caused by context-sensitive sensory processing or whisker movement following whisker deflection.” - We think that the data shown in Figure 3F-H indicate that differences in S1 activity when comparing W+ and W- stimulation are not directly caused by context-sensitive sensory processing. On P. 9 Lines 270273 we write: “Early after stimulus onset, whisker deflection evoked similar activation of primary and secondary whisker somatosensory cortices (wS1 and wS2) in both W+ and W− contexts.” Indeed, context separation in wS1/wS2 only emerged later than 100 ms, which is indeed confounded by the difference in movement evoked by the sensory stimulus (now quantified in new Figure 3 – figure supplement 4). On the contrary RSC and wM1/2 responses to the whisker stimulus were different in W+ and W- at early time points (~50 ms for RSC and ~80 ms for wM1/2) which is consistent with context dependent sensory processing. At least 2 hypotheses could explain the absence of early difference in whisker evoked activity in wS1/wS2 between W+ and W-. The first one is that sensory activity in wS1/wS2 is not modulated by contextual information at all, while the alternative option would imply that sensory activity is mediated by different neuronal populations depending on context with an overall similar average response. We think this is an interesting question which we hope to address in future experiments using Neuropixels recordings and multiphoton cellular imaging to address the single neuron representation of whisker stimulus in wS1/wS2 according to context in the task presented here.

      We have of course also addressed each of the more detailed comments from the“Recommendations for the authors” section, please see below.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Suggestions to strengthen the manuscript (no new data collection)

      (1) The block-switch dynamics were clearly demonstrated behaviorally. It would be very powerful to mirror this with an analysis of neural data around the block-switch: how do the various areas adjust immediately after a shift in the continuous contextual sound? Does the RSC show any evidence of changing activity patterns? How does the within-trial activity dynamic look as a function of the number of trials from the context switch? This could be done with the data collected for Figure 3 (for within-trial dynamics), but also for the pre-stimulus baseline activity data (Figure 4A-B).

      We thank the reviewer for raising this interesting point. We have now investigated the change of cortical activity at the transition between contexts (new Figure 3 – figure supplement 5). At the context transition, both to W+ and to W- contexts, we observed a rapid activation of the auditory cortex (new Figure 3 – figure supplement 5A). In addition, there appeared to be a slightly higher activation of RSC when transitioning to W- rather than to W+ (new Figure 3 – figure supplement 5A). In the future, it will be of great interest to further investigate this phenomenon.

      We also evaluated the whisker deflection-evoked responses of the different cortical regions according to the number of whisker trials from context switch (new Figure 3 – figure supplement 5B&C). This analysis revealed that while the sensory response in wS1 and wS2 were constant over the time course of a context block, the response of wM1/2 and especially RSC became progressively lower in the W- context, consistent with the behavioral results in Figure 1 supporting time-dependent contextual integration.

      Overall, these results strengthen the role of RSC and wM1/2 in integrating contextual information to guide the response to the whisker stimulus, and we thank the reviewer for raising this important point.

      (2) It might be useful to state 'earliest among the imaged dorsal cortical areas,' and briefly acknowledge potential subcortical contributors (since those were not explored and could be earlier than cortical areas).

      We agree with the reviewer. In the Summary, on P. 2 Line 39-40 we now write: “Widefield calcium imaging revealed that retrosplenial cortex was the first dorsal cortical area to show context discrimination in response to whisker stimulation”. On P. 8 Lines 257-258, we now write: “To investigate the spatiotemporal neural dynamics underlying task execution, we recorded calcium activity across the dorsal cortex in transgenic mice”. On P. 13 Lines 416-420 we now write: “Functional imaging of cortical activity with two different genetically-encoded calcium indicators each showed similar spatiotemporal dynamics of whisker sensory processing with the earliest contextdependent divergence in signalling being detected in RSC, out of the imaged dorsal cortical areas (Figure 3).” On P. 15 Lines 470-473, we now write: “Finally, it is of course important to note that many subcortical regions (as well as non-dorsal cortical regions, which were not imaged) are likely to contribute importantly to context-dependent task performance.”

      (3) Fit a simple exponential/logistic to lick probability vs time-since-switch (your Figure 1Hstyle analysis) to report a time constant with CIs; it will help quantify the integration of the continuous cue.

      We thank the reviewer for this suggestion. We have fitted an exponential to the grand average data to quantify the time constants for integration of contextual information before the presentation of the first whisker stimulus of the block (see new Figure 1H). On P. 6 Lines 170-173 we now write: “To assess whether this temporal integration would differ between contexts we fitted an exponential to the time evolution of the lick probability. This suggested a faster transition to the W+ context than to the W- context (W+ time constant: 9.4 s, W- time constant: 15.5 s) (Figure 1H).”

      (4) Because catch-trial false alarms are higher in W+ than W−, report per-context d′ and criterion for whisker trials (using signal detection theory); this separates sensitivity from bias and makes the behavioral shift more interpretable. It is also further proof that the behavior is contextual (versus a compound stimulus, for example).

      We have computed the d’ and criterion for the whisker trials in the W- and W+ contexts. (see new Figure 1 - figure supplementary 1D). As suggested by the reviewer, this further supports that the behavior is driven by contextual information.

      (5) For the pre-stimulus seed-correlation analysis, can you regress out the pupil/jaw/whisker activity to confirm whether the context modulation is (or is not) movement-driven? It would be helpful to better understand whether the baseline correlation is driven by differences in lowlevel factors between the contexts, versus the higher-level decision rule/context.

      The reviewer raises an interesting point. However, we did not find a straightforward way to regress out movements, and thus we leave this point for future in-depth analysis. On P. 11 Lines 354-357 we now write: “It is important to note that these context-dependent changes in resting-state functional connectivity could relate to the overt context-dependent movements in the prestimulus baseline (Figure 1I&J) and/or a manifestation of higher-level internal rule representations.”

      (6) For the earliest divergence analysis, is this consistent across animals and across sessions within animals? Can you show per-mouse distributions of first-crossing times (d′>2) for RSC vs wM1/2/wS2? This would help provide confidence in this key finding.

      The d’ presented in Figure 3H is computed as the discriminability between contexts at the population level, meaning that at each timepoint (from Figure 3F) we compared the 2 distributions built on N=6 mice. As such if the divergence between context was not consistent across animals this d’ would be low. That said, as suggested by the reviewer, we further investigated this context divergence at single mouse level and single session level. Our analysis supporting the main finding (Figure 3F-H) is shown in new Figure 3 – figure supplement 3.

      First, we show the results for a single mouse across sessions in Figure 3 – figure supplement 3A. We show the stimulus aligned activity in correct whisker trials in both contexts for the 3 recording sessions. For each session we quantified the main effect size defined as the difference of the trial average between contexts. Plotting the difference of mean response, we consistently observed that RSC ramps-up before wM1/2 for the 3 sessions.

      Second, across all individual mice: we further aggregated the session average responses to show discriminability between context for each region at the single mouse level (Figure 3 – figure supplement 3B). We show that RSC is the first region to exhibit context separation in 4 out of the 6 mice that we recorded. In 2 other mice all regions seemed to show context separation but without clear temporal ordering.

      Finally, when averaging across mice, we observed a clear separation and first discrimination in RSC (Figure 3F-H and Figure 3 – figure supplement 3C).

      Overall, these further analyses suggest that the early divergence of RSC activity appears to be robust with a consistent mean difference in single sessions and single mice, as well as across the population of mice. We think this analysis has strengthened our manuscript and we thank the reviewer for the valuable suggestion.

      (7) For the opto mapping data, could you provide P(lick) effect sizes with CIs per grid site? It would also be nice to summarize the qualitative dichotomy: RSC/tjS1 increases licking in W−; canonical wS1/wS2/wM/ALM decreases licking across contexts (to my understanding).

      We now provide the P(lick) effect sizes for the main cortical areas studied in the paper in Figure 2 – figure supplement 1C. This shows the relative change in lick probability in optogenetic trials compare to control trials for each mouse.

      Reviewer #2 (Recommendations for the authors):

      (1) Do mice move their whiskers after stimulus onset? If so, are these movements dependent on behavioral context? What causes the increase in S1 activity during auditory-evoked response trials?

      To answer the reviewer’s questions we have further investigated whisker movements following the sensory stimuli (whisker and auditory correct trials) in both contexts. The results of this analysis are presented in new Figure 3 – figure supplement 4.

      We find that mice move their whiskers shortly after the whisker stimulus in both contexts. The time course of whisker angle in correct whisker trials is similar in both contexts with a discriminability index (d’) consistently below 1. The whisker speed in response to stimulus is slightly higher in the W+ context compared to W- with a d’ slightly above 1 after ~100 ms. We also observed evoked whisker movements in auditory trials independent of context. Thus, whisker movements are indeed evoked by the sensory stimuli, but the overall context-dependent modulation of whisker movements is weak. The early differences in whisker-evoked cortical activity in W+ compared to W- contexts are therefore more likely related to the integration of contextual information than to differences in evoked movements.

      The reviewer is correct to point out that wS1 activity increases in auditory trials (Figure 3E). The response is initially very weak, but becomes more prominent after ~100 ms following the auditory tone. We do not know the underlying mechanisms, but there are several likely explanations. First, as discussed above, there are indeed some whisker movements evoked in response to the auditory stimulus (Figure 3 – figure supplement 4), which could result in sensory input to wS1. Equally, the increase could relate to licking, given the broad representation of movements in cortex and an appropriate reaction time in auditory trials (Figure 3C). Alternatively, wS1 activity in auditory trials could also be related to input connectivity from auditory cortex, top-down input from frontal cortex or subcortical regions such as high-order POm.

      (2) What do the authors think is causing the W+ vs W- difference in S1/S2 activity approximately 100ms after whisker deflection?

      The late W+ vs W- difference in wS1/wS2 activity could be explained by several factors. First this could be due to the difference in whisker movements after ~100 ms as shown in Figure 3 – figure supplement 4. Second this could be driven by the lick vs no lick activity (see reaction time in Figure 3C for whisker trials ~110 ms). Finally, this could be partly due to some movement independent top-down contextual information reaching wS1/wS2 at late time points. Overall, our claim in the paper is that there was no contextual difference in whisker primary and secondary cortices at early time points (before movement). On P. 9 Lines 270-273 we explicitly write: “Early after stimulus onset, whisker deflection evoked similar activation of primary and secondary whisker somatosensory cortices (wS1 and wS2) in both W+ and W− contexts.” In contrast, our main findings are grounded in the divergence of cortical activity in RSC and wM1/2 at early time points (<100 ms).

      (3) The choice of PC3 seems arbitrary. Is there no task-relevant information in PC1 and PC2?

      We appreciate the point raised by the reviewer and have clarified the reasoning leading to PC3 selection in the main text, where on P. 12-13 Lines 384-391 we now write: “The loadings of the first principal components were uniformly distributed and could reflect a late movement driven activation distributed across all cortical areas (Figure 4 – figure supplement 2C&D). PC2 loadings show variation along the anteroposterior axis that could reflect differences between sensory and motor regions but its time course does not separate between lick and no lick in control conditions (Figure 4 – figure supplement 2C&D). The loadings of PC3 highlighted task-related cortical regions and its time course exhibited clear differences comparing lick and no-lick trials.” In addition, we now also show the time courses for PC1 and PC2 in Figure 4 – figure supplementary 2D.

      Overall, the reasoning is the following:

      PC1 has spatially-homogeneous positive loadings (Figure 4 – figure supplementary 2C) and activity along PC1 gradually ramps up following sensory stimulation (Figure 4 – figure supplementary 2D). It is likely driven by widespread activation of the cortex following the whisker stimulus and the lick response. As such we believe that the taskrelated information captured by PC1 is movement related and not necessarily informative about processing of whisker and context.

      PC 2 has loadings varying along the antero-posterior axis (Figure 4 – figure supplementary 2C), which could be relevant for the task, but its time-course does not discriminate between lick and no lick neither in W+ nor W- (Figure 4 – figure supplementary 2D).

      PC3 has both loadings that vary between several cortical regions involved in the task (Figure 4 – figure supplementary 2C) and a time course that separates between lick and no lick in both contexts (Figure 4 – figure supplementary 2D). We thus focus on PC3 to investigate the effect of optogenetic inactivation on whisker stimulus evoked activity.

      The remaining components beyond PC3 contain a very small fraction of variance and were thus not considered.

      (4) Figure 3 - Supplement 1: What explains the change in fluorescence in GFP/tdT mice during W+ stimulation? Is it brain movement on the z-dimension? Could this explain differences in calcium imaging results?

      We thank the reviewer for this question. The nature of intrinsic signals is a complex topic, but brain movement is unlikely to contribute importantly, because under similar behavioral conditions we (and others) typically find brain movements to be on the scale of a few microns. The three most widely-reported contributions to intrinsic optical changes in cortex relate to:

      (i) Light scattering – as neurons integrate synaptic inputs and fire action potentials, the neuronal elements swell slightly due to the ionic and water fluxes (see for example Vincis et al. Cell Reports 2015, doi: 10.1016/j.celrep.2015.06.016). This reduces the refractive index mismatch between the intracellular and extracellular space. This in turn reduces light scattering, which could result in fluorescence increases.

      (ii) Hemodynamics – changes in blood volume and changes in oxygenation/deoxygenation will change the absorption of light at different wavelengths, in an activity-dependent manner (also forming the basis of BOLD fMRI signals).

      (iii) Flavoproteins – endogenous fluorescent proteins, such as flavoproteins present at high levels in mitochondria, have been reported to change their fluorescence depending upon neuronal activity, presumably in relationship to increased mitochondrial activity.

      We therefore think it is very important to image GFP/tdTomato-expressing mice as controls, and we would suggest that this should be carried out more commonly in the field. Indeed, similar to our results, another study (Yogesh et al., eLife 2025, doi: 10.7554/eLife.104914) recently reported upon the importance of carefully examining intrinsic fluorescence changes, which were found to be present in both wide-field and two-photon imaging of GFP expressing mice.

      Our results reported in Figure 3 – figure supplement 1, show that GFP/tdTomato signals over the first ~120 ms following whisker stimulation were much smaller that the equivalent changes in GCaMP6f/jRGECO1a-expressing mice, and therefore would only have a minor contribution to our analyses. However, we refrained from analysing fluorescence changes at later post-stimulus times, because the intrinsic signals indeed become increasingly prominent as the mice initiate licking.

    1. eLife Assessment

      This is an important contribution that confirms prior evidence that word recognition - a cornerstone of development - improves across early childhood and is related to vocabulary growth. This study is distinguished by its use of a large, multi-study dataset that is uncommon in prior research on cognitive development. It provides compelling evidence that speed, accuracy, and consistency of word learning improve with age, and will therefore prove of interest to those studying language, and more broadly, perception and development.

    2. Reviewer #1 (Public review):

      Summary:

      The study examined the extent to which children's word recognition skill improves across early development, becoming faster, more accurate and less variable, and the extent to which word recognition skill is related to children's concurrent and later vocabulary knowledge.

      The main strength of the study comes from the dataset which recycles previously collected data from 24 studies to examine the development of word recognition skill using data from 1963 children. This maximizes the impact of previously collected data while also allowing the study to reliably ask big picture questions on the development of word recognition skill and its relation to chronological age and vocabulary knowledge. Data analysis is rigorous, thought through and very clearly described. Data and code necessary to reproduce the manuscript are shared on the project's Github. The limitations of the study are acknowledged and the manuscript does well to tone down the causal implications of their results.

    3. Reviewer #2 (Public review):

      Summary:

      This paper presents a series of analyses of a large dataset combining many prior studies of early word recognition (Peekbank). The analyses demonstrate that the speed, accuracy and consistency of word learning improves with age. Moreover, the speed of word learning early in development was related to vocabulary growth over time.

      Strengths:

      A key strength of the paper is the use of a large multi-study dataset. This is particularly valuable in the field of early cognitive development, which has (due to practical limitations) often been based on small-scale studies that necessarily provide a shaky foundation for conclusions. The analyses are also well-motivated.

      Weaknesses:

      In an earlier version of the manuscript, the meaning of "word recognition ability" was ambiguous and could have referred to either (A) an intrinsic ability that matures, or (B) knowledge of the common, concrete words typically used in these studies that increases with experience. The revised version of the manuscript identifies these two interpretations and acknowledges that they cannot be teased apart in the current work.

    4. Author response:

      The following is the authors’ response to the original reviews

      General note

      We have issued a new release of the general Peekbank database, 2026.1, which includes more data integrity checks and several more datasets. As a result of this release, the underlying dataset we use in our paper has shifted slightly. The shifts represent a relatively small proportion of the total data and thus these changes have caused only relatively minor changes to our numerical results. We also highlight that we now include a small amount of data regarding children younger than 12 months, increasing the developmental range of our analysis (see Figure 1).

      Reviewer 1 (Public review):

      The limitations of the study are acknowledged to some extent, but need to be improved and ensured that they run throughout the manuscript. Thus, in the discussion, the authors note that the approach is observational and exploratory, and highlight for me a key alternative explanation of the findings, namely that faster children could be faster due to their larger vocabulary, rather than faster children learning more words. Indeed, the latter explanation for the relationship is called into question, given that growth in speed was not related to growth in vocabulary. Here, the authors note that the null result may be related to the fact that they do not sufficiently precise estimates of growth slopes, rather than taking the alternative explanation seriously that there may not be as causal a link between being a faster word learner and a better word learner (learn more words).

      Thank you very much for your challenging and thoughtful comments. In hindsight we did not realize that the way we were writing about our results was ambiguous between several interpretations (one of which we endorse and one of which we do not).

      We respond below to the specific suggestions about causal directionality in the longitudinal analysis, but we certainly believe that we cannot draw strong conclusions about causality from our dataset and have attempted throughout the paper to remove causal language that might have crept into our interpretation.

      In response to your comments, we have made a number of key revisions aimed at qualifying and clarifying our points:

      • The abstract now prominently notes that our design is observational: “In an observational study…”

      • The abstract notes a positive and a negative result in the relationship between word recognition and vocabulary: “Further, across a range of longitudinal models, speed, accuracy, and vocabulary were coupled. Children with overall faster word recognition tended to show faster vocabulary growth, though developmental growth in word recognition skill was not specifically associated with growth in vocabulary.”

      • The abstract removes potential casual language in the final sentence: “... these findings support the view that word recognition is a skill that develops gradually across early childhood and that this skill is deeply intertwined with early language learning.”

      • A new paragraph in the Results introduces the potential hypotheses investigated via the longitudinal models.

      • The final paragraph of the Results section sharpens the contrast between two possible growth hypotheses: “However, we did not find evidence for the stronger version of this claim: in neither the non-linear growth model nor the linear SEM did we find evidence that increases in speed were related to increases in vocabulary size. Thus, our findings do not support a ‘virtuous cycle’ model in which increases in recognition specifically lead to increases in vocabulary size.”

      We hope these changes lead to a manuscript that better aligns with the limitations of the study.

      This is especially since, but correct me if I’m wrong here, the current vocabulary size is not taken into consideration in the model examining vocabulary growth. Given the increasing number of studies showing that current vocabulary knowledge predicts vocabulary growth (Laing, Kalinowski et al, Siew & Vitevitch), one simple alternative explanation is that current vocabulary knowledge predicts both current word recognition skill and later vocabulary knowledge. Is there anything in the data speaking against this hypothesis?

      We think the reviewer’s overall point is generally correct, as we described above, but we want to clarify a specific statistical point. The non-linear longitudinal model of vocabulary growth does in fact take into account a child’s average vocabulary size. (This point feels tricky in a non-linear model but it’s actually quite similar to a linear model for the purposes of this discussion). Basically, vocabulary (at all timepoints) is modeled as a function of age, with both main effects and interactions with age. Critically, each participant is also modeled as having a random intercept capturing their deviation from the average growth pattern across ages (as expressed by the fixed effects). In this model, the “main effect” (here captured by the intercept for the logistic curve in the model) that we observe for speed indicates that vocabulary growth for individuals is predicted to be faster (their curve is shifted left) if their RTs are fast. The presence of the random effects in this model thus “controls” for the fact that some participants have overall higher vocabularies (and are shifted up relative to the average growth curve).

      But, we note that this model does not show an “interaction effect” (here captured by the null effect of RT on the slope parameter in the logistic model). That’s one of the null effects that we now call out much more prominently in the abstract and end of the results (per our response above).

      Equally, while the SEM examines vocabulary growth controlling for age, I wonder about the other way around. What would happen to the effect of age on word recognition skill (in the LME model, S8) if one were to add concurrent vocabulary size? So does chronological age explain word recognition skill or vocabulary knowledge? Right now, the manuscript describes this effect purely related to chronological age, but is it age per se or other cognitive abilities, including a key change across development, namely, vocabulary size? Thus, the presentation of the skill learning hypothesis suggests that age is a proxy for experience, while you actually have here a very nice proxy for experience in terms of children’s vocabulary size.

      Again, thank you for engaging with this tricky set of issues. Overall, our goal is to adjust the manuscript to reflect points of agreement; in particular, we agree that age is a proxy for language experience, vocabulary, and other cognitive changes, and we have stated this explicitly now in the intro to the factor analyses: “In our prior analyses, chronological age acts as a proxy for greater language experience and larger vocabulary as well as a host of other correlated developmental changes in cognition. Now we explicitly explore relations to vocabulary growth and the triadic relationship between age, word recognition, and vocabulary.”

      On the statistical side, we do think that the NLME (non-linear mixed effects; the logistic growth mode) effectively controls for average vocabulary size, as described above. The longitudinal SEM also relates vocabulary growth to growth in word recognition skill. In both models, we find no evidence for coupled growth; instead the evidence points to children with higher baseline word recognition skill showing faster growth in vocabulary (speed intercept significantly related to vocabulary slope, -.14, p < .01) but not the reverse (vocabulary intercept not strongly related to speed slope; -.01, ns).

      More generally, we hope our edits to the paper, detailed above, both clarify this tricky set of issues and also remove inappropriate casual language throughout.

      Critically, while the discussion is more nuanced, the way the abstract is concluded and the way the Introduction is phrased suggest that the study is able to answer a causal question, which, as the authors themselves note, is not possible. The abstract, for instance, states that word recognition becomes faster, more accurate and less variable...consistent with a process of skill learning. And also that this skill plays a role in supporting early language learning, which is very causal language. I don’t think you can really claim that you are testing the two hypotheses you suggest here. The work is definitely embedded in the context of these hypotheses, but are you really able to test them? My worry is that while the discussion is more nuanced, the extent to which this study will then be cited down the line as showing that children learn more words down the line because they are faster at recognizing words, and anything that you can do to tamper with such interpretations would be good for the literature. For me, this should not just be relegated to the discussion but should be touched upon in the abstract and Introduction.

      Thanks for pushing us to be more precise with how we frame and describe our findings. We agree with the reviewer that our findings do not warrant strong conclusions about the causal role of word recognition skill in vocabulary growth. Per our response above, we have now tried to carefully revise our language throughout the paper (in particular, in the abstract and introduction, as noted by the reviewer).

      Finally, it would help to talk more about the mechanisms at work in any relationship between word recognition and language learning. It seems to me that this would rely on some predictive processing framework, given the description on page 4, and it would be good to make this clear (faster and more accurately you can recognize a ball, better use this evidence to infer the speaker’s intended meaning).

      Thanks, this is a great point. We’ve revised this text and added references to predictive processing, unpacking a problematic paragraph into two:

      “Familiar word recognition -- as measured by LWL -- is hypothesized to play a key role in language learning (19). The idea, in a nutshell, is that the faster and more accurately a child can process incoming words, the more opportunities they have for learning. Consider a child hearing the utterance "Can you put the ball in the crate?" The better the child can recognize the word "ball", the better they can use this evidence to help infer the speaker's intended meaning, allowing possible inferences about the meaning of the less familiar word, "crate" (20).

      “Real time language processing, including word recognition, relies heavily on predictive processing, in which comprehenders integrate expectations from prior linguistic context with noisy and ephemeral incoming signals (21, 22). The more input a child receives, the better their predictions are likely to be, and hence the more they can learn (19, 23). Indeed, measurements of children's language input at home are consistently associated with their vocabulary size (24, 25). And, in line with this predictive processing framework, one important study found that children's word recognition speed mediated the longitudinal relationship between home language input and vocabulary growth (26). Thus, word recognition is thought to be a key support for ongoing word learning.”

      Equally, when referring to word recognition, it would be good to clarify what this refers to - how well a child knows what a word refers to (and in the context of LWL, what it does not refer to) or how quickly it directs attention to what is referred to.

      Thanks, we’ve added a capsule definition in the second paragraph, and added the sentence “This procedure [LWL] measures the general construct of word recognition by operationalizing knowledge of a meaning as visual attention to a specific named referent.” We hope this clarifies the relationship between LWL and word recognition.

      With regards to the data, I wonder if there is a clustering of kids past 24 months that is happening here, looking at Figures 1 and 2, where it seems like there is less change past the 24-month point. Is there any way to look at whether the effect of age or vocabulary on word recognition is not linear but asymptotic?

      Thanks for pointing this out; we do see what you are talking about but think it’s being handled appropriately in the analysis. In Figure 1 it clearly looks like changes to RT are asymptotic – this is why we analyze the logarithm of RT throughout the paper. In Supplement S6 we show that reaction time is indeed best fit by a log-log function. Your question about Figure 2 asks whether there is further structure beyond the log-log fit; in Supplement S7 we show some analyses that suggest a polynomial fit is not better than the log-log fit; there is some small additional linear effect of age over and above the log-log fit, but it’s minor and pretty hard to interpret in our view.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Page 3. Word production may manifest in overt behaviour but need not reflect complete knowledge. A child can say the word dog and use it to refer to a cat.

      This is a good point. Since we are not able to speak to the precision of meaning representations (an important issue in its own right), we have omitted the phrase “with incomplete knowledge.”

      Page 4. The first two sentences of the paragraph beginning with word recognition ability... don’t go together. The second sentence does not support the claim that word recognition plays a role in language learning.

      Thanks, we’ve tried to smooth out this transition as part of unpacking the role of predictive processes.

      Page 4. “predicts children’s standardized test scores years later” - make clear what test scores are here.

      We added some additional details. The specific tests were the CELF (expressive language) and the KABC (IQ), but we thought too much detail might be distracting.

      Page 5. I love Table 1, but would like for the data to be weighted somehow. So, given that some studies had a lot more trials and more children, what percentage of the data did this study contribute? That allows a clearer view of how biased the sample is in certain studies. The x in CDIS and longitudinal could be aligned to the right. I kept wondering why there was an x near some trials.

      Thanks, we’ve adjusted the table to add the percentage of the total dataset (in trials) due to each study and fixed the alignment issue.

      Page 6. 12 million individual samples: what samples are these? Individual data points per trial per time point. Making this clear would be great.

      Clarified, thanks.

      Page 9. Your accuracy measures only seem to consider the target. From what I remember of my preferential looking days, this measure usually also includes the distractor. Why do you not do this? This is especially since you have such a wide age range, so if a 12-month-old only looks for about 50 per cent of the trial and spends that time looking at the target, that is very different from a child who looks at the screen all of the trial and spends less time looking at the target here.

      Sorry for any lack of clarity: we do in fact compute accuracy as the ratio of looking to target over looking to target plus looking to distractor. We have added this information to the parenthetical referenced above: “... accuracy (more target looking; computed as the ratio of target to target plus distractor looking)”.

      Page 12. I only found out that age was in this model by looking at S9.

      Thanks for mentioning this omission, we’ve clarified in the text: “We initially add age as an additional variable to our models to explore whether this factor structure relates to age; later we treat age as a predictor of latent factors.”

      Page 12. Isn’t it trivial that speed and accuracy show negative covariance, especially given how you measure accuracy? Thus, if I take longer to fixate the target, I have less time to look at the target during the trial. If, however, I included the distractor in my accuracy measure, then I could still take longer to look at the target, but still look more at the target than the distractor.

      Thanks for mentioning that this covariance is not the key result of interest; that observation didn’t come out in the text. Now we note that this covariation is “... as expected since they [speed and accuracy] are derived from the same data.” Note per above that accuracy is computed as target / target + distractor looking; even so, your observation is correct: slower looking at the target means lower accuracy at least to some degree.

      Page 19. If you excluded data from trials with less than 50% of timepoints, how did this vary across age? Arguably, your study has to worry less about this, given your sample size, but it would be nice to know, which you could include in the percentage of data that each study contributed to the final sample.

      Thanks, we’ve added this information to a new table in S1.

      Reviewer #2 (Public review):

      First, I wasn’t entirely clear about what the authors meant by “word recognition ability”. For much of the manuscript (including the use of the term “word recognition ability” itself), this comes across as an intrinsic ability or skill that improves with development. Alternatively, the speed and accuracy metrics taken from studies in Peekbank might capture children’s increasing knowledge of the common, concrete words typically used in these studies. To me, this is a somewhat different construct from a general skill at recognizing words. It would be helpful if the authors could clarify which construct they intend to capture, or if it is not possible to distinguish between these constructs from the Peekbank data.

      In response to this comment and related comments above, we’ve added text to the first two paragraphs trying to clarify the general construct that we’re talking about – recognizing the meaning of a word in real-time language comprehension. We’ve also clarified several times throughout the introduction that we’re talking about familiar word recognition, that is, the ability to recognize specific known words. Further, we directly acknowledge the issue above in the introduction:

      “Critically, most word recognition paradigms use words that children at the target age are reported to understand and produce. They are thus not indices of vocabulary size but rather measures of how quickly and accurately the child can recognize a familiar spoken word and use it to guide their visual attention to a referent. However, it is unknown the extent to which specific responses reflect an individual child's general speed of language processing versus their familiarity of specific words.”

      Second, and relatedly, if the source of the age-related improvements is increasing experience with the common concrete words used in the Peekbank studies, then one might expect word recognition and improvements with age to be related to word frequency, given that more frequent words are experienced more often. Word frequency predicts word knowledge when assessed using CDI data. Can effects of frequency be detected in Peekbank word recognition metrics? If not, why? Similarly, is the speed and accuracy of word recognition in Peekbank data related to CDI-derived word age of acquisition, and again, if not, why?

      This is a fascinating set of ideas, and one that we’ve pursued extensively using the Peekbank data. Unfortunately, we think it is out of scope for the current paper, which focuses on child-level metrics (including vocabulary and processing measures). Right now the current paper doesn’t include any analysis of individual words.

      Just to expand a bit on the problem here: unfortunately, modeling word recognition as a simple linear function of (log) word frequency is only possible in the case that distractors are held constant (e.g., “ball” always has “book” as its distractor), because distractor frequency plays an important role in the recognition process. However, in our dataset, words are paired with many different distractors across studies. This property means a fairly complex model of the LWL decision process would be necessary for a model to successfully predict effects for individual words. While such a model is an exciting research goal, it’s not something we can include in the current manuscript.

      Finally, there is a bit of a risk of the main findings of this paper coming across as a foregone conclusion. I.e., how could it be otherwise that word recognition improves with development?

      Reviewer #2 (Recommendations for the authors):

      Regarding the feedback about the risk of the findings coming across as a foregone conclusion - perhaps a primary place in the paper where it would be useful to clarify this point is on page 6, in the paragraph beginning, “We investigate two specific hypotheses here. First, one influential theory...”. Here, it might be worth clarifying whether there are alternative ideas about the emergence of word recognition in childhood that predict different patterns, so that the findings of the current paper can be framed as shedding new light on word recognition in development, rather than a confirmation of the common-sense idea that word recognition must improve over development.

      Thanks, we appreciate this feedback and it’s something we’ve struggled with in this project. Our conclusion is that this paper does not constitute a binary hypothesis test of e.g., whether word recognition is linked to vocabulary development. Instead, we lean into the idea that there are empirical issues (rather than hypotheses) that have not been quantified sufficiently. Thus, we end the revised introduction with the following paragraph:

      “Across both of these issues, the contribution of our work here lies in the detailed quantitative description of development. Nearly every theory of language learning assumes some role for continuous developmental change in word recognition, but these assumptions have not previously been anchored to specific measurements. Hence neither the functional form of the assumed changes nor their concurrent and predictive relationships to vocabulary have been quantified. We leverage the Peekbank dataset to accomplish these goals.”

    1. eLife Assessment

      This important study is the first characterization of the phenotype caused by a lack of Eml3 expression in mice. Mutant animals present a disrupted pial basement membrane, leading to focal extrusions from the cerebral cortex, called ectopias. The methodology is convincing and the conclusions are solid, although further investigations on the molecular and cellular mechanisms are required to improve the manuscript. This work would be of interest to neural development biologists and human geneticists working on brain disorders.

    2. Reviewer #2 (Public review):

      Summary:

      In this manuscript, the authors investigate the role of the microtubule-binding protein EML3 during cortical development through the generation and characterization of an Eml3 mouse mutant. The authors focus mainly on the effects of EML3 loss on brain development, although Eml3 mouse mutants also present with developmental delay and growth restriction, and die perinatally due to respiratory distress caused by delayed maturation of the lungs. The main finding in the developing cortex is the presence of focal neuronal ectopias, which contain neurons from all cortical layers, as revealed by immunostaining. The authors use electron microscopy to show that ectopias seem to be caused by disruption to the pial basement membrane at early stages of development, which allows neurons to breach through it. To find a functional link between EML3 and the observed phenotype, studies are conducted that demonstrate expression of EML3 in radial glia cells and mesenchymal cells, both cell types involved in the formation and maintenance of the pial basement membrane. Furthermore, interaction partners for EML3 are identified through coIP-MS analysis, including tubulin beta-3, 14-3-3 proteins and cytoplasmic dynein light chain. However, mice carrying a mutant EML3 allele engineered to abolish the interaction between EML3 and cytoplasmic dynein light chain do not recapitulate any of the symptoms of complete EML3 loss.

      Strengths:

      The manuscript offers several important strengths that contribute significantly to the field. This study presents the first characterization of Eml3 knockout animals, providing novel insights into the role of Eml3 in vivo. Information on Eml3 function so far was restricted to cell culture data, so the results in this manuscript start to fill an important gap in our knowledge about this microtubule-binding protein. The experimental approach is carefully designed, with appropriate controls that ensure the reliability of the data. Moreover, the authors have addressed a key challenge in the analysis, namely the developmental delay of the knockout animals. By implementing a strategy to match developmental stages between wild-type and knockout groups, they allow for meaningful and valid comparisons between the two genotypes. Importantly, the authors have successfully generated three different Eml3 mutant mouse lines (knockout, floxed and with disrupted binding to cytoplasmic dynein light chain), which are very valuable tools for the broader scientific community to further study the roles of this gene in development and disease in the future.

      Weaknesses:

      While the manuscript presents valuable data, there are also several weaknesses that limit the overall impact of the study. Most notably, there is no clear mechanistic link established between the loss of Eml3 function and the observed phenotype, leaving the biological significance of the findings somewhat speculative, as it is not straightforward how a microtubule-associated protein can have an impact on the stability of the pial basement membrane. In this respect, but also in general for the whole manuscript, there seems to be a considerable amount of experimental work that has been conducted but is not presented, possibly due to the negative nature of the results. Additionally, the phenotype reported appears to be dependent on the genetic background, as it is absent in the CD1 strain. This observation raises concerns as to how robust the results are and how much they can be generalized to other mouse strains, but, more importantly, to humans.

    3. Reviewer #3 (Public review):

      Summary:

      This work aims to understand the role of Echinoderm Microtubule-associated Protein-like 3 (EML3) on embryogenesis and neocortical development. Importantly, this work shows that depletion of EML3 cause focal neuronal ectopias by disrupting the structural integrity of the pial basement membrane, describing a new model of cobblestone brain malformation. Another member of the EML family, EML1, has been already shown to trigger neuronal migration disorders, particularly subcortical band heterotopia by affecting cell polarity. The results presented here point to a different mechanism of action. The authors show that EML3 is expressed in radial glia cells and mesenchymal cells in the pial region and upon EML3 depletion (i.e., Eml3 mutant mice) the pial basement membrane is structurally damaged allowing migrating neuroblasts to ectopically migrate through. Answering, in this case, that the weakening of the pial basement membrane is a prerequisite of focal neuronal ectopias. The authors provide a meticulous characterization of the Eml3 mutant mice, strengthening the conclusions of the results.

      Strengths:

      The authors provide a very detailed analysis of the defects observed in Eml3 mutant mice, by providing not only results by inferred day of conception but by classifying embryos by their number of somite pairs.

      Weaknesses:

      Most of the weaknesses originally raised by the reviewer had been addressed.

    4. Author response:

      The following is the authors’ response to the original reviews

      The following revisions have been made to address most of the publicly available suggestions made by the Reviewers.

      We have also corrected formatting issues in two figure panels:

      Fig.1B: embryo ages added over placenta images.

      Fig. 4D: fixed a truncated label.

      Reviewer #1 (Public review):

      The study would benefit from clearer evidence and additional experiments that would help to establish the molecular and cellular mechanisms underlying the brain phenotype, the central topic of the work.

      We agree that additional experiments are necessary to elucidate the mechanism(s) by which EML3 deficiency causes the observed developmental phenotypes. However, as no further experimentation is possible due to the closure of our laboratory, we are committed to sharing available materials including custom antibodies and cryopreserved sperm from our mouse lines. We include previously generated experimental data not presented in the original submission. While these additional data do not reveal the mechanisms, we believe that sharing hypotheses that were experimentally ruled out will benefit the scientific community.

      M&M: we have added a section listing several tissue-specific Eml3 KOs generated. All of the generated cKO mice were indistinguishable from Eml3<sup>wt</sup> controls.

      Supp. Fig. 2 with staining for major PBM components has been added. We have included antibody information to M&M.

      Reviewer #2 (Public review):

      (1) While the manuscript presents valuable data, there are also several weaknesses that limit the overall impact of the study. Most notably, there is no clear mechanistic link established between the loss of Eml3 function and the observed phenotype, leaving the biological significance of the findings somewhat speculative, as it is not straightforward how a microtubule-associated protein can have an impact on the stability of the pial basement membrane. In this respect, but also in general for the whole manuscript, there seems to be a considerable amount of experimental work that has been conducted but is not presented, possibly due to the negative nature of the results. At least some of those results could be shown, particularly (but not only) the stainings for the composition of the ECM components.

      We agree that additional experiments are necessary to elucidate the mechanisms at play. While we cannot conduct further experiments, we provide additional existing data, including a new Supp. Fig. 2 showing ECM component staining. As this reviewer rightly anticipated, these results might not clarify the mechanism but sharing the hypotheses that were already experimentally tested will be helpful.

      (2) Additionally, the phenotype reported appears to be dependent on the genetic background, as it is absent in the CD1 strain. This observation raises concerns as to how robust the results are and how much they can be generalized to other mouse strains, but, more importantly, to humans.

      Indeed, we have determined that genetic background greatly influences the manifestation of developmental defects caused by absence or mutation of the EML3 protein in mice. Modifier genes appear to play a significant role in phenotypic expression. In humans, the presence or absence of such modifiers may result in a broad spectrum of outcomes from no clinical relevance, as seen in CD1 mice, to potential intrauterine mortality. We agree that this underscores the challenge of translating mouse model findings to human implications. Future studies could include a search for EML3 non-coding regulatory mutations and expanded analysis of neuronal development defects, such as COB, as well as cases of intrauterine growth restriction (IUGR).

      (3) There is no data included in the manuscript about the generation and analysis of the Eml3AAA/AAA mouse line. This is an important omission, especially as no details on the validation or phenotypic characterization of this additional mouse line are provided. Including these elements would greatly strengthen the rigor and interpretability of the work, especially if that mouse line is to be shared with the scientific community.

      We acknowledge this oversight and have added a Materials and Methods section describing the generation of Eml3 TQT86AAA mice. Validation of the Eml3 TQT86AAA mice included showing absence of EML3-DYNLL binding in our co-IP MS data in Table 3. We state that the validated Eml3 TQT86AAA mice were phenotypically indistinguishable from Eml3<sup>wt</sup> control mice.

      Reviewer #3 (Public review):

      (1) Besides the data provided in the figures, the authors report a significant amount of experiments/results as "Data not shown". Negative data is still important data to report, and the authors may want to choose some crucial "not shown data" to report in the manuscript.

      We have incorporated key datasets previously omitted, with priority given to those specifically requested by Reviewer #2.

      (2) Results in Figure 3A apparently contradict results in 3B. A better explanation of the results should improve understanding of the data. Even though the conclusion that the "onset and progression of neurogenesis is normal in Eml3 null mice" seems logical based on the data, the final numbers are not (Figure 3A) and this should be acknowledged, as well.

      We provide further explanations for the data presented in figures 3A and 3B to better convey the fact that the two datasets are not contradicting. In essence, since Eml3 null mice are developmentally delayed (as determined by the number of somites at a specific age, Fig. 1C), the milestones in neurogenesis are reached at a later age in Eml3 null mice, thus at embryonic age E11.5 Eml3 null mice have fewer TBR2-positive cells (Fig. 3A). However, Eml3 null mice have reached the same neurogenesis milestones as their WT counterparts when they have the same number of somites (Fig. 3B).

      Results section for Fig. 3: we provide additional explanations that reconcile the results shown in Fig. 3A and Fig. 3B.

      (3) The authors should define which cell types are identified by SOX1 and PAX6.

      We have defined the expression timing and cell identity marked by SOX1 and PAX6 in neural progenitors during cortical development.

    1. eLife Assessment

      In this important study, Li et al. identify estrogen receptor 1-expressing neurons (ESR1+) in Barrington's nucleus as key regulators coordinating both bladder contraction and the relaxation of the external urethral sphincter. Using appropriate and validated methodologies aligned with the current state of the art, the data are convincing and of generally high quality.

    2. Reviewer #1 (Public review):

      [Editor's note: this version has been assessed by the Reviewing Editor with further input from the original reviewers. The authors have addressed the comments raised in the previous round of review.]

      Summary:

      Urination requires precise coordination between the bladder and external urethral sphincter (EUS), while the neural substrates controlling this coordination remain poorly understood. In this study, Li et al. identify estrogen receptor 1-expressing neurons (ESR1+) in Barrington's nucleus as key regulators that faithfully initiate or suspend urination. Results from peripheral nerve lesions suggest that BarEsr1 neurons play independent roles in controlling bladder contraction and relaxation of the EUS. Finally, the authors performed region-specific retrograde tracing, claiming that distinct populations of BarEsr1 neurons target specific spinal nuclei involved in regulating the bladder and EUS, respectively.

      Strength:

      Overall, the work is done with high quality. The authors integrate several cutting-edge technologies and sophisticated, thorough analyses, including opto-tagged single unit recordings, combined optogenetics and urodynamics, particularly those following distinct peripheral nerve lesions.

      Comments on revised version:

      During the revision, the authors have adequately addressed my concerns and made the suggested changes accordingly. I have no additional comments.

    3. Reviewer #2 (Public review):

      Summary:

      The authors have performed a rigorous study to assess the role of ESR1+ neurons in the PMC to control coordination of bladder and sphincter muscles during urination. This is an extension of previous work defining the role of these brainstem neurons, and convincingly adds to the understanding of their role as master regulators of urination. This is a thorough, well-done study that clarifies how the Pontine micturition center coordinates different muscle groups for efficient urination, but there are some questions and considerations that remain.

      Strengths:

      These data are thorough and convincing in showing that ESR1+ PMC neurons exert coordinated control over both the bladder and sphincter activity, which is essential for efficient urination. The anatomical distinctions in pelvic versus pudendal control is clear, and it's an advance to understand how this coordination occurs. This work offers a clearer picture of how micturition is driven.

      Weaknesses:

      The dynamics of how this population of ESR1+ neurons is engaged in natural urination events remains unclear. Not all ESR1+neurons are always engaged, and it is not measured whether this is simply variation in population activity, or if more neurons are engaged during more intense starting bladder pressures, for instance. In particular, the response dynamics of single and doubly-projecting neurons are not defined. Additionally, the model for how these neurons coordinate with CRH+ neuron activity in the PMC is not addressed, although these cell types seem to be engaged at the same time. Lastly, it would be interesting to know how sensory input can likely modulate the activity of these neurons, but this is perhaps a future direction.

    4. Reviewer #3 (Public review):

      Summary:

      The paper by Li et al explored the role of Estrogen receptor 1 (Esr1) expressing neurons in the pontine micturition center (PMC), a brainstem region also known as Barrington's nucleus (Hou wt al 2016, Keller et al 2018). First the author conducted bulk Ca2+ imaging/unit recording from PMCESR1 to investigate the correlations of PMCESR1 neural activity to voiding behavior in conscious mice and bladder pressure/external urethral muscle activity in urethane anesthetized mice. Next the authors conducted optogenetics inactivation/activation of PMCESR1 to confirm the contribution to the voiding behavior also conducted peripheral nerve transection together with optogenetics activation to confirm the independent control of bladder pressure and urethral sphincter muscle.

      Comments on revised version:

      No concerns. All my major questions were addressed.

    5. Author response:

      The following is the authors’ response to the original reviews.

      We would like to express our deep appreciation to the editor and reviewers for their constructive comments and suggestions, which have significantly improved the quality of our manuscript. In response, we have carefully revised the manuscript, addressed all comments, and performed additional experiments and analyses to strengthen our findings.

      (1) We repeated retrograde tracing using CTB-647 to verify precise targeting of SPN and DGC neurons, as shown in the new Figure 7.

      (2) We performed dual retrograde tracing combined with fiber photometry or optogenetic activation to investigate the role of PMC dual-projecting neurons in the control of urination, as shown in Figure supplements 11 and 12.

      (3) We conducted new experiments activating PMC<sup>ESR1+</sup> neurons after PDNx to assess their role in urination, as shown in new Figure 6.

      (4) We added a more detailed analysis of the dynamics of neural responses in PMC<sup>ESR1+</sup> neurons in Figure supplements 3F-3G.

      (5) We analyzed peak Ca<sup>2+</sup> signals in the PMC during and after the onset of EMG bursting, as shown in Figure supplement 4F.

      (6) We added a comparison of spontaneous and light-induced spikes in PMC<sup>ESR1+</sup> neurons, as shown in Figure supplements 3B–3C.

      (7) We expanded the Discussion to address how PMC<sup>ESR1+</sup> neurons coordinate bladder contraction and sphincter relaxation to control both the initiation and suspension of urination.

      We hope these revisions meet the reviewers' expectations and contribute to the improvement of our manuscript.

      Reviewer #1 (Public review):

      Summary:

      Urination requires precise coordination between the bladder and external urethral sphincter (EUS), while the neural substrates controlling this coordination remain poorly understood. In this study, Li et al. identify estrogen receptor 1-expressing neurons (ESR1+) in Barrington's nucleus as key regulators that faithfully initiate or suspend urination. Results from peripheral nerve lesions suggest that BarEsr1 neurons play independent roles in controlling bladder contraction and relaxation of the EUS. Finally, the authors performed region-specific retrograde tracing, claiming that distinct populations of BarEsr1 neurons target specific spinal nuclei involved in regulating the bladder and EUS, respectively.

      Strengths:

      Overall, the work is of high quality. The authors integrate several cutting-edge technologies and sophisticated, thorough analyses, including opto-tagged single unit recordings, combined optogenetics, and urodynamics, particularly those following distinct peripheral nerve lesions.

      We are grateful for your insightful and constructive comments, which affirmed the importance and technical depth of our work. Thank you for dedicating your expertise and time to reviewing our manuscript. Guided by your suggestions, we have revised the paper as detailed below.

      Weaknesses:

      (1) My major concern is the novelty of this study. Keller et al. 2018 have shown that BarEsr1 neurons are active during urination and play an essential role in relaxing the external urethral sphincter (EUS). Minimally, substantial content that merely confirms previous findings (e.g. Figures 1A-E; Figures 3A-E) should be move to the supplementary datasets.

      Thank you for this valuable and constructive comment. We fully agree that the novelty of our study relative to Keller et al., 2018 must be made explicit. Keller et al. established that PMC<sup>ESR1+</sup> neurons are active during socially evoked urine-marking behavior (voluntary urination) and demonstrated their essential role in relaxing the EUS. Their study mainly focused on behavioral context and EUS relaxation. In contrast, our work addresses a distinct, mechanistic question: how these same neurons participate in reflexive, physiological urination and coordinate both bladder detrusor contraction and EUS relaxation.

      Novel aspects of the present study:

      (1) Temporal dynamics of PMC<sup>ESR1+</sup> neurons during reflexive micturition.

      Using opto-tagging and single-unit recordings, we reveal the precise firing pattern of PMC<sup>ESR1+</sup> neurons during reflexive voiding. Simultaneous fiber photometry, cystometry, and EUS-EMG recordings demonstrate that population-level activity of PMC<sup>ESR1+</sup> neurons precedes and tightly correlates with both bladder contraction and EUS relaxation a coordination not previously demonstrated.

      (2) Causal role in reflexive urination.

      Manual closed-loop optogenetic inhibition at the onset of reflexive voiding acutely terminates EUS bursting and bladder contraction, immediately halting urine release.

      (3) Dual control of bladder and EUS.

      Optogenetic activation combined with selective pelvic or pudendal nerve transection shows that PMC<sup>ESR1+</sup> neurons drive both bladder contraction and EUS relaxation, revealing a coordinating role beyond EUS relaxation alone.

      (4) Anatomical substrate for coordinated control of bladder contraction and EUS relaxation in reflexive urination.

      Retrograde tracing identifies three spinal-projecting sub-populations: SPN-only, DGC-only, and dual-targeting neurons, providing a circuit-level explanation for the simultaneous control of bladder and EUS.

      Following your suggestion, panels that merely replicate Keller et al. (former Figures 1A–1E and Figures 3A–3E) have been moved to new Figure Supplements 1 and 7, respectively, so that the main figures now emphasize the new mechanistic findings.

      (2) I also have concerns regarding the results showing that the inactivation of BarEsr1 neurons led to the cessation of EUS muscle firing (Figures 2G and S5C). As shown in the cartoon illustration of Figure 8, spinal projections of BarEsr1 neurons contact interneurons (presumably inhibitory) that innervate motor neurons, which in turn excite the EUS. I would therefore expect that the inactivation of BarEsr1 should shift the EUS firing pattern from phasic (as relaxation) to tonic (removal of relaxation), rather than stopping their firing entirely. Could the authors comment on this and provide potential reasons or mechanisms for this finding?

      Thank you for this crucial comment. We apologize that the representative EUS-EMG traces in Figures 2G and S5C were too small to be clearly seen and that the corresponding results description was not sufficiently accurate. We have now replaced these EMG traces with enlarged versions (revised Figures 2G and S5C) and revised the corresponding Results section (lines 184, 197, 340-341). Based on the enlarged traces, we found that acute photoinhibition of PMC<sup>ESR1+</sup> neurons at the onset of phasic EUS-EMG bursting shifted the EUS firing pattern from large-amplitude phasic bursts to low-amplitude tonic firing. This suggests that ongoing activity of PMC<sup>ESR1+</sup> neurons is required to maintain phasic EUS bursting. A similar shift from phasic to tonic EUS-EMG activity during optogenetic silencing of PMC<sup>ESR1+</sup> neurons was reported by Keller et al., 2018 (Figure supplement 8C), confirming the reproducibility of the phenotype. We propose that the potential mechanism of this low-amplitude tonic activity may be mediated in part by a spinal reflex pathway (the guarding reflex) for preventing urination, whereby the loss of PMC<sup>ESR1+</sup> neurons-mediated supraspinal facilitation reduces inhibition of spinal interneurons, leading to enhanced baseline excitability of EUS motor neurons in response to bladder afferent input during bladder distension (William C. de Groat et al., Comprehensive Physiology. 2015, PMID: 25589273).

      (3) Current evidence is insufficient to support the claim that the majority of BarEsr1 neurons innervate the SPN but not DGC. The current spinal images are uninformative, as the fluorescence reflects the distribution of Esr1- or Crh-expressing neurons in the spinal cord, along with descending BarEsr1 or BarCrh axons. Given the close anatomical proximity of these two nuclei, a more thorough histological analysis is required to demonstrate that the spinal injections were accurately confined to either the SPN or the DGC.

      Thank you for raising this important concern. To rigorously verify that our spinal injections were confined to either the SPN or the DGC, we performed new retrograde-tracing experiments in ESR1-Cre and CRH-Cre mice. We injected a mixture of AAV-Retro-DIO-mCherry or AAV-Retro-DIO-EGFP with the retrograde tracer CTB-647 specifically into the SPN or DGC (Methods, lines 465-466). Only animals in which CTB-647 fluorescence was strictly limited to the target nucleus, without detectable spread to the adjacent region, were included in the analysis (new Figures 7A and 7E). These results confirm our original observation that PMC<sup>ESR1+</sup> neurons comprise three distinct spinal-projection subpopulations: one (19.0%) targeting the SPN, one (52.2%) innervating the DGC, and a third (28.8%) projecting to both regions (Results, lines 304–306; new Figures 7F–7H). In addition, the majority of PMC<sup>CRH+</sup> neurons project to the SPN but not the DGC (new Figures 7B–7D; Results, lines 297–301). We have assembled new Figure 7 using the newly acquired spinal images and the validated data.

      Reviewer #1 (Recommendations for the authors):

      From the abstract: "Anatomically, PMCESR1+ cells possess two subpopulations projecting to either the pelvic or pudendal nerve". I don't think these neurons directly project to either nerve.

      Thank you for this precise comment. We apologize for incorrectly stating that PMC<sup>ESR1+</sup> cells project directly to the pelvic or pudendal nerves. In the revised Abstract (lines 32–36) we have rephrased the sentence to clarify the actual anatomy: “Anatomically, PMC<sup>ESR1+</sup> neurons consist of three distinct spinal-projection-based subpopulations: one targeting the sacral parasympathetic nucleus (SPN), one innervating the dorsal gray commissure (DGC), and a third that projects to both regions, thereby enforcing the coordination of bladder contraction and sphincter relaxation in a rigid temporal sequence.”. We trust this revision now accurately reflects the anatomical findings.

      Reviewer #2 (Public review):

      Summary:

      The authors have performed a rigorous study to assess the role of ESR1+ neurons in the PMC to control the coordination of bladder and sphincter muscles during urination. This is an important extension of previous work defining the role of these brainstem neurons, and convincingly adds to the understanding of their role as master regulators of urination. This is a thorough, well-done study that clarifies how the Pontine micturition center coordinates different muscle groups for efficient urination, but there are some questions and considerations that remain.

      Strengths:

      These data are thorough and convincing in showing that ESR1+PMC neurons exert coordinated control over both the bladder and sphincter activity, which is essential for efficient urination. The anatomical distinctions in pelvic versus pudendal control are clear, and it's an advance to understand how this coordination occurs. This work offers a clearer picture of how micturition is driven.

      We sincerely thank you for highlighting the rigor of our study and for recognizing the advance in understanding how PMC<sup>ESR1+</sup> neurons exert coordinated, anatomically segregated control over bladder and sphincter. We also appreciate the constructive suggestions that helped us further improve clarity, which we address point-by-point below.

      Weaknesses:

      The dynamics of how this population of ESR1+ neurons is engaged in natural urination events remains unclear. Not all ESR1+ neurons are always engaged, and it is not measured whether this is simply variation in population activity, or if more neurons are engaged during more intense starting bladder pressures, for instance. In particular, the response dynamics of single and doubly-projecting neurons are not defined. Additionally, the model for how these neurons coordinate with CRH+ neuron activity in the PMC is not addressed, although these cell types seem to be engaged at the same time. Lastly, it would be interesting to know how sensory input can likely modulate the activity of these neurons, but this is perhaps a future direction.

      Thank you for this insightful comment. First, we agree that not all ESR1+ neurons are consistently engaged during urination (Figure 1B). Because bladder pressure was not measured during the opto-tagging experiments, we cannot determine whether this reflects trial-to-trial variability in population activity or pressure-dependent recruitment of additional neurons. We speculate that stronger starting bladder pressures may recruit a larger subset of ESR1+ neurons, analogous to graded, pressure-dependent recruitment observed in peripheral sensory neurons (Bruns et al., J Neural Eng. 2011, PMID: 21878706; Marshall et al., Nature. 2020, PMID: 33057202).

      Second, using fiber photometry recording and optogenetic activation, we examined the dynamics of dual-projecting neurons in the PMC that were retrogradely labeled from the SPN and DGC. Their activity correlated with bladder contraction and sphincter relaxation, and optogenetic activation sequentially induced these events to trigger urination (see Recommendation #8). Although retrograde labeling captured only a subset of dual-projecting neurons, the results indicate that they coordinate bladder and sphincter activity.

      Third, previous studies suggest that PMC<sup>CRH+</sup> cells are associated with bladder contraction and likely serve as an integration center for context-dependent micturition behavior (Hou et al., Cell. 2016, PMID: 27662084; Ito et al., Elife. 2020, PMID: 32347794). We therefore propose that PMC<sup>CRH+</sup> cells establish the baseline conditions and contextual readiness for voiding, whereas PMC<sup>ESR1+</sup> cells act as the executive command to reliably initiate and execute the event.

      Finally, we agree that sensory inputs likely modulate PMC<sup>ESR1+</sup> neuron activity. Although this falls beyond the scope of the present study, it represents an important avenue for future investigation.

      Reviewer #2 (Recommendations for the authors):

      (1) In the introduction, the authors write that Keller 2018 only showed this ESR1 population to induce EUS relaxation, but those results also do show bladder contraction with photostimulation of this population. While the authors' work extends this finding in important ways, this should be acknowledged (line 60).

      Thank you for this important correction. We have now revised the Introduction to explicitly acknowledge that stimulation of neurons expressing estrogen receptor 1 (ESR1) in the PMC (PMC<sup>ESR1+</sup>) contributes to sphincter relaxation and increased bladder pressure (Introduction, lines 60-62), as originally reported by Keller et al., 2018.

      (2) I think a more detailed analysis of the dynamics of neural responses in the PMC ESR1 neurons would be valuable. For example: are the same cells always engaged before micturition, or do different populations activate on different trials? Can the authors comment on the half of the opto-tagged ESR1 population that is not firing during urination? Do they ever fire? A cell-by-cell analysis of which neurons are engaged over multiple trials would be very valuable to understand the dynamics of population activity. Figure 1H shows cumulative sessions, but what do single sessions look like?

      Thank you for these valuable comments. In response, we have performed refined single-trial analyses of neuronal activity, as detailed in the point-by-point replies below.

      For example: are the same cells always engaged before micturition, or do different populations activate on different trials?

      Among 11 PMC<sup>ESR1+</sup> units that showed urination-related excitation, 8 units exhibited a consistent firing increase in every voiding trial, whereas the remaining 3 increased their discharge in >78 % of trials (Figure 1B; new Figure supplement 3F). Thus, the same PMC<sup>ESR1+</sup> cells are recruited repeatedly, rather than distinct populations being activated on different trials. We have added this clarification to Results (lines 106–108).

      Can the authors comment on the half of the opto-tagged ESR1 population that is not firing during urination? Do they ever fire? A cell-by-cell analysis of which neurons are engaged over multiple trials would be very valuable to understand the dynamics of population activity.

      Approximately half of the opto-tagged PMC<sup>ESR1+</sup> cells showed no increase in firing rate during urination, yet exhibited spontaneous spikes at other times (new Figure supplement 3G), confirming their electrical competence. Because the PMC also participates in defecation, uterine activity, and other pelvic functions (Rouzade-Dominguez et al., Eur J Neurosci. 2003, PMID: 14686905; Schellino et al., Frontiers in Neuroanatomy. 2020, PMID: 33013330; Quaghebeur et al., Auton Neurosci. 2021, PMID: 34391125), these ESR1+ neurons may serve functions other than urination. We have now added this cell-by-cell analysis and discussion to the manuscript (Results, lines 108-112).

      Figure 1 H shows cumulative sessions, but what do single sessions look like?

      As shown in new Figure supplements 3F–3G, single-session raster plots reveal that PMC<sup>ESR1+</sup> neurons display consistent firing patterns across individual trials. Neurons whose firing rate increased during urination did so in most trials (Figure supplement 3F), whereas neurons unrelated to voiding remained silent or showed no discernible rate change during voiding across trials (Figure supplement 3G). These single-session observations are consistent with the cumulative population analysis shown in Figure 1H (new Figure 1B).

      (3) Supplemental Figure 4: It seems clear from this figure that NVCs are only occurring when the sphincter fails to engage. Can the authors quantify how often this is the case?

      Thank you for this important point. We have now quantified the occurrence of non-voiding contractions (NVCs) across all 229 bladder contraction events from 3 mice shown in Supplemental Figure 4. NVCs were observed exclusively when the external urethral sphincter failed to relax, accounting for 62/229 events (27.1 %), whereas coordinated voiding contractions (VCs) occurred in the remaining 167 events (72.9 %). These new data are presented in Figure supplement 4C.

      (4) Continuing from the above point: the authors say that the insufficient top-down drive or strength of activity from PMC ESR1 neurons is why NVCs occur. In looking closely, it also seems there is a small hump and subsequent increase in the calcium signal when the EUS bursting begins (particularly clear in Supplementary Figure 4). Could this instead mean that the bursting/urethral activity itself is feeding back onto the PMC to continue/enhance its activity, and it is instead the lack of sphincter bursting that results in the NVC? Could the authors analyze the signal during and after bursting starts? This model is consistent with one of the classic reflexes defined by Barrington, in which urethral fluid flow/activation enhances bladder contraction. The Figure 4 transection experiments do not fully answer this, as the authors are driving activity in the PMC at this time, but they could test this using PDN transection with fiber photometry recording.

      Thank you for this important point. We fully agree that EUS bursting may provide excitatory feedback to the PMC that sustains or even amplifies its activity, and that the absence of such feedback could underlie NVCs. To test this possibility, we re-analyzed the fiber-photometry traces aligned to the onset and offset of each EUS bursting (new Figure supplement 4). A small but consistent hump in the Ca<sup>2+</sup> signal appeared before bursting onset and the Ca<sup>2+</sup> signal continued to rise throughout the bursting (Figure supplement 4B, yellow arrow). The amplitude at bursting offset was significantly higher than both the NVC peak and the level recorded at bursting onset. These observations support the interpretation that urethral fluid flow/activation supplies excitatory feedback that reinforces PMC activity and bladder contraction, consistent with Barrington’s classic reflex. We have incorporated these new analyses into the revised manuscript (lines 145–155 and Figure supplement 4F).

      We agree that the positive-feedback loop described by Barrington’s classic urethra-to-bladder reflex is an intriguing mechanism. However, the PDN-transection experiment in Figure 4 was designed to determine if bladder contractions triggered by PMC<sup>ESR1+</sup> cells can proceed in the absence of sphincter bursting, not to evaluate this reflex. Incorporating simultaneous fiber-photometry recording into the PDN-transection experiment would therefore go beyond the scope of the present study. In future work we are keen to combine PDN transection with fiber photometry to further determine whether the urethra-to-bladder reflex contributes to the sustained PMC activity observed in our paradigm.

      (5) In Figure 4, is the timing of sphincter engagement different with ChR2 stimulation from what normally occurs? It appears that the bursting happens immediately upon activation whereas bladder contraction is a bit delayed.

      Thank you for this important observation. We have carefully re-examined the EMG traces from all animals shown in Figure 4. We confirm that the onset of sphincter bursting activity during ChR2 stimulation is indeed more rapid than during natural reflex voiding; nevertheless, the onset of phasic sphincter bursting during ChR2 stimulation remained delayed relative to the intravesical pressure rise (see Figure 8B).

      The immediate sphincter discharge visible in some trials was tonic EUS discharge or rare irregular bursting, not the typical EUS bursting. This tonic pattern corresponds to the spinal guarding reflex that suppresses urine leakage (Fowler et al., Nature Reviews Neuroscience. 2008, PMID: 18490916; Keller et al., Nature Neuroscience. 2018, PMID: 30104734). These segments were identified by their amplitude and spectral content and excluded from burst-onset analysis. Our analysis protocol therefore distinguishes tonic guarding activity from true phasic bursting, ensuring that only the latter was used to determine burst timing.

      (6) The explanation on line 299 about how spinal reflexes are impinging on this circuit is confusing. I agree that the bladder contraction stopping later than the EUS signal likely has something to do with spinal reflexes, but it seems this could instead be feedback from the urethral fluid flow, which continues bladder contractions (urethra-destrusor facilitative reflex). Could the authors clarify their thoughts here?

      Thank you for highlighting this ambiguity. We agree that the delayed cessation of bladder contraction could equally reflect either (1) the urethra-to-bladder facilitative reflex driven by ongoing urethral fluid flow or (2) spinal reflexes that we described. In the revised manuscript (Results, lines 343–349), we have re-worded the paragraph to make this dual possibility explicit, thereby avoiding an overly strong emphasis on spinal mechanisms alone.

      (7) A note on phrasing: the authors frequently say PMCESR1 cells drive sphincter relaxation, but then show an effect on sphincter bursting. Experienced readers might realize that relaxation and bursting are connected, but this might be confusing for readers and should be clarified in the text.

      Thank you for highlighting the potential ambiguity. We agree that the sentence “PMC<sup>ESR1</sup> cells drive sphincter relaxation” can seem paradoxical when our data show increased EUS bursting. In adult mice, the EUS does not remain continuously relaxed during voiding; instead, it generates rhythmic bursting composed of high-frequency spike clusters (active periods) alternating with low tonic activity (silent periods), resulting in rhythmic contractions and relaxations of EUS. This phasic activity acts as a pump that facilitates urine flow through the narrow rodent urethra (Kadekawa et al., Am J Physiol Regul Integr Comp Physiol, 2016, PMID: 26818058). The EUS bursting activity we recorded is consistent with the results reported in previous studies (Keller et al., Nat Neurosci, 2018, PMID:30104734; Ito et al., Elife, 2020, PMID:32347794).

      Consequently, when PMC<sup>ESR1</sup> neurons initiate bursting, they simultaneously generate the relaxation phases that separate the spikes. To make this explicit we have replaced the phrase “PMC<sup>ESR1+</sup> cells drive sphincter relaxation” with “PMC<sup>ESR1</sup> neurons trigger EUS bursting, which generates rhythmic sphincter contractions and relaxations.” (Results, page 7, lines 219-221). We have applied similar clarifications throughout the revised manuscript (Results, lines 125-129). We hope this revision eliminates any apparent contradiction.

      (8) The question remains as to which neurons (dual projecting, single projecting, or all?) are active in natural urination. This is possible to do through dual injection of retrograde virus in SPN and DGC that could coordinately turn on Gcamp, but this challenging experiment is perhaps beyond the scope of this paper. Even still, the authors could discuss their model for whether the dual- and single-projecting neurons are all engaged at once in a natural urination event. Do the authors have any data that could provide insight as to when these sub-populations are active? Results from the opto-tagging in Figure 1 (and comment #2 about single neuron firing properties) might provide a foundation for hypotheses or insights.

      Thank you for this valuable suggestion. We have now performed the experiment you proposed: dual injection of retrograde virus (AAV-Retro-Cre and AAV-Retro-DIO-GCaMP6s) in SPN and DGC were used to selectively label PMC dual-projecting neurons, and a 200-µm optic fiber was implanted above the PMC to record their Ca<sup>2+</sup> dynamics during natural urination (Figure supplement 11A and Methods, lines 470–474, 652-655). Dual-projecting neurons exhibited robust activation throughout the entire voiding phase that was tightly correlated with intravesical pressure rise and EUS bursting (Figure supplements 11A–11H). However, technical limits of current retrograde tools preclude selective isolation of single-projecting (SPN-only or DGC-only) subsets for independent fiber-photometry recordings and injection restricted to one target unavoidably labels both single- and dual-projecting cells. We now state this technical limitation explicitly (Discussion, lines 426-430).

      Accordingly, in the revised Discussion (lines 389-406), we integrate fiber-photometry Ca<sup>2+</sup> signals with single-unit data from opto-tagged recordings to propose several testable, non-mutually-exclusive models for how dual- and single-projecting PMC<sup>ESR1+</sup> neurons are engaged during natural urination: “Based on population dynamics obtained by fiber photometry (Figures 1D-1H, Figure supplements 1A-1F, and Figure supplements 11A-11H) and single-neuron firing properties recorded via optrode (Figures 1A-1C), we propose several mechanistic models for the engagement of dual- and single-projecting PMC<sup>ESR1+</sup> neurons during natural micturition. One possibility is that all three populations (dual-projecting, SPN-projecting and DGC-projecting neurons) are co-activated, with the dual-projecting subset acting as a “bridging amplifier” that sustains rising bladder pressure while coordinating EUS relaxation. Alternatively, SPN-projecting neurons may be recruited first to initiate bladder contraction, followed by DGC-projecting neurons that evoke EUS bursting and facilitate urine entry into the urethra; once flow begins, the urethro-detrusor facilitative reflex could recruit dual-projecting neurons to further enhance voiding efficiency. In addition, contextual or state-dependent urination—such as scent-marking behavior characterized by multiple voiding events with smaller volumes than reflexive urination—may predominantly rely on sequential and cooperative activation of single-projecting neurons. Other recruitment sequences remain conceivable. Future studies combining diverse urination-related behavioral paradigms with simultaneous recordings from projection-specifically labeled PMC neurons will be required to validate and refine these models.”

      Reviewer #3 (Public review):

      Summary:

      The paper by Li et al explored the role of Estrogen receptor 1 (Esr1) expressing neurons in the pontine micturition center (PMC), a brainstem region also known as Barrington's nucleus (Hou et al 2016, Keller et al 2018). First, the author conducted bulk Ca2+ imaging/unit recording from PMCESR1 to investigate the correlations of PMCESR1 neural activity to voiding behavior in conscious mice and bladder pressure/external urethral muscle activity in urethane anesthetized mice. Next, the authors conducted optogenetics inactivation/activation of PMCESR1 to confirm the contribution to the voiding behavior also conducted peripheral nerve transection together with optogenetics activation to confirm the independent control of bladder pressure and urethral sphincter muscle.

      We sincerely thank you for providing a thoughtful summary and insightful comments on our study.

      Weaknesses:

      (1) The study demonstrates that pelvic nerve transection reduces urinary volume triggered by PMC ESR1+ cell photoactivation in freely moving mice. Could the role of pudendal nerve transection also be examined in awake mice to provide a more comprehensive understanding of neural involvement?

      Thank you for this valuable suggestion. We conducted an additional experiment to determine the contribution of the pudendal nerve to PMC<sup>ESR1+</sup> neuron-driven voiding in awake mice. Bilateral pudendal nerve transection (PDNx) reduced the optogenetically evoked urine volume compared with sham-operated controls, yet photoactivation of PMC<sup>ESR1+</sup> neurons still reliably induced urination after PDNx (new Figure 6). Thus, bilateral integrity of the pudendal nerve is required for efficient PMC<sup>ESR1+</sup> neuron-driven voiding, most likely by transmitting the signals that entrain rhythmic EUS bursting. These data and experimental details have been incorporated into Figure 6, Results (lines 272–276), and Methods (lines 542–545).

      (2) While the paper primarily focuses on PMCESR1+ cells in bladder-sphincter coordination, the analysis of PMCESR1+-DGC/SPN neural circuits - given their distinct anatomical projections in the sacral spinal cord - feels underexplored. How do these circuits influence bladder and sphincter function when activated or inhibited? Also, do you have any tracing data to confirm whether bladder-sphincter innervation comes from distinct spinal nuclei?

      Thank you for this critical comment. To determine how PMC<sup>ESR1+</sup> neurons that target distinct sacral nuclei influence bladder–sphincter coordination, we first focused on the dual-projecting subset in a new experiment (Figures supplement 11 and Methods, lines 470–477, 652-655, 669-673). Dual retrograde virus injections into SPN and DGC selectively labelled PMC dual-projecting neurons, a subset of which are ESR1+. Fiber-photometry recordings showed that these cells were active during bladder contraction and sphincter relaxation (Figure supplements 11E-11H), whereas optogenetic activation reliably initiated urination: bladder pressure rose immediately and was followed by rhythmic EUS bursting (Figure supplements 11I-11N and 12B; Results, lines 309-313, 332-335). Thus, the dual-projecting sub-population is sufficient to coordinate bladder contraction with sphincter relaxation. Current retrograde tools do not allow selective isolation of single-projecting (SPN-only or DGC-only) subsets; injecting only one target unavoidably labels both single- and dual-projecting cells. Consequently, we cannot yet compare the functional impact of pure SPN-only versus DGC-only PMC populations. This limitation is now stated explicitly in the revised Discussion (lines 426–430).

      In our 2025 paper (Yan et al., Commun Biol, 2025, PMID: 40259086), we used PRV-based retrograde tracing to show that SPN and DGC constitute two separate spinal nuclei controlling the bladder and the EUS, respectively. Classic studies have reached the same conclusion (Yao et al., Nat Neurosci, 2018, PMID: 30361547; Karnup & De Groat, IBRO Reports, 2020, PMID: 32775758; Karnup, Auton Neurosci, 2021, PMID: 34391124). These citations and a concise summary have been added to the Results (lines 289–294).

      (3) Although the paper successfully identifies the physiological role of PMCESR1+ cells in bladder-sphincter coordination, the study falls short in examining the electrophysiological properties of PMC ESR1+-DGC/SPN cells. A deeper investigation here would strengthen the findings.

      Thank you for this thoughtful suggestion. While a detailed electrophysiological characterization of PMC<sup>ESR1+-DGC/SPN</sup> neurons would provide complementary information, the primary goal of the present study was to define the in vivo functional dynamics and behavioral role of these neurons during natural urination. As you suggested, further electrophysiological analysis of PMC<sup>ESR1+-DGC/SPN</sup> neurons will be an important direction for our future work.

      (4) The parameters for photoactivation (blue light pulses delivered at 25 Hz for 15 ms, every 30 s) and photoinhibition (pulses at 50 Hz for 20 ms) vary. What drove the selection of these specific parameters? Moreover, for photoactivation experiments, the change in pressure (ΔP = P5 sec - P0 sec) is calculated differently from photoinhibition (Δpressure = Ppeak - Pmin). Can you clarify the reasoning behind these differing approaches?

      Thank you for this opportunity to clarify our experimental design. The photoactivation protocol (25 Hz, 15 ms pulses) was chosen because PMC<sup>ESR1+</sup> neurons faithfully follow this frequency without depolarisation block and it reliably triggers voiding (Keller et al., Nat Neurosci, 2018, PMID:30104734). For photoinhibition we originally stated “50 Hz, 20 ms pulses”, but this was an error. Consistent with the same study (Keller et al., Nat Neurosci, 2018, PMID:30104734), we used continuous light (constant illumination) to maintain sustained suppression. The Methods section has been corrected (lines 659-661, 690-691).

      The ΔP formula was tailored to the temporal profile of each manipulation. For activation, ΔP (P<sub>5 sec</sub> - P<sub>0 sec</sub>) captures the rapid pressure rise after light onset; the same window was used in (Hou et al., Cell. 2016, PMID: 27662084). For inhibition, because saline infusion produces rhythmic reflex voiding, we delivered light at the onset of EUS bursting (i.e. when pressure was already at ~peak). Inhibition abruptly stops the bladder contraction, so the bladder cannot return to its pre-void baseline. The Δpressure (P<sub>peak</sub> – P<sub>min</sub>) was therefore used to quantify the extent to which the ongoing pressure wave was aborted by photoinhibition. P<sub>min</sub> is the lowest value reached before the next infusion-driven upswing, making the metric insensitive to the slow baseline drift produced by continuous infusion. These clarifications have been added to the Methods (Methods, lines 676-677, 679-680, 692-693).

      (5) The discussion could further emphasize how PMCESR1+ cells coordinate bladder contraction and sphincter relaxation to control urination, highlighting their central role in the initiation and suspension of this process.

      Thank you for this valuable comment. We have revised the Discussion to emphasize that PMC<sup>ESR1+</sup> neurons coordinate urination by sequentially driving bladder contraction followed by sphincter relaxation through their dual projections to the SPN and DGC. We also emphasized that this coordination is essential for the initiation and effective execution of voiding (Discussion, lines 369-388). In addition, in the revised Discussion (Discussion, lines 389-406), we integrate fiber-photometry Ca<sup>2+</sup> signals with single-unit data from opto-tagged recordings to propose several testable, non-mutually-exclusive models for how PMC<sup>ESR1+</sup> cells are engaged during natural urination.

      (6) In Figure 8, The authors analyze the temporal sequence of bladder pressure and EUS bursting during natural voiding and PMC activation-induced voiding. It would be acceptable to consider the existence of a lower spinal reflex circuit, however, the interpretation of the data contains speculation. Bladder pressure measurement is hard to say reflecting efferent pelvic nerve activity in real time. (As a biological system, bladder contraction is mediated by smooth muscle, and does not reflect real-time efferent pelvic nerve activity. As an experimental set-up, bladder pressure measurement has some delays to reflect bladder pressure because of tubing, but EUS bursting has no delay.) Especially for the inactivation experiment, these factors would contribute to the interpretation of data. This reviewer recommends a rewrite of the section considering these limitations. Most of the section is suitable for the results.

      We agree with the reviewer that bladder pressure, mediated by smooth muscle contraction, provides an indirect measure of efferent pelvic nerve activity and is subject to both physiological and experimental delays. Regarding potential delay from the tubing system, pressure propagates in fluid at approximately 1000 m/s (Kela & Pekka, Proceedings of World Academy of Science Engineering & Technology, 2009, DOI: 10.5281/zenodo.1080526). Given that the total tubing length in our setup is 0.5-1 meter, this gives an estimated transmission delay of only 0.5-1 ms. However, this delay is negligible compared with the observed time difference (~700 ms) between the cessation of EUS bursting and the termination of bladder contraction. Theoretically, pressure transmission is not expected to introduce a temporal delay. However, we cannot exclude the possibility that the pressure measurement itself may impose such a delay, because bladder pressure does not necessarily reflect efferent pelvic nerve activity in real time. Future studies using simultaneous recordings of bladder pressure and pelvic nerve discharges will help clarify whether a true temporal delay exists. Nevertheless, we agree that additional physiological or peripheral factors may also contribute to this difference in timing. As suggested by the reviewer, we have revised the discussion to consider the potential influence of other factors, such as urethra-detrusor facilitative reflex (Results, lines 343-349).

      Reviewer #3 (Recommendations for the authors):

      (1) In opto-tag experiments, a comparison of average AP waveform during behavior and during light stimulation should be included as criteria. It should be mostly the same waveform.

      Thank you for bringing this to our attention. We have now added this comparison as an inclusion criterion in the revised manuscript. Figure supplement 3B shows representative examples of the average waveforms, and Figure supplement 3C displays the distribution of correlation coefficients between spontaneous and light-evoked spikes for all recorded PMC<sup>ESR1+</sup> units, all of which exhibited r > 0.8.

      (2) Optical fiber implantation seems to be done in two different methods. In Figure 1 and Figure 2, the fiber tip is positioned just above PMC but in Figure 3 it seems to be angled. The information should be included in the Methods section.

      Thank you for this important comment. We have now clarified in the Methods that for Figures 1 and 2, the optical fibers were implanted vertically above the PMC, whereas for Figure 3, the left optical fiber was implanted at a 33° lateral angle targeting the PMC (Methods, lines 499-503).

      (3) In the closed-loop inhibition experiments of Figure 2, the parameters to start closed-loop photo-inactivation were not described in the method. If it is a manual closed loop, it should be described clearly.

      Thank you for raising this important point. We apologize for omitting these details in the original Methods. We have now added a complete description of the manual closed-loop photo-inhibition protocol, including the triggering criteria and operator-controlled timing, in the revised Methods section (lines 602–605).

      (4) In Figure 7A/E the authors provide a spinal cord image to show the injection site, but the image is misleading. The figure only shows AAV-infected CRH/ESR1 neurons in the spinal cord section. It does not indicate the AAV injection site or the terminal distribution.

      Thank you for your important comment. We apologize for providing a spinal cord image that did not accurately depict the injection site. To rigorously verify that our spinal injections were confined to SPN or DGC, we performed new retrograde-tracing experiments in ESR1-Cre and CRH-Cre mice. A mixture of AAV-Retro-DIO-mCherry or AAV-Retro-DIO-EGFP with the retrograde tracer CTB-647 was injected specifically into SPN or DGC. Only animals in which CTB-647 fluorescence was strictly limited to the target nucleus, without spread to the adjacent region, were included (new Figures 7A and 7E). These data confirmed our original observations and have been pooled in Figure 7. The manuscript and figure have been updated accordingly (Results, lines 297-301, 304-306; Methods, lines 465–466).

    1. eLife Assessment

      This manuscript applies a theoretical analysis to two published datasets on yeast and bacterial evolution to compare different ways of quantifying fitness. It makes an important advance by clarifying how discrepancies can arise by using different approaches and provides recommendations for best practices. Overall, this is an impressive and highly beneficial study that is based on convincing evidence and has the potential of setting standards in this rapidly growing field.

    2. Reviewer #1 (Public review):

      The authors point out that the fitness estimates obtained from different experimental assays (monoculture, pairwise competition or bulk competition) are not generally equivalent, not even with regard to the fitness ranking of different genotypes. Using a computational model based on experimentally measured growth phenotypes for knockout strains in yeast, as well as data from Lenski's Long Term Evolution Experiment (LTEE), they derive a set of best practice rules aimed at extracting the optimal amount of information from such experiments.

      The study is very complete on a technical level, and the conceptual weaknesses raised in the first round of reviews have been fully addressed in the revision.

    3. Reviewer #2 (Public review):

      Summary:

      The manuscript "Quantifying microbial fitness in high-throughput experiments" provides a comprehensive analysis of the various approaches to quantifying fitness in microbial evolution, focusing on three primary factors: encoding of relative abundance, time scale of measurement, and the choice of reference subpopulation. The authors systematically explore how these choices impact fitness statistics and provide recommendations aimed at standardizing practices in the field. This manuscript aims to highlight the impact of differing fitness definitions and the methodologies utilized for analysis and how that can significantly alter interpretations of mutant fitness, affecting evolutionary predictions and the overall understanding of genetic interactions in the experiments.

      Strengths:

      The choices for quantifying fitness in evolution experiments are critical and highly relevant given the increasing prevalence of high-throughput experiments in evolutionary biology. The authors methodically categorize fitness statistics and their implications, providing clarity on a complex subject. This structured approach aids in understanding the nuances of fitness measurement. The manuscript effectively highlights how different choices in fitness measurement can influence fitness rankings and the understanding of epistasis, which is important for modeling evolutionary dynamics.

      Comments on revisions:

      The authors have comprehensively addressed all previous comments and suggestions. In particular, the addition of the new methods section: 'A guide to calculate pairwise relative fitness under the logit encoding from bulk competition data' - significantly improves the clarity of the implementation and helps in the overall interpretation of the framework.

    4. Reviewer #3 (Public review):

      Summary:

      The authors present analyses of different fitness measures derived from empirical data from yeast knock-out mutants and the long-term evolution experiment (LTEE) with Escherichia coli to explore discrepancies and identify preferred methods to estimate relative fitness in high-throughput experiments. Their work has three components. They first discuss the different "encodings" of relative abundance data and conclude that logit-transformations are preferred, because they transform nonlinear abundance trajectories into linear trajectories with greater predictive power. Next, they compare per-generation with per-growth cycle relative fitness estimates inferred from simulations of pairwise competitions based on published growth traits for the yeast strains and on published pairwise competition measurements for the LTEE data. Both data sets show quantitative and qualitative (i.e. rank order) discrepancies of estimates across different time scales, which are highlighted by considering possible underlying causes (i.e. trade-offs between growth traits) and consequences (i.e. epistasis among mutations affecting different growth traits). Finally, the authors compare simulated pairwise and bulk (i.e. where many mutants compete during a growth cycle in a single environment) competition assays based on the yeast knock-out mutants and demonstrate an optimal ratio of collective mutants to wild-type strains that minimizes both sampling error and overestimation of fitness estimates when compared with pairwise competitions.

      Strengths:

      The study deals with a highly relevant topic. Fitness is central to general evolutionary theory, but also poorly defined and implies different traits for different organisms and conditions. For microbes, which are often used in evolution experiments, high-throughput experiments may yield different measures to quantify abundance over time, from individual growth traits to bulk competition experiments. Hence, it is relevant to consider discrepancies among those measures and identify preferred measures with respect to predicting population dynamic and evolutionary processes. The present study contributes to this aim by (i) making readers aware of differences among commonly used fitness estimates, (ii) showing that simulated (yeast) and calculated (E. coli) competitive fitness may differ across time scales, and (iii) showing that bulk competitions may yield relative fitness estimates that are systematically higher than pairwise competitions. The study is rather thorough on the theory side, with extensive derivations and analyses of various fitness measures using their resource competition model in the Supplementary Information. The study ends with a few practical recommendations for preferred methods to infer relative fitness estimates, that may be useful for experimentalists and stimulate further investigations.

      Weaknesses:

      The study has a few limitations. Perhaps the most apparent limitation is the lack of a clear answer to the question which fitness measure is best "in the light of first principles". The authors show clear discrepancies between fitness estimates across different time scales or using different reference genotypes in bulk competition and provide useful recommendations based on practical considerations (e.g. using pairwise competitions as "golden standard"), but it remains unclear whether these measures provide the greatest value for the questions researchers may want to answer with them (e.g. predict shifts in genotype frequencies). -- The authors have convinced me in their response that their recommendations were fundamentally related to the resource competition model, and the changes in introduction and discussion help to appreciate the choice of fitness measure in relation to the research question.

      A second limitation is that the authors analyse fitness differences arising solely from resource competition, whereas microbes often interact via other mechanisms, e.g. the production of anticompetitor toxins, cross-feeding of metabolites or lack of growth to enhance their persistence in stress conditions. Without simulations of these processes, understanding discrepancies among fitness measures is necessarily limited. In addition, the analysis of trade-offs between growth traits causing these discrepancies during resource competition seems confounded by biases in measurement error or parameter estimation, at least for growth rate and lag time (Fig. 2B), where the replicate estimates for the wildtype show a similar negative correlation. -- The motivation to use a resource competition model for fitness inference is generally well motivated now. I accept their argument that resource competitive differences are most important for microbial strains with small genetic differences (e.g. from mutant libraries or from the same evolution experiment). However, it is relevant to note that this ignores situations that are rather common, where the wild-type strain produces an anticompetitor toxin or causes growth inhibition through metabolite products that lower the pH (and derived strains will likely contain resistant mutations).

      Third, the study does not validate relative fitness predictions from growth traits (as is done for the yeast mutants) with measured relative fitness estimates using competition assays, while such data are available, e.g. for the LTEE. This would strengthen their inferences about preferred fitness measures. -- In their response, the authors explain that their aim was different, i.e. the provide "proof of principle" that the choices of fitness measure can produce discrepancies even when they follow the same growth model.

      Fourth, the analysis of epistasis between mutations affecting different growth traits (shown in Fig. 3) based on the LTEE data could be better introduced and analysed more comprehensively. Now, the examples given in panels C-F seem rather idiosyncratic and readers may wonder how general these consequences of using fitness estimates based on different time scales are. -- The authors have made extensive improvements to address how different growth parameters, especially lag and growth rate, differently affect apparent epistasis based on measures at different time scale (per generation vs per cycle). These provide a more comprehensive analysis of down-stream consequences for epistasis detection.

      Finally, the study is generally less accessible to experimentalists due to the extensive and principled treatment of specific population dynamic models and fitness inferences. This may distract from the overarching aim to identify fitness measures that are most accurate and useful for predictions of population dynamic and evolutionary processes. In this light, the motivation for the initial discussion of the importance of how to best encode relative abundance (Fig. 1) is unclear. Also, the conclusion, that logit encoding is preferred, because it linearizes logistic growth dynamics and "improves the quality of predictions", is not further motivated. Experimentalists using non-linear models to infer fitness from growth curves or competition assays may miss the relevance of this discussion. -- Thanks for this explanation (indeed, I confused "logistic dynamics" with "logistic growth model"); the additional explanations and text reductions have improved accessibility for experimentalists.

      Comments on revisions:

      I appreciate the thorough and effective response to all recommendations and have no further comments.

    5. Author response:

      The following is the authors’ response to the original reviews.

      We thank both editors and the three reviewers for their constructive criticism of our work. As a result of these comments, we have made several significant revisions to the paper that we believe strengthen and clarify our major results:

      (1) Following suggestions from Reviewers #1 and #3, we have have improved our introduction to the different fitness concepts (lines 105–148) and streamlined the discussion of the logit encoding (lines 175–190). In particular, we have moved the most technical points to the SI (Sec. S3).

      (2) Based on criticisms of our usage of the population dynamics model from Reviewers #1 and #3, we significantly revised our explanation of the motivation and interpretation of this model (lines 284–310 and 323–336) and our discussion of the generalizability of these results (lines 678–728), including the possible effects of interactions besides resource competition.

      (3) Following a request from Reviewer #3, we have expanded our analysis of epistasis to systematically test all possible double mutants between qualitative types of trait perturbations in the model. We have added a new main text figure (Fig. 3), new SI figures (Figs. S9–S15), a new subsection in the Results (lines 344–395), and corresponding new sections in the Methods (lines 864–892) and SI (Sec. S8).

      (4) Following concerns from Reviewers #2 and #3 about the limited empirical data, we have expanded our analysis of the LTEE data (new main text Fig. 4, revised text on lines 416–439, and revised SI Figs. S16–S18) and have analyzed two new benchmarking datasets for bulk fitness to test our predictions (new main text Fig. 6, new Results subsection on lines 561–590, and new SI Figs. S24 and S25).

      (5) Following the criticism of Reviewer #3 about the lack of a clear recommendation on fitness quantification that provides the greatest value for a given scientific question, we have better explained what we think the scientific consequences of fitness are as a motivation for our analysis (lines 82–88, 319–322, and 615–630) and replaced the final flowchart figure with a step-by-step guide in the Methods to implement our recommendations in practice (lines 964–982).

      Reviewer #1 (Public review):

      The authors point out that the fitness estimates obtained from different experimental assays (monoculture, pairwise competition, or bulk competition) are not generally equivalent, not even with regard to the fitness ranking of different genotypes. Using a computational model based on experimentally measured growth phenotypes for knockout strains in yeast, as well as data from Lenski’s Long Term Evolution Experiment (LTEE), they derive a set of best practice rules aimed at extracting the optimal amount of information from such experiments.

      The study is very complete on a technical level and I have no suggestions for further analyses. However, I feel the readability and the conceptual focus of the manuscript could be significantly improved by rearranging the material with regard to the contents of the main text vs. the Methods and the Supplement. Detailed recommendations:

      (1) Regarding readability, the large number of references to material in the Methods and Supplement fragment the main text and make it difficult to follow.

      We understand the challenges these references pose to the flow of the main text; we have attempted to keep those references to a minimum, while ensuring that technical details of the work are fully documented and referenced for completeness.

      (2) Conceptually, it seems to me that the current presentation obscures the reasons why we should care about fitness in the first place. In the first paragraph of Results, the authors define fitness “as any number that is sufficient to predict the genotype’s relative abundance x(t) over a short-time horizon”. To me, this seems like an extremely narrow and not very interesting definition. Instead, I view fitness as an intrinsic property of a genotype that allows us to predict its performance under a range of conditions, including in particular conditions that are different from the experimental setup that was used to obtain the fitness estimates. The latter viewpoint is well expressed in Supplementary Section S1, where the authors discuss the notion of fitness potential. I would recommend to move at least part of this discussion to the main text.

      We appreciate the reviewer’s viewpoint and have moved that conceptual discussion from the SI to the beginning of the Results section to give readers a broader perspective on fitness (lines 105–148). We use “potential” in analogy with potential energy in physics and have clarified this on lines 126–135.

      What we call fitness potential, like the other notions of fitness we discuss in this paper (relative and absolute fitness), is still specific to an environmental condition. Fitness as a property intrinsic to a genotype and independent of any environment, as the reviewer mentions, is an interesting concept but beyond the scope of this paper, which is focused on analyzing fitness measurements that are inevitably environment-specific and we have clarified this on lines 142–148. While it is true that this definition of fitness is narrow, it is what can be empirically measured directly, and thus we believe it is crucial to understand how to best interpret that data.

      By comparison, the arguments in favor of the logit encoding that currently opens the Results session are rather straightforward and could be shortened significantly.

      We agree and have condensed this section (lines 175–192).

      (3) Similarly, the modeling strategy used in this work is quite subtle and needs to be explained more fully in the main text. The authors use growth traits (lag time, growth rate, and yield) extracted from monoculture experiments on a yeast knockout collection and feed them into a specific mathematical model to simulate pairwise and bulk competition scenarios. Since a key claim of the work is that monoculture experiments are generally poor predictors of competitive fitness, the basis for this conclusion and the assumptions on which it is based need to be described clearly in the main text. In the current version of the manuscript, this information has been largely relegated to the Methods section.

      We agree that our motivation for the population dynamics model and growth curve data was not clearly explained. We have significantly revised this section of the Results in the main text (lines 284–310).

      In particular, we recognize the potential for misunderstanding this material we do not intend the relative fitness values calculated from this model to be interpreted as predictions of the true relative fitness between yeast deletion strains. Rather, we use the population dynamics model for our proof of principle: that the most basic features of microbial population dynamics in laboratory experiments, as captured by this model (resource competition, lag phase, growth phase, saturation), are sufficient to create discrepancies between common fitness statistics used in these experiments (different encodings, time scales, choices of reference subpopulations). We have added a statement to highlight existing work on monoculture predictors for competition outcomes [32, 34, 36, 37] on lines 453–459.

      Reviewer #1 (Recommendations for the authors):

      In the discussion of the LTEE in Section S8, the authors write on page 8 that “we couldn’t fit the fitted values a,b in ref. 29 so we were unable to check it”. I don’t understand this sentence - is the claim that the fit in ref. 29 was incorrect?

      We have clarified this point in the SI (now Sec. S9). Our point was not that the fit in Wiser et al. 2013 is incorrect, but merely that we could not find the exact values of the fitted parameters they obtained documented in their paper, so we could not compare our own fitted parameters directly to theirs.

      Also, at the end of the section, the authors refer to theory work on the long-term fitness trend in the LTEE. Here, two early references arguing for a logarithmic increase in fitness could be mentioned as well:

      International Journal of Modern Physics B 12,:361-391 (1998) Evolution and Extinction Dynamics in Rugged Fitness Landscapes Paolo Sibani, Michael Brandt, and Preben Alstrøm

      J. Stat. Mech. (2008) P04014 Evolution in random fitness landscapes: the infinite sites model Su-Chan Park and Joachim Krug

      We thank the reviewer for providing these two references and have added them to the list of previous works on long-term fitness trends at the end of the section (now Sec. S9).

      Reviewer #2 (Public review):

      Summary:

      The manuscript “Quantifying microbial fitness in high-throughput experiments” provides a comprehensive analysis of the various approaches to quantifying fitness in microbial evolution, focusing on three primary factors: encoding of relative abundance, time scale of measurement, and the choice of reference subpopulation. The authors systematically explore how these choices impact fitness statistics and provide recommendations aimed at standardizing practices in the field. This manuscript aims to highlight the impact of differing fitness definitions and the methodologies utilized for analysis and how that can significantly alter interpretations of mutant fitness, affecting evolutionary predictions and the overall understanding of genetic interactions in the experiments. Although this manuscript focuses on a critical issue in the quantification of fitness in high throughput experiments, it heavily relies on only one experimental dataset (Warringer et al 2003) and one organism i.e, Yeast (Saccharomyces cerevisiae) grown in a defined medium, the environmental influence is not completely captured. While the theoretical framework is strong, more experimental examples with more organisms (i.e., more datasets) in their analysis and comparison would enhance the manuscript, especially its conclusion.

      We have expanded our analysis of competition data from the Long-Term Evolution Experiment in E. coli (lines 416– 439), including adding a main text figure (Fig. 4) along with the three SI figures (Figs. S16–S18). We have also added two completely different data sets that directly test our predicted discrepancies in fitness estimates from bulk competition experiments. From this data we have added a new main text figure (Fig. 6), two new SI figures (Figs. S24 and S25), and a new section at the end of the Results (lines 563–590).

      We wish to clarify, though, that the aim of this study is to develop theory on fitness quantification choices and minimal examples to demonstrate the potential for discrepancies between these choices. While we appreciate the reviewer’s interest in understanding how discrepancies in fitness statistics vary across organisms and environments, that is an empirical question beyond the scope of this paper.

      Strengths:

      The choices for quantifying fitness in evolution experiments are critical and highly relevant given the increasing prevalence of high-throughput experiments in evolutionary biology. The authors methodically categorize fitness statistics and their implications, providing clarity on a complex subject. This structured approach aids in understanding the nuances of fitness measurement. The manuscript effectively highlights how different choices in fitness measurement can influence fitness rankings and the understanding of epistasis, which is important for modeling evolutionary dynamics.

      Weaknesses:

      The theoretical framework is robust, but the manuscript could benefit from more empirical examples to illustrate how different fitness quantification methods lead to varied conclusions in experiments.

      Please see our response to the previous comment on this point.

      The discussion on the choice of reference subpopulation could be expanded with the influence of the environment or the condition. Different types of reference groups might yield different implications for fitness calculations, and further elaboration would enhance this section.

      While we agree that studying how environmental conditions affect fitness is an important and interesting problem, it goes beyond the scope of this paper, which focuses on the basic theory of quantifying microbial fitness from highthroughput experiments. Applications of this theory to empirical questions about environmental variation would be best served by their own studies. We have added a statement clarifying this goal (lines 144–148).

      We are unsure how the choice of reference subpopulation is related to this issue. In our view, if the goal of a mutant fitness measurement is to predict how that mutant would behave when arising spontaneously and competing against its immediate ancestor, the gold-standard reference subpopulation must always be the mutant’s immmediate ancestor, or another mutant that is known to be phenotypically equivalent to the ancestor (e.g., neutral mutants in the case of a large mutant library). Other choices of reference subpopulations would not provide directly meaningful information in this regard.

      The authors overgeneralize some findings; for instance, the implications of fitness measurement choices could vary significantly across different microbes or experimental conditions. A more detailed discussion would strengthen the conclusion.

      We certainly agree that the consequences of fitness quantification choices could vary significantly across organisms and environments; our goal for this paper is to demonstrate what discrepancies are possible in principle and in particular how they depend on basic features of microbial population dynamics (e.g., variation in yield). We have added two separate paragraphs in the Discussion section to address the generalizability of our results in the context of pairwise (lines 678–710) and bulk fitness measurements (lines 711–728).

      Overall, this manuscript is a significant contribution to the field of evolutionary biology, addressing a critical issue in the quantification of fitness but lacks more experimental support to make it a wider claim. By systematically exploring the factors that influence fitness measurements, the authors provide valuable insights that can guide future research - the framework is computationally thorough but needs a more detailed explanation of concepts instead of generalizing.

      We have improved our explanation of several of the important concepts. In particular, we have significantly revised our explanation of the population dynamics model (lines 284–310) to emphasize its role as a null model to demonstrate how fundamental aspects of microbial growth are sufficient to cause discrepancies between fitness statistics. We have also revised two paragraphs on the generalizability of our results in the Discussion section (lines 678–728).

      Further work is needed, particularly to incorporate empirical examples and expand certain discussions to include environmental variation and their impact, which would improve clarity and applicability.

      We have added a sentence at the beginning of the Results section to acknowledge the environmental dependence of fitness (lines 142–148). We believe further discussion of that issue is beyond the scope of this paper, as it would require a significant amount of additional data and/or environmental modeling.

      Reviewer #2 (Recommendations for the authors):

      In addition to the comments from the previous sections, other specific comments:

      (1) Figure 5 needs to be populated with additional parameter details. For example, include brief descriptions of each parameter involved in the encoding, time scale, and reference choices. This will help users understand the implications of each choice. Adding these details will make the flow diagram more comprehensive, aiding researchers in implementing these steps more clearly.

      Following this comment and another comment about this figure from Reviewer #3, we decided to replace this figure with a new Methods section with step-by-step instructions (lines 964–982).

      (2) Duplication in Line 620: “Nevertheless, the fact that we see the fact that we see...” This redundancy needs to be corrected.

      We thank the reviewer for pointing this out; we have rewritten this paragraph.

      (3) More experimental data comparisons and their assessment concerning various microbial systems and multiple environmental conditions are recommended to support the claim.

      Please see our responses to the related public comments.

      Reviewer #3 (Public review):

      Summary:

      The authors present analyses of different fitness measures derived from empirical data from yeast knockout mutants and the long-term evolution experiment (LTEE) with Escherichia coli to explore discrepancies and identify preferred methods to estimate relative fitness in high-throughput experiments. Their work has three components. They first discuss the different “encodings” of relative abundance data and conclude that logit transformations are preferred because they transform nonlinear abundance trajectories into linear trajectories with greater predictive power. Next, they compare per-generation with per-growth cycle relative fitness estimates inferred from simulations of pairwise competitions based on published growth traits for the yeast strains and on published pairwise competition measurements for the LTEE data. Both data sets show quantitative and qualitative (i.e. rank order) discrepancies of estimates across different time scales, which are highlighted by considering possible underlying causes (i.e. trade-offs between growth traits) and consequences (i.e. epistasis among mutations affecting different growth traits). Finally, the authors compare simulated pairwise and bulk (i.e. where many mutants compete during a growth cycle in a single environment) competition assays based on the yeast knock-out mutants and demonstrate an optimal ratio of collective mutants to wild-type strains that minimizes both sampling error and overestimation of fitness estimates when compared with pairwise competitions.

      Strengths:

      The study deals with a highly relevant topic. Fitness is central to general evolutionary theory, but also poorly defined and implies different traits for different organisms and conditions. For microbes, which are often used in evolution experiments, high-throughput experiments may yield different measures to quantify abundance over time, from individual growth traits to bulk competition experiments. Hence, it is relevant to consider discrepancies among those measures and identify preferred measures with respect to predicting population dynamics and evolutionary processes. The present study contributes to this aim by (i) making readers aware of differences among commonly used fitness estimates, (ii) showing that simulated (yeast) and calculated (E. coli) competitive fitness may differ across time scales, and (iii) showing that bulk competitions may yield relative fitness estimates that are systematically higher than pairwise competitions. The study is rather thorough on the theory side, with extensive derivations and analyses of various fitness measures using their resource competition model in the Supplementary Information. The study ends with a few practical recommendations for preferred methods to infer relative fitness estimates, that may be useful for experimentalists and stimulate further investigations.

      Weaknesses:

      The study has several limitations. Perhaps the most apparent limitation is the lack of a clear answer to the question of which fitness measure is best “in the light of first principles”. The authors show clear discrepancies between fitness estimates across different time scales or using different reference genotypes in bulk competition and provide useful recommendations based on practical considerations (e.g. using pairwise competitions as the “golden standard”), but it remains unclear whether these measures provide the greatest value for the questions researchers may want to answer with them (e.g. predict shifts in genotype frequencies).

      We agree on the importance of considering the scientific questions researchers want to answer in determining the best way to quantify fitness. We have revised both the Introduction (lines 82–88) and the Discussion (lines 615–630) to more clearly explain possible downstream questions researchers may wish to answer with fitness data, and thus why discrepancies in that data based on analysis choices may be important.

      We believe that the text does provide a specific recommendation (second subsection of the Discussion, lines 635– 658) for how to quantify relative fitness: using the logit encoding (rather than other encodings), measuring fitness per-cycle (rather than per-generation), and using the wild-type or a phenotypically-equivalent proxy as reference subpopulation to calculate pairwise fitness in a bulk competition (rather than using the mutant library as a whole). This recommendation is based on first principles: the logit encoding is based on the principle of the logistic equation as the null model of relative abundance dynamics (lines 635–637), the choice of the per-cycle timescale is based on the principle that in non-steady state environments the time scale for measuring selection should not depend on the wild-type growth (lines 640–645), and the choice of reference population is based on the principle that a mutant’s fitness should serve as a predictor of its dynamics when arising de novo at low frequency and competing against its wild-type (lines 648–653).

      A second limitation is that the authors analyse fitness differences arising solely from resource competition, whereas microbes often interact via other mechanisms, e.g. the production of anticompetitor toxins, cross-feeding of metabolites, or lack of growth to enhance their persistence in stress conditions. Without simulations of these processes, understanding discrepancies among fitness measures is necessarily limited.

      We agree that other interactions are important in many microbial ecosystems and could affect measurements of fitness. We discuss the possibility of these other interactions and their potential consequences for fitness on lines 697– 710.

      We focus on resource competition in this paper, however, for two reasons. One is that we are using it as a null model: resource competition is always present, and thus it provides an important baseline for discrepancies in fitness statistics in the absence of any other assumptions. Indeed, our results are that this minimal assumption alone is sufficient to produce a wide range of significant discrepancies, which provides the proof of principle that choices of fitness quantification matter. We have clarified this in a revised explanation of the population dynamics model on lines 294–304.

      The second reason is that fitness measurements of the type discussed in this paper are typically performed on mutants that have only small genetic differences with their ancestor (e.g., a point mutation or gene deletion). While more complex interactions between such similar genotypes are not impossible, we expect them to be rare, in which case resource competition is the only interaction. Explicit modeling of other interactions is an important question for future work, but would require more detailed models and data of those phenomena, and thus would go beyond the scope of the present study. We have added a sentence to explain our emphasis on resource competition on lines 298–301 and 690–697.

      In addition, the analysis of trade-offs between growth traits causing these discrepancies during resource competition seems confounded by biases in measurement error or parameter estimation, at least for growth rate and lag time (Figure 2B), where the replicate estimates for the wildtype show a similar negative correlation.

      The tradeoff between growth traits was only an incidental observation and is not necessary for the fitness statistic discrepancies we analyze in this paper; the only important pattern in the growth traits is the existence of mutants with reduced yields (so as to reduce the wild-type log fold-change in a competition) as well as variation in one other trait under selection (lag time or growth rate in this model). We have clarified this mechanism on lines 328–336, which is demonstrated by Fig. S7. Since these tradeoffs are not relevant to the results and we agree that their significance may be unreliable due to the noisiness of the data, we have removed mention of them.

      Third, the study does not validate relative fitness predictions from growth traits (as is done for the yeast mutants) with measured relative fitness estimates using competition assays, while such data are available, e.g. for the LTEE. This would strengthen their inferences about preferred fitness measures.

      The goal of our modeling with the yeast growth trait data is not to test the ability to predict competition experiments from monoculture data; that has been the focus of previous studies [32, 34, 36, 37]. Rather, we use the population dynamics model for a proof of principle: that the most basic features of microbial population dynamics in laboratory experiments, as captured by this model (resource competition, lag phase, growth phase, saturation), are sufficient to create discrepancies between common fitness statistics used in these experiments (different encodings, time scales, choices of reference subpopulations). The yeast growth curve data merely provides realistic parameters for this model, to ensure we are studying a biologically relevant regime of the dynamics. To avoid this misconception, we have revised our explanation of this model and the data on lines 284–310.

      Fourth, the analysis of epistasis between mutations affecting different growth traits (shown in Figure 3) based on the LTEE data could be better introduced and analysed more comprehensively. Now, the examples given in panels C-F seem rather idiosyncratic and readers may wonder how general these consequences of using fitness estimates based on different time scales are.

      We agree that this analysis was incomplete and missed an opportunity to emphasize this important consequence of fitness quantification. We have thus expanded this analysis into a systematic test of all possible double mutants between qualitative types of trait perturbations in the model. We have added a new main text figure (Fig. 3), new SI figures (Figs. S9–S15), a new subsection in the Results (lines 346–395), and corresponding new sections in the Methods (lines 864–892) and SI (Sec. S8).

      Finally, the study is generally less accessible to experimentalists due to the extensive and principled treatment of specific population dynamic models and fitness inferences. This may distract from the overarching aim to identify fitness measures that are most accurate and useful for predictions of population dynamics and evolutionary processes.

      We appreciate this concern as we do hope to make the paper as broadly accessible as possible, especially to experimentalists who measure microbial fitness. To this end, we have reduced the technical discussion of encodings in the first section of the Results (lines 164–187); revised explanations of the population dynamics model (lines 284–310), importance of growth trait variation (lines 328–336), and epistasis (lines 346–395) to better emphasize the conceptual intuition of these parts; and added a step-by-step guide for our recommended best practices of quantifying fitness in bulk competition experiments (lines 964–982).

      In this light, the motivation for the initial discussion of the importance of how to best encode relative abundance (Figure 1) is unclear. Also, the conclusion, that logit encoding is preferred, because it linearizes logistic growth dynamics and “improves the quality of predictions”, is not further motivated. Experimentalists using non-linear models to infer fitness from growth curves or competition assays may miss the relevance of this discussion.

      The motivation for the discussion of encodings is that it is one of the choices made differently by researchers, mainly using either the logit (more common in experimental evolution and population genetics studies) or log encoding (more common in TnSeq analyses). As such we believe it is important to explain where this choice comes from (a transformation of relative abundance data to make it approximately linear in time, and thus amenable to characterization by a single slope parameter) and why we believe the logit encoding is more logical in most cases. We have streamlined and revised this subsection to make it clearer (lines 164–187).

      Our argument for favoring the logit encoding in most cases is based on the logistic model being a null model for relative abundance dynamics (Sec. S3). In light of the reviewer’s comments, we have realized this may be confusing because there are two common usages of logistic dynamics that are biologically distinct. What we mean by logistic model is the dynamics of relative abundance x of a mutant in competition with other genotypes:

      Here s turns out to be the relative fitness under the logit encoding. On the other hand, researchers also use a logistic ODE to describe the dynamics of absolute abundance N of a single strain in monoculture (e.g., as in a growth curve):

      We believe the reviewer’s last point refers to Eq. (2), whereas our argument about the logit encoding is based on Eq. (1). We have added a note to clarify this distinction for the reader (lines 192–196).

      Reviewer #3 (Recommendations for the authors):

      In addition to my general comments in the public review, I have several more specific recommendations:

      (1) Line 183-189: unclear why logit-based relative fitness is preferred. Abundance data are not typically binomial.

      We agree this claim about abundance data was incorrect and have removed it. We have revised the section to focus on motivating the logit encoding from logistic dynamics of relative abundance as a null model for most systems (main text lines 175–187 and Sec. S3).

      (2) Line 205: it may be mentioned that s(logit) is the same as the “selection rate constant” often used in microbial studies.

      We have added a sentence clarifying the equivalence of the logit-encoded relative fitness to the selection coefficient in population genetics (lines 188–190).

      (3) Line 368: why do mutations that increase biomass yield also increase WT LFC? Is this, because they grow slower and hence allow the WT more time to grow?

      Mutants with higher yield allow the wild-type to achieve higher log fold-change because those mutants consume fewer resources per cell, which frees up more resources for the wild-type to consume and increase its overall growth. It’s not about growth rate or time, as this would occur even for mutants whose growth rates are identical to the wild-type’s. We have revised our explanation of how variation in growth traits differentially affects fitness statistics (lines 323–340) and epistasis (lines 361–378).

      (4) Line 382-386: you may want to cite Ram et al. (2019, 10.1073/pnas.1902217116), who also did such analyses for experimental data from E. coli.

      We have cited this work as Ref. [34].

      (5) Line 415: perhaps use “bulk relative fitness” instead of “total relative fitness”, to contrast with “pairwise relative fitness”.

      We acknowledge the language in this section can be subtle. However, “bulk” is not a sufficient identifier for the concept of total relative fitness as bulk competition experiments (with many genotypes competing simultaneously) can be used to measure either total relative fitness or pairwise relative fitness. (In pairwise competition experiments with only two genotypes, these two types of fitness are identical.) As such we adhere to our original language but have added words to clarify which type of experiment (bulk or pairwise) we are talking about in a given context (e.g., on lines 495–504).

      (6) Line 451-453: why does a population in bulk competition consume resources more slowly than in pairwise competitions?

      Mutant libraries used in bulk competition experiments usually include a large number of deleterious mutants, which grow more slowly than the wild-type. Thus these populations typically consume resources more slowly than a population in a pairwise competition would, where a large part of the population is the wild-type.

      (7) Line 565: I don’t understand how one can compare relative fitness to other timescales.

      Relative fitness, as we’ve defined it, has units of rate, since it describes the rate of change of relative abundance (or an encoding of it) over some time scale (e.g., a batch growth cycle or a generation). Therefore it can be compared to other times scales of the system, such the rate of new mutations arising or the rate of genetic drift fluctuations, as long as they are measured in the same units. This comparison is important to population genetics analyses, such as determining whether the population is in the strong selection-weak mutation limit or the clonal interference regime.

      (8) Line 620 repeats text.

      Thank you, we have revised this paragraph and removed the typo.

      (9) Figure 1C+D: the link between the scenarios on the left and the graphs on the right may be better explained. For example, it may help to make explicit that the 4 scenarios in panel C show the same relative fitness per cycle and that mutant and wildtype have the same growth rate, but different growth periods in both scenarios in panel D. It is also unclear whether the grey dot links to the upper scenario in D.

      We have clarified this issue in the caption and changed the colors to avoid this confusion.

      (10) Figure 2E: it is unclear why “mutants with equal fitness are assigned the lowest rank”.

      This was a technical comment about how to handle ties in our analysis of mutant rankings, but it is moot since no exact ties actually occur in our simulations. We have removed this remark to avoid confusion.

      (11) Figure 2F: the axis labels are confusing, as for the WT estimates no LFC mutant exists. It would also help to make explicit in the legend against which WT replicate/reference strain each strain has competed.

      We agree the inclusion of wild-type replicates in this plot was confusing and unnecessary, so we have removed them. The mutants compete against a wild-type with traits defined by their median values across all wild-type replicates; this is noted in Fig. 2A and the Methods section on our analysis of this data (lines 809–813).

      (12) Figure 5: I am not sure this is needed, as its information is rather limited.

      We agree and have removed this figure.

    1. eLife Assessment

      This is a valuable study presenting solid data indicating that the bacterial GTPases EngA and ObgE enable single-step reconstitution of functional 50S ribosomal subunits under near-physiological conditions. The study elegantly bridges the gap between the non-physiological aspects of the previous two-step reconstitution method and the extract-dependent iSAT system to enable assembly of highly functional ribosomes under translation-compatible conditions. The reported findings represent progress towards achieving a bottom-up reconstruction of the translation machinery from synthetic parts.

    2. Reviewer #1 (Public review):

      Summary:

      This study presents evidence that addition of the two GTPases EngA and ObgE to reactions comprised of rRNAs and total ribosomal proteins purified from native bacterial ribosomes can bypass the requirements for non-physiological temperature shifts and Mg+2 ion concentrations for in vitro reconstitution of functional E. coli ribosomes.

      Strengths:

      This advance allows ribosome reconstitution in a fully reconstituted protein synthesis system containing individually purified recombinant translation factors, with the reconstituted ribosomes substituting for native purified ribosomes to support protein synthesis. This represents a significant development in the long-term effort to produce synthetic cells.

      Weaknesses:

      - The authors carried out additional experiments indicating that ~60% of the reconstituted ribosomes are functional and that a significant proportion are capable of synthesizing GFP from the correct initiation codon to the correct stop codon, and also of producing an enzymatically active protein at appreciable levels. Their SDS-PAGE and MS analyses of N-terminally tagged GFP are also quite useful but did not assess the frequency of initiation at the wrong start codon, termination at the incorrect stop codon, or the frequency of frameshifting during elongation. This would require examining additional reporters designed to examine dependence on a Shine-Dalgarno sequence or the impact of an in-frame stop codon to assess the fidelity of initiation and termination events, respectively, and one with a programmed frameshift site to assess the elongation fidelity of their reconstituted ribosomes.

      - Reconstitution studies in the past have succeeded by using all recombinant, individually purified RPs that, if successful here, would have eliminated the possibility that one or more unknown ribosome assembly factors that co-purify with native ribosomes was added to their reconstitution reactions.

    3. Reviewer #2 (Public review):

      This study has developed a single-step method to assemble active bacterial ribosomes under near-physiological conditions by using the GTPase factors EngA and ObgE. These factors eliminate the need for the traditional, harsh manipulations of temperature and magnesium levels. This integration is an important step toward the bottom-up construction of synthetic cells.

      Comments on revisions:

      The authors have addressed my concerns in the previous round of review.

    4. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      This is a useful study presenting solid data indicating that the bacterial GTPases EngA and ObgE enable single-step reconstitution of functional 50S ribosomal subunits under near-physiological conditions. The study elegantly bridges the gap between the non-physiological aspects of the previous two-step reconstitution method and the extract-dependent iSAT system to enable ribosome assembly under translation-compatible conditions; however, it is limited by reliance on rRNA and proteins extracted from native ribosomes and does not achieve a true bottom-up reconstruction from all synthetic components. The evidence is incomplete in not characterizing the spectrum of reporter polypeptides produced and not comparing their rate and yield of synthesis from reconstituted ribosomes to that obtained with pure native ribosomes; and the impact of the study is limited by not including reporters to examine the fidelity of initiation, elongation or termination achieved with the reconstituted ribosomes.

      As described below, based on the comments from the public reviewers, we have summarized at the end of the Discussion how this study contributes toward true bottom-up reconstruction from fully synthetic components, as well as the aspects that will require further development. In addition, we have newly provided data characterizing the reporter polypeptides from multiple perspectives, demonstrating that the assembled ribosomes do not exhibit issues such as reduced fidelity (Fig. 6, 7, Supplementary Data 2, 3). We believe that these data adequately address the limitations that were pointed out in the eLife Assessment.

      Public Reviews:

      Reviewer #1 (Public review):

      This study presents evidence that the addition of the two GTPases EngA and ObgE to reactions comprised of rRNAs and total ribosomal proteins purified from native bacterial ribosomes can bypass the requirements for non-physiological temperature shifts and Mg+2 ion concentrations for in vitro reconstitution of functional E. coli ribosomes.

      Strengths:

      This advance allows ribosome reconstitution in a fully reconstituted protein synthesis system containing individually purified recombinant translation factors, with the reconstituted ribosomes substituting for native purified ribosomes to support protein synthesis. This work potentially represents an important development in the long-term effort to produce synthetic cells.

      Weaknesses:

      While much of the evidence is solid, the analysis is incomplete in certain respects that detract from the scientific quality and significance of the findings:

      (1) The authors do not describe how the native ribosomal proteins (RPs) were purified, and it is unclear whether all subassemblies of RPs have been disrupted in the purification procedure. If not, additional chaperones might be required beyond the two GTPases described here for functional ribosome assembly from individual RPs.

      Native ribosomal proteins (RPs) were prepared from native ribosomes, according to the well-established protocol described by Dr. Knud H. Nierhaus [Nierhaus, K. H. Reconstitution of ribosomes in Ribosomes and protein synthesis: A Practical Approach (Spedding G. eds.) 161-189, IRL Press at Oxford University Press, New York (1990)]. In this method, ribosome proteins are subjected to dialysis in 6 M urea buffer, a strong denaturing condition that may completely disrupt ribosomal structure and dissociate all ribosomal protein subassemblies. To make this point clear, we described the detailed ribosomal protein (RP) preparation procedure in the manuscript, rather than merely referring to the book.

      In addition, we would like to clarify one point related to this comment. The focus of the present study is to show that the presence of two factors is required for single-step ribosome reconstitution under translation-compatible, cell-free conditions. We do not intend to claim that these two factors are absolutely sufficient for ribosome reconstitution. Hence, we have revised the manuscript to more explicitly state what this work does and does not conclude.

      (2) Reconstitution studies in the past have succeeded by using all recombinant, individually purified RPs, which would clearly address the issue in the preceding comment and also eliminate the possibility that an unknown ribosome assembly factor that co-purifies with native ribosomes has been added to the reconstitution reactions along with the RPs.

      As noted in the response to the Comment (1), the focus of the present study is the requirement of the two factors for functional ribosome assembly. Therefore, we consider that it is not necessary to completely exclude the possibility that unknown ribosome assembly factors are present in the RP preparation. Nevertheless, we agree that it is important to clarify what factors, if any, are co-present in the RP fraction. To address this, we performed proteomic analysis of the TP70 preparation (Supplementary Data 3) and stated the possibility of other factors’ inclusion.

      We also agree that additional, as-yet-unidentified components, including factors involved in rRNA modification, could plausibly further improve assembly efficiency. We also consider that such studies may contribute to extending the system to the use of in vitro-transcribed rRNA and fully recombinant ribosomal proteins, which could be essentially a next step of this study. We noted the possibility of as-yet-unidentified components and the future perspectives in the Discussion.

      (3) They never compared the efficiency of the reconstituted ribosomes to native ribosomes added to the "PURE" in vitro protein synthesis system, making it unclear what proportion of the reconstituted ribosomes are functional, and how protein yield per mRNA molecule compares to that given by the PURE system programmed with purified native ribosomes.

      According to this suggestion, we measured the sfGFP synthesis rate from the increase in fluorescence over time under conditions where the template mRNA is in excess, and compared this rate directly between reconstituted and native ribosomes. We consider that this comparison provides insight into what fraction of ribosomes reconstituted in our system are functionally active (Fig. 6).

      As noted in the provisional responses, quantifying protein yield per mRNA molecule is substantially more challenging. The translation system is complex, and the apparent yield per mRNA can vary depending on factors such as differences in polysome formation efficiency. In addition, the PURE system is a coupled transcription–translation setup that starts from DNA templates, which further complicates rigorous normalization on a per-mRNA basis. Because the main focus of this study is to determine how many functionally active ribosomes can be reconstituted under translation-compatible conditions, we addressed this comment by just carrying out the experiment comparing sfGFP synthesis rate.

      (4) They also have not examined the synthesized GFP protein by SDS-PAGE to determine what proportion is full-length.

      We have added an affinity tag to the sfGFP reporter, and then, purified the synthesized products from the reaction mixture and analyzed it by SDS–PAGE (Fig. 7a).

      (5) The previous development of the PURE system included examinations of the synthesis of multiple proteins, one of which was an enzyme whose specific activity could be compared to that of the native enzyme. This would be a significant improvement to the current study. They could also have programmed the translation reactions containing reconstituted ribosomes with (i) total native mRNA and compared the products in SDS-PAGE to those obtained with the control PURE system containing native ribosomes; (ii) with specifc reporter mRNAs designed to examine dependence on a Shine-Dalgarno sequence and the impact of an in-frame stop codon in prematurely terminating translation to assess the fidelity of initiation and termination events; and (iii) an mRNA with a programmed frameshift site to assess elongation fidelity displayed by their reconstituted ribosomes.

      Following the recommendation, we selected DHFR as an enzymatically active protein and used it as a reporter, confirming that it exhibited enzymatic activity comparable to that observed when synthesized by native ribosomes (Fig. 7c). In addition, MS analysis of the purified sfGFP used for SDS-PAGE analysis showed that nearly all peptide fragments were detected, covering almost the entire sequence from the initiator amino acid to the amino acid immediately preceding the stop codon (Fig. 7b, Supplementary Data 2. These results suggest that protein synthesis by the newly assembled ribosomes proceeds smoothly from initiation to termination, with no apparent problem in fidelity, and therefore indicate that functional ribosomes were successfully reconstituted.

      Reviewer #2 (Public review):

      This study presents a significant advance in the field of in vitro ribosome assembly by demonstrating that the bacterial GTPases EngA and ObgE enable single-step reconstitution of functional 50S ribosomal subunits under near-physiological conditions-specifically at 37 {degree sign}C and with total Mg<sup>2+</sup> concentrations below 10 mM.

      This achievement directly addresses a long-standing limitation of the traditional two-step in vitro assembly protocol (Nierhaus & Dohme, PNAS 1974), which requires non-physiological temperatures (44-50 {degree sign}C), and high Mg<sup>2+</sup> concentrations (~20 mM). Inspired by the integrated Synthesis, Assembly, and Translation (iSAT) platform (Jewett et al., Mol Syst Biol 2013), leveraging E. coli S150 crude extract, which supplies essential assembly factors, the authors hypothesize that specific ribosome biogenesis factors-particularly GTPases present in such extracts-may be responsible for enabling assembly under mild conditions. Through systematic screening, they identify EngA and ObgE as the minimal pair sufficient to replace the need for temperature and Mg<sup>2+</sup> shifts when using phenol-extracted (i.e., mature, modified) rRNA and purified TP70 proteins.

      However, several important concerns remain:

      (1) Dependence on Native rRNA Limits Generalizability

      The current system relies on rRNA extracted from native ribosomes via phenol, which retains natural post-transcriptional modifications. As the authors note (lines 302-304), attempts to assemble active 50S subunits using in vitro transcribed rRNA, even in the presence of EngA and ObgE, failed. This contrasts with iSAT, where in vitro transcribed rRNA can yield functional (though reduced-activity, ~20% of native) ribosomes, presumably due to the presence of rRNA modification enzymes and additional chaperones in the S150 extract. Thus, while this study successfully isolates two key GTPase factors that mimic part of iSAT's functionality, it does not fully recapitulate iSAT's capacity for de novo assembly from unmodified RNA. The manuscript should clarify that the in vitro assembly demonstrated here is contingent on using native rRNA and does not yet achieve true bottom-up reconstruction from synthetic parts. Moreover, given iSAT's success with transcribed rRNA, could a similar systematic omission approach (e.g., adding individual factors) help identify the additional components required to support unmodified rRNA folding?

      We fully recognize the reviewer’s point that our current system has not yet achieved a true bottom-up reconstruction. Although we intended to state this clearly in the manuscript, the fact that this concern remains indicates that our description was not sufficiently explicit. We therefore added the paragraph to ensure that this limitation is clearly communicated to readers.

      (2) Imprecise Use of "Physiological Mg<sup>2+</sup> Concentration"

      The abstract states that assembly occurs at "physiological Mg<sup>2+</sup> concentration" (<10 mM). However, while this total Mg<sup>2+</sup> level aligns with optimized in vitro translation buffers (e.g., in PURE or iSAT systems), it exceeds estimates of free cytosolic [Mg<sup>2+</sup>] in E. coli (~1-2 mM). The authors should clarify that they refer to total Mg<sup>2+</sup> concentrations compatible with cell-free protein synthesis, not necessarily intracellular free ion levels, to avoid misleading readers about true physiological relevance.

      We agree that this is a very reasonable point and revised the manuscript to clarify that we are referring to the total Mg<sup>2+</sup> concentration compatible with cell-free protein synthesis, rather than the intracellular free Mg<sup>2+</sup> level under physiological conditions. We also changed the term “physiological” to “near-physiological” to avoid the misunderstanding.

      In summary, this work elegantly bridges the gap between the two-step method and the extract-dependent iSAT system by identifying two defined GTPases that capture a core functionality of cellular extracts: enabling ribosome assembly under translation-compatible conditions. However, the reliance on native rRNA underscores that additional factors - likely present in iSAT's S150 extract - are still needed for full de novo reconstitution from unmodified transcripts. Future work combining the precision of this defined system with the completeness of iSAT may ultimately realize truly autonomous synthetic ribosome biogenesis.

      Recommendations for the authors:

      Reviewing Editor Comments:

      Recommendations for improvement:

      (1) Assess the length distribution of GFP polypeptides being produced using SDS-PAGE.

      SDS-PAGE was performed according to the comment 4 of the Reviewer #1 (Fig. 7b). Please refer to our response addressing the comment.

      (2) Compare the rate and yield of GFP synthesized per mRNA using their reconstituted ribosomes to that obtained with pure native ribosomes.

      The efficiency of the reconstituted ribosomes was compared to native ribosomes according to the comment 3 of the Reviewer #1 (Fig. 6). Please refer to our response addressing the comment.

      (3) Expand the panel of reporter mRNAs being examined to compare the fidelity of initiation, elongation or termination achieved with reconstituted ribosomes to that obtained using native ribosomes.

      DHFR synthesis was addressed and also MS analysis of synthesized sfGFP was performed according to the comment 5 of the Reviewer #1 (Fig. 7b, c). Please refer to our response addressing the comment.

      (4) Revise the manuscript to clarify that the in vitro assembly demonstrated here is contingent on using native rRNA and thus does not achieve a true bottom-up reconstruction from synthetic parts.

      We added to the Discussion a paragraph summarizing the findings of this study, limitations, and future perspectives according to the comment 1 and 2 of the Reviewer #1 and the comment 1 of the Reviewer #2. Please refer to our responses addressing these comments.

      (5) Revise the manuscript to clarify that they are referring to total Mg2+ concentrations compatible with cell-free protein synthesis, not necessarily intracellular free ion levels, to avoid misleading readers about the physiological relevance of the reconstitution.

      We revised the manuscript to clarify this point according to the comment 2 of the Reviewer #2. Please refer to our response addressing the comment.

      (6) Revise the text to fully describe how the native ribosomal proteins (RPs) were purified and indicate whether all subassemblies of RPs were disrupted in the purification procedure.

      We revised the Methods section to clarify how the native RPs were purified and that all subassemblies of RPs were disrupted according to the comment 1 of the Reviewer #1.

      (7) Revise the text to indicate that achieving ribosome reconstitutions using all recombinant, individually purified RPs is required to achieve a true bottom-up reconstruction from all synthetic components.

      As with our response to the comment 4, we have added the point at the end of the Discussion as a future perspective toward true bottom-up reconstruction from all synthetic components.

      (8) Consider conducting a similar systematic omission approach (e.g., adding individual factors) to help identify the additional components required to support unmodified rRNA folding.

      As with our response to the comment 4 and 7, we have added the point at the end of the Discussion as a future perspective toward identification of additional essential factors for true bottom-up reconstruction.

      Reviewer #1 (Recommendations for the authors):

      (1) Assessing the spectrum of GFP polypeptides being produced by SDS-PAGE and comparing the rate and yield of GFP produced to that obtained with pure native ribosomes would seem to be essential additional measurements needed to bolster the evidence supporting the main conclusions of the work.

      SDS-PAGE and MS analysis of the synthesized sfGFP were performed (Fig. 7a, b). Comparison of the assembled ribosomes and native ones were also performed (Fig. 6).

      (2) Examining translation of other reporter mRNAs designed to compare the fidelity of initiation, elongation or termination achieved with reconstituted ribosomes to that produced by native ribosomes in the PURE system would be required to elevate the scientific quality of the work and its significance to the field.

      DHFR synthesis and its activity measurement were performed (Fig. 7c). Also, MS analysis of the purified sfGFP showed that nearly all peptide fragments were detected, covering almost the entire sequence from the initiator amino acid to the amino acid immediately preceding the stop codon (Fig. 7b). We consider that these findings indicate that there is no apparent problem with fidelity.

    1. eLife Assessment

      This is an important study that develops multiple human iPSC-based models to study the consequences of DNMT3A mutations in Tatton-Brown-Rahman Syndrome. Convincing evidence shows dysregulation of GABAergic interneuron development and function, and the authors identify some of the key signaling mechanisms underlying these changes. This study will be of interest for understanding the functions of DNMT3A in brain development and the causes of neurological dysfunction in Tatton-Brown-Rahman Syndrome.

    2. Reviewer #1 (Public review):

      Summary:

      This is an important study that describes the consequences of the DNMT3A mutation in human neuronal development for the first time. The selective impact of DNMT3A function on GABAergic interneurons is interesting and an important feature of future therapeutics. The claims made in that manuscript are supported by strong evidence for the most part. And the data are of high quality in general and presented well.

      Strengths:

      The strengths of the work include: Characterization of multiple DNMT3A loss-of-function alleles, including two misense variants, R882H, P904L, and a deletion allele. The missense mutation lines both include an ideal control with the same genetic background. The CRISPRi-mediated DNMT3A knockdown has also been included. The study identifies the mTOR-PI3K pathway as a factor of overgrowth issues found in the mutant organoid. In bulk mRNA sequencing and whole-genome bisulfite sequencing, identify hypomethylated genomic regions associated with gene expression repression. Again, this is more pronounced in the ventral organoid compared to the dorsal organoid. In addition, the extensive electrophysiological characterizations with a high-density microelectrode array support the more mature status of mutant interneurons.

      Weaknesses:

      Although a strong study overall, some weaknesses are noted. These include:

      (1) The lack of validation data for the generated iPSCs and hESCs, such as the chromosomal contents, ploidy, and pluripotency states.

      (2) Other weaknesses relate to data interpretation and insufficient discussion of related matters, as detailed in the recommendations to the authors.

      (3) Also, some errors are noted and detailed in the recommendation section.

    3. Reviewer #2 (Public review):

      Summary:

      Chapman, Determan et al. investigate how pathogenic mutations in DNMT3A, which cause Tatton-Brown-Rahman Syndrome (TBRS), disrupt human cortical developmental processes using a comprehensive panel of human pluripotent stem cell models spanning DNMT3A loss-of-function severity. The authors aim to identify the cellular and molecular mechanisms underlying TBRS-associated brain overgrowth and intellectual disability, and to test whether mechanistic convergence exists between TBRS and other overgrowth-intellectual disability disorders (OGIDs) caused by mutations in EZH2 (Weaver syndrome) or PIK3CA pathway components. Their central conclusion is that GABAergic interneuron development is selectively vulnerable to DNMT3A mutation, where reduced DNA methylation causes premature de-repression of neuronal and synaptic genes, driving precocious neuronal maturation and hyperactivity sufficient to disrupt neuronal network synchrony. This report adds to a growing literature supporting the vulnerability of GABAergic interneurons in NDDs and further provides a mechanistic view of this vulnerability, potentially convergent across OGIDs. The mechanistic claims around H3K27me3 compensation and mTOR-based therapeutic convergence, while promising, rest on more preliminary evidence and would benefit from the distinction between correlation and mechanism being made more explicit in the text. Overall, this is a compelling study with a rigorous experimental design and novel findings with a potential impact on a better understanding of the OGID pathophysiology.

      Strengths:

      (1) A major strength of this work is the breadth and rigor of the disease modeling approach. Four independent TBRS model systems are used in tandem: a patient-derived iPSC line with isogenic CRISPR-corrected control (R882H), a knock-in hESC model (P904L) with its wild-type isogenic, patient deletion iPSC lines (Del1/2), and CRISPRi knockdown models (G1/G2), collectively spanning a range of DNMT3A loss-of-function that correlates with phenotypic severity. This allelic series design substantially strengthens causal inference beyond what any single isogenic pair could provide.

      (2) The multi-omic integration across matched developmental stages provides a strong mechanistic foundation for the cellular phenotyping and provides significantly enhanced novelty. RNA-seq, whole-genome bisulfite sequencing, and H3K27me3 CUT&Tag are combined in the same cell types, and timepoints show that DNMT3A loss reduces CG methylation at neuronal and synaptic gene loci, leading to premature transcriptional activation.

      (3) The selective vulnerability of ventral (GABAergic) versus dorsal (glutamatergic) progenitors is one of the study's most important findings. This lineage specificity is consistently observed across all model systems and in both 2D and organoid formats, where ventral NPCs show increased proliferation, premature neuronal gene expression, and increased neurogenesis, while dorsal NPCs are largely unaffected at the transcriptomic and cellular level despite exhibiting comparable DNA methylation changes. This adds to a body of emerging work showing GABAergic interneuron vulnerability in NDDs where ubiquitously expressed genes such as chromatin modifiers are perturbed, and provides additional molecular insights into potential mechanisms of "resilience" of dorsal populations.

      (4) The functional characterization follows a logical progression from single-neuron electrophysiology (demonstrating GABAergic hyperactivity with increased action potential amplitude and firing rate) to network-level analysis using high-density multi-electrode arrays. The HD-MEA experimental design - pairing TBRS or control GABAergic neurons with a constant background of control iGlut neurons - cleanly isolates GABAergic dysfunction as the driver of network hypersynchrony.

      Weaknesses:

      (1) The concomitant induction of proliferation and differentiation in TBRS V-NPCs is conceptually striking, since these are generally considered antagonistic developmental programs. The authors partially address this tension by noting that DNMT3A LOF alone is insufficient to initiate neuronal differentiation, i.e., V-NPCs upregulate neuronal and synaptic genes while retaining progenitor identity, implying that transcriptomic priming and commitment to differentiation are decoupled. However, the relationship between the proliferative phenotype and the epigenetic priming phenotype remains mechanistically unresolved. The manuscript documents mTOR pathway upregulation at the protein level and identifies shared DEGs that include proliferative regulators, but it does not establish whether mTOR-driven proliferation and mCG-loss-driven neuronal gene de-repression/enhanced differentiation are causally linked or represent two independent consequences of DNMT3A LOF.

      (2) Relatedly, the rapamycin rescue experiment is a valuable proof-of-concept for the PIK3/AKT/mTOR convergence but is limited to a single dose in a single model (882) with a single readout (Ki67+ proliferation). Given the prominence of mTOR pathway convergence in the manuscript as a potential shared therapeutic avenue across OGIDs, the data supporting this claim are somewhat preliminary. It remains unknown whether mTOR inhibition rescues downstream phenotypes (neurogenesis, gene expression, neuronal maturation) or whether less severe TBRS models respond similarly. This might also help tackle the first comment above. e.g., if mTOR inhibition rescued proliferation but not the transcriptomic priming, that would support two independent mechanisms.

      (3) The claim that H3K27me3 compensates for mCG loss is an important mechanistic point, but the current data do not distinguish between active compensation, in which EZH2 is recruited in response to methylation loss, and functional redundancy, in which H3K27me3 is independently established and becomes the dominant repressive mark once DNA methylation is reduced. The EZH2 knockdown/inhibition experiments show that H3K27me3 is sufficient to maintain repression at hypo-DMR sites, but they do not establish that H3K27me3 gain is itself a response to methylation loss. Because H3K27me3 profiling was performed only in the severe 882 model, it is also unclear whether H3K27me3 gain scales with DNMT3A LOF severity, as a compensatory model would predict. Finally, the EZH2 overexpression rescue is performed in V-NPCs, whereas the compensation model is developed primarily in D-NPCs, making it difficult to assess whether the same mechanism operates in the lineage where it was originally inferred.

      (4) The narrative framing of dorsal neuron development as unaffected by DNMT3A LOF is somewhat at odds with the data presented. The 882 D-NPCs show substantial DNA methylation changes, and TBRS D-INs exhibit what the authors describe as "substantive transcriptomic differences" involving persistent expression of pluripotency and progenitor genes, which seems to be a distinct but potentially significant phenotype. The impact of DNMT3A loss between ventral and dorsal lineages might be more accurately framed as divergent in nature rather than specific to a certain population.

      (5) SST stainings are not entirely convincing. They appear mostly nuclear, and some instances localized to rosettes in organoids, whereas the protein is largely confined to processes and is expected to be found outside progenitor-rich zones like rosettes.

    4. Reviewer #3 (Public review):

      Summary:

      In this manuscript, the authors investigated TBRS etiology by using new human pluripotent stem cell models, modeling varying levels of TBRS-associated loss of DNMT3A function. They identified increased lineage-specific proliferation of precursors in TBRS ventral MGE-like progenitors, which they propose was related to increased signaling through the PIK3/AKT/mTOR pathway. Furthermore, they show that reduced DNA methylation during MGE-like progenitor differentiation into GABAergic interneurons can cause a premature expression of neuronal and synaptic genes, triggering precocious neuronal maturation. In conclusion, they propose that TBRS-derived GABAergic neurons exhibit hyperactivity that can alters the development and structure of neuronal networks.

      Strengths:

      Overall, the data presented is convincing, from an early developmental point of view, given that the iPSC-derived 2D cultures or organoids used do not get to reach a mature state. Nonetheless, the data clearly show the effects that deleterious mutations in TBRS can cause during the period of neurogenesis, which was missing in the field.

      Weaknesses:

      (1) Li et al., 2022 (referred to in the manuscript) seems to already show the interplay between H3K27me3 and Dnmt3a discussed in this study i.e., that in the absence of DNA methylation, there is an expansion of polycomb-like repression. These data should be better acknowledged in the paragraph 'Repressive H3K27me3 compensates for severe loss of DNA methylation' (page 9), given it supports the data presented in this manuscript and suggests this as a common mechanism in the interplay between these two repressive marks, as it is well established in the literature.

      (2) The authors should acknowledge that the omics data come from a mixed population of cells.

      (3) The authors are encouraged to further discuss whether the overgrowth observed in ventral GABAergic cultures or organoids compares to the overgrowth observed in diseased patients. One expects MRIs to have been performed in patients and that these could be harnessed to discern if overgrowth occurs in the cortex or ventral regions of the brain.

    5. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This is an important study that describes the consequences of the DNMT3A mutation in human neuronal development for the first time. The selective impact of DNMT3A function on GABAergic interneurons is interesting and an important feature of future therapeutics. The claims made in that manuscript are supported by strong evidence for the most part. And the data are of high quality in general and presented well.

      Strengths:

      The strengths of the work include: Characterization of multiple DNMT3A loss-of-function alleles, including two misense variants, R882H, P904L, and a deletion allele. The missense mutation lines both include an ideal control with the same genetic background. The CRISPRi-mediated DNMT3A knockdown has also been included. The study identifies the mTOR-PI3K pathway as a factor of overgrowth issues found in the mutant organoid. In bulk mRNA sequencing and whole-genome bisulfite sequencing, identify hypomethylated genomic regions associated with gene expression repression. Again, this is more pronounced in the ventral organoid compared to the dorsal organoid. In addition, the extensive electrophysiological characterizations with a high-density microelectrode array support the more mature status of mutant interneurons.

      Weaknesses:

      Although a strong study overall, some weaknesses are noted. These include:

      (1) The lack of validation data for the generated iPSCs and hESCs, such as the chromosomal contents, ploidy, and pluripotency states.

      We thank the reviewer for their constructive feedback. We previously validated our 882 models with whole genome sequencing and teratoma formation upon mouse fat pad injection, while the parental human embryonic stem cell line (WA01 hESCs) used for P904L variant knock-in was validated by our Genome Engineering Stem Cell (GESC) core upon derivation of that variant knock-in model. We have now added both karyotyping and pluripotency staining (SOX2/OCT4) for all other hPSC lines as (new) Supplementary Figure S17 and included further description in our Methods section under “hPSC Model Generation and Culture”.

      New Data: Supplemental Figure S17 (SOX2/OCT4 staining in hPSCs and karyotyping of all lines used)

      Text edits: Additional language confirming hPSC line validation will be added to the Methods section under “hPSC Model Generation and Culture” on page 18.

      (2) Other weaknesses relate to data interpretation and insufficient discussion of related matters, as detailed in the recommendations to the authors.

      We thank the reviewer for their insightful suggestions and have detailed our responses in the “recommendations to the authors” section.

      (3) Also, some errors are noted and detailed in the recommendation section.

      We thank the reviewer for catching these errors and have since corrected them, with detailed responses below.

      Reviewer #2 (Public review):

      Summary:

      Chapman, Determan et al. investigate how pathogenic mutations in DNMT3A, which cause Tatton-Brown-Rahman Syndrome (TBRS), disrupt human cortical developmental processes using a comprehensive panel of human pluripotent stem cell models spanning DNMT3A loss-of-function severity. The authors aim to identify the cellular and molecular mechanisms underlying TBRS-associated brain overgrowth and intellectual disability, and to test whether mechanistic convergence exists between TBRS and other overgrowth-intellectual disability disorders (OGIDs) caused by mutations in EZH2 (Weaver syndrome) or PIK3CA pathway components. Their central conclusion is that GABAergic interneuron development is selectively vulnerable to DNMT3A mutation, where reduced DNA methylation causes premature de-repression of neuronal and synaptic genes, driving precocious neuronal maturation and hyperactivity sufficient to disrupt neuronal network synchrony. This report adds to a growing literature supporting the vulnerability of GABAergic interneurons in NDDs and further provides a mechanistic view of this vulnerability, potentially convergent across OGIDs. The mechanistic claims around H3K27me3 compensation and mTOR-based therapeutic convergence, while promising, rest on more preliminary evidence and would benefit from the distinction between correlation and mechanism being made more explicit in the text. Overall, this is a compelling study with a rigorous experimental design and novel findings with a potential impact on a better understanding of the OGID pathophysiology.

      Strengths:

      (1) A major strength of this work is the breadth and rigor of the disease modeling approach. Four independent TBRS model systems are used in tandem: a patient-derived iPSC line with isogenic CRISPR-corrected control (R882H), a knock-in hESC model (P904L) with its wild-type isogenic, patient deletion iPSC lines (Del1/2), and CRISPRi knockdown models (G1/G2), collectively spanning a range of DNMT3A loss-of-function that correlates with phenotypic severity. This allelic series design substantially strengthens causal inference beyond what any single isogenic pair could provide.

      (2) The multi-omic integration across matched developmental stages provides a strong mechanistic foundation for the cellular phenotyping and provides significantly enhanced novelty. RNA-seq, whole-genome bisulfite sequencing, and H3K27me3 CUT&Tag are combined in the same cell types, and timepoints show that DNMT3A loss reduces CG methylation at neuronal and synaptic gene loci, leading to premature transcriptional activation.

      (3) The selective vulnerability of ventral (GABAergic) versus dorsal (glutamatergic) progenitors is one of the study's most important findings. This lineage specificity is consistently observed across all model systems and in both 2D and organoid formats, where ventral NPCs show increased proliferation, premature neuronal gene expression, and increased neurogenesis, while dorsal NPCs are largely unaffected at the transcriptomic and cellular level despite exhibiting comparable DNA methylation changes. This adds to a body of emerging work showing GABAergic interneuron vulnerability in NDDs where ubiquitously expressed genes such as chromatin modifiers are perturbed, and provides additional molecular insights into potential mechanisms of "resilience" of dorsal populations.

      (4) The functional characterization follows a logical progression from single-neuron electrophysiology (demonstrating GABAergic hyperactivity with increased action potential amplitude and firing rate) to network-level analysis using high-density multi-electrode arrays. The HD-MEA experimental design - pairing TBRS or control GABAergic neurons with a constant background of control iGlut neurons - cleanly isolates GABAergic dysfunction as the driver of network hypersynchrony.

      Weaknesses:

      (1) The concomitant induction of proliferation and differentiation in TBRS V-NPCs is conceptually striking, since these are generally considered antagonistic developmental programs. The authors partially address this tension by noting that DNMT3A LOF alone is insufficient to initiate neuronal differentiation, i.e., V-NPCs upregulate neuronal and synaptic genes while retaining progenitor identity, implying that transcriptomic priming and commitment to differentiation are decoupled. However, the relationship between the proliferative phenotype and the epigenetic priming phenotype remains mechanistically unresolved. The manuscript documents mTOR pathway upregulation at the protein level and identifies shared DEGs that include proliferative regulators, but it does not establish whether mTOR-driven proliferation and mCG-loss-driven neuronal gene de-repression/enhanced differentiation are causally linked or represent two independent consequences of DNMT3A LOF.

      We thank the reviewer for their comment and agree that this phenotype, whereby progenitors exhibited both increased proliferation and hallmarks of gene expression associated with neuronal differentiation is striking and interesting, given that these are typically antagonistic paradigms during normal development.

      We documented that these phenotypes involve upregulated expression of both neuronal/synaptic and proliferative genes in V-NPCs (Figure 2d), with concomitant loss of repressive DNA methylation at regulatory elements associated with these genes (Figure 2f, Supplemental Data 5). In this work, DNMT3A mutation had a more prominent role in de-repressing neuronal and synaptic gene expression to promote hallmarks of neuron differentiation, while playing a relatively less central role in direct regulation of proliferation genes, as seen from the relative prominence of neuronal/synaptic- versus proliferation-related GO terms in our Supplemental Data 5 table.

      To examine the mechanisms underlying increased V-NPC proliferation in our TBRS models, we assessed a potential relationship with the PIK3/AKT/mTOR pathway, as this is implicated in increased proliferation resulting from DNMT3A-associated mutation in myeloid leukemia (Dai et al., 2017, PMID: 28461508). In our work, DNMT3A mutation increased the expression and/or phosphorylation of mTOR signaling pathway targets specifically in V-NPCs (Figure 1q-r, Supplemental Figure S3a-d). However, while TBRS mutation directly affected repressive DNA methylation at a suite of cell proliferation-related genes, these did not include the PIK3/AKT/mTOR pathway genes themselves, suggesting an indirect relationship between altered DNA methylation and increased mTOR signaling.

      Text Edits: We will incorporate further discussion of how DNMT3A-mediated gene repression and levels of PIK3/AKT/mTOR pathway signaling may be interacting, providing a framework for future studies to identify how these related OGID gene mutations may converge mechanistically.

      (2) Relatedly, the rapamycin rescue experiment is a valuable proof-of-concept for the PIK3/AKT/mTOR convergence but is limited to a single dose in a single model (882) with a single readout (Ki67+ proliferation). Given the prominence of mTOR pathway convergence in the manuscript as a potential shared therapeutic avenue across OGIDs, the data supporting this claim are somewhat preliminary. It remains unknown whether mTOR inhibition rescues downstream phenotypes (neurogenesis, gene expression, neuronal maturation) or whether less severe TBRS models respond similarly. This might also help tackle the first comment above. e.g., if mTOR inhibition rescued proliferation but not the transcriptomic priming, that would support two independent mechanisms.

      We thank the reviewer for their comment. We explored both the overall levels and phosphorylation of proteins involved in PIK3/AKT/mTOR signaling in the 882, 904, Del1, Del2, and KO V-NPC models (Figure 1q-r, Supplementary Figure S3a-d), finding specific increases of all proteins. We showed that rapamycin addition reversed the increased proportion of KI67+ proliferating cell nuclei resulting from 882 mutation in V-NPCs in main Figure 1s, while demonstrating that rapamycin also reduced the proportion of KI67+ nuclei observed in both less severe 904 and Del1 V-NPC models (Supplementary Figure S3e-f).

      We agree that understanding whether rapamycin treatment can rescue TBRS neuronal phenotypes would be very interesting, as previous work on Tuberous Sclerosis Complex has utilized rapamycin and other mTOR inhibitors to effectively reverse TSC-related alterations of neuronal morphology and neuronal hyperexcitability (Buttermore et al., 2025, PMID: 40792287). Future studies examining convergent mechanisms and therapeutics for OGIDs should examine how similarly targeting this and related pathways rescues altered neuronal morphology, maturation, and function, as we have demonstrated that TBRS mutation has subsequent consequences for V-IN differentiation, maturation, and function. This point has been detailed in the discussion section on pages 15-16.

      (3) The claim that H3K27me3 compensates for mCG loss is an important mechanistic point, but the current data do not distinguish between active compensation, in which EZH2 is recruited in response to methylation loss, and functional redundancy, in which H3K27me3 is independently established and becomes the dominant repressive mark once DNA methylation is reduced. The EZH2 knockdown/inhibition experiments show that H3K27me3 is sufficient to maintain repression at hypo-DMR sites, but they do not establish that H3K27me3 gain is itself a response to methylation loss. Because H3K27me3 profiling was performed only in the severe 882 model, it is also unclear whether H3K27me3 gain scales with DNMT3A LOF severity, as a compensatory model would predict. Finally, the EZH2 overexpression rescue is performed in V-NPCs, whereas the compensation model is developed primarily in D-NPCs, making it difficult to assess whether the same mechanism operates in the lineage where it was originally inferred.

      We thank the reviewer for the opportunity to clarify our findings and experimental reasoning. A previous study using a conditional Dnmt3a knockout mouse model (Li et al., 2022, PMID: 35604009) demonstrated increased expression of multiple PRC2 components following the loss of Dnmt3a. This study demonstrated that sites which lost DNA methylation gained H3K27me3 in postnatal neurons upon Dnmt3a loss. Therefore, we hypothesize that the gain of H3K27me3 likely occurs in response to loss of DNMT3A methylation.

      While we did not perform CUT&Tag for H3K27me3 in our less severe models, we did validate gene expression changes following EZH2 knockdown and inhibition in both the R882H (Figure 4g-h) and P904L (Supplementary Figure S8b) models, finding that gene expression was unchanged in the model with the less severe DNMT3A mutation (P904L). Based upon these findings, we hypothesized that compensatory H3K27me3 may occur only upon severe DNMT3A loss, as seen in the dominant-negative R882H model. Furthermore, as H3K27me3 compensation was more prominent in D-NPCs, we hypothesized that this might be sufficient to prevent de-repression and aberrant neuronal gene repression upon loss of DNMT3A-mediated repression in D-NPCs. However, since TBRS mutation caused the most prominent de-repression of neuronal gene expression in V-NPCs, we also tested whether EZH2 overexpression could reverse this, finding that it partially suppressed this dysregulated neuronal gene expression. To better clarify this logic and the findings, we will make text edits to this results section.

      Text edits: We will clarify the reasoning for performing the EZH2 overexpression experiments in V-NPCs and reference Li et al., 2022 in both the results (pg. 9-10) and discussion.

      (4) The narrative framing of dorsal neuron development as unaffected by DNMT3A LOF is somewhat at odds with the data presented. The 882 D-NPCs show substantial DNA methylation changes, and TBRS D-INs exhibit what the authors describe as "substantive transcriptomic differences" involving persistent expression of pluripotency and progenitor genes, which seems to be a distinct but potentially significant phenotype. The impact of DNMT3A loss between ventral and dorsal lineages might be more accurately framed as divergent in nature rather than specific to a certain population.

      We thank the reviewer for their comment. While TBRS mutations appear to have a significantly stronger effect on V-NPCs and subsequently V-INs, both transcriptomic and methylation alterations do also occur upon TBRS mutation in D-NPCs and D-INs, as noted in Supplemental Figure S4d, S11, and Supplemental Data 2. However, we observed substantially greater molecular alterations in V-NPCs/V-INs, a lack of overt cellular phenotypes in D-NPCs where assayed, and a lack of functional consequences in matured D-INs, suggesting a more significant requirement for DNMT3A in regulating the differentiation and subsequent maturation of cortical inhibitory interneurons during embryonic and early pre-natal development, the developmental periods that we can readily model in hPSC-derived neurons.

      It should also be noted that these hPSC differentiation models do not recapitulate post-natal deposition of non-CpG (mCA) DNA methylation, a mechanism disrupted postnatally by TBRS-associated mutations in our prior work in murine models (Harrison Gabel; e.g. Beard et al., 2023, PMID: 37952155). Therefore, we hypothesize that if we could sufficiently mature D-INs to a state that modeled postnatal development and recapitulated this non-CpG methylation, we might be able to detect cellular and functional phenotypes in later stage D-INs. To avoid misinterpretation, we will alter the language in the results section to confirm that there are both transcriptomic and methylation changes in our D-NPCs/D-INs, but that these are not accompanied by cellular phenotypes or neuronal dysfunction.

      Text edits: We will better clarify that there are transcriptomic and methylation changes in D-NPCs/D-INs, but that these changes are minimal compared to those in V-NPCs/V-INs, as supported by the lack of cellular and functional phenotypes seen in D-NPCs/D-INs.

      (5) SST stainings are not entirely convincing. They appear mostly nuclear, and some instances localized to rosettes in organoids, whereas the protein is largely confined to processes and is expected to be found outside progenitor-rich zones like rosettes.

      We agree that the perinuclear SST staining detected in these young ventral telencephalic-patterned organoids at day 30 differs somewhat from the more process-localized and cytosolic signal seen in later stage organoids in other studies. This may be related to the use of different commercial SST antibodies across studies but also likely reflects SST immunoreactivity in newborn neurons near the onset of SST expression. For example, immature SST-immunoreactive neurons in the early postnatal rat cortex exhibit predominant SST staining in perinuclear cytoplasm and short processes (e.g. Fig. 3 in Lee et al, PMID: 9664223) while acquiring more cytosolic and process-localized staining as postnatal neuron maturation occurs. Evaluation of immunopositivity for other markers of neurogenesis (ASCL1) and immature neurons (TUJ1) is also congruent with these findings for SST, with TBRS-associated mutations increasing in the fraction of cells in V-NPCs/V-ORGs that express these three markers.

      Reviewer #3 (Public review):

      Summary:

      In this manuscript, the authors investigated TBRS etiology by using new human pluripotent stem cell models, modeling varying levels of TBRS-associated loss of DNMT3A function. They identified increased lineage-specific proliferation of precursors in TBRS ventral MGE-like progenitors, which they propose was related to increased signaling through the PIK3/AKT/mTOR pathway. Furthermore, they show that reduced DNA methylation during MGE-like progenitor differentiation into GABAergic interneurons can cause a premature expression of neuronal and synaptic genes, triggering precocious neuronal maturation. In conclusion, they propose that TBRS-derived GABAergic neurons exhibit hyperactivity that can alters the development and structure of neuronal networks.

      Strengths:

      Overall, the data presented is convincing, from an early developmental point of view, given that the iPSC-derived 2D cultures or organoids used do not get to reach a mature state. Nonetheless, the data clearly show the effects that deleterious mutations in TBRS can cause during the period of neurogenesis, which was missing in the field.

      Weaknesses:

      (1) Li et al., 2022 (referred to in the manuscript) seems to already show the interplay between H3K27me3 and Dnmt3a discussed in this study i.e., that in the absence of DNA methylation, there is an expansion of polycomb-like repression. These data should be better acknowledged in the paragraph 'Repressive H3K27me3 compensates for severe loss of DNA methylation' (page 9), given it supports the data presented in this manuscript and suggests this as a common mechanism in the interplay between these two repressive marks, as it is well established in the literature.

      We thank the reviewer for this suggestion and will incorporate this reference into both the results and the discussion when discussing the respective roles of DNMT3A and PCR2-mediated repression.

      Text edits: We will add Li et al., 2022 to both the results section (pg. 9-10) and our discussion section.

      (2) The authors should acknowledge that the omics data come from a mixed population of cells.

      We thank the reviewer for their comment. We have validated that the established 2-D differentiation methods we used in this study generate cell populations with >85-90% enrichment for the desired progenitor and neuronal cell type, based upon marker expression, but acknowledge that these are bulk -omics data obtained from cells that may represent a mixed population and have now detailed this in the methods section under “Sequencing”.

      Text edits: we will add language acknowledging that our omics data (bulk) was generated from mixed populations of cells.

      (3) The authors are encouraged to further discuss whether the overgrowth observed in ventral GABAergic cultures or organoids compares to the overgrowth observed in diseased patients. One expects MRIs to have been performed in patients and that these could be harnessed to discern if overgrowth occurs in the cortex or ventral regions of the brain.

      We thank the reviewer for their suggestion and do note that at least one published study documents increased cortical thickness in the MRIs of TBRS patients (Jiménez de la Peña et al., 2024, PMID: 37795572); however, to our knowledge studies have not examined regional or cell type-selective overgrowth of cortical tissue in TBRS patients. Future clinical studies examining the nature of the neuronal progenitor overgrowth and resulting consequences for patient brain imaging would be of interest to better understand TBRS-associated etiology of brain overgrowth and its manifestations.

    1. eLife Assessment

      This is a useful study investigating the role of peristalsis in the elongation of the gut, using the chick ceca as a model. The work employs optogenetics together with embryological approaches to establish links between peristaltic muscle contractions and downstream cell behaviors that lead to tube elongation. However, the work is somewhat incomplete, limited in mechanistic insights that would extend beyond prior work in the literature, which has already suggested a role for smooth muscle contractility in avian gut elongation.

    2. Reviewer #1 (Public review):

      Kawamura et al. investigated the role of circumferential smooth muscle contractions in chick gut tube elongation, addressing the hypothesis that "peristaltic activity generated by the gut promotes its own elongation during embryogenesis". Although not acknowledged in the current manuscript, this interesting premise was, in fact, previously demonstrated.

      Indeed, the experiments in the present manuscript closely parallel a previous study (Khalipina et al, 2019: "Smooth muscle contractility causes the gut to grow anisotropically") that also cultured chick gut tissue and performed time-lapse analyses to quantify peristalsis. Both studies showed that inhibiting peristalsis with Ca-channel blockers induces a switch from elongational to radial growth in the gut.

      However, one of the main strengths of the current study is the innovative use of optogenetic manipulation to rescue gut lengthening in drug-inhibited gut tissue by re-stimulating peristaltic contractions. In addition, the authors use aphidicolin to show that peristalsis-mediated gut elongation is independent of cell division. They also track individual smooth muscle cells and show that they divide circumferentially, but become redistributed along the length of the gut tube with peristalsis.

      While these data are solidly quantitative, they do not provide mechanistic insight into how peristaltic contractions cause smooth muscle cells to be redistributed.

      The evidence presented in this manuscript supports the main conclusion that peristalsis plays a critical role in embryonic gut elongation, but this conclusion itself is not novel. In addition to corroborating previous work, this manuscript provides some useful additions to our existing knowledge of the role of mechanical forces in embryonic gut morphogenesis and illustrates the utility of a previously published optogenetic manipulation technique.

    3. Reviewer #2 (Public review):

      Summary:

      This study uses the chicken caecum ex vivo culture to show that embryonic peristaltic activity is a key mechanical factor for gut elongation. It is shown that pharmacological inhibition arrests intestinal growth, while optogenetic restoration rescues longitudinal elongation. The authors propose a two-step mechanism in which circular smooth muscle cells proliferate circumferentially, but peristalsis pushes them toward longitudinal rearrangement, which explains the anisotropic growth of the gut.

      Strengths:

      The experiments combine loss-of-function (peristalsis inhibition) with gain-of-function (optogenetic rescue) experiments and quantifiable readouts in an embryonic gut culture model. The work is clearly presented with nice microscopy videos and offers a potentially valuable conceptual framework linking tissue-scale mechanics to smooth muscle cell behaviors during development.

      Weaknesses:

      Some results appear conceptually inconsistent with the claim of peristalsis-essential rearrangement (e.g., longitudinal separation of daughter cells even without peristalsis), and the mechanistic link would benefit from clearer quantification and reconciliation. The study largely overlooks contributions from other gut layers and the ECM (and aphidicolin affects all proliferating cells), limiting interpretation of how smooth muscle rearrangement translates into whole-wall elongation.

    4. Reviewer #3 (Public review):

      Summary:

      The authors noted a steep increase in the rate of growth with the onset of more frequent peristaltic-like movements and hypothesized that peristaltic activity rearranges the orientation of cell growth from circumferential to longitudinal. This study sought to alter peristalsis and then (1) carefully examine the growth of the chick cecum relative to the frequency of peristaltic-like movements and (2) examine the orientation of cells relative to the circumferential and longitudinal axes to determine whether peristalsis is required for cecum lengthening. To alter peristaltic-like movements, contraction was inhibited through treatment with nifedipine (a calcium channel blocker that acts to relax smooth muscle) or Ani9 (inhibits Ca-activated chloride channels), and contractions were induced through activation of a blue light-activatable channel rhodopsin 2 (introduced through electroporation).

      Strengths:

      (1) Use of multiple methods to alter peristalsis in initial studies.

      (2) Live imaging.

      (3) Careful measurements.

      (4) Nicely presented figures.

      Weaknesses:

      (1) Only Nifedipine inhibition was examined for cell positional changes.

      (2) Ki67 was not carefully analysed, and apoptosis was not shown at all.

      (3) The results shown are suggestive of a role for peristalsis in the lengthening of the cecum. Demonstration that increased peristalsis could further increase lengthening would be helpful.

      (4) The novelty of this work is incremental for the field in that the reagents used and the model of smooth muscle driving gut lengthening in mouse and chick small intestines have both previously been published. This manuscript does suggest that the role of smooth muscle in longitudinal growth may extend to other tubular organs (chick cecum).

    5. Author response:

      We sincerely appreciate the efforts of the Senior and Reviewing Editors, as well as the three reviewers, for their careful evaluation of our manuscript and their insightful comments. Previous studies have suggested that smooth muscle activity contributes to gut elongation; however, these studies do not directly demonstrate that peristaltic movements per se drive elongation. For example, studies in mouse have primarily focused on residual stress of smooth muscle (Yang et al., 2021), rather than the dynamic spatiotemporal nature of peristalsis. In chickens, inhibition of peristalsis by nifedipine has been interpreted as evidence for a role of peristalsis in gut elongation (Khalipina et al., 2019). However, because nifedipine broadly affects calcium-dependent cellular processes, these experiments cannot distinguish whether the observed effects arise specifically from loss of peristalsis or from other cellular perturbations. In our current study, we aimed to challenge this limitation by combining pharmacological inhibition with optogenetic reactivation. This approach allows us to selectively restore peristaltic movements under conditions in which endogenous peristalsis are suppressed. Based on these experiments, we provide evidence supporting a causal contribution of peristalsis to the anisotropic gut growth. We agree with the reviewers that the positioning of our study relative to previous work should be clarified. In a revised manuscript, we will more clearly distinguish between static mechanical tension and endogenous peristaltic movements, and better define the conceptual advance of our study. In addition to macroscopic growth analysis, we identified cellular dynamics associated with elongation, including circumferentially oriented cell division and peristalsis-dependent longitudinal cell rearrangement. We agree that the mechanistic link between peristalsis and downstream cellular behaviors remains incompletely understood. In the revised manuscript, we will clarify this limitation and outline future directions, including experiments to test the role of mechanical cues (e.g., mechanical perturbation and pharmacological manipulation of mechanotransduction pathways).

      Public Reviews:

      Reviewer #1 (Public review):

      The mechanism by which peristalsis and the cell rearrangement are mediated

      We appreciate this important point. As suggested, the possibility that mechanical aspects of peristalsis contribute to the gut elongation is highly plausible. To address this, we plan to perform additional experiments aimed at isolating the mechanical component of peristalsis. Furthermore, we will investigate the involvement of mechanotransduction pathways, including Piezo-mediated pathway, using pharmacological approaches. We will revise the manuscript to better discuss these possibilities and clarify the current limitations of our study.

      The novelty and positioning of our study

      We appreciate this comment and have addressed this point in the General response above. In the revised manuscript, we will more clearly position our study relative to the previous studies.

      Reviewer #2 (Public review):

      Longitudinal separation of daughter cells even without peristalsis

      We appreciate this insightful and important comment. As noted, daughter cells can exhibit longitudinal separation even under nifedipine treatment, whereas the divergence index (DI) shows a clear increase only in the control (with peristalsis) condition. We interpret this as follows; immediately after cell division, two daughter cells occupy nearly identical positions along the longitudinal axis, and stochastic fluctuations may cause them to separate each other. Such local separation does not necessarily reflect population-level cell rearrangement. In contrast, DI captures collective dispersion of a cell population, which reflects organized tissue-level rearrangement associated with elongation. We will revise the manuscript to clarify this distinction between local cell behavior and population-level dynamics, and to better explain how DI reflects elongation-related processes.

      Contributions from other gut layers and ECMs

      We agree that contributions from other tissue layers and extracellular matrix (ECM) components might be important. To address this, we plan additional experiments including targeted ablation of specific tissue layers and pharmacological manipulation of ECM remodeling (e.g., using MMP modulators). We will also expand the Discussion to better acknowledge these factors.

      Reviewer #3 (Public review):

      (1) We agree that experiments based solely on nifedipine treatment cannot fully exclude potential off-target effects. To address this limitation, we plan to perform additional experiments that rescue the mis-rearrangement of cells by applying mechanical forces.

      (2) We agree that more elaborate analyses of cell proliferation and apoptosis are needed. In the revised manuscript, we will incorporate additional analyses using appropriate markers and methods suitable for developing gut tissue.

      (3) In Figure 2, we had already shown an increased the frequency of peristaltic contractions (30 s intervals, Fig. 2i, j, k, n). This did not result in a significant increase in elongation or widening compared to the control condition (120 s intervals). This suggests that the effect of peristalsis on elongation may reach a plateau at a certain frequency. We will revise the manuscript to clarify this interpretation and discuss its implications.

      (4) We appreciate this important comment and have addressed the issue of novelty and positioning in the General response shown above.

      Reference

      Yang, Y. et al. Ciliary Hedgehog signaling patterns the digestive system to generate mechanical forces driving elongation. Nat. Commun. 12, 7186 (2021).

      Khalipina, D., Kaga, Y., Dacher, N. & Chevalier, N. R. Smooth muscle contractility causes the gut to grow anisotropically. J. R. Soc. Interface 16, 20190484 (2019).

    1. eLife Assessment

      This valuable study builds a novel auditory-motor paradigm to investigate how the brain learns associations between movements and their auditory consequences. Solid evidence is provided for early ERPs (50-100 ms latency) reflecting violations of established key-pitch mappings. The writing, however, could be streamlined to better emphasize the paper's key contribution, and some statistical analyses might be improved.

    2. Reviewer #1 (Public review):

      Summary:

      Zhang et al. report on an ambitious study that investigates multiple aspects of the neural and behavioral underpinnings of auditory-motor surprisal in the context of an auditory-motor learning paradigm (piano keyboard). Using an intricate design comprising several sub-parts and control procedures, they report that early ERPs (50-100 ms latency) reflect violations of established key-pitch mappings.

      Strengths:

      This is a carefully devised and executed study. The paradigm is quite intricate and, at the same time, addresses multiple aspects of auditory-motor learning, and does so in a rigorous way.

      Weaknesses:

      Perhaps because of the exhaustive approach, it is sometimes difficult to follow which parts of the experimental design the results come from; there are some questions regarding appropriate statistical methods, the inclusion/treatment of musical background in participants, and the nature (latency & extent) of the identified neural components that detect auditory-motor violations.

    3. Reviewer #2 (Public review):

      Summary:

      Zhang et al. report an EEG study (n=18) of participants playing a keyboard where the correspondence between keys and pitches is varied to introduce sensory-motor mismatches (discrepancies between sensory inputs and expected sensory consequences of motor commands). They find that the auditory N100 amplitude is enhanced for the initial keystroke following a mapping switch but rapidly attenuates for subsequent keystrokes (showing rapid updating of the forward model), whereas the motor-related P50 amplitude only differentiates trained versus untrained mappings after 30 minutes of goal-directed practice (potentially showing timescales of inverse model updating). Using parallel univariate and mTRF decoding analyses, they conclude that forward models (mapping action to predicted sound) update almost instantly to track short-term context, while inverse models (mapping sound to motor commands) update slowly and require extended, targeted practice.


      Strengths

      (1) Methodological innovation:<br /> The study utilizes an interesting, continuous auditory-motor paradigm that moves beyond standard trial-by-trial oddball designs, offering a more ecologically valid measure of trial-to-trial adaptation.

      (2) Analytical elegance and rigor:<br /> The combination of traditional univariate ERP analyses with multivariate temporal response function (mTRF) decoding is elegant, allowing the authors to successfully dissociate overlapping auditory and motor variance streams.

      (3) The dissociation between the rapid adaptation of the N100 forward model and the slower adaptation of the P50 inverse model is interesting.

      Weaknesses

      (1) Confounded passive listening baseline:<br /> The passive listening control condition lacks an orthogonal behavioural task (e.g., an occasional oddball detection task). Active playing inherently necessitates focused attention on auditory feedback to monitor performance, whereas passive playback does not. The globally weaker stimulus-evoked pattern at electrode Fz during passive listening strongly suggests that the absence of an N100 effect in this condition may simply reflect a lower state of attention, rather than isolating the absence of a motor-driven forward prediction, in particular because the pure sensory suprisal was also enhanced for "firsts" notes, so this could also lead to stronger N1, but this effect may be masked.

      (2) Overclaimed theoretical novelty:<br /> The conceptual framing leans excessively on the authors' specific "MirrorNet" framework, presenting foundational, decades-old tenets of the motor control literature (i.e., unsupervised exploration for forward models vs. supervised skill acquisition for inverse models; Wolpert, Jordan, both in the nineties) as their own novel "conjectures." This theory-heavy introduction obscures the paper's actual empirical contribution to the design and the interesting question regarding the distinct temporal adaptation scales of forward versus inverse models. I think some rewriting can improve the paper.

      (3) Misplaced surprisal terminology:<br /> In a similar vein, I find the use of the term "auditory-motor surprisal" more theoretical grandstanding than actually useful. The significance statement claims to "extend this principle from sensory processing" but in fact, the concept of sensory motor unexpectedness is again a staple of the forward motor literature. Moreover, nowhere in the paper do they actually estimate sensorimotor surprisal. While the authors compute surprisal for their auditory baseline using IDyOM, their central sensorimotor analysis relies entirely on a simple categorical mismatch (first vs. subsequent keystrokes). The phenomenon can equally be referred to by its established nomenclature-"sensorimotor mismatch" or "sensory motor unexpectedness".

      (4) Incremental conceptual advance regarding the N100:<br /> The paper frames the N100 finding as a major discovery, but as far as I know, the attenuation of the auditory N1 to self-generated sounds via accurate motor prediction-and its enhancement during sensorimotor mismatch - is one of the most heavily documented phenomena in the auditory-motor literature (e.g. Timm et al., 2013; Bendixen et al, 2012; 2013). As far as I'm concerned, the authors should clarify that the novelty lies in the novel, elegant design that provides a new way to correct for non-sensory-specific motor-induced attenuation, and characterizing the distinct adaptation timescales of forward versus inverse models  -- not in demonstrating N100 modulation by sensorimotor mismatch, which is well-documented, AFAIC.

    1. eLife Assessment

      This useful study asked whether the behaviour of motor units from a hand muscle changed across the two mechanical actions it performs. The authors used high-density intramuscular electrodes to record the activity of several motor units and reported changes in motor unit recruitment order across tasks that were not dependent on motor unit properties, suggesting differential spinal contributions to the two actions. However, the evidence supporting their main claims is incomplete, and some of the conclusions are based on unsubstantiated assumptions: the authors should correct several key analyses and temper claims that are not directly backed up by their data.

    2. Reviewer #1 (Public review):

      Summary:

      Osswald and colleagues aim to show how motor units of the first dorsal interosseous (FDI) are flexibly recruited across two functionally different movements: index finger abduction and index finger flexion. They motivate this by arguing that FDI is the prime mover in abduction but acts as a synergist in flexion, alongside flexor digitorum profundus (FDP) and flexor digitorum superficialis (FDS) as the prime movers. This is a worthwhile question because it speaks to how descending neural inputs to the spinal cord flexibly control movement.

      The authors claim that recruitment order and recruitment threshold of FDI motor units differ between abduction and flexion, and that beta-band intramuscular coherence is reduced when FDI acts as a synergist. However, there are significant methodological concerns that undermine the results and conclusions.

      Strengths:

      The study certainly aims to address a central question in motor neuroscience - how flexible recruitment of motor units occurs across movements where the same muscle changes its functional role. They correctly identify the FDI as a multi-functional muscle and use intramuscular high-density EMG arrays to record several motor units simultaneously, which is a major technical strength. They also track individual motor units between conditions and, therefore, have generated a potentially valuable dataset for studying spinal motor control across different movements.

      Weaknesses:

      The key limitation comes from the authors' interpretation of "neural drive" to FDI. The authors acknowledge that global EMG during flexion is smaller than that during abduction (for the same force), and surmise that the FDI receives different amounts of neural drive between these two movements, which is a potential confound for their analyses. To match the neural drive (i.e., global EMG), the authors ask participants to generate the same global EMG in flexion as in abduction; the forces generated by FDI are significantly different (2-3N for abduction and 1-8-6.2 for flexion). From this, they find changes in recruitment order, recruitment threshold, and beta coherence. However, different FDI motor units (and different muscle fibres) are active during abduction versus flexion. Using global EMG as a proxy for neural drive ignores this spatial separation of EMG generation during abduction and flexion, such that some amount of global EMG generated by one part of FDI (during abduction) is considered the same (from a neural drive perspective) as the same amount of EMG generated by a completely different part of FDI (during flexion). But these two global EMGs (during abduction and flexion) are not biologically equivalent because they are generated by different motor units and muscle fibres. Consequently, neural drive during flexion and abduction is not equivalent, which makes biological interpretation less clear. Furthermore, it is difficult to tell if abduction-versus-flexion differences are due to task role (prime mover vs synergist) or differences in force/mechanical demands, multi-muscle coordination, and spatial sampling limits of intramuscular recordings.

      As mentioned, we think that the question asked is a very interesting one and framed appropriately to investigate the behaviour of motor units during prime mover and synergist roles. Simultaneously recording the prime movers for index flexion (FDP and FDS) would significantly improve the completeness of the study and allow for multi-muscle comparisons that are more relevant to how the motor system resolves prime mover vs synergist roles.

      The authors use motor unit action potential as a proxy for motor unit size. This is not suitable because muscle fibres closer to the electrode will appear larger, independent of their true size. We advise that the authors remove analyses pertaining to motor unit size if it cannot be accurately measured.

      Finally, several mechanistic interpretations in the discussion (e.g., spinal interneuronal suppression, reduced corticospinal input, proprioceptive mechanisms) read as more speculative than the current data can support without added controls or citations.

    3. Reviewer #2 (Public review):

      In this study, the authors examine whether the structure of motor unit (MU) recruitment and firing varies across movement directions in the human first dorsal interosseous (FDI) muscle. While task-dependent changes in MU recruitment have been reported previously (e.g., Thomas et al. 1986), these findings were largely based on recordings from a limited number of isolated single motor units. By applying high-density intramuscular electromyography and decomposition techniques, the authors demonstrate similar phenomena at the level of larger MU populations, thereby providing a useful consolidation of prior observations. In addition, they show that recruitment thresholds shift across tasks while the inverse relationship between discharge rate and recruitment threshold (the "onion-skin" organization) is preserved, suggesting that the overall structure of inputs to the motoneuron pool remains stable despite changes in recruitment order. Furthermore, by analyzing intramuscular coherence across MU firing, the authors attempt to characterize differences in the extent of synchronization among frequency components of neural inputs between abduction and flexion of the index finger. In particular, they report reduced beta-band coherence during flexion compared to abduction, indicating decreased synchronization in this frequency range (13-30Hz). This observation is noteworthy, as it points to potential differences in the neural inputs underlying these task-dependent changes.

      A key strength of the study is that it extends prior work on task-dependent MU recruitment to larger populations using state-of-the-art recording and decomposition approaches. This represents a meaningful technical and conceptual advance over earlier studies limited to small numbers of units. The finding that recruitment shifts between flexion and abduction occur consistently across MUs, independent of motor unit size, further strengthens the robustness and generality of the observed phenomenon. Together, these results provide convincing evidence that MU recruitment is not strictly fixed by a rigid size principle across functional contexts and thus make a valuable contribution to the literature on motor control.

      However, several aspects of the mechanistic interpretation are less well supported. The authors interpret their findings as reflecting a "redistribution" of net excitatory input to the motoneuron pool across tasks. While this is a plausible interpretation of the observed changes in recruitment thresholds and recruitment order, it is not directly demonstrated by the analyses presented. The current data do not clearly distinguish redistribution of inputs from alternative explanations, such as task-dependent modulation of shared versus independent inputs, or changes in the effective gain of existing pathways. As such, the evidence for a specific redistribution of input remains incomplete.

      The interpretation of the intramuscular coherence analysis represents a further key weakness. By computing frequency-specific coherence across MUs during abduction (as a prime mover) and flexion (as a synergist), the authors report reduced beta-band coherence during flexion and interpret this as evidence for attenuated corticospinal input and increased involvement of spinal circuits. However, the relationship between changes in downstream coherence and the magnitude of upstream neural drive is not well established. Coherence reflects the synchronization of inputs rather than their net strength, and therefore, a reduction in coherence cannot be directly interpreted as a decrease in input from a specific source. Moreover, coherence measures alone do not permit identification of the origin of the inputs, and thus do not provide sufficient evidence to attribute the observed differences to descending or spinal pathways. While the difference between tasks is clear and potentially informative, the mechanistic interpretation appears overstated and should be treated more cautiously.

      A related issue concerns the interpretation of the preserved RT-DR relationship. While this finding supports the presence of a stable common input structure across tasks, the additional claim that proprioceptive feedback contributes significantly to maintaining this organization is not clearly justified by the presented data. No direct evidence is provided to dissociate afferent from descending inputs, and the absence of task-dependent differences in lower-frequency coherence further limits support for this interpretation. As such, the proposed role of proprioceptive feedback appears speculative.

      Overall, the authors successfully achieve their primary aim of demonstrating task-dependent flexibility in MU recruitment at the population level, and the results provide useful empirical support for this phenomenon using modern techniques. The study is likely to be of interest to researchers in motor control and neuromuscular physiology, particularly given the increasing relevance of MU-level analyses in both basic and applied contexts. However, the broader mechanistic conclusions regarding the nature and origin of the underlying neural inputs are not fully supported by the data and would benefit from more cautious interpretation or additional experimental evidence.

    1. eLife Assessment

      The authors make the valuable observation that directional memory during epithelial cell migration is enhanced compared to single-cell migration. They attribute this effect to adherens junctions and vinculin dimerization. In the work, central measures should be defined more precisely, and the support for their claims about the roles of adherens junctions and vinculin dimerization in memory enhancement remains incomplete.

      [Editors' note: this paper was previously reviewed by another journal.]

    2. Reviewer #1 (Public review):

      Summary:

      In this work, the authors study the migration of isolated cells and of cells in ensembles. They quantify several aspects of the corresponding migration patterns and investigate how these quantities depend on molecules that are known to play an important role in migration. Furthermore, they study the effect of external cues on these migration processes.

      Strengths:

      The authors provide a clean and uniform setting for comparing the migration of isolated cells and of cells in an ensemble in control and mutant conditions, and in the presence and absence of external cues. This allows for a meaningful comparison between different conditions. In this way, the authors obtain useful data that link the migration of isolated cells to that of cells in collectives.

      Weaknesses:

      A major weakness of the manuscript is that the authors do not properly introduce the quantities and concepts they are working with. In this way, it is hardly accessible for a reader who does not have a thorough background in cell migration and anomalous transport. In addition, the manuscript uses some notions that are not standard, for example, vinculin or FA stability, which are not properly introduced. Most strikingly, "collective directional memory" is not defined.

      The authors infer relationships between different quantities, but they remain qualitative, even though the authors use a language that suggests otherwise. For example, "The combination of Focal Adhesion stability and force transmission from the cytoskeleton predicts the migration speed of single cells" (p 2). I am not sure what is meant by prediction, but this heading suggests that knowledge of FA stability and force transmission yields the migration speed. Reading this line, I expect that if I give you values for FA stability and force transmission, you would give me a value for the migration speed. Such a quantitative mapping is not provided. In fact, it cannot be provided, because - as mentioned before - these quantities are not properly defined, so I would not know how to measure them. I do not even know their units.

      Furthermore, the authors do interpret some of their results without explaining or justifying the basis for their interpretation. For example, they use the FRET index of vinculin - another notion that is not properly introduced - to make statements about mechanical stress.

      It also seems that the figures could be improved. Some of the sketches are, in my opinion, not helpful. Examples are Figure 3A (how could a cell move while the hexagonal arrangement of the cells is maintained?) or Figures 2F, 4F, and 6F (what do the colored ellipses indicate?). In Figures 1B, 1D, 2A, 2E, 3B, 3D-F, 4A, 4F, 5B-D, it is not clear which lines merely connect data points and which lines are fits to the data.

    3. Reviewer #2 (Public review):

      Summary:

      The manuscript by Canever et al was assessed by three Referees at another journal, who brought up a range of critical points. I will not repeat a summary of the work; this can be found in the first-round reviews.

      Strengths:

      In their revised manuscript, the authors include substantial changes and additional reasoning. Along with their rebuttal letter, I think they make a very convincing case. While the claims are well supported by the analysis, I do not see that the findings need to be universal to be relevant. It might be rather surprising to me if there existed such a universality, in fact. I think that the findings are solid and interesting in their own right and are worthy of publication, especially with the amended discussion in this revision.

      Weaknesses:

      However, while the more bio-oriented parts are not fully accessible to me, I do have a few points from the data analysis point of view that need amendment.

      (1) The used mathematical models need to be specified more precisely. First, the authors confuse Levy flights and walks. These are distinct processes in the sense that a Levy flight does not have a finite variance and thus no finite speed. The proper model here would be Levy walks. As in a big body of the literature, both notions are used interchangeably here, while they are distinct processes. Then the authors speak about a "superdiffusive model", for which I do not find a proper definition. There exists an entire range of superdiffusive models, each with a different physical background, so this needs more clarity. The authors may consult one of the standard reviews for more details, e.g., Soft Matter 8, 9043 (2012) or Phys Chem Chem Phys 16, 24128<br /> (2014). Overall, a few equations (maybe in the Supplement) would help to be more specific.

      (2) For fractional Brownian motion, the authors should check the displacement correlation function; it should show slowly decaying, positive correlations. More details on the practical analysis of FBM can be found, e.g., in Phys Chem Chem Phys 27, 14350 (2025). These correlations should decay as a function of the bin time, e.g., as discussed for the opposite case of subdiffusion in Phys Rev E 88, 010101(R) (2013) [cf Fig 3b]. In general, FBM was determined to be a highly relevant process for a number of systems, including amoeba cells at shorter times, see the detailed analysis in Phys Rev Res 4, 033055 (2022). In this paper, there are also different ways to characterise the motion in terms of scaling. Exponents are detailed.

      (3) Some relevant approaches discussed in literature that should be discussed in the context of this work: eLife 9, e52224 (2020); Rep Prog Phys 86, 126601 (2023); Chaos 35, 023145 (2025). In the context of non-Gaussianity for active particles: Phys Rev E 104, 064615 (2021); New J Phys 25, 013010 (2023).

      (4) In the abstract, I am having some issues with the formulation in the sentence: "This directional memory emerges from fractional Brownian motion". It sounds as if FBM were a fully clarified phenomenon. I would prefer some statement along the lines that the data are consistent with such a mathematical modelling approach.

      After fixing these points, I think the manuscript will clearly warrant being shared.

    4. Reviewer #3 (Public review):

      This manuscript focuses on the presence/origin of directional memory during epithelial cell migration. It starts by analyzing single cells and then moves to more complex systems (confluent layers and scratch assays). The paper first demonstrates that the migration in all of these systems is well-described by persistent random walks, which likely emerge from fractional Brownian motion. This is an important demonstration, as it implies orientation memory in the systems. Then the paper proceeds to attempt to discern the origin of this memory and claims to establish key roles for adherens junctions and vinculin dimerization. While for the most part the manuscript is well-written, there are some significant overinterpretations in experimental results. The largest issue is demonstrating the role of vinculin dimerization, which is not a well-studied phenomenon inside living cells, as all data is reliant on a single point mutation (Y1065E). Additionally, the authors seem to be over-interpreting several of the assays; the statistical analysis does not seem to encompass all comparisons made, and the molecular model proposed does not clearly explain the observed results. The discussion could also be strengthened by considering other aspects of vinculin behavior (e.g., vinculin catch bonding) as well as discussing some other recent similar papers.

      (1) Likely the most significant issue with the manuscript is the interpretation of the vinculin Y1065E variant and the assumption that the only defect the mutations cause is a lack of dimerization. Vinculin dimerization is mediated by a conformational change in the vinculin tail domain induced by F-actin binding (Thompson, FEBS Letters, 2013). Dimerization of the vinculin tail domain has been clearly demonstrated in in vitro systems involving purified proteins, as the authors point out in the manuscript. However, the dimerization of full-length vinculin has not been well characterised in living cells. There are several reasons to suspect dimerization is potentially not prevalent in cells. For instance, in the presence of other actin-binding proteins, there may not be sufficient binding sites available on neighboring actin filaments to facilitate dimerization. Additionally, pY1065 vinculin and vinculin Y1065E have been associated with increased vinculin activation (Huang, JBC, 2014), so other effects seem possible. While the Y1065E variant clearly has an effect on the tension sensor readout and vinculin dynamics, further experimental evidence is needed to show that these effects are due to a lack of dimerization in living cells. To justify the definitive claims made in the manuscript, the authors likely need to develop, or employ, an assay for detecting vinculin dimerization in living cells. The authors could choose between intermolecular FRET, proximity labeling assays (i.e., antibodies with DNA for signal amplification), bimolecular fluorescent complementation (i.e., split GFP) based approaches, or some other approach. It should be noted that working with full-length vinculin, not just Vt, and designing an assay that can incorporate vinculin Y1065 variants (Y1065E and potentially Y1065A/F) would strengthen results. Also, the authors should be aware that the observation of strong dimerization may invalidate the use of FRET-based tension sensors in this system or at least necessitate intermolecular FRET control experiments.

      (2) The authors have seemed to assume that FRAP and adhesion stability are interchangeable. To this reviewer's knowledge, this is not the standard in the field. FRAP informs about molecular dynamics. Stability assays, which probe the spatial position of an entire focal adhesion over time (Zaidel-Bar, JCS, 2007, although other approaches are equally suitable), are typically used for assessing adhesion stability. If the authors wish to make strong claims about the stability of the adhesions, non-FRAP-based assays should be employed. Alternatively, the authors could interpret the FRAP data simply in terms of vinculin dynamics.

      (3) A major conclusion in the manuscript is that in response to overexpression of a specific vinculin construct, focal adhesions behave the same in single cells, confluent cells, and collectively migrating cells for all the mutants but Y1065E. However, outside of the FRET measurements, there is not much evidence to support this claim. The authors should perform a greater comparison of the focal adhesions between the systems used in the manuscript (single cell, confluent cells, collectively migrating cells). Key measurements would include focal adhesion number per cell, focal adhesion size, focal adhesion orientation, vinculin dynamics (e.g., FRAP), focal adhesion stability, and some indicators of focal adhesion composition. For the last aspect, focusing on focal adhesion components that also have roles in adherens junctions, such as VASP, seems appropriate. Without such characterization, it is an overinterpretation to assume that focal adhesions are the same in each system and, therefore, effects are due to vinculin behavior in the adherens junctions.

      (4) What is shown in Figure 3G is not clear. How are P/Po and alpha shown on different areas of the same plot?

      (5) It seems that an insufficient statistical test was used in many experiments. There are comparisons being made between systems (cell migration speed, FRET index...) that are not directly compared in a statistical test. Statistical tests are limited to differences from control (over-expression of full-length vinculin), and consistent increases or decreases (not quantitative values) are taken as evidence of similarity across systems. It seems that a more rigorous and standard approach would be to use an ANOVA/MANOVA with a suitable post-hoc test to perform all of these.

      (6) It is unclear how a lack of vinculin dimerization at adherences junctions perturbs epithelial migration, but the complete lack of vinculin tail, which can also not dimerize, does not. In other words, how can TL "have no other role in cell migration at confluence than those at FAs as in single cells." Notably, the authors do not include the tailless variation in the schematic model figures. These results should be included and explained.

    5. Author response:

      [Editors' note: The authors included an author response to reviews from another journal]

      Reviewer #1 (Comments to the Authors):

      In this manuscript the authors describe that cells in collective movements adopt a superdiffusive behavior to out pace individual cells. This behavior is regulated by cell-cell junctional stability and force transmission. The authors state that speed is regulated by vinculin through mechanosensitivity.

      While is makes intuitive sense that cells may move more efficiently collectively as it reduces their exploratory space and therefore increases their efficiency of movement,

      We agree that this is an intuitive explanation. However, previous literature had shown that confluent cells may or may not migrate depending on conditions that do not solely depend on the space available per cell, but also involve the intrinsic activity of the cell, its cortical tension, and its adhesion with its neighbors, with sometimes counterintuitive effects (doi: 10.1016/J.CEB.2021.07.011). This was the reason that motivated us to investigate how these various ingredients affected space exploration efficiency on different time scales.

      Our results indeed refute the intuition that cells move more efficiently when their exploratory space is reduced by showing that the outcome depends on the time scale considered (Fig. S3B). Specifically, on short time scales (less than 3 hours), the area explored by individual MDCK cells is larger than that explored by MDCK cells at confluence. On a longer time scale (greater than 3 hours), however, the area explored by confluent MDCK cells is larger. This switch is a direct consequence of the change in migratory behavior from persistent random walk to superdiffusion, Moreover, its position in time depends on the cell line: extrapolation of our results on RPE-1 cells suggests that it should theoretically occur after approximately 300hrs, if this time scale was experimentally accessible (Fig. S3F).

      …the role of junctions specifically is less clear.

      We are sorry that we were not able to clearly convey the roles of junctions. We have substantially rewritten our text to address this and all the changes are highlighted in orange. As summarized in Fig. 6F, junctions have three roles. The first role is on persistence, through velocity coordination between neighbors, the second is on speed, through the stability of junctions, and the third role is on directionality, through the sensitivity of the monolayer to the wound edge.

      The first role is evidenced thanks to the comparison of the MSD between single cell and confluent migration assays and the use of the alpha-catenin KD cell line. Alpha-catenin depletion is known to be the most potent disruptor of adherens junctions (DOI:10.1091/mbc.e06-05-0471, , DOI:10.1126/science.aaf7119, (DOI:10.1073/pnas.1002662107, DOI:10.1073/pnas.1119313109), and we show that it significantly alters the superdiffusive behavior that emerges in the confluent migration assay (Fig. 3E,F, 5C). Therefore, junction integrity is critical for the control of cell persistence.

      Moreover, alpha-catenin depletion induces a loss of velocity coordination between neighbors (Fig. S3E), which we show through numerical simulations to induce superdiffusion (Fig. 3G). By contrast, E-cadherin KO and vinculin mutants have no effect on the superdiffusion of confluent cells (Fig. 3E, 4A). Therefore, the critical molecular ingredient is the link provided by alpha-catenin to the cytoskeleton that provides junction integrity.

      The second role of junctions is evidenced thanks to the comparison of cell speeds between single and confluent migration assays with the vinculin mutants (Fig. S4A). Results show that cell speed is reduced of about 10µm/h by confluence, regardless of the mutant except for YE, whose only difference with other mutants is its lower stability (Fig. 4F). This supports that junction stability, and not the other effects of mutants, controls cell speed (we provide a detailed demonstration in the response to the following question). As expected, junction integrity is required as well, as seen from the higher cell speed of the alpha-catenin KD cell line compared to WT (first MSD point in Fig. 3B, E).

      The third role of junctions is evidenced thanks to the comparison between confluent and directed migration assays (Fig. 6A). Results show that the wound healing rate is proportional to cell speed at confluence, regardless of the mutant except for YE, which displays no tension gradient in junctions from front to back cells (Fig. 6C). This supports that such gradient is required for cells to identify on which side is the wound edge. As expected, junction integrity is required as well, as seen from the loss of directional bias of the alpha-catenin KD cell line (Fig. 5F).

      The authors chose vinculin as the basis by which to manipulate tensions at cell-cell junctions, but this comes with considerable drawbacks. Namely, since vinculin appears at both cell-cell and cell-matrix junctions, its role and the role of its mutations is not clear here. The authors state that the collective migration speed is related to junctional stability, but because vinculin is also at FA, how can this be concluded?

      We apologize for the lack of clarity. We hope that the highlighted changes in the revised manuscript will improve this point. As exemplified above, comparing cell migration between isolated cells and confluent cells is essential to enable us to distinguish between the contributions of AJs and FAs. Indeed, since isolated cells lack AJs, the impact of vinculin mutants on single cell migration can only be explained by their effects on FAs. This is how we first determine the effects of vinculin mutants on migration that depend on FAs. Because confluent cells also have FAs, we expect that the effects of vinculin mutants on the migration of isolated cells will still be present in confluent cells, to which will be added the effects of these mutants on AJs and their consequences on migration, if any.

      Therefore, when compared to WT cells, if a given mutant decreases or increases migration speed in individual cells, and does so in confluent cells in the same proportion, then its effects at confluence can be entirely explained by its effects in individual cells, and there are no additional effects of that mutant from AJs. This is indeed what we observe for all mutants except the YE mutant (Fig. S4C), leading us to conclude that none of the vinculin mutants, except the YE mutant, have an effect on migration at confluence that results from AJs. In contrast, the YE mutant has effects on migration at confluence that cannot be explained by its effect on individual cell migration. Therefore, the effects of YE at confluence depend on AJs, whether they result from alterations in AJs, FAs, or both. To distinguish between these scenarios, we proceed by elimination, comparing the effects of YE to those of other mutants on force transmission and adhesion stability, and how these two factors associate with migration speed, as explained below. In FAs, YE alters force transmission differently in individual cells and at confluence, but we already know from Fig. 2 that force transmission in FAs cannot alone explain the speed of migration. This result rules out an indirect effect of AJs on cell migration at confluence through FAs. Furthermore, in AJs, YE affects stability and force transmission, but TL has the same effect on force transmission as YE and we already know that none of the effects of TL on migration depend on AJs (Fig. 3, S4C). This result rules out an effect of force transmission in AJs on migration speed at confluence. We conclude that stability at the AJ level, which is the remaining property specifically impaired by YE, is what regulates migration speed at confluence.

      The manuscript's logic and flow are not clear in some places, making the story hard to follow. As one example, the FRAP data, which the authors suggest is used to investigate vinculin's combined role does not help in this capacity as the interpretation and its connection to the bigger story are not clear.

      We are sorry again for the lack of clarity. We used FRAP data to evaluate the effects of vinculin mutants on adhesion stability. Indeed, mutants have different effects on adhesion stability (Fig. 2E, 4F). In addition, they also have different effects on force transmission (Fig. 2D, 4D,E). The partial overlap in functional alterations caused by the mutants allows us to test the involvement of the overlapping function (here stability) in the overall migration outcome. For example, if two mutants have a similar effect on adhesion stability but different effects on migration speed (such as TL and T12), we can then rule out that speed results from adhesion stability. Similarly, if two mutants have different effects on stability but a similar effect on speed (such as TL and YE), we can also rule out that speed results from stability. We applied the same reasoning to force transmission to conclude that neither adhesion stability nor force transmission alone is sufficient for cells to migrate rapidly. However, the combination of the two enables rapid migration.

      As another example, the information derived from the use of the mutants is not clear in the context of the message in the manuscript since they affect cell-cell and cell-matrix junctions and in some places show results that are counter intuitive and not well-explained, to which the authors admit they are surprising but then do not explain their meaning.

      As such, it is very hard to follow the logic with regard to the information resulting from the mutant experiments.

      We provide above a detailed break-down of our strategy to analyze the results. We regret that our manuscript did not adequately convey our conclusions and we hope that the new version of the manuscript improves this point.

      Proliferation has been shown to play a role in wound healing. Does proliferation change in the various conditions?

      This is an important point. The average speed of cells at confluence is approximately 20 µm/h (Fig. 4B), which means that each cell moves approximately its own size in one hour. During this time, assuming a 16-hour cell cycle, 6% of the cells would have divided, each of them likely pushing one of its neighbors a distance equivalent to the size of a cell. Therefore, cell proliferation accounts for at most a few percent of the total cell movement. For this reason, we can assume that growth does not account for a large part of the movement we observe. This is consistent with previous work showing that proliferation does not contribute significantly to wound healing (DOI: 10.1073/pnas.0705062104, DOI: 10.1083/jcb.201207148).

      Minor comments:

      The authors should provide a better description of the mutants: what does a tailless mutant not bind, or bind differently? More context is needed to help interpret the results. While the mutants have all been published on before, it would be helpful to have more information here so that the manuscript is easier to follow.

      We are sorry that the information we provided was insufficient. We have now detailed the mutations to help the reader understand how interactions are altered.

      Figure 1A is not necessary. Figure 1 overall is fairly predictable as there have been many papers using the persistent random walk as the best model to single cell migration (dating back to the early 1990's). The authors define a new term, angular memory, which they show decreases with increasing delta t as one would predict.

      We acknowledge that persistent random walks have already been observed for individual cells, as in references 3-4 cited in the introduction. Nevertheless, we believe that Figure 1 is important because not all cells migrate as persistent random walkers when isolated. Some migrate in a more exotic manner, resulting in superdiffusive behavior, as in references 5-8 cited in the introduction. Since we observe superdiffusive behavior at confluence (Figure 2), it was therefore necessary to show whether or not single cells were superdiffusive too. We also use this figure to introduce angular memory, a measure that, to our knowledge, has never been used before. According to intuition, it decreases to 0 for persistent random walkers, just as another resembling measure, velocity autocorrelation, would do. However, the angular memory of fractional Brownian walkers does not vanish with increasing delta t (Fig. 3D), while velocity correlation would, just as that of persistent random walkers. This difference makes angular memory much more appropriate for distinguishing between the two migration behaviors, and prompted us to introduce it in the first figure as a reference.

      In the wound healing assay, which cells were measured? Leading edge or interior, and does it matter?

      Figure 5A shows that cells behave differently depending on their distance from the wound. This is because the traces shown correspond to the first few hours of the movie, during which the cells at the front begin to move first. Figure S5A shows the speed of the cells over time after the wound and indicates that the cells reach a stable speed after approximately 3 to 4 hours. Figure S5B shows the speed of the cells as a function of distance from the wound at steady state. These results show that the speed of the cells no longer depends on the distance from the wound at this stage. As indicated in the “Materials and Methods” section, we only considered time points beyond this stage for subsequent analyses of population-averaged MSD and velocity presented in Figure 5, so the location of cells at the front or rear was irrelevant.

      Reviewer #2 (Comments to the Authors):

      To migrate cells must spatially explore their environments, a process that is guided by intrinsic signals (adhesive and mechanical properties, etc) and extrinsic (gradient cues) signals. This exploration can occur on the single or multicellular level. In this study, the authors examine the effect of cell-cell interactions, guidance cues, and cell mechanics in the exploratory capacity of MDCK cells. The authors show that cell-cell adhesion provides a "infinite directional memory for migration" and cell speed is dependent upon the focal adhesion stability, cell mechanics, and the mobility of adherens junctions-these processes are modulated by vinculin.

      My three major concerns with the manuscript are as follows:

      (1) While there is potential new information about the role cell-cell junctions and guidance cues play in cell migration, there is not enough NEW insight presented. Rather the role of vinculin in these processes is expected given what is already known about its ability to control focal adhesion stability, mechanics, and adherens junctions.

      We agree that our cell migration results make sense based on the effects of vinculin mutants on the stability and force transmission of adhesions. Nevertheless, we argue that this was not the only possible scenario. Indeed, we find that none of the effects of vinculin mutants on AJs (except YE) have an impact on cell migration (Fig. S4C). One might have expected that the increased stability provided by the TL and T12 mutants would reduce the speed of collective cell migration, just as the YE mutant increased cell speed due to its altered stability. This is not what we found, and this reveals a nonlinear relationship between AJ stability and migration speed that could be investigated more thoroughly in future studies. Another example is that the effects of the mutants on force transmission in AJs do not impact migration speed at confluence but do impact directed collective migration (Fig. 6). One might have expected that vinculin-mediated force transmission in AJs would impact collective migration, whether directed or not.

      More importantly, we show that the role of intercellular adhesion in cell migration is more complex than expected. Indeed, it depends on the timescale considered: intercellular adhesion is detrimental to short-term spatial exploration and beneficial in the long term (Fig. S3B). Such a timescale-dependent behavior is impossible to predict from previously known effects of the mutants or other molecular considerations. Furthermore, we show that this behavior can be fully explained by the coordination of velocities between neighbors, which depends on intact connections between AJs and the cytoskeleton via alpha-catenin, but is independent of vinculin mutants that connect AJs to the cytoskeleton in parallel with alpha-catenin. One might have expected these connections to also have an impact on velocity coordination, and thus on spatial exploration, but we show that this is not the case (Fig. 3). Finally, we show that directed collective migration has a negligible impact on cell exploration at our experimental timescale (Fig. 5), whereas we initially expected the wound to make migration more ballistic. This reveals that such a directional signal affects spatial exploration at much longer timescales than expected.

      Overall, our results quantify the outcome of competing effects and provide timescales at which one effect outweighs the other in influencing cell migration. We believe this is an original approach that provides substantial new insights into collective cell migration.

      (2) The phenotypes of the cells expressing the mutant vinculins varying greatly. These phenotypes are not addressed despite the fact that they could potentially complicate the analyses. For example, there are dramatic differences between focal adhesion numbers and sizes in the cells expressing the different vinculin mutants; cell spreading is also dramatically altered. Likewise, the T12 mutant vinculin has previously been reported to have increased adhesive strength, increased traction forces, and cell spreading. How does this knowledge change the interpretation?

      We agree that vinculin mutants may have effects on the size and number of FAs, cell spreading, and traction forces that we do not examine here. These consequences can be explained by the effects of these mutants on force transmission in FAs and on their stability, which we report in our work. They do not affect our interpretations. Here, we provide a predictive model of migration speed based on the combination of two consequences of vinculin function, namely stability and force transmission. An interesting avenue for future research would be to assess whether these combinations can be reduced to a single higherlevel effect of vinculin on the cellular phenotype that would be sufficient to predict migration speed. This work remains to be done, as neither FA size and number, cell spreading, adhesion force, nor traction forces alone are sufficient to predict migration speed.

      Along the same lines, it has previously been established that tagged version of vinculin do not efficiently integrate into adherens junctions. Published work from the Nelson laboratory suggests that GFP-vinculins do not localize to cell-cell junctions and work from other laboratories suggests localization occurs only when the endogenous vinculin is silenced.

      We are aware that some GFP-vinculin constructs may not localize as well as the endogenous protein at AJs. This is due to the localization of the GFP tag on the head of vinculin and depends on the length of the linker between GFP and the head of vinculin. The longer the linker, the easier the interaction with AJ partners. Unlike these constructs, the vinculinTSMod sensors we use in our work do not carry a GFP on the head and do not suffer from the same limitations.

      Furthermore, vinculin recruitment to AJs depends on force, with little or no recruitment when tension on the AJs is relaxed (DOI: 10.1038/ncb2055). Vinculin recruitment has in fact already been used as an indicator of AJ tension in Drosophila (DOI: 10.1038/s41467-01807448-8). Consequently, the amount of vinculin visible at the AJs varies depending on the tension exerted on the AJs, which our results confirm: vinculin is more difficult to detect at the AJs in cells located at the front of a wound than in those located at the back (Fig. 6B), which is consistent with the difference in vinculin tension between front and back cells (Fig. 6C) and to the E-cadherin tension gradient between front and back cells (DOI: 10.1083/jcb.201706013). Overall, these results show that vinculin is not always easy to detect at AJs, but this is due to the properties of vinculin, which the constructs we use reproduce better than previous constructs (see also below).

      The images in figure S2 and the prebleach images in figure S4 do not show convincing localization of the mutant vinculins to cell-cell adhesions. This is of special concern given that YE mutant protein hardly has any discernable localization to cell-cell junctions; additionally, none of the mutant proteins were tested for their ability to co-localize with adherens junction components. This raises the question if the parameters being examined and the conclusions drawn from them are affected by a difference in localization.

      We agree that the recruitment of vinculin at intercellular contacts may be difficult to see.

      Besides force-dependent effects mentioned above, other factors are involved. The images shown in Figures S2 and S4 are from live cells in which cytoplasmic vinculin is still present, and its level proportional to the mobility of vinculin. Indeed, the TL and T12 mutants show a more marked contrast between intercellular contacts and the cytoplasm, which is consistent with their greater stability at AJs (Fig. 4F). Conversely, YE shows lower contrast, which is consistent with the lower stability of this construct at AJs (Fig. 4F). The FL construct lies between the two. As a result, the cytoplasmic content can variably mask vinculin recruitment at the AJs depending on the mutant.

      We have now performed additional quantifications of mutant recruitment at intercellular contacts as a function of distance from the basal surface of the cells and relative to their recruitment in FAs, in live cells. Results are shown in the new Fig. S4F. We find that all the constructs are recruited to intercellular contacts with a density that is at most half of that in FAs and that varies along the height. FL shows the highest density, localized more apically, consistent with the localization of an AJ-bound actin belt. The mutants appear to be more homogenously distributed along the height of the lateral surface, which may be explained by their impaired autoinhibition (TL, T12), or mechanosensitivity (YE). This variability also contributes to the difficulty in seeing vinculin recruitment in all cells in a single z-slice.

      To confirm the proper recruitment of vinculin constructs to AJs we have performed immunofluorescence against alpha-catenin and phalloidin on each of the stable cell lines. Results are shown in the new Fig. S4D and E. In these experiments, cell permeabilization allows for the release of some of the cytoplasmic pool of vinculin, which highlights the recruitment of all vinculin constructs to intercellular contacts. There, all vinculin constructs colocalize with alpha-catenin and F-actin, as expected. Additionally, images displayed are maximum intensity projections to mitigate recruitment variability along the height.

      Overall, our results clearly support the localization of vinculin at intercellular contacts, and the differences between the constructs are consistent with the effects of their mutations.

      (3) There is a lack of new mechanistic insight. Conclusions are made about a role of vinculin dimerization. This conclusion appears to be based upon the usage of the mutant version of vinculin Y1065. Did the authors directly measure the ability of this mutant protein to dimerize? Is actin binding also affected.

      The binding properties of the Y1065E mutant, including its dimerization and binding to actin, have already been characterized by other researchers (ref. 40 in our manuscript, as well as DOI:10.1111/j.1432-1033. 1997.01136.x or DOI: 10.1016/j.febslet.2013.02.042). We assumed that these properties are now well established and can be used to explain higher-level phenotypes that we show for the first time, to our knowledge.

      Reviewer #3 (Comments to the Authors):

      Canever et al. tracked two epithelial cell lines on collagen coated glass and showed that isolated cells (non confluent) move as persistent random walkers, whereas confluent monolayers migrate super diffusive, with long range directional memory. By systematically perturbing adhesion machinery they found that focal adhesion mutations mainly tune the speed of single cell tracks, but cannot create long range memory, while force bearing adherens junctions are essential for the super diffusive regime-genetically perturbing them collapses collective memory. These interesting results identify junctional tension as important to switch epithelial cells/sheets between individual and collective search modes - an important quantitative insight that is of clear relevance to cell biologists.

      - The presented data is nicely quantitative and convincing, but I have subtle concerns about the generality of the findings. While the authors show that the differential behavior, they describe is not cell-line specific (MDCK, RPE), there are no experiments evaluating the generality of their conclusions across different matrix conditions. How are the measured migration parameters affected by matrix stiffness? Cell migration on collagen coated glass coverslips is a relatively narrow and artificial condition. How is the collective directional memory expected to behave on softer substrates? The generality of the conclusions could be strengthened by repeating measurements using hydrogels of varying stiffness. Further, it should be discussed to which tissues in the body the selected matrix conditions and migration modes plausibly apply.

      We agree that the generality of our results and the relevance of glass-rigid substrates is an important point. In vivo, epithelial cells rest on a basement membrane with a typical stiffness of approximately 10 MPa, as demonstrated by experimental evaluations on various tissue explants, including renal glomeruli and Bruch's membrane, which are relevant to MDCK and RPE-1 cells (DOI: 10.1111/j.1742-4658.2007.05823.x, DOI: 10.1172/JCI106898, DOI:10.1038/eye.1987.35), we have added these references in the manuscript to support our experimental strategy. In vitro, the most significant effects of substrate stiffness on FAs and cell migration generally occur at much lower stiffnesses, between 0.2 and 100 kPa, and cell phenotypes generally plateau at levels comparable to those observed on glass, even below 100 kPa (DOI: 10.1242/jcs.133645, DOI: 10.1038/ncb3268, DOI:10.1039/c5ib00307e, DOI: 10.1039/c9sm01893j). Furthermore, substrate stiffness has much more moderate effects on confluent cells than on isolated cells. For example, it has been previously demonstrated that confluent layers of MCF10A epithelium showed no change in velocity coordination in the range of 3 to 65 kPa (DOI: 10.1083/jcb.201207148). Therefore, collagen-coated glass appears to be a reasonable model for the basement membrane. Overall, we believe that we have conducted our experiments under physiological conditions, and that our results apply to a wide range of substrate stiffnesses.

      - It would be nice to see how long it takes confluent cell layers to close rectangular wounds of defined size when cells migrate as individual (adherens junctions perturbation) versus collective (wt) (on substrates of different stiffness). Presumably, there should be faster wound closure under the collective regime, at least for simple shaped wounds.

      This is an interesting question, which our results indirectly address. In our study, we measured the wound healing speed of the WT MDCK cell line as well as lines expressing mutant vinculin constructs (Fig. 6A). These results show that this speed ranges from 5 to 15 µm/h depending on the construct expressed (and for reasons that we explain in the manuscript). These values make it easy to estimate the time required to close a wound based on its width. For example, it would take 5 hours to close a 100 µm wide wound for the WT cell line, which has a rate of 10 µm/h (on both sides of the wound).

      Wound closure for cells with disrupted adhesive junctions has already been documented (DOI: 10.1083/jcb.200910041). The results show that wound closure is indeed slower than with WT cells. Although this previous study does not reveal the underlying causes, our work now shows that there are two factors: weaker directional memory due to impaired intercellular coordination and, in the longer term, an additional lack of sensitivity to the guidance signal provided by the wound.

      - Akin to substrate stiffness variation, I am missing experiments that test the effect of cytoskeletal tension on these migration modes. Experiments with Rho kinase or myosin inhibitors could meaningfully broaden the scope of this study.

      Rho kinase or myosin inhibitors applied to cells during the time required to assess migration patterns (a movie recorded overnight is necessary to obtain a statistically reliable calculation of MSD over 3 to 4 hours) are likely to affect many other cellular processes in addition to the cytoskeletal tension directly involved in migration. We believe that the accumulation of these effects will make interpretation of the results very difficult. For example, it has been shown that inhibition of ROCK by Y27 promotes healing of corneal endothelial lesions by affecting proliferation through cyclin D and p27 (DOI: 10.1167/iovs.13-12225), or by improving respiration, which would provide the energy necessary for migration (DOI: 10.1096/fj.202101442RR). Consistently, another study on HaCaT epidermal cells confirms that myosin phosphatase accelerates wound healing through proliferation (DOI: 10.1016/j.bbadis.2018.07.013). In contrast, in HUVEC cells, ROCK inhibition significantly impaired the proliferation and migration of vascular endothelial cells in vitro in a dose-dependent manner (DOI: 10.1097/ICO.0000000000000493).

      Furthermore, previous studies have highlighted that differential contractility at the subcellular level is important for collective migration (DOI: 10.1038/ncb2133, DOI: 10.1083/jcb.201706013), which is not possible to examine with global activation or inhibition of contractility. This prompts the development of more refined and specific measurement and disruption strategies to assess the respective impact of cytoskeletal tension on cell-cell and cell-matrix adhesion mechanisms. Our work, which uses biosensors to assess how this tension differentially affects cell-cell and cell-matrix adhesions, is a step in this direction. The localized spatio-temporal activation or inhibition of myosin subtypes or Rho GTPase regulators specific to these adhesion structures will likely answer these questions in the future, but we believe that the development and application of these approaches will require a substantial amount of work that goes beyond the scope of our study.

    1. eLife Assessment

      This important study fills a major geographic and temporal gap in understanding Paleocene mammal evolution in Asia and proposes an intriguing "brawn before bite" hypothesis grounded in diverse analytical approaches. The work rests on a solid methodological base. Some limitations remain, including uncertainty introduced by pooling different tooth positions, limited dietary interpretation, and the predominantly herbivorous taxonomic focus, which narrows the ecological scope of the conclusions. However, the manuscript provides a substantially strengthened and well-supported contribution, while appropriately inviting further work to clarify dietary trends, broader ecological context, and links between dental trait evolution and environmental change.

    2. Reviewer #2 (Public review):

      [Editors' note: this version has been assessed by the Senior Editor without further input from the original reviewers. The authors have addressed the minor comments raised in the previous round of review.]

      Summary:

      This study uses dental traits of a large sample of Chinese mammals to tract evolutionary patterns through the Paleocene. It presents and argues for a 'brawn before bite' hypothesis -- mammals increased in body size disparity before evolving more specialized or adapted dentitions. The study makes use of an impressive array of analyses, including dental topographic, finite element, and integration analyses, which help to provide a unique insight into mammalian evolutionary patterns.

      Strengths:

      This paper helps to fill in a major gap in our knowledge of Paleocene mammal patterns in Asia, which is especially important because of the diversification of placentals at that time. The total sample of teeth is impressive and required considerable effort for scanning and analyzing. And there is a wealth of results for DTA, FEA, and integration analyses. Further, some of the results are especially interesting, such as the novel 'brawn before bite' hypothesis and the possible link between shifts in dental traits and arid environments in the Late Paleocene. Overall, I enjoyed reading the paper and I think the results will be of interest to a broad audience.

      Weaknesses:

      For the original draft of the manuscript, I had four major concerns with the study, especially related to the sampling, diet, and evidence for the 'brawn before bite' hypothesis. I still believe that the original issues that I raised may be weaknesses of the study. For example, there is still limited discussion on diets (even though the dental topographic analyses used in the study are designed for inferring diets). And I find the results a little challenging to interpret because teeth of multiple positions are included in the same samples, which seems problematic. That said, the authors have addressed each of my previous concerns and have made major revisions, including running new analyses, and thus I support the paper.

    3. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #2 (Public review):

      Summary:

      This study uses dental traits of a large sample of Chinese mammals to tract evolutionary patterns through the Paleocene. It presents and argues for a 'brawn before bite' hypothesis -- mammals increased in body size disparity before evolving more specialized or adapted dentitions. The study makes use of an impressive array of analyses, including dental topographic, finite element, and integration analyses, which help to provide a unique insight into mammalian evolutionary patterns.

      Strengths:

      This paper helps to fill in a major gap in our knowledge of Paleocene mammal patterns in Asia, which is especially important because of the diversification of placentals at that time. The total sample of teeth is impressive and required considerable effort for scanning and analyzing. And there is a wealth of results for DTA, FEA, and integration analyses. Further, some of the results are especially interesting, such as the novel 'brawn before bite' hypothesis and the possible link between shifts in dental traits and arid environments in the Late Paleocene. Overall, I enjoyed reading the paper and I think the results will be of interest to a broad audience.

      Weaknesses:

      For the original draft of the manuscript, I had four major concerns with the study, especially related to the sampling, diet, and evidence for the 'brawn before bite' hypothesis. I still believe that the original issues that I raised may be weaknesses of the study. For example, there is still limited discussion on diets (even though the dental topographic analyses used in the study are designed for inferring diets). And I find the results a little challenging to interpret because teeth of multiple positions are included in the same samples, which seems problematic. That said, the authors have addressed each of my previous concerns and have made major revisions, including running new analyses, and thus I support the paper.

      This revised submission includes only minor changes aimed at clarifying the main text.

      Reviewer #2 (Recommendations for the authors):

      I appreciate that the authors made many improvements to their study based on reviewers' comments. I don't have any remaining major issues with the paper, but I do have several minor comments.

      Thank you for taking the time to provide additional helpful feedback on our study. We have made minor revisions to the manuscript based on your suggestions. Please see our point-by-point response below.

      Lines 48-50. I reiterate my suggestion in my previous review to explicitly state which clade is being discussed, which is important because several major mammal groups beyond placentals (metatherians, multituberculates, dryolestoids, gondwanatherians) survived the K-Pg and had very different diversification patterns. You mention "mammal taxonomic diversity" but in the next sentence say "This initial placental mammals diversification ..." and later mention "stem placental/eutherian lineages." To stay consistent, you might replace "mammal" (L48) and "placental mammals" (L50) with "eutherian(s)" (usually defined as stem + crown placentals). If you follow this suggestion, then elsewhere in the paper I recommend replacing "mammals" with "eutherians" for consistency.

      Thank you for this suggestion. We modified the use of “mammals” throughout the text to general reference to the group only; specific mentions of the dataset analyzed are revised to “eutherians.”

      Lines 75-83. I respect the authors' hesitancy to reconstruct specific diets for the fossil taxa (L75-83), especially considering that dental topographic analyses (DTAs) often struggle to differentiate diets in extant taxa (e.g., Pineda-Munoz et al. 2016 Methods Ecol Evol). I still think that the authors might be able to interpret dietary trends from their results (e.g., an increase in average OPCR values indicating a shift toward more herbivorous diets) - I think discussing dietary trends would be an interesting discussion topic later in the paper. That said, I also recognize that different DTA results seem to show conflicting dietary trends (based on my limited knowledge of those metrics) so maybe that complicates things too much.

      We concur with Reviewer 2 that dietary inferences of DTA data are premature, especially given the ongoing controversies of its use in studies of extant mammal teeth. We kept our current scope of discussion unchanged.

      Lines 75-77. "early mammals ... are beyond the reach of conventional phylogenetic bracketing approaches to dietary reconstruction." But your fossils (eutherians) are certainly within 'phylogenetic brackets' of modern clades (therians, i.e. Eutheria + Metatheria). Maybe you're alluding to the fossils being stem lineages of extant subgroups like Ungulata, which means we can't bracket them specifically within those eutherian subgroups? So, I recommend revising or expanding your statement for clarity. Also, the considerable phylogenetic uncertainty for Paleocene groups (e.g., Halliday et al. 2015) complicates this issue, which you could mention.

      We modified the sentence to now say “Additional complications with ecomorphological analysis of these stem eutherians include the uncertainty in their dietary ecology, having diverged prior to the crown radiation, and uncertainty in phylogenetic positions of Paleocene taxa [7]; thus, they are beyond the reach of conventional phylogenetic bracketing approaches to dietary reconstruction.”

      Line 84. "We investigated dental topography-performance shifts ...". You haven't introduced dental topography or even mentioned teeth yet, and "performance shifts" is vague. So, this phrase might confuse readers. Maybe you can just erase it and start the sentence with "We investigated the timing of ecomorphological ..."?

      We made the recommended revision.

      Lines 104-105 (and elsewhere). "Dental traits paralleled Paleocene global and regional environmental conditions" and "We found that dental topographic trait variability in Paleocene mammals in south China tracked global and regional climatic changes". These conclusions seem a little too assertive to me. Your sample is grouped into 3 rough time bins (of somewhat uncertain ages) and is from a relatively small geographic range - that seems like very limited information for inferring links between dental patterns and climatic changes, especially global patterns. I think it's worth HYPOTHESIZING that dental traits are linked to environmental/climatic changes (with results like those in Figure 2A & B as evidence to support that hypothesis), but I wouldn't make that claim with any confidence. So, I recommend that you temper your relevant conclusion statements. For example, for Line 105, you could replace "We found ..." with "We posit ..." (L105). I would make similar changes to similar statements throughout the paper (e.g., L243).

      Thank you for this suggestion to temper our phrasing. We edited throughout the text to make our interpretations less assertive.

      Figure 1 (and your response to reviewers). Why was the timescale changed to 65.5 Ma for the K-Pg boundary? The K-Pg is 66 Ma (not 65.5), which is the age you mention in the text (e.g. Pg 3 L39) and is well established in the literature - see recent papers from the Paul Renne lab for a more exact age.

      We revised the figure to have the K-Pg at 66 Ma.

    1. eLife Assessment

      This valuable paper describes the regulation of the association of meiotic chromosome axis proteins on chromosome ends with sub-telomeric elements in budding yeast. The genome-wide analyses of binding of chromosome components as well as chromatin regulators, complemented with the mapping of meiotic DNA double-strand breaks on chromosome ends, provided solid evidence to support the authors' conclusion. The results in the paper are of interest to researchers in meiotic recombination and the structure of genomes and chromosomes.

    2. Reviewer #1 (Public review):

      The revised manuscript includes several useful additions, and I appreciate the efforts to clarify parts of the analysis. The dataset remains valuable. However, several key issues raised previously are not yet fully resolved and continue to limit the clarity of the main conclusions.

      (1) I appreciate that the authors guide the reader to the relevant regions in the analysis of chromosome fusions (Fig. 2b). However, these subtelomeric regions are not clearly visualized, making it difficult to compare fused and unfused profiles, even though the conclusions rely largely on visual inspection of them. A more direct comparison between fused and unfused ends, together with quantitative summaries (e.g., binned Red1 enrichment and comparisons with internal regions), would make this experiment more convincing.

      (2) The SK1/S288c comparison (Fig. 2c) is an excellent approach, but is currently presented just as profiles, which again requires substantial effort from the reader to extract the relevant information. A systematic analysis across all informative chromosome ends-for example, comparing Red1 levels in syntenic regions using binned log2 fold-change-would more directly test the proposed in cis effect (L168) and clarify the contribution and range of Y'-associated effects. Other factors (e.g. distance from chromosome ends) could also be assessed within this framework.

      Related to this, it is unclear if Y' elements themselves exhibit lower Red1 binding than the genome average. Providing the mean Red1 signal per Y' element would clarify this point and may also aid interpretation of the relationship between coding density and Red1 enrichment.

      (3) The Dot1-Sir3 section is now simpler. However, I still find it difficult to follow the underlying rationale. In particular, it is unclear why a Dot1 function dependent on H3K79 methylation is introduced, given that the data in the previous section suggest H3K79 methylation is dispensable for subtelomeric Red1 depletion. A clearer statement of the authors' working model would be helpful.

    3. Reviewer #2 (Public review):

      Summary:

      In this manuscript, Raghavan and his colleagues sought to identify cis-acting elements and/or protein factors that limit meiotic crossover at chromosome ends. This limitation is important for avoiding chromosome rearrangements and preventing chromosome mis-segregation.

      By comparing protein axis recruitment in SK1 and S288C background, which differ in their number and distribution of Y' elements, the authors show that Y' element have a limited impact on axis protein enrichment. Genetic analyses coupled with ChIP experiments revealed that the differential binding of the Red1 protein in subtelomeric regions requires the methyltransferase Dot1. Interestingly, the lack of Red1 depletion in subtelomeric regions in this mutant does not impact DSB formation. Another surprising finding is that deleting DOT1 has no effect on Red1 loading in the absence of the silencing factor Sir3. Unlike Dot1, Sir3 directly impacts DSB formation, probably by limiting promoter access to Spo11. As now clearly stated in the abstract and the discussion, this explains only a small part of the low levels of DSBs forming in subtelomeric regions and the main mechanisms suppressing crossover close to the ends of chromosomes remain to be deciphered.

      Strengths:

      This work provides intriguing observations, such as the impact of Dot1 and Sir3 on Red1 loading and the uncoupling of Red1 loading and DSB induction in subtelomeric regions.

      The separation of axis protein deposition and DSB induction observed in the absence of Dot1 is interesting because it rules out the possibility that the binding pattern of these proteins is sufficient to explain the low level of DSB in subtelomeric regions.

      The demonstration that Sir3 suppresses the induction of DSBs by limiting the openness of promoters in subtelomeric regions is convincing.

      Weaknesses:

      The section examining the impact of Dot1 and Sir3 remains complex, which is partly inherent to the intricate relationship between Dot1 and Sir3. However, the authors conclude that Dot1 acts independently of its catalytic activity based on the phenotype of the H3K79R mutant phenotype. Although this is possible it is not fully demonstrated as the H3K79R mutant may exhibit its own phenotype independently of Dot1. Unless the authors test the impact of the catalytic dead mutant Dot1-G401R on axis protein enrichment at subtelomeres they cannot claim that Dot1 act independently of its catalytic activity.

      Sir3's impact on DSB induction is compelling, yet it only accounts for a small proportion of DSB depletion in subtelomeric regions. Thus, the main mechanisms suppressing crossover close to the ends of chromosomes remain to be deciphered.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Meiotic recombination at chromosome ends can be deleterious, and its initiation-the programmed formation of DSBs-has long been known to be suppressed. However, the underlying mechanisms of this suppression remained unclear. A bottleneck has been the repetitive sequences embedded within chromosome ends, which make them challenging to analyze using genomic approaches. The authors addressed this issue by developing a new computational pipeline that reliably maps ChIP-seq reads and other genomic data, enabling exploration of previously inaccessible yet biologically important regions of the genome.

      In budding yeast, chromosome ends (~20 kb) show depletion of axis proteins (Red1 and Hop1) important for recruiting DSB-forming proteins. Using their newly developed pipeline, the authors reanalyzed previously published datasets and data generated in this study, revealing heretofore unseen details at chromosome ends. While axis proteins are depleted at chromosome ends, the meiotic cohesin component Rec8 is not. Y' elements play a crucial role in this suppression. The suppression does not depend on the physical chromosome ends but on cis-acting elements. Dot1 suppresses Red1 recruitment at chromosome ends but promotes it in interior regions. Sir complex renders subtelomeric chromatin inaccessible to the DSB-forming machinery.

      The high-quality data and extensive analyses provide important insights into the mechanisms that suppress meiotic DSB formation at chromosome ends. To fully realise this value, several aspects of data presentation and interpretation should be clarified to ensure that the conclusions are stated with appropriate precision and that remaining future issues are clearly articulated.

      (1) To assess the chromosome fusion effects on overall subtelomeric suppression, authors should guide how to look at the data presented in Figure 2b-c. Based on the authors' definition of the terminal 20 kb as the suppressed region, SK1 chrIV-R and S288c chrI-L would be affected by the chromosome fusion, if any. In addition, I find it somewhat challenging to draw clear conclusions from inspecting profiles to compare subtelomeric and internal regions. Perhaps, applying a quantitative approach - such as a bootstrap-based analysis similar to those presented earlier-would be easier to interpret.

      The reviewer is correct that we could not simply fuse two ends but had to create translocations that also removed variable amounts of subtelomeric sequence. Targeted translocations require unique sequences, and thus the extent to which telomeric sequences were deleted varied based on the availability of such sequences. As noted by the reviewer this necessarily limits the conclusions that can be drawn. We have expanded the description of this experiment and also explicitly state the limitations of this assay. To improve clarity, we have also included a schematic to better highlight which chromosomal sequences were removed.

      To further probe our finding that subtelomeric axis protein enrichment may largely be encoded in cis, we now compared axis protein enrichment between S288c and SK1, as suggested by reviewer 2. For this analysis, we took advantage of a dataset we had produced previously that measures Red1 enrichment in SK1/S288c hybrid strains, which provide a powerful internally controlled setup that eliminates effects caused by differential timing and synchrony between samples. As now shown in Supplementary Fig. 5, SK1 and S288c differ substantially in their subtelomeric architecture at many ends, including extensive differences in the number and distribution of Y’ elements. Importantly, axis protein distribution was very consistent between SK1 and S288c when correcting for the differences in length of individual chromosome ends, supporting the conclusion that axis protein enrichment levels are primarily encoded in cis. This analysis is now shown in Fig. 2c. These data also indicate that the presence of a Y’ element does not affect axis protein levels beyond displacing the axis-recruiting sequences further into the chromosome interior.

      (2) The relationship between coding density and Red1 signal needs clarification. An important conclusion from Figure 3 is that the subtelomeric depletion of Red1 primarily reflects suppression of the Rec8-dependent recruitment pathway, whereas Rec8-independent recruitment appears similar between ends and internal regions. Based on the authors' previous papers (referencess 13, 16), I thought coding (or nucleosome) density primarily influences the Rec8-independent pathway. However, the correlations presented in Figure 2d-e (also implied in Figure 3a) appear opposite to my expectation. Specifically, differences in axis protein binding between chromosome ends and internal regions (or within chromosome ends), where the Rec8-dependent pathway dominates, correlate with coding density. In contrast, no such correlation is evident in rec8Δ cells, where only the Rec8-independent pathway is active and end-specific depletion is absent. One possibility is that masking coding regions within Y' elements influences the correlation analysis. Additional analysis and a clearer explanation would be highly appreciated.

      Thank you for pointing this out. We now also included Y’ elements in the analysis in Fig 2d. Including the Y’ elements yielded an increase in average coding density near the very ends of the chromosomes. This increase matches the higher level of axis protein binding seen in rec8 mutants in Fig. 3a and is consistent with the previously noted link between coding density and axis protein deposition. We now provide further description in the text and the figure legends.

      We do not have an explanation for why there is no correlation with coding density in the EARs but assume that this reflects the unique regulation of this region (as also implied by Supplementary Fig. 4d). At present, the signals that establish the EARs remain unknown although our data indicate that the Hop1-CBR as well as Dot1 are important for axis protein enrichment in the EARs.

      (3) The Dot1-Sir3 section staring from L266 should be clarified. I found this section particularly difficult to follow. It begins by stating that dot1∆ leads to Sir complex spreading, but then moves directly to an analysis of Red1 ChIP in sir3∆ without clearly articulating the underlying hypothesis. I wonder if this analysis is intended to explain the differences observed between dot1∆ and H3K79R mutants in the previous section. I also did not get the concluding statement - Dot1 counteracts Sir3 activity. As sir3Δ alone does not affect subtelomeric suppression, it is unclear what Dot1 counteracts. Perhaps, explicitly stating the authors' working model at the outset of this section would greatly clarify the rationale, results, and conclusions.

      Thank you for this comment. We reworked the introduction to this paragraph to be more focused on Sir3 rather than Dot1. We hope that this introduction is less confusing and more in line with the data presented in this paragraph. We also expanded the conclusion to suggest the alternative possibility that the Sir complex only becomes a regulator of axis proteins in the absence of Dot1.

      Reviewer #2 (Public review):

      Summary:

      In this manuscript, Raghavan and his colleagues sought to identify cis-acting elements and/or protein factors that limit meiotic crossover at chromosome ends. This is important for avoiding chromosome rearrangements and preventing chromosome missegregation.

      By reanalyzing published ChIP datasets, the researchers identified a correlation between low levels of protein axis binding - which are known to modulate homologous recombination - and the presence of cis-acting elements such as the subtelomeric element Y' and low gene density. Genetic analyses coupled with ChIP experiments revealed that the differential binding of the Red1 protein in subtelomeric regions requires the methyltransferase Dot1. Interestingly, Red1 depletion in subtelomeric regions does not impact DSB formation. Another surprising finding is that deleting DOT1 has no effect on Red1 loading in the absence of the silencing factor Sir3. Unlike Dot1, Sir3 directly impacts DSB formation, probably by limiting promoter access to Spo11. However, this explains only a small part of the low levels of DSBs forming in subtelomeric regions.

      Strengths:

      (1) This work provides intriguing observations, such as the impact of Dot1 and Sir3 on Red1 loading and the uncoupling of Red1 loading and DSB induction in subtelomeric regions.

      (2) The separation of axis protein deposition and DSB induction observed in the absence of Dot1 is interesting because it rules out the possibility that the binding pattern of these proteins is sufficient to explain the low level of DSB in subtelomeric regions.

      (3) The demonstration that Sir3 suppresses the induction of DSBs by limiting the openness of promoters in subtelomeric regions is convincing.

      Weaknesses:

      (1) The impact of the cis-encoded signal is not demonstrated. Y' containing subtelomeres behave differently from X-only, but this is only correlative. No compelling manipulation has been performed to test the impact of these elements on protein axis recruitment or DSB formation.

      Thank you for this comment. Our data indeed appeared contradictory because XY’ ends showed overall lower axis protein enrichment, yet our analysis of chromosome fusions, which also eliminated Y’ elements at some the fused ends, provided no evidence for an effect of Y’ elements at those ends. As also noted in the response to reviewer 1, we now compared axis protein enrichment between S288c and SK1, which differ substantially in their number and distribution of Y’ elements (Supplementary Fig. 5). We found that axis protein distribution and enrichment was very consistent between SK1 and S288c when correcting for the displacement caused by the presence of Y' elements and other subtelomeric sequences (now shown in Fig. 2d). These data support the conclusion that axis protein enrichment levels are primarily encoded in cis and indicate that the presence of Y’ elements does not affect axis protein levels beyond displacing the axis-recruiting sequences further into the chromosome interior (giving rise to the apparently lower axis protein enrichment on XY’ ends).

      (2) The mechanism by which Dot1 and Sir3 impact Red1 loading is missing.

      Although we do not yet understand the precise molecular details of these effects, we nevertheless believe we have obtained several important insights into this mechanism. First, our data indicate that the suppressive effect of the ends primarily impacts the Rec8-dependent loading of Red1, whereas loading via the Hop1-CBR is largely unaffected. The effect of Dot1 thus likely occurs via the Rec8-Red1 interaction. Second, the increase in Red1 recruitment is fully rescued by deletion of Sir3, suggesting that Sir3 becomes a promoter of axis protein recruitment in the absence of Dot1. These dependencies are now outlined in the model in Fig. 9. We would also like to note that the Sir complex was previously shown to impact cohesin in mitotic cells. Thus, a connection between the Sir complex and cohesin is not without precedent.

      (3) Sir3's impact on DSB induction is compelling, yet it only accounts for a small proportion of DSB depletion in subtelomeric regions. Thus, the main mechanisms suppressing crossover close to the ends of chromosomes remain to be deciphered.

      Thank you, we absolutely agree. We had discussed this point in the discussion but now also explicitly state this point in the abstract and expanded the discussion of these findings in the results and discussion.

      Reviewer #3 (Public review):

      Summary:

      The paper by Raghavan et. al. describes pathways that suppress the formation of meiotic DNA double-strand breaks (DSBs) for interhomolog recombination at the end of chromosomes. Previously, the authors' group showed that meiotic DSB formation is suppressed in a ~20kb region of the telomeres in S. cerevisiae by suppressing the binding of meiosis-specific axis proteins such as Red1 and Hop1. In this study, by precise genome-wide analysis of binding sites of axis proteins, the authors showed that the binding of Red1 and Hop1 to sub-telomeric regions with X and Y' elements is dependent on Rec8 (cohesin) and/or Hop1's chromatin-binding region (CBR). Furthermore, Dot1 functions in a histone H3K79 trimethylation-independent manner, and the silencing proteins Sir2/3 also regulate the binding of Red1 and Hop1 and also the distribution of DSBs in sub-telomeres.

      Strengths:

      The experiments were conducted with high quality and included nice bioinformatic analyses, and the results were mostly convincing. The text is easy to read.

      Weaknesses:

      The paper did not provide any new mechanistic insights into how DSB formation is suppressed at sub-telomeres.

      We respectfully disagree with this assessment. We show that the Sir complex suppresses DSB formation at a number of cryptic hotspots in the X elements and the adjacent subtelomeric sequences by causing chromatin compaction. The role of the Sir complex in transcriptional silencing, chromatin accessibility, and DSB formation had not previously been analyzed in the meiotic subtelomeres. That being said, Sir-dependent suppression is clearly not the only mechanism that suppresses DSBs in the subtelomeres, as we only observed DSB formation at a small number of hotspots. This was in and of itself a surprise, in particular given the large scale effect on chromatin compaction. We made an effort to more strongly emphasize the fact that additional layers of regulation must exist in the abstract and in the discussion.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) Evidence for cis-acting suppression by Y' elements requires further support. The authors propose that Y' elements act in cis to suppress axis protein association at chromosome ends. While this is an attractive model, the current analyses do not yet provide sufficient support for it.

      Thank you for this comment. Our data indeed appeared contradictory, because XY’ ends showed overall lower axis protein enrichment, yet our analysis of chromosome fusions, which also eliminated Y’ elements at some the fused ends provided no evidence for an effect of Y’ elements at those ends. As noted above, we now compared axis protein enrichment between S288c and SK1, which differ substantially in their number and distribution of Y’ elements (Supplementary Fig. 5). We found that axis protein distribution and enrichment was very consistent between SK1 and S288c when correcting for the displacement caused by the presence of Y' elements and other subtelomeric sequences. These data support the conclusion that axis protein enrichment levels are primarily encoded in cis and indicate that the presence of Y’ elements does not affect axis protein levels beyond displacing the axis-recruiting sequences further into the chromosome interior (giving rise to the apparently lower axis protein enrichment on XY’ ends).

      (1a) In Figure S4c, the authors masked Y' elements to rule out the possibility that reduced binding within Y' elements themselves accounts for the overall underrepresentation in subtelomeric regions. However, since the authors propose that Y' elements suppress axis protein binding in surrounding regions in cis, it is appropriate to perform this analysis specifically on chromosome ends containing XY'.

      Thank you for this suggestion. We agree that this would specifically affect the XY’ ends. However, given that we did not see a change even with all the ends, we do not expect a change with just the XY’ ends.

      (1b) In Figure 2b-c, the authors conclude that removal of Y' elements by chromosome fusion does not reveal a long-range suppressive effect. However, the spatial extent of Y'-mediated suppression is not defined, making it unclear whether this experiment can test the proposed model. Perhaps plotting the averaged axis protein profile as a function of distance from Y' elements could help define the effective range of suppression and clarify whether the fusion experiment is informative in this context.

      Thank you. As noted above we now compared SK1 and S288c ends, which provided further evidence that Y’ elements do not affect axis protein enrichment beyond displacing binding sites further into the chromosome interior. In addition, we substantially expanded the description of the chromosome fusion experiment to more clearly outline the setup and the limitations of this experiment.

      (2) L402: "one of the first pieces of direct evidence that nucleosomes block meiotic DSB formation in vivo" sounds overstated, considering past publications (e.g., ref 45 and S. pombe ade6-M26 papers).

      We toned this down and added the references.

      (3) Figure 2e and other scatter plots: Correlation coefficients are reported without p-values. If the authors prefer to use confidence intervals from linear regression instead, they should justify this approach.

      We added p-values to all scatter plots.

      (4) Figure 7f. Explain blue dots.

      We apologize for this oversight (also applies to Supplementary Fig. 10). The blue dots are measurements within 5 kb of an X element. The red dots are the rest of the genome. We now included a legend in the panel to clarify this notation.

      (5) Figure 8d. To assess whether the conclusion can be generalized, the authors could plot the MNase and TrAEL-seq signal fold changes (sir3Δ/SIR3) for hotspots within 5 kb of X elements.

      We attempted various analyses in this direction. However, the range of the MNase-seq effect in sir3 mutants is much greater than the effect on DSBs, making it difficult to make any correlative statements. There are clearly additional layers of DSB suppression in the telomeric regions, and loss/gain of nucleosomes is not sufficient to switch hotspots on/off at most hotspots. We now included a statement to this end in the abstract and also further discuss this notion in the discussion.

      (6) Figure S1c. The apparent difference in X-element distribution may be influenced by bin size. This could be tested by repeating the analysis using smaller bins, comparable in size to the X elements, for all regions.

      We thank the reviewer for this thoughtful suggestion. To address this concern, we repeated the analysis using smaller bins comparable in size to X elements (450 bp) across all region types. Specifically, X elements were analyzed per annotated element, while Y′ elements, subtelomeric 20 kb regions, and internal regions were subdivided into fixed 450 bp windows, and mean input coverage was calculated for each window using the same width-weighted approach.

      This reanalysis did not materially alter the overall distribution patterns observed in Figure S1c. We observed only minor shifts in absolute values, which are expected when changing bin granularity.

      Any residual differences likely reflect underlying copy number of X elements at chromosome ends. Importantly, all ChIP signals in the manuscript are normalized to their corresponding input (ChIP/Input), which mitigates potential biases arising from local copy number variation.

      (7) Figure S2. X elements are difficult to find (e.g., chrVII-L).

      We now included arrowheads at locations with full-length X elements. Partial X elements are marked with stars.

      (8) Figure S7. Please indicate the endpoints of spreading.

      As apparent in this figure and also indicated in the quantification in Supplementary Fig. 9a, spreading of the Sir complex is in most cases quite limited. The example in Supplementary Fig. 9b is one of the largest spreads we observed. The scale of the spreading is hard to meaningfully visualize in Supplementary Fig. 8 given the relatively large genomic distances shown in these profiles. We therefore refer the reader to the analyses shown in Supplementary Fig. 9a, which shows chromosome-resolved extent of spreading.

      Reviewer #2 (Recommendations for the authors):

      To go beyond the correlation between the presence of Y' elements and low levels of protein axis binding, subtelomeres could be easily truncated. Analyzing strains with different distributions of Y' elements would also be informative. The correlative analysis could also be expanded to compare how far the influence of Y' elements goes and whether the number of Y' impacts the extent of protein axis depletion.

      We respectfully disagree with the assertion that subtelomeres could easily be truncated. The high repetitiveness of these sequences makes targeted manipulations of the extreme ends where the Y’ elements are located essentially impossible and is the main reasons for the limitations associated with the analysis of the chromosome fusions as outlined in the response to reviewer 1.

      However, we would like to thank the reviewer for their suggestion to analyze different strain backgrounds. We now compared axis protein enrichment between S288c and SK1. For this analysis, we took advantage of a dataset we had produced previously that measures Red1 enrichment in SK1/S288c hybrid strains, which provide a powerful internally controlled setup that eliminates effects caused by differential timing and synchrony between samples. As now shown in Supplementary Fig. 5, SK1 and S288c differ substantially in their subtelomeric architecture at many ends, including extensive differences in the number and distribution of Y’ elements. Importantly, axis protein distribution was very consistent between SK1 and S288c when correcting for the differences in length of individual chromosome ends, supporting the conclusion that axis protein enrichment levels are primarily encoded in cis. This analysis is now shown in Fig. 2c. These data also indicate that the presence of Y’ elements does not affect axis protein levels beyond displacing the axis-recruiting sequences further into the chromosome interior.

      Given the separation between protein axis loading and DSB induction, it would be interesting to test whether the presence of Y' elements influences the frequency and position of DSB induction.

      We agree that this experiment would be very interesting. However, given the experimental challenges associated with targeted manipulation of Y’ elements as outlined above, we believe that this experiment lies outside the scope of this study. Our observations that Y’ elements do not grossly influence axis protein enrichment in their vicinity may also make an effect on DSB formation less likely.

      The effect of Dot1 on Red1 loading is intriguing because it is at least partially independent of its main known target H3K79, yet fully dependent on Sir3. However, this effect extends far beyond Sir3 binding as detected by ChIP. This is surprising because Dot1 has a limited effect on Sir3 binding as detected by ChIP, and SIR3 deletion has no impact on Red1 binding. However, Dot1 was shown to limit Sir3 spreading to 20 kb on average when overexpressed (Katan-Khaykovich and Struhl 2005; Hocher et al, 2018). It would be interesting to test whether the regions affected by DOT1 deletion coincide with the zone covered by Sir3 upon overexpression (Extended Silent Domains: ESDs, Hocher et al., 2018).

      We agree that this would be an interesting analysis. Unfortunately, the available data on the extended silent domains were not obtained in SK1 and, as noted above, the chromosome end structure differs substantially between the strains, preventing direct comparisons without repeating all the relevant analyses in S288c. In addition, the available data was collected in vegetative cells, although this may be less of an issue given that our analyses show similar spreading in vegetative and meiotic cells. However, short of repeating SIR3 overexpression in meiosis (which also would require a different overexpression regimen as galactose interferes with meiosis), we are not in a position to do this analysis.

      As mentioned in the manuscript, the interplay between the Sir complex and Dot1 has been shown to affect checkpoint regulation during meiotic recombination. However, a discussion on how this relates to the observations reported here is missing.

      Thank you. We included a discussion of this role and its relation to our observations.

      Also, it is unclear why the authors did not investigate the impact of Dot1 and Sir3 impact on the binding of Hop1 rather than Red1, given that Hop1 is currently « the most upstream regulator of recombination known to be depleted about 20 kb from chromosome ends. »

      We changed this statement in the introduction to avoid confusion and also included a model figure that specifically highlights the Rec8-dependent recruitment as a regulatory target.

      Our data show that most of the telomere-proximal effects seem to act through the Rec8-dependent recruitment pathway for which Red1 is the most upstream regulator known. So, although the most upstream factor known before this study was Hop1, our data now identify the interaction between Red1 and Rec8 as the most upstream regulatory node.

      Sir3's impact on DSB induction is compelling, yet it only accounts for a small proportion of DSB depletion in subtelomeric regions. Thus, the main mechanisms suppressing crossover close to the ends of chromosomes remain to be deciphered. This should be acknowledged and discussed.

      In addition to the explicit statement of this conclusion in the results, we now added another statement in the abstract and also expanded the discussion of the fact that there are clearly additional levels of regulation that remain to be discovered.

      Reviewer #3 (Recommendations for the authors):

      Major points:

      It would be nice to show a schematic summary of the authors' main conclusion.

      Thank you, we now included a model schematic as Fig. 9.

      Minor points:

      (1) Supplemental Figure 2: A small box for the X element is marked with the same color as the Y' element, and so it is very hard to find the X element. Please use the clearer color, and it would be nice to show the chromosome ends without the X element (lines 129-130).

      We now included arrowheads at locations with full-length X elements. Partial X elements are marked with stars. This notation also makes it obvious which ends lack annotated X elements.

      (2) Line 156-163, Figure 2b: In the main text, "chromosome fusion between chromosome IV right arm and chromosome I left arm" should be mentioned. Moreover, it isn't very clear to have the data in the S288C background. The fusion points are different between S288C and SK1 (the structures of these ends are quite different). Please explain the authors' logic in the text. 

      To improve clarity, we have included a schematic to better highlight which chromosomal sequences were removed. We have also substantially expanded the description of this experiment and explicitly state the limitations of this assay.

      (3) Supplemental Figure 6: Since the sir3 mutation affects the binding of Red1 EARs (and centromeres). It would be nice to show the similar sets for the HML, MAT, and HMR loci (and intergenic regions as a control).

      We are unfortunately statistically underpowered to perform a meta-analysis of just HML, HMR and MAT. However, we now indicated the positions of HML and HMR in Supplementary Fig. 2 and 8, so the binding of the axis proteins and Sir3 can be inspected directly. MAT is not within 50 kb of a chromosome end and thus was not captured in these analyses.

      (4) Line 322-, the section: From here, the authors switched their story from the sir3 to the sir2. It would be nice to provide the logic with a small introduction on the relationship between Sir2 and Sir3.

      We apologize for this confusion. We are not switching our story to Sir2 but rather are taking advantage of an available dataset that analyzed DSBs in sir2 mutants. We then return to Sir3 to also analyze DSBs in the sir3 mutant and analyze its interaction with a dot1 mutation. To better support the logic, we now briefly reiterate that Sir2 and Sir3 are part of the same complex at the beginning of this section.

      (5) Line 330-331, Figure 8a (and also Supplemental Fig. 8c): Would you explain a bit more about matched strain in the text or figure legend? Each dot represents a strain. If so, please show the strains used here.

      Each dot refers to an individual X or Y’ element that is shown matched in WT and mutant to highlight the trends at the level of individual elements. This is noted in the figure legend.

      (6) Supplemental Figure 7 (and 2): It would be nice to show the position of the HML, MAT, and HMR loci as well as the centromeres in the Figure.

      We now indicated the positions of HML and HMR in Supplementary Fig. 2 and 8. MAT and the centromeres are not located within 50 kb from chromosome ends.

    1. eLife Assessment

      This valuable study demonstrates that self-motion strongly affects neural responses to visual stimuli, comparing humans moving through a virtual environment to passive viewing. The evidence for visuomotor mismatch responses is solid, although the interpretation in terms of prediction remains somewhat preliminary. This study bridges human and rodent studies on the role of prediction in sensory processing, and is therefore expected to be of interest to a large community of neuroscientists.

    2. Reviewer #1 (Public review):

      In this paper, Solyga, Zelechowski & Keller study human visuomotor mismatch responses as an alternative instantiation of prediction errors to classic oddball paradigms. Using VR, they created a condition in which participants were moving around thereby creating a visuomotor coupling between physical movement and visual flow. To attempt to isolate the contribution of specifically movement-related predictions in this condition, they contrasted it to a condition in which participants were seated and rewatching their movement trajectory during the 'active' condition. Visuomotor mismatches were created by temporarily decoupling movement and visual experience by halting the VR display as participants continued to move.

      The core finding of the paper is that participants exhibit a positively-valenced response to the visuomotor decoupling in the active but not in the passive condition. Since walking speed only insignificantly slows down following decoupling events in the active conditions, the authors argue that this difference can not be accounted for by "changes in participants' behavior or to simple visual offset responses" with the latter being equal across both conditions. The following reinstatement of the coupling in turn does not differ between the two conditions. The authors additionally show that this mismatch response differs from visual onset responses elicited by checkerboard inversions and that it's "qualitatively" stronger than more commonly studied auditory oddball mismatch responses.

      The design with its focus on ecological validity is impressive, well-rationalized and the results are well illustrated. I additionally appreciate the control analyses with regards to changes in walking speed and playback DOF and, now added, additional participants who experience the passive condition before the active. I have a couple of questions/comments.

      My main question in round 1 regarded the isolation of visuomotor mismatch. Although the comparison with a seated control seems like a very sensible way to control for simple visual responses, there seem to be more differences than just a break in visuomotor coupling between the conditions. I therefore wonder whether the reduced offset response in the seated condition may be, in part, explained differently. For example, given that participants always conduct the active condition before rewatching their movement in the seated condition, it seemed likely that there is a component of learning across the session that flow will sometimes be halted. This is confirmed with the analyses. The explanation that there is a visuomotor component here is given further weight by their conduction of an additional group of participants who perform the conditions in the reverse order, so this has strengthened the manuscript considerably. However, it does of course remain an imperfect control because the visual stimulus is now different between the conditions for these participants. It's the best that can be achieved with this type of paradigm though and of course it yields a great deal of ecological validity.

      I was also wondering whether the authors may consider the findings in frontal electrodes more closely given that the title of the paper focuses on a specifically occipital effect. Their further analyses have confirmed that there are likely interesting frontal effects. From a theoretical point of view, the spatial dissociation in adaptation effects, which were stronger in frontal and weaker in occipital areas, seems interesting and perhaps worth discussing, especially given the interpretation that "mismatch processing may initially arise in sensory visual areas before engaging higher-order frontal regions." How come the frontal decrease in responses is not accompanied by an analogous decrease in its supposed occipital source? Could these two responses reflect different kinds of prediction error signals (i.e. objective vs subjective)?

      I remain concerned that the authors fight too defensively that they have absolutely isolated visuomotor prediction mechanisms with this paradigm. It's a nice, informative study, but it seems odd to argue there are no other possible explanations. One picks a design to optimize some features but they will always come at some cost to others. Prioritising ecological validity, which is a justifiable aim, necessarily usually weakens some control over confounds.

      To outline my reasoning fully: My concerns wrt generic influences of action on perception are reflected in Fig 1. The P1 is smaller when walking than sitting. It seems likely that the mismatch response reflects something about extrapolation or prediction, because it is larger when walking. However, it's not necessarily sensorimotor prediction. Even if you remove action from the equation, the flow can be extrapolated or predicted most of the time in a way it cannot so well when the video is halted. Of course the sitting condition somewhat controls for it, but when it came second the visual flow disruptions were more predictable here. A reduction in effects over time is indeed confirmed with their analyses. They now have conducted a study with the conditions in the reverse order and they find the same thing. But of course this necessitates non-identical visual flow because the sitting condition is playing the previous participant's flow. So it is likely that across all of these comparisons, it is the visuomotor mismatch that is especially salient. It's just that each comparison is a bit messy/confounded. It would strengthen the manuscript if there were some consideration given to the other processes likely at play here.

      As a more minor point in response to our previous review, whether particular accounts represent an 'orthodox' view at present does not determine whether they raise logical issues in need of consideration. The authors may have missed that the papers in question consider mechanisms underlying the attenuation of particular pieces of information *from perception*. Not perceptual processing. We have one percept at any one moment in time and must understand how different population types synergistically generate that percept.

      Similarly a little strange is the way in which the authors aggressively defend the position that self-generated motion is 'the strongest' type of prediction. Sure, we probably experience the effects of our actions more often than ambulances. But what about objects obeying laws of gravity or others' faces being structured and moving in systematic ways? It is hard to quantify, such that presumably many scientists would be skeptical of such a claim, and it is not needed logically to justify the importance of examining mechanisms enabling action to shape perceptual processing. I'd assume it better to fight the battles you need to (and can) fight, such that the robust claims carry more weight.

      Hope these comments are helpful.

    3. Reviewer #2 (Public review):

      Summary:

      This study investigates whether visuomotor mismatch responses can be detected in humans. By adapting paradigms from rodent studies, the authors report EEG evidence of mismatch responses during visuomotor conditions and compare them to visual-only stimulation and mismatch responses in other modalities.

      Strengths:

      - Authors use a creative experimental design to elicit visuomotor mismatch responses in humans.

      - The study provides an initial dataset and analytical framework that could support future research on human visuomotor prediction errors.

      Weaknesses:

      - Methodological issues (e.g., volume conduction) make it difficult to confidently attribute the observed mismatch responses to activity in visual cortical regions. This could be alleviated by increasing the number of channels.

      The authors successfully demonstrate that visuomotor mismatch paradigms can, in principle, be applied in human EEG. This approach provides a translational bridge between rodent and human work on predictive processing.