10,000 Matching Annotations
  1. Jan 2026
    1. eLife Assessment

      This important study reports convincing evidence of associations between 35 polygenic indices (PGIs) for social, behavioural, and psychological traits, as well as other health conditions (e.g., BMI) and all-cause mortality, based on data from Finnish population-based surveys and a twin cohort linked to administrative registers. PGIs for education, depression, alcohol use, smoking, BMI, and self-rated health showed the strongest associations with all-cause mortality, in the order of ~10% increment in risk per PGI standard deviation. Effect sizes from twin-difference analyses tended to be slightly larger than those from population cohorts, a pattern opposite that generally observed when testing PGI associations with their target phenotypes, and supporting the robustness of findings to confounding by population stratification.

    2. Reviewer #1 (Public review):

      Lahtinen et al. evaluated the association between polygenic scores and mortality. This question has been intensely studied (Sakaue 2020 Nature Medicine, Jukarainen 2022 Nature Medicine, Argentieri 2025 Nature Medicine), where most studies use PRS as an instrument to attribute death to different causes. The presented study focuses on polygenic scores of non-fatal outcomes and separates the cause of death into "external" and "internal". The majority of the results are descriptive, and the data doesn't have the power to distinguish effect sizes of the interesting comparisons: (1) differences between external vs. internal (2) differences between PGI effect and measured phenotype.

      Comments on revised version:

      The authors answered my concerns well. I don't have any further comments.

    3. Reviewer #2 (Public review):

      Summary:

      This study provides a comprehensive evaluation of the association between polygenic indices (PGIs) for 35 lifestyle and behavioral traits and all-cause mortality, using data from Finnish population- and family-based cohorts. The analysis was stratified by sex, cause of death (natural vs. external), age at death, and participants' educational attainment. Additional analyses focused on the six most predictive PGIs, examining their independent associations after mutual adjustment and adjustment for corresponding directly measured baseline risk factors.

      Strengths:

      Large sample size with long-term follow-up.

      Use of both population- and family-based analytical approaches to evaluate associations.

      Comments on revised version:

      I am happy with the revision. No further comments.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Lahtinen et al. evaluated the association between polygenic scores and mortality. This question has been intensely studied (Sakaue 2020 Nature Medicine, Jukarainen 2022 Nature Medicine, Argentieri 2025 Nature Medicine), where most studies use PRS as an instrument to attribute death to different causes. The presented study focuses on polygenic scores of non-fatal outcomes and separates the cause of death into "external" and "internal". The majority of the results are descriptive, and the data doesn't have the power to distinguish effect sizes of the interesting comparisons: (1) differences between external vs. internal (2) differences between PGI effect and measured phenotype. I have two main comments:

      (1) The authors should clarify whether the p-value reported in the text will remain significant after multiple testing adjustment. Some of the large effects might be significant; for example, Figure 2C

      We have now added Benjamini-Hochberg multiple-testing adjusted p-values in the text each time we present nominal p-values. Additionally, supplementary tables S5 and S6 provide multiple-adjusted p-values for all analysed PGIs.

      Although this was not always the case, many comparisons remained significant after multiple testing adjustments, especially in Figure 2C that the reviewer commented on. In the revised version, we have placed more emphasis on describing these HRs that have low p-values after multiple-test adjustment. The revised text for Figure 2C in the Results section now reads:

      Panel C analyses mortality in three age-specific follow-up periods. The PGIs were more predictive of death in younger age groups, although the difference between the 25–64 and 65–79 age groups was small, except for the PGI of ADHD (HR=1.14, 95% CI 1.08; 1.21 for 25–64-year-olds; HR=1.04, 95% CI 1.00; 1.08 for 65–79-year-olds; p=0.008 for difference, p=0.27 after multiple-testing adjustment). PGIs predicted death only negligibly among those aged 80+, and the largest differences between the age groups 25–64 and 80+ were for PGIs of self-rated health (HR 0.87, 95% CI 0.82; 0.93 for 25–64-year-olds, HR 1.00, 95% CI 0.94; 1.04 for 80+ year-olds, p=2*10<sup>-4</sup> for difference, p=0.006 after multiple-testing adjustment), ADHD (HR 1.14, 95% CI 1.08; 1.21 for 25–64-year-olds, HR 0.99, 95% CI 0.95; 1.03 for 80+ year-olds, p=7*10<sup>-4</sup> for difference, p=0.012 after multiple-testing adjustment) and depressive symptoms (HR 1.12, 95% CI 1.06; 1.18 for 25–64-year-olds, HR 1.00, 95% CI 0.96; 1.04 for 80+ year-olds, p=0.002 for difference, p=0.032 after multiple-testing adjustment). Additionally, the difference in HRs between these age groups achieved significance after multiple testing adjustment at the conventional 5% level for PGIs of cigarettes per day, educational attainment, and ever smoking.

      We have also included the recent study by Argentieri et al. (2025) in the literature review, which was missing from our previous version. We appreciate the reference. Other references mentioned were already included in the previous version of the manuscript.

      (note that the small prediction accuracy of PGI in older age groups has been extensively studied, see Jiang, Holmes, and McVean, 2021, PLoS Genetics).

      We would like to thank the reviewer for suggesting the relevant reference by Jiang et al. We have now expanded on the discussion of age-specific differences in the discussion section and included this reference.

      (2) The authors might check if PGI+Phenotype has improved performance over Phenotype only. This is similar to Model 2 in Table 1, but slightly different.

      The reviewer raises an interesting angle to approach the analysis. We have now added an analysis assessing the information criteria and the significance of improvement between nested models in Supplementary table S8. All the tested PGI+phenotype models show improvement over the phenotype-only model that is statistically significant at all conventional levels when tested by likelihood-ratio tests between nested models . Additionally,  improvement was found when using Akaike and Bayesian (Schwarz) information criteria (albeit sometimes modest in size). We have added a passage in the results section briefly summarising this analysis:

      Supplementary table S8 presents information criteria and significance tests on corresponding models. Models with PGI+phenotype (Models 2a–f) showed improvement over models with the phenotype only (Models 1a, 1c, 1e, 1g, 1i, 1k, with a p=0.0006 or lower) in terms of both Akaike information criterion (AIC) as well as Bayesian (Schwarz) information criterion (BIC) with a p=0.0006 or lower in all comparisons. The full Model 4 again showed improvement over the model with all PGIs jointly (Model 3b, with a p=0.0002 or p=0.00002, depending on continuous/categorical phenotype measurement), which had a lower AIC but not BIC.

      Reviewer #2 (Public review): 

      Summary:

      This study provides a comprehensive evaluation of the association between polygenic indices (PGIs) for 35 lifestyle and behavioral traits and all-cause mortality, using data from Finnish population- and family-based cohorts. The analysis was stratified by sex, cause of death (natural vs. external), age at death, and participants' educational attainment. Additional analyses focused on the six most predictive PGIs, examining their independent associations after mutual adjustment and adjustment for corresponding directly measured baseline risk factors.

      Strengths:

      Large sample size with long-term follow-up.

      Use of both population- and family-based analytical approaches to evaluate associations.

      Weaknesses:

      It is unclear whether the PGIs used for each trait represent the most current or optimal versions based on the latest GWAS data.

      To our reading, this comment is closely related to the “recommendations for the author” number 3 by reviewer 2, and we thus address them together. 

      If the Finnish data used in this study also contributed to the development of some of the PGIs, there is a risk of overestimating their associations with mortality due to overfitting or "double-dipping." Similar inflation of effect sizes has been observed in studies using the UK Biobank, which is widely used for PGI construction.

      To our reading, this comment is closely related to the “recommendations for the author” 4 by reviewer 2, and we thus address them together.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Specific comments:

      (1) Cited reference 1 also investigated the PRS association with life span; cited reference 8 explains PRS association with healthy lifespan. Can authors be clearer about what is new in the context of these references? Specifically, what are the PGIs studied here that were not analyzed in the cited analyses?

      Although some previous studies on the topic do exist, our analysis arguably has novelty in touching upon several unstudied or scarcely studied themes. These include:

      A set of PGIs focusing on social, psychological, and behavioural phenotypes or PGIs for typically non-fatal health conditions.

      An assessment of direct genetic effects/ confounding with a within-sibship design.

      An assessment of potential heterogeneous effects by several socio-demographic characteristics.

      An analysis of external causes of deaths (which can be hypothesised to be particularly relevant here, given the choice of our PGIs not focusing directly on typical causes of death).

      A detailed assessment of the interplay of the most predictive PGIs with their corresponding phenotypes.

      We have substantially revised the Introduction section focusing on making these novel contributions more explicit.

      (2) In the Methods section, it is not very clear why the authors specifically study the "within-sibship" samples. Is this for avoiding nurturing effects from parental genotypes or for controlling assortative mating? The authors should clarify the rationale behind the design.

      The substance-related rationale behind this approach was briefly discussed in the Introduction section while in the Methods section, we focused more on the technical description of our analyses. However, it is certainly worthwhile to clarify to the reader why within-sibship methods have been used. The revised passage in the methods section now states:

      “In addition to this population sample, we used a within-sibship analysis sample to assess the extent of direct and indirect genetic associations captured by the PGIs, as discussed in the introduction.”

      (3) Residual correlations of PGIs were no more than 0.050..." As a minor comment, since PGIs is a noisy variable, the correlation would be low; however, I don't think there are better ways to evaluate Cox assumptions, and in many cases, this assumption is not correct for strong predictors.

      Yes, these points are true. Overall, it is often implausible that empirical distributions exactly match distributional assumptions in statistical models. For example, it may not be realistic to expect that the mortality hazards across categories of independent variables stay exactly proportional during long mortality-follow-ups; some deviations from constant proportions are almost inevitable. However, there are reasonable grounds to argue that in case of moderate violations of the proportional hazards assumption, the estimates still remain interpretable for practical uses. They can be read as approximating average relative hazards over the study period (for discussion, see pages 42–47 in Allison P. 2014. Event history and survival analysis: Regression for longitudinal event data (second edition). Thousand Oaks: SAGE).

      (4) "PGI of ADHD (HR=1.08 95%CI 1.04;1.11 among men; HR=1.01 95%CI 0.97;1.05 among women; p=0.012 for difference)." Is this difference significant after multiple testing correction?

      We have presented multiple-testing adjusted p-values together with nominal ones in this and in all other instances where they are mentioned in the text. Additionally, Supplementary tables S5–S6 present multiple-adjusted p-values for each PGIs studied.

      (5) "Panel D displays that most PGIs had stronger associations with external (accidents, violent, suicide, and alcohol related deaths) than natural causes of death." Similar to the comment above, are there any results that are significantly different between internal and external?

      We have added the p-values of those variables that had larger differences in the revised text. Quoting from the revised article: “The HR differences between external and natural causes of death were nominally significant at the conventional 5% level for cannabis use (p=0.016), drinks per week (p=0.028), left out of social activity (p=0.029), ADHD (p=0.031), BMI (p=0.035) and height (p=0.049), but none of these differences remained significant after adjusting for 35 multiple tests. “

      (6) Table 1: The effect of the phenotype is stronger than the PGI; this is expected as PGI is a weak predictor and can be considered as "noised" measurement of true genetic value (Becker 2021 Nature Human behavior). Is there a way to adjust for the impact of noise in PGI at tagging genetic value and compare if the PGI effect is different from the phenotype effect?

      PGIs are certainly imperfect measures that contain a lot of noise. However, extracting new information from what is unknown is an extremely demanding exercise, and still further complicated for example, by that we do not know the exact benchmark of total genetic effect we should be aiming at. Different methods of heritability estimation, for instance, often give dramatically differing results – for reasons that are still up to scrutiny.

      We are thus not familiar with a method that could achieve satisfactory answer for this challenging task.

      Reviewer #2 (Recommendations for the authors):

      (3) Justification and Selection of PGIs:

      For several traits, such as BMI, multiple polygenic indices (PGIs) are currently available. The criteria used to select specific PGIs for this study are not clearly described. A more systematic and reproducible approach-for example, leveraging metadata from the PGS Catalog-could strengthen the justification for PGI selection and enhance the study's generalizability.

      There are numerous PGIs developed in the extensive GWAS literature, but a finite set of PGIs always needs to be chosen for any analysis. The rationale behind our decision to include every PGI from the repository of Becker et al. 2021 (full reference in the manuscript, see also https://www.thessgac.org/pgi-repository) that was available for the Finnish data (including the possibility to exclude overlapping samples, see our response to the next comment for more discussion) was to provide rigorous analysis by limiting the researchers degrees of freedom in arbitrarily choosing PGIs. Although it would have been tempting to not use some PGIs that were not expected to substantially correlate with mortality, we believe that our conservative strategy increases the credibility of the reported p-values, particularly the multiple adjustment should now work as intended. 

      We also mention now this rationale when discussing the chosen PGIs in the methods section: “As the independent variables of main interest, we used 35 different PGIs in the Polygenic Index repository by Becker et al., which were mainly based on GWASes using UK Biobank and 23andMe, Inc. data samples, but also other data collections. They were tailored for the Finnish data, i.e., excluding overlapping individuals between the original GWAS and our analysis and performing linkage-disequilibrium adjustment. We used every single-trait PGI defined in the repository (except for subjective well-being, for which we were unable to obtain a meta-analysis version that excluded the overlapping samples). By limiting the researchers’ freedom in selecting the measures, this conservative strategy should increase the validity of our estimates, particularly with regards to multiple-testing adjusted p-values.”

      (4) Overlap Between PGI Training Data and Study Sample:

      The authors should describe any overlap between the data used to develop the PGIs and the current study sample. If such overlap exists, it may lead to overestimation of effect sizes due to "double-dipping." A discussion of this issue and its potential implications is warranted, as similar concerns have been raised in studies using UK Biobank data.

      This is, fortunately, not a concern of our analysis. Overlapping samples were excluded in creating the PGIs that we used. We have now described this more clearly in the revised methods section.

      (1) Clarify the Methodology for Family-Based Cox Analysis:

      It is unclear what specific method was used to perform Cox regression in the family-based analysis. Please provide additional methodological details. ”

      We have described the method further and added an additional reference in the revision. The text now stands:

      “We compared these models to the corresponding within-sibship models, using the sibship identifier as the strata variable. This method employs a sibship-specific (instead of a whole-sample-wide baseline hazard in the population models) baseline hazard, and corresponds to a fixed-effects model in some other regression frameworks (e.g., linear model with sibship-specific intercepts)”

      (2) Clarify Timing of Measured Risk Factors Relative to Follow-Up:

      The main text should provide more detailed information regarding the timing of data collection for directly measured risk factors. Specifically, it should be clarified whether the measurements used correspond to the first available data for each individual after the start of follow-up, or if a different criterion was applied.

      BMI, self-rated health, alcohol consumption and smoking status were measured at the baseline survey of each dataset. Education was registered as the highest completed degree up to the end of 2019. Depression was a composite of survey self-report (at the time of the baseline survey), as well as depression-related medicine purchases and hospitalizations over a two-year period before the start of the individual’s follow-up.

      We have added more comprehensive information on the measurement of the phenotypes of interest in Supplementary table 2, including the timing of the measurement.

    1. eLife Assessment

      This work significantly advances our understanding of chromatin organization within regions of repetitive sequences in the parasitic protozoan Trypanosoma brucei. Using cutting edge interdisciplinary tools, the authors provide compelling evidence for two discrete types of repetitive DNA element-associated proteins- one set involved in essential centromere function; and, the other involved in glycoprotein antigenic variation via homologous recombination. Thus, these fundamental findings have implications for this parasite's biology, and for therapeutic targeting in kinetoplastid diseases. This work will be exciting to those in the centromere/mitosis and parasite immunity fields.

      [Editors' note: this paper was reviewed by Review Commons.]

    2. Reviewer #1 (Public review):

      Summary:

      Carloni et al. comprehensively analyze which proteins bind repetitive genomic elements in Trypanosoma brucei. For this, they perform mass spectrometry on custom-designed, tagged programmable DNA-binding proteins. After extensively verifying their programmable DNA-binding proteins (using bioinformatic analysis to infer target sites, microscopy to measure localization, ChIP-seq to identify binding sites), they present, among others, two major findings: 1) 14 of the 25 known T. brucei kinetochore proteins are enriched at 177bp repeats. As T. brucei's 177bp repeat-containing intermediate-sized and mini-chromosomes lack centromere repeats but are stable over mitosis, Carloni et al. use their data to hypothesize that a 'rudimentary' kinetochore assembles at the 177bp repeats of these chromosomes to segregate them. 2) 70bp repeats are enriched with the Replication Protein A complex, which, notably, is required for homologous recombination. Homologous recombination is the pathway used for recombination-based antigenic variation of the 70bp-repeat-adjacent variant surface glycoproteins.

      Strengths and Weaknesses:

      The manuscript was previously reviewed through Review Commons. As noted there, the experiments are well controlled, the claims are well supported, and the methods are clearly described. The conclusions are convincing. All concerns I raised have been addressed except one (minor point #8):

      "The way the authors mapped the ChIP-seq data is potentially problematic when analyzing the same repeat type in different genomic regions. Reads with multiple equally good mapping positions were assigned randomly. This is fine when analyzing repeats by type, independent of genomic position, which is what the authors do to reach their main conclusions. However, several figures (Fig. 3B, Fig. 4B, Fig. 5B, Fig. 7) show the same repeat type at specific genomic locations." Due to the random assignment, all of these regions merely show the average signal for the given repeat. I find it misleading that this average is plotted out at "specific" genomic regions.<br /> Initially, I suggested a workaround, but the authors clarified why the workaround was not feasible, and their explanation is reasonable to me. That said, the figures still show a signal at positions where they can't be sure it actually exists. If this cannot be corrected analytically, it should at least be noted in the figure legends, Results, or Discussion.

      Importantly, the authors' conclusions do not hinge on this point; they are appropriately cautious, and their interpretations remain valid regardless.

      Significance:

      This work is of high significance for chromosome/centromere biology, parasitology, and the study of antigenic variation. For chromosome/centromere biology, the conceptual advancement of different types of kinetochores for different chromosomes is a novelty, as far as I know. It would certainly be interesting to apply this study as a technical blueprint for other organisms with mini-chromosomes or chromosomes without known centromeric repeats. I can imagine a broad range of labs studying other organisms with comparable chromosomes to take note of and build on this study. For parasitology and the study of antigenic variation, it is crucial to know how intermediate- and mini-chromosomes are stable through cell division, as these chromosomes harbor a large portion of the antigenic repertoire. Moreover, this study also found a novel link between the homologous repair pathway and variant surface glycoproteins, via the 70bp repeats. How and at which stages during the process, 70bp repeats are involved in antigenic variation is an unresolved, and very actively studied, question in the field. Of course, apart from the basic biological research audience, insights into antigenic variation always have the potential for clinical implications, as T. brucei causes sleeping sickness in humans and nagana in cattle. Due to antigenic variation, T. brucei infections can be chronic.

      Comments on revised version:

      All my recommendations have been addressed.

    3. Reviewer #2 (Public review):

      The Trypanosoma brucei genome, like that of other eukaryotes, contains diverse repetitive elements. Yet, the chromatin-associated proteome of these regions remains largely unexplored. This study represents a very important conceptual and technical advancement by employing synthetic TALE DNA-binding proteins fused to YFP to selectively capture proteins associated with specific repetitive sequences in T. brucei chromatin. The data presented here are convincing, supported by appropriate controls and a well-validated methodology, aligned with current state-of-the-art approaches.

      The authors used synthetic TALE DNA binding proteins, tagged with YFP, which were designed to target five specific repeat elements in T. brucei genome, including centromere and telomeres-associated repeats and those of a transposon element. This is in order to identify specific proteins that bind to these repetitive sequences in T. brucei chromatin. Validation of the approach was done using a TALE protein designed to target the telomere repeat (TelR-TALE) that detected many of the proteins that were previously implicated with telomeric functions. A TALE protein designed to target the 70 bp repeats that reside adjacent to the VSG genes (70R-TALE) detected proteins that function in DNA repair and a protein designed to target the 177 bp repeat arrays (177R-TALE) identified kinetochore proteins associated T. brucei mega base chromosomes, as well as in intermediate and mini-chromosomes, which imply that kinetochore assembly and segregation mechanisms are similar in all T. brucei chromosomes.

      This study represents a significant conceptual and technical advancement. To the best of our knowledge, it is the first report of employing TALE-YFP for affinity-based detection of protein complexes bound to repetitive genomic sequences in T. brucei. This approach enhances our understanding the organization in these important regions of the trypanosomal chromatin and provides the foundation for investigating the functional roles of associated proteins in parasite biology. These findings will be of particular interest to researchers studying the molecular biology of kinetoplastid parasites and other unicellular organisms, as well as to scientists investigating the roles of repetitive genomic elements in chromatin structure and their functional role in higher eukaryotes.

      Importantly, any essential or unique interacting partners identified using the approach employed here, could serve as a potential target for therapeutic intervention in severe tropical diseases cause by kinetoplastids.

    4. Author response:

      Point-by-point description of the revisions:

      Reviewer #1 (Evidence, reproducibility and clarity):

      Summary

      In this article, the authors used the synthetic TALE DNA binding proteins, tagged with YFP, which were designed to target five specific repeat elements in Trypanosoma brucei genome, including centromere and telomeres-associated repeats and those of a transposon element. This is in order to detect and identified, using YFP-pulldown, specific proteins that bind to these repetitive sequences in T. brucei chromatin. Validation of the approach was done using a TALE protein designed to target the telomere repeat (TelR-TALE) that detected many of the proteins that were previously implicated with telomeric functions. A TALE protein designed to target the 70 bp repeats that reside adjacent to the VSG genes (70R-TALE) detected proteins that function in DNA repair and the protein designed to target the 177 bp repeat arrays (177R-TALE) identified kinetochore proteins associated T. brucei mega base chromosomes, as well as in intermediate and mini-chromosomes, which imply that kinetochore assembly and segregation mechanisms are similar in all T. brucei chromosome.

      Major comments:

      Are the key conclusions convincing?

      The authors reported that they have successfully used TALE-based affinity selection of proteinassociated with repetitive sequences in the T. brucei genome. They claimed that this study has provided new information regarding the relevance of the repetitive region in the genome to chromosome integrity, telomere biology, chromosomal segregation and immune evasion strategies. These conclusions are based on high-quality research, and it is, basically, merits publication, provided that some major concerns, raised below, will be addressed before acceptance for publication.

      (1) The authors used TALE-YFP approach to examine the proteome associated with five different repetitive regions of the T. brucei genome and confirmed the binding of TALE-YFP with Chip-seq analyses. Ultimately, they got the list of proteins that bound to synthetic proteins, by affinity purification and LS-MS analysis and concluded that these proteins bind to different repetitive regions of the genome. There are two control proteins, one is TRF-YFP and the other KKT2-YFP, used to confirm the interactions. However, there are no experiment that confirms that the analysis gives some insight into the role of any putative or new protein in telomere biology, VSG gene regulation or chromosomal segregation. The proteins, which have already been reported by other studies, are mentioned. Although the author discovered many proteins in these repetitive regions, their role is yet unknown. It is recommended to take one or more of the new putative proteins from the repetitive elements and show whether or not they (1) bind directly to the specific repetitive sequence (e.g., by EMSA); (2) it is recommended that the authors will knockdown of one or a small sample of the new discovered proteins, which may shed light on their function at the repetitive region, as a proof of concept.

      The main request from Referee 1 is for individual evaluation of protein-DNA interaction for a few candidates identified in our TALE-YFP affinity purifications, particularly using EMSA to identify binding to the DNA repeats used for the TALE selection. In our opinion, such an approach would not actually provide the validation anticipated by the reviewer. The power of TALE-YFP affinity selection is that it enriches for protein complexes that associate with the chromatin that coats the target DNA repetitive elements rather than only identifying individual proteins or components of a complex that directly bind to DNA assembled in chromatin.

      The referee suggests we express recombinant proteins and perform EMSA for selected candidates, but many of the identified proteins are unlikely to directly bind to DNA – they are more likely to associate with a combination of features present in DNA and/or chromatin (e.g. specific histone variants or histone post-translational modifications). Of course, a positive result would provide some validation but only IF the tested protein can bind DNA in isolation – thus, a negative result would be uninformative.

      In fact, our finding that KKT proteins are enriched using the 177R-TALE (minichromosome repeat sequence) identifies components of the trypanosome kinetochore known (KKT2) or predicted (KKT3) to directly bind DNA (Marciano et al., 2021; PMID: 34081090), and likewise the TelR-TALE identifies the TRF component that is known to directly associate with telomeric (TTAGGG)n repeats (Reis et al 2018; PMID: 29385523). This provides reassurance on the specificity of the selection, as does the lack of cross selectivity between different TALEs used (see later point 3 below). The enrichment of the respective DNA repeats quantitated in Figure 2B (originally Figure S1) also provides strong evidence for TALE selectivity.

      It is very likely that most of the components enriched on the repetitive elements targeted by our TALE-YFP proteins do not bind repetitive DNA directly. The TRF telomere binding protein is an exception – but it is the only obvious DNA binding protein amongst the many proteins identified as being enriched in our TelR-TALE-YFP and TRF-YFP affinity selections.

      The referee also suggests that follow up experiments using knockdown of the identified proteins found to be enriched on repetitive DNA elements would be informative. In our opinion, this manuscript presents the development of a new methodology previously not applied to trypanosomes, and referee 2 highlights the value of this methodological development which will be relevant for a large community of kinetoplastid researchers. In-depth follow-up analyses would be beyond the scope of this current study but of course will be pursued in future. To be meaningful such knockdown analyses would need to be comprehensive in terms of their phenotypic characterisation (e.g. quantitative effects on chromosome biology and cell cycle progression, rates and mechanism of recombination underlying antigenic variation, etc) – simple RNAi knockdowns would provide information on fitness but little more. This information is already publicly available from genome-wide RNAi screens (www.tritrypDB.org), with further information on protein location available from the genome-wide protein localisation resource (Tryptag.org). Hence basic information is available on all targets selected by the TALEs after RNAi knock down but in-depth follow-up functional analysis of several proteins would require specific targeted assays beyond the scope of this study.

      (2) NonR-TALE-YFP does not have a binding site in the genome, but YFP protein should still be expressed by T. brucei clones with NLS. The authors have to explain why there is no signal detected in the nucleus, while a prominent signal was detected near kDNA (see Fig.2). Why is the expression of YFP in NonR-TALE almost not shown compared to other TALE clones?

      The NonR-TALE-YFP immunolocalisation signal indeed is apparently located close to the kDNA and away from the nucleus. We are not sure why this is so, but the construct is sequence validated and correct. However, we note that artefactual localisation of proteins fused to a globular eGFP tag, compared to a short linear epitope V5 tag, near to the kinetoplast has been previously reported (Pyrih et al, 2023; PMID: 37669165).

      The expression of NonR-TALE-YFP is shown in Supplementary Fig. S2 in comparison to other TALE proteins. Although it is evident that NonR-TALE-YFP is expressed at lower levels than other TALEs (the different TALEs have different expression levels), it is likely that in each case the TALE proteins would be in relative excess.

      It is possible that the absence of a target sequence for the NonR-TALE-YFP in the nucleus affects its stability and cellular location. Understanding these differences is tangential to the aim of this study.

      However, importantly, NonR-TALE-YFP is not the only control for used for specificity in our affinity purifications. Instead, the lack of cross-selection of the same proteins by different TALEs (e.g. TelR-TALE-YFP, 177R-TALE-YFP) and the lack of enrichment of any proteins of interest by the well expressed ingiR-TALE-YFP or 147R-TALE-YFP proteins each provide strong evidence for the specificity of the selection using TALEs, as does the enrichment of similar protein sets following affinity purification of the TelR-TALE-YFP and TRF-YFP proteins which both bind telomeric (TTAGGG)n repeats. Moreover, control affinity purifications to assess background were performed using cells that completely lack an expressed YFP protein which further support specificity (Figure 6).

      We have added text to highlight these important points in the revised manuscript:

      Page 8:

      “However, the expression level of NonR-TALE-YFP was lower than other TALE-YFP proteins; this may relate to the lack of DNA binding sites for NonR-TALE-YFP in the nucleus.”

      Page 8:

      “NonR-TALE-YFP displayed a diffuse nuclear and cytoplasmic signal; unexpectedly the cytoplasmic signal appeared to be in the vicinity the kDNA of the kinetoplast (mitochrondria). We note that artefactual localisation of some proteins fused to an eGFP tag has previously been observed in T. brucei (Pyrih et al, 2023).”

      Page 10:

      Moreover, a similar set of enriched proteins was identified in TelR-TALE-YFP affinity purifications whether compared with cells expressing no YFP fusion protein (No-YFP), the NonR-TALE-YFP or the ingiR-TALE-YFP as controls (Fig. S7B, S8A; Tables S3, S4). Thus, the most enriched proteins are specific to TelR-TALE-YFP-associated chromatin rather than to the TALE-YFP synthetic protein module or other chromatin.

      (3) As a proof of concept, the author showed that the TALE method determined the same interacting partners enrichment in TelR-TALE as compared to TRF-YFP. And they show the same interacting partners for other TALE proteins, whether compared with WT cells or with the NonR-TALE parasites. It may be because NonR-TALE parasites have almost no (or very little) YFP expression (see Fig. S3) as compared to other TALE clones and the TRF-YFP clone. To address this concern, there should be a control included, with proper YFP expression.

      See response to point 2, but we reiterate that the ingi-TALE -YFP and 147R-TALE-YFP proteins are well expressed (western original Fig. S3 now Fig. S2) but few proteins are detected as being enriched or correspond to those enriched in TelR-TALE-YFP or TRF-YFP affinity purifications (see Fig. S9). Therefore, the ingi-TALE -YFP and 147R-TALE-YFP proteins provide good additional negative controls for specificity as requested. To further reassure the referee we have also included additional volcano plots which compare TelR-TALE-YFP, 70R-TALE-YFP or 177R-TALE-YFP to the ingiR-TALE-YFP affinity selection (new Figure S8). As with No-YFP or NonR-TALE-YFP controls, the use of ingiR-TALE-YFP as a negative control demonstrates that known telomere associated proteins are enriched in TelR-TALE-YFP affinity purification, RPA subunits enriched with 70R-TALE-YFP and Kinetochore KKT poroteins enriched with 177RTALE-YFP. These analyses demonstrate specificity in the proteins enriched following affinity purification of our different TALE-YFPs and provide support to strengthen our original findings.

      We now refer to use of No-YFP, NonR-TALE-YFP, and ingiR-TALE -YFP as controls for comparison to TelR-TALE-YFP, 70R-TALE-YFP or 177R-TALE-YFP in several places:

      Page10:

      “Moreover, a similar set of enriched proteins was identified in TelR-TALE-YFP affinity purifications whether compared with cells expressing no YFP fusion protein (No-YFP), the NonR-TALE-YFP or the ingiR-TALE-YFP as controls (Fig. S7B, S8A; Tables S3, S4).”

      Page 11:

      “Thus, the nuclear ingiR-TALE-YFP provides an additional chromatin-associated negative control for affinity purifications with the TelR-TALE-YFP, 70R-TALE-YFP and 177R-TALE-YFP proteins (Fig. S8).”

      “Proteins identified as being enriched with 70R-TALE-YFP (Figure 6D) were similar in comparisons with either the No-YFP, NonR-TALE-YFP or ingiR-TALE-YFP as negative controls.”

      Top Page 12:

      “The same kinetochore proteins were enriched regardless of whether the 177R-TALE proteomics data was compared with No-YFP, NonR-TALE or ingiR-TALE-YFP controls.”

      Discussion Page 13:

      “Regardless, the 147R-TALE and ingiR-TALE proteins were well expressed in T. brucei cells, but their affinity selection did not significantly enrich for any relevant proteins. Thus, 147R-TALE and ingiR-TALE provide reassurance for the overall specificity for proteins enriched TelR-TALE, 70R-TALE and 177R-TALE affinity purifications.”

      (4) After the artificial expression of repetitive sequence binding five-TALE proteins, the question is if there is any competition for the TALE proteins with the corresponding endogenous proteins? Is there any effect on parasite survival or health, compared to the control after the expression of these five TALEs YFP protein? It is recommended to add parasite growth curves, for all the TALE proteins expressing cultures.

      Growth curves for cells expressing TelR-TALE-YFP, 177R-TALE-YFP and ingiR-TALE-YFP are now included (New Fig S3A). No deficit in growth was evident while passaging 70R-TALE-YFP, 147R-TALE-YFP, NonR-TALE-YFP cell lines (indeed they grew slightly better than controls).

      The following text has been added page 8:

      “Cell lines expressing representative TALE-YFP proteins displayed no fitness deficit (Fig. S3A).”

      (5) Since the experiments were performed using whole-cell extracts without prior nuclear fractionation, the authors should consider the possibility that some identified proteins may have originated from compartments other than the nucleus. Specifically, the detection of certain binding proteins might reflect sequence homology (or partial homology) between mitochondrial DNA (maxicircles and minicircles) and repetitive regions in the nuclear genome. Additionally, the lack of subcellular separation raises the concern that cytoplasmic proteins could have been co-purified due to whole cell lysis, making it challenging to discern whether the observed proteome truly represents the nuclear interactome.

      In our experimental design, we confirmed bioinformatically that the repeat sequences targeted were not represented elsewhere in the nuclear or mitochondrial genome (kDNA). The absence of subcellular fractionation could result in some cytoplasmic protein selection, but this is unlikely since each TALE targets a specific DNA sequence but is otherwise identical such that cross-selection of the same contaminating protein set would be anticipated if there was significant non-specific binding. We have previously successfully affinity selected 15 chromatin modifiers and identified associated proteins without major issues concerning cytoplasmic protein contamination (Staneva et al 2021 and 2022; PMID: 34407985 and 36169304). Of course, the possibility that some proteins are contaminants will need to be borne in mind in any future follow-up analysis of proteins of interest that we identified as being enriched on specific types of repetitive element in T. brucei. Proteins that are also detected in negative control, or negative affinity selections such as No-YFP, NoR-YFP, IngiR-TALE or 147R-TALE must be disregarded.

      (6) Should the authors qualify some of their claims as preliminary or speculative, or remove them altogether?

      As mentioned earlier, the author claimed that this study has provided new information concerning telomere biology, chromosomal segregation mechanisms, and immune evasion strategies. But there are no experiments that provides a role for any unknown or known protein in these processes. Thus, it is suggested to select one or two proteins of choice from the list and validate their direct binding to repetitive region(s), and their role in that region of interaction.

      As highlighted in response to point 1 the suggested validation and follow up experiments may well not be informative and are beyond the scope of the methodological development presented in this manuscript. Referee 2 describes the study in its current form as “a significant conceptual and technical advancement” and “This approach enhances our understanding of chromatin organization in these regions and provides a foundation for investigating the functional roles of associated proteins in parasite biology.”

      The Referee’s phrase ‘validate their direct binding to repetitive region(s)’ here may also mean to test if any of the additional proteins that we identified as being enriched with a specific TALE protein actually display enrichment over the repeat regions when examined by an orthogonal method. A key unexpected finding was that kinetochore proteins including KKT2 are enriched in our affinity purifications of the 177R-TALE-YFP that targets 177bp repeats (Figure 6F). By conducting ChIP-seq for the kinetochore specific protein KKT2 using YFP-KKT2 we confirmed that KKT2 is indeed enriched on 177bp repeat DNA but not flanking DNA (Figure 7). Moreover, several known telomere-associated proteins are detected in our affinity selections of TelRTALE-YFP (Figure 6B, FigS6; see also Reis et al, 2018 Nuc. Acids Res. PMID: 29385523; Weisert et al, 2024 Sci. Reports PMID: 39681615).

      Would additional experiments be essential to support the claims of the paper? Request additional experiments only where necessary for the paper as it is, and do not ask authors to open new lines of experimentation.

      The answer for this question depends on what the authors want to present as the achievements of the present study. If the achievement of the paper was is the creation of a new tool for discovering new proteins, associated with the repeat regions, I recommend that they add a proof for direct interactions between a sample the newly discovered proteins and the relevant repeats, as a proof of concept discussed above, However, if the authors like to claim that the study achieved new functional insights for these interactions they will have to expand the study, as mentioned above, to support the proof of concept.

      See our response to point 1 and the point we labelled ‘6’ above.

      Are the suggested experiments realistic in terms of time and resources? It would help if you could add an estimated cost and time investment for substantial experiments.

      I think that they are realistic. If the authors decided to check the capacity of a small sample of proteins (which was unknown before as a repetitive region binding proteins) to interacts directly with the repeated sequence, it will substantially add of the study (e.g., by EMSA; estimated time: 1 months). If the authors will decide to check the also the function of one of at least one such a newly detected proteins (e.g., by KD), I estimate the will take 3-6 months.

      As highlighted previously the proposed EMSA experiment may well be uninformative for protein complex components identified in our study or for isolated proteins that directly bind DNA in the context of a complex and chromatin. RNAi knockdown data and cell location data (as well as developmental expression and orthology data) is already available through tritrypDB.org and trtyptag.org

      Are the data and the methods presented in such a way that they can be reproduced? Yes

      Are the experiments adequately replicated, and statistical analysis adequate?

      The authors did not mention replicates. There is no statistical analysis mentioned.

      The figure legends indicate that all volcano plots of TALE affinity selections were derived from three biological replicates. Cutoffs used for significance: P < 0.05 (Student's t-test).

      For ChiP-seq two biological replicates were analysed for each cell line expressing the specific YFP tagged protein of interest (TALE or KKT2). This is now stated in the relevant figure legends – apologies for this oversight. The resulting data are available for scrutiny at GEO: GSE295698.

      Minor comments:

      Specific experimental issues that are easily addressable.

      The following suggestions can be incorporated:

      (1) Page 18, in the material method section author mentioned four drugs: Blasticidine, Phleomycin and G418, and hygromycin. It is recommended to mention the purpose of using these selective drugs for the parasite. If clonal selection has been done, then it should also be mentioned.

      We erroneously added information on several drugs used for selection in our labaoratory. In fact all TALE-YFP construct carry the Bleomycin resistance genes which we select for using Phleomycin. Also, clones were derived by limiting dilution immediately after transfection. We have amended the text accordingly:

      Page 17/18:

      “Cell cultures were maintained below 3 x 106 cells/ml. Pleomycin 2.5 µg/ml was used to select transformants containing the TALE construct BleoR gene.”

      “Electroporated bloodstream cells were added to 30 ml HMI-9 medium and two 10-fold serial dilutions were performed in order to isolate clonal Pleomycin resistant populations from the transfection. 1 ml of transfected cells were plated per well on 24-well plates (1 plate per serial dilution) and incubated at 37°C and 5% CO2 for a minimum of 6 h before adding 1 ml media containing 2X concentration Pleomycin (5 µg/ml) per well.”

      (2) In the method section the authors mentioned that there is only one site for binding of NonR-TALE in the parasite genome. But in Fig. 1C, the authors showed zero binding site. So, there is one binding site for NonR-TALE-YFP in the genome or zero?

      We thank the reviewer for pointing out this discrepancy. We have checked the latest Tb427v12 genome assembly for predicted NonR-TALE binding sites and there are no exact matches. We have corrected the text accordingly.

      Page 7:

      “A control NonR-TALE protein was also designed which was predicted to have no target sequence in the T. brucei genome.”

      Page 17:

      “A control NonR-TALE predicted to have no recognised target in the T. brucei geneome was designed as follows: BLAST searches were used to identify exact matches in the TREU927 reference genome. Candidate sequences with one or more match were discarded.”

      (3) The authors used two different anti-GFP antibodies, one from Roche and the other from Thermo Fisher. Why were two different antibodies used for the same protein?

      We have found that only some anti-GFP antibodies are effective for affinity selection of associated proteins, whereas others are better suited for immunolocalisation. The respective suppliers’ antibodies were optimised for each application.

      (4) Page 6: in the introduction, the authors give the number of total VSG genes as 2,634. Is it known how many of them are pseudogenes?

      This value corresponds to the number reported by Consentino et al. 2021 (PMID: 34541528) for subtelomeric VSGs, which is similar to the value reported by Muller et al 2018 (PMID: 30333624) (2486), both in the same strain of trypanosomes as used by us. Based on the earlier analysis by Cross et al (PMID: 24992042), 80% of the identified VSGs in their study (2584) are pseudogenes. This approximates to the estimation by Consentino of 346/2634 (13%) being fully functional VSG genes at subtelomeres, or 17% when considering VSGs at all genomic locations (433/2872).

      (5) I found several typos throughout the manuscript.

      Thank you for raising this, we have read through the manuscipt several times and hopefully corrected all outstanding typos.

      (6) Fig. 1C: Table: below TOTAL 2nd line: the number should be 1838 (rather than 1828)

      Corrected- thank you.

      - Are prior studies referenced appropriately? Yes

      - Are the text and figures clear and accurate? Yes

      - Do you have suggestions that would help the authors improve the presentation of their data and conclusions? Suggested above

      Reviewer #1 (Significance):

      Describe the nature and significance of the advance (e.g., conceptual, technical, clinical) for the field:

      This study represents a significant conceptual and technical advancement by employing a synthetic TALE DNA-binding protein tagged with YFP to selectively identify proteins associated with five distinct repetitive regions of T. brucei chromatin. To the best of my knowledge, it is the first report to utilize TALE-YFP for affinity-based isolation of protein complexes bound to repetitive genomic sequences in T. brucei. This approach enhances our understanding of chromatin organization in these regions and provides a foundation for investigating the functional roles of associated proteins in parasite biology. Importantly, any essential or unique interacting partners identified could serve as potential targets for therapeutic intervention.

      - Place the work in the context of the existing literature (provide references, where appropriate). I agree with the information that has already described in the submitted manuscript, regarding its potential addition of the data resulted and the technology established to the study of VSGs expression, kinetochore mechanism and telomere biology.

      - State what audience might be interested in and influenced by the reported findings. These findings will be of particular interest to researchers studying the molecular biology of kinetoplastid parasites and other unicellular organisms, as well as scientists investigating chromatin structure and the functional roles of repetitive genomic elements in higher eukaryotes.

      - (1) Define your field of expertise with a few keywords to help the authors contextualize your point of view. Protein-DNA interactions/ chromatin/ DNA replication/ Trypanosomes

      - (2) Indicate if there are any parts of the paper that you do not have sufficient expertise to evaluate. None

      Reviewer #2 (Evidence, reproducibility and clarity):

      Summary

      Carloni et al. comprehensively analyze which proteins bind repetitive genomic elements in Trypanosoma brucei. For this, they perform mass spectrometry on custom-designed, tagged programmable DNA-binding proteins. After extensively verifying their programmable DNA-binding proteins (using bioinformatic analysis to infer target sites, microscopy to measure localization, ChIP-seq to identify binding sites), they present, among others, two major findings: 1) 14 of the 25 known T. brucei kinetochore proteins are enriched at 177bp repeats. As T. brucei's 177bp repeatcontaining intermediate-sized and mini-chromosomes lack centromere repeats but are stable over mitosis, Carloni et al. use their data to hypothesize that a 'rudimentary' kinetochore assembles at the 177bp repeats of these chromosomes to segregate them. 2) 70bp repeats are enriched with the Replication Protein A complex, which, notably, is required for homologous recombination. Homologous recombination is the pathway used for recombination-based antigenic variation of the 70bp-repeat-adjacent variant surface glycoproteins.

      Major Comments

      None. The experiments are well-controlled, claims well-supported, and methods clearly described. Conclusions are convincing.

      Thank you for these positive comments.

      Minor Comments

      (1) Fig. 2 - I couldn't find an uncropped version showing multiple cells. If it exists, it should be linked in the legend or main text; Otherwise, this should be added to the supplement.

      The images presented represent reproducible analyses, and independently verified by two of the authors. Although wider field of view images do not provide the resolution to be informative on cell location, as requested we have provided uncropped images in new Fig. S4 for all the cell lines shown in Figure 2A.

      In addition, we have included as supplementary images (Fig. S3B) additional images of TelRTALE-YFP, 177R-TALE-YFP and ingiR-TALE YFP localisation to provide additional support their observed locations presented in Figure 1. The set of cells and images presented in Figure 2A and in Fig S3B were prepared and obtained by a different authors, independently and reproducibly validating the location of the tagged protein.

      (2) I think Suppl. Fig. 1 is very valuable, as it is a quantification and summary of the ChIP-seq data. I think the authors could consider making this a panel of a main figure. For the main figure, I think the plot could be trimmed down to only show the background and the relevant repeat for each TALE protein, leaving out the non-target repeats. (This relates to minor comment 6.) Also, I believe, it was not explained how background enrichment was calculated.

      We are grateful for the reviewer’s positive view of original Fig. S1 and appreciate the suggestion. We have now moved these analysis to part B of main Figure 2 in the revised manuscript – now Figure 2B. We have also provided additional details in the Methods section on the approaches used to assess background enrichment.

      Page 19:

      “Background enrichment calculation

      The genome was divided into 50 bp sliding windows, and each window was annotated based on overlapping genomic features, including CIR147, 177 bp repeats, 70 bp repeats, and telomeric (TTAGGG)n repeats. Windows that did not overlap with any of these annotated repeat elements were defined as "background" regions and used to establish the baseline ChIP-seq signal. Enrichment for each window was calculated using bamCompare, as log₂(IP/Input). To adjust for background signal amongst all samples, enrichment values for each sample were further normalized against the corresponding No-YFP ChIP-seq dataset.”

      Note: While revising the manuscript we also noticed that the script had a nomalization error. We have therefore included a corrected version of these analyses as Figure 2B (old Fig. S1)

      (3) Generally, I would plot enrichment on a log2 axis. This concerns several figures with ChIP-seq data.

      Our ChIP-seq enrichment is calculated by bamCompare. The resulting enrichment values are indeed log2 (IP/Input). We have made this clear in the updated figures/legends.

      (4) Fig. 4C - The violin plots are very hard to interpret, as the plots are very narrow compared to the line thickness, making it hard to judge the actual volume. For example, in Centromere 5, YFP-KKT2 is less enriched than 147R-TALE over most of the centromere with some peaks of much higher enrichment (as visible in panel B), however, in panel C, it is very hard to see this same information. I'm sure there is some way to present this better, either using a different type of plot or by improving the spacing of the existing plot.

      We thank the reviewer for this suggestion; we have elected to provide a Split-Violin plot instead. This improves the presentation of the data for each centromere. The original violin plot in Figure 4C has been replaced with this Split-Violin plot (still Figure 4C).

      (5) Fig. 6 - The panels are missing an x-axis label (although it is obvious from the plot what is displayed).

      Maybe the "WT NO-YFP vs" part that is repeated in all the plot titles could be removed from the title and only be part of the x-axis label?

      In fact, to save space the X axis was labelled inside each volcano plot but we neglected to indicate that values are a log2 scale indicating enrichment. This has been rectified – see Figure 6, and Fig. S7, S8 and S9.

      (6) Fig. 7 - I would like to have a quantification for the examples shown here. In fact, such a quantification already exists in Suppl. Figure 1. I think the relevant plots of that quantification (YFPKKT2 over 177bp-repeats and centromere-repeats) with some control could be included in Fig. 7 as panel C. This opportunity could be used to show enrichment separated out for intermediate-sized, mini-, and megabase-chromosomes. (relates to minor comment 2 & 8)

      The CIR147 sequence is found exclusively on megabase-sized chromosomes, while the 177 bp repeats are located on intermediate- and mini-sized chromosomes. Due to limitations in the current genome assembly, it is not possible to reliably classify all chromosomes into intermediate- or mini- sized categories based on their length. Therefore, original Supplementary Fig. S1 presented the YFP-KKT2 enrichment over CIR147 and 177 bp repeats as a representative comparison between megabase chromosomes and the remaining chromosomes (corrected version now presented as main Figure 2B). Additionally, to allow direct comparison of YFP-KKT2 enrichment on CIR147 and 177 bp repeats we have included a new plot in Figure 7C which shows the relative enrichment of YFP-KKT2 on these two repeat types.

      We have added the following text , page 12:

      “Taking into account the relative to the number of CIR147 and 177 bp repeats in the current T.brucei genome (Cosentino et al., 2021; Rabuffo et al., 2024), comparative analyses demonstrated that YFP-KKT2 is enriched on both CIR147 and 177 bp repeats (Figure 7C).”

      (7) Suppl. Fig. 8 A - I believe there is a mistake here: KKT5 occurs twice in the plot, the one in the overlap region should be KKT1-4 instead, correct?

      Thanks for spotting this. It has been corrected

      (8) The way that the authors mapped ChIP-seq data is potentially problematic when analyzing the same repeat type in different regions of the genome. The authors assigned reads that had multiple equally good mapping positions to one of these mapping positions, randomly.

      This is perfectly fine when analysing repeats by their type, independent of their position on the genome, which is what the authors did for the main conclusions of the work.

      However, several figures show the same type of repeat at different positions in the genome. Here, the authors risk that enrichment in one region of the genome 'spills' over to all other regions with the same sequence. Particularly, where they show YFP-KKT2 enrichment over intermediate- and mini-chromosomes (Fig. 7) due to the spillover, one cannot be sure to have found KKT2 in both regions.

      Instead, the authors could analyze only uniquely mapping reads / read-pairs where at least one mate is uniquely mapping. I realize that with this strict filtering, data will be much more sparse. Hence, I would suggest keeping the original plots and adding one more quantification where the enrichment over the whole region (e.g., all 177bp repeats on intermediate-/mini-chromosomes) is plotted using the unique reads (this could even be supplementary). This also applies to Fig. 4 B & C.

      We thank the reviewer for their thoughtful comments. Repetitive sequences are indeed challenging to analyze accurately, particularly in the context of short read ChIP-seq data. In our study, we aimed to address YFP-KKT2 enrichment not only over CIR147 repeats but also on 177 bp repeats, using both ChIP-seq and proteomics using synthetic TALE proteins targeted to the different repeat types. We appreciate the referees suggestion to consider uniquely mapped reads, however, in the updated genome assembly, the 177 bp repeats are frequently immediately followed by long stretches of 70 bp repeats which can span several kilobases. The size and repetitive nature of these regions exceeds the resolution limits of ChIP-seq. It is therefore difficult to precisely quantify enrichment across all chromosomes.

      Additionally, the repeat sequences are highly similar, and relying solely on uniquely mapped reads would result in the exclusion of most reads originating from these regions, significantly underestimating the relative signals. To address this, we used Bowtie2 with settings that allow multi-mapping, assigning reads randomly among equivalent mapping positions, but ensuring each read is counted only once. This approach is designed to evenly distribute signal across all repetitive regions and preserve a meaningful average.

      Single molecule methods such as DiMeLo (Altemose et al. 2022; PMID: 35396487) will need to be developed for T. brucei to allow more accurate and chromosome specific mapping of kinetochore or telomere protein occupancy at repeat-unique sequence boundaries on individual chromosomes.

      Reviewer #2 (Significance):

      This work is of high significance for chromosome/centromere biology, parasitology, and the study of antigenic variation. For chromosome/centromere biology, the conceptual advancement of different types of kinetochores for different chromosomes is a novelty, as far as I know. It would certainly be interesting to apply this study as a technical blueprint for other organisms with minichromosomes or chromosomes without known centromeric repeats. I can imagine a broad range of labs studying other organisms with comparable chromosomes to take note of and build on this study. For parasitology and the study of antigenic variation, it is crucial to know how intermediate- and mini-chromosomes are stable through cell division, as these chromosomes harbor a large portion of the antigenic repertoire. Moreover, this study also found a novel link between the homologous repair pathway and variant surface glycoproteins, via the 70bp repeats. How and at which stages during the process, 70bp repeats are involved in antigenic variation is an unresolved, and very actively studied, question in the field. Of course, apart from the basic biological research audience, insights into antigenic variation always have the potential for clinical implications, as T. brucei causes sleeping sickness in humans and nagana in cattle. Due to antigenic variation, T. brucei infections can be chronic.

      Thank you for supporting the novelty and broad interest of our manuscript

      My field of expertise / Point of view:

      I'm a computer scientist by training and am now a postdoctoral bioinformatician in a molecular parasitology laboratory. The laboratory is working on antigenic variation in T. brucei. The focus of my work is on analyzing sequencing data (such as ChIP-seq data) and algorithmically improving bioinformatic tools.

    1. eLife Assessment

      This important study examines the role of map3k1, a MAP3K family member that has both kinase and ubiquitin ligase domains, in the differentiation of progenitors in the flatworm Planaria. The convincing analyses demonstrate that map3k1 acts within progenitors to restrict their premature differentiation and to prevent formation of teratomas. This work would be of interest to researchers in the fields of regeneration, developmental biology, and aging.

    2. Reviewer #1 (Public review):

      Summary:

      The authors assess the role of map3k1 in adult Planaria through whole body RNAi for various periods of time. The authors' prior work has shown that neoblasts (stem cells that can regenerate the entire body) for various tissues are intermingled in the body. Neoblasts divide to produce progenitors that migrate within a "target zone" to the "differentiated target tissues" where they differentiate into a specific cell type. Here the authors show that map3k1-i animals have ectopic eyes that form along the "normal" migration path of eye progenitors, ectopic neurons and glands along the AP axis and pharynx in ectopic anterior positions. The rest of the study shows that positional information is largely unaffected by loss of map3k1. However, loss of map3k1 leads to premature differentiated of progenitors along their normal migratory route. They also show that "long-term" whole body depletion of map3k1 results in mis-specified organs and teratomas. In short, this study convincingly demonstrates that in planaria, map3k1 maintains progenitor cells in an undifferentiated state, preventing premature fate commitment until they encounter the appropriate signals, either positional cues within a designated region or contact-dependent inputs from surrounding tissues.

      Strengths:

      (1) The study has appropriate controls, sample sizes and statistics.

      (2) The work is high-quality.

      (3) The conclusions are supported by the data.

      (4) Planaria is a good system to analyze the function of map3k1, which exists in mammals but not other invertebrates.

      Weaknesses:

      None noted.

    3. Reviewer #2 (Public review):

      Summary:

      The flatworm planarian Schmidtea mediterranea is an excellent model for understanding cell fate specification during tissue regeneration and adult tissue maintenance. Planarian stem cells, known as neoblasts, are continuously deployed to support cellular turnover and repair tissues damaged or lost due to injury. This reparative process requires great precision to recognize the location, timing, and cellular fate of a defined number of neoblast progeny. Understanding the molecular mechanisms driving this process could have important implications for regenerative medicine and enhance our understanding of how form and function are maintained in long-lived organisms such as humans. Unfortunately, the molecular basis guiding cell fate and differentiation remains poorly understood.

      In this manuscript, Canales et al. identified the role of the map3k1 gene in mediating the differentiation of progenitor cells at the proper target tissue. The map3k1 function in planarians appears evolutionarily conserved as it has been implicated in regulating cell proliferation, differentiation, and cell death in mammals. The results show that the downregulation of map3k1 with RNAi leads to spatial patterning defects in different tissue types, including the eye, pharynx, and the nervous system. Intriguingly, long-term map3k1-RNAi resulted in ectopic outgrowths consistent with teratomas in planarians. The findings suggest that map3k1 mediates signaling, regulating the timing and location of cellular progenitors to maintain correct patterning during adult tissue maintenance.

      Strengths:

      The authors provide an entry point to understanding molecular mechanisms regulating progenitor cell differentiation and patterning during adult tissue maintenance.

      The diverse set of approaches and methods applied to characterize map3k1 function strengthens the case for conserved evolutionary mechanisms in a selected number of tissue types. The creativity using transplantation experiments is commendable, and the findings with the teratoma phenotype are intriguing and worth characterizing.

      Weaknesses:

      The authors have satisfactorily addressed our previous concerns.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors assess the role of map3k1 in adult Planaria through whole body RNAi for various periods of time. The authors' prior work has shown that neoblasts (stem cells that can regenerate the entire body) for various tissues are intermingled in the body. Neoblasts divide to produce progenitors that migrate within a "target zone" to the "differentiated target tissues" where they differentiate into a specific cell type. Here the authors show that map3k1-i animals have ectopic eyes that form along the "normal" migration path of eye progenitors (Fig. 1), ectopic neurons and glands along the AP axis (Fig. 2) and pharynx in ectopic anterior positions (Fig. 3). The rest of the study show that positional information is largely unaffected by loss of map3k1 (Fig. 4,5). However, loss of map3k1 leads to premature differentiated of progenitors along their normal migratory route (Fig. 6). They also show that an ill-defined "long-term" whole body depletion of map3k1 results in mis-specified organs and teratomas.

      Strengths:

      (1) The study has appropriate controls, sample sizes and statistics.

      (2) The work appears to be high-quality.

      (3) The conclusions are supported by the data.

      (4) Planaria is a good system to analyze the function of map3k1, which exists in mammals but not in other invertebrates.

      Weaknesses:

      (1) The paper is largely descriptive with no mechanistic insights. 

      The mechanistic insights we aim to address are primarily at the cellular systems level – how adult progenitor cells produce pattern. Specifically, we uncovered strong evidence that regulation of differentiation is an active process occurring in migratory progenitors and that this regulation is a major component of pattern formation during the adult processes of tissue turnover and regeneration. The map3k1 phenotype provided a tool used to reveal the existence of this regulation, and to understand the patterning abnormalities prevented by this regulatory mechanism. We updated the text in several places to make clearer some of this emphasis. For example, in the Discussion: "We suggest that differentiation is restricted during migratory targeting as an essential component of pattern formation, with the map3k1 RNAi phenotype indicating the existence and purpose of this element of patterning." 

      Naturally, identifying a particular molecule involved in this process is of interest for understanding molecular mechanism; this would allow for comparison to other cellular systems in other organisms and would focus future molecular inquiry. Future molecular studies into the mechanism of Map3k1 regulation and its downstream signaling will be fascinating as next steps towards understanding the process at the molecular level more deeply. We also added some discussion considering the types of upstream activation cues that could potentially be associated with Map3k1 regulation to suppress differentiation. 

      (2) Given the severe phenotypes of long-term depletion of map3k1, it is important that this exact timepoint is provided in the methods, figures, figure legends and results. 

      We removed the use of the term “long-term” and instead added timepoints used to all figure legends. We also added a summary of timepoints used in the methods section and included RNAi timepoint labels in figures where a phenotype progression over time is relevant to interpretation. For timecourses, we also added suitable time information to text in the results. 

      (3) Figure 1C, the ectopic eyes are difficult to see, please add arrows. 

      To improve visualization, we replaced the example animal in the original Figure 1C with one that has a stronger phenotype, including arrows pointing to every ectopic event. Additionally, we included magnified images of optic cup cells and photoreceptor neurons in the trunk and tail region. This is now Figure 1B.

      (4) line 217 - why does the n=2/12 animals not match the values in Figure 3B, which is 11/12 and 12/12. The numbers don't add up. Please correct/explain. 

      In Figure 3B in the submitted version (3/18 had cells in the tail) had more animals scored (6 animals from a replicate experiment where 1/6 showed the cells in the tail) than the total scored (2/12 had cells in the tail) in the text, which did not have the animals from the replicate added during writing. The results are the same, just different sample sizes were noted in those locations and we fixed this issue. In the updated Figure 3, the order of presentation has shifted (e.g., prior 3B is now in 3C and Figure 3_figure supplement 1). We made sure to include numbers to all figure panels. 

      (5) Figure panels do not match what is written in the results section. There is no Figure 6E. Please correct.

      Thank you for catching this. We have gone through figures and text after editing to make sure that text callouts are appropriately matched to the figures. 

      Reviewer #2 (Public review):

      Summary:

      The flatworm planarian Schmidtea mediterranea is an excellent model for understanding cell fate specification during tissue regeneration and adult tissue maintenance. Planarian stem cells, known as neoblasts, are continuously deployed to support cellular turnover and repair tissues damaged or lost due to injury. This reparative process requires great precision to recognize the location, timing, and cellular fate of a defined number of neoblast progeny. Understanding the molecular mechanisms driving this process could have important implications for regenerative medicine and enhance our understanding of how form and function are maintained in long-lived organisms such as humans. Unfortunately, the molecular basis guiding cell fate and differentiation remains poorly understood.

      In this manuscript, Canales et al. identified the role of the map3k1 gene in mediating the differentiation of progenitor cells at the proper target tissue. The map3k1 function in planarians appears evolutionarily conserved as it has been implicated in regulating cell proliferation, differentiation, and cell death in mammals. The results show that the downregulation of map3k1 with RNAi leads to spatial patterning defects in different tissue types, including the eye, pharynx, and the nervous system. Intriguingly, long-term map3k1-RNAi resulted in ectopic outgrowths consistent with teratomas in planarians. The findings suggest that map3k1 mediates signaling, regulating the timing and location of cellular progenitors to maintain correct patterning during adult tissue maintenance.

      Strengths:

      The authors provide an entry point to understanding molecular mechanisms regulating progenitor cell differentiation and patterning during adult tissue maintenance.

      The diverse set of approaches and methods applied to characterize map3k1 function strengthens the case for conserved evolutionary mechanisms in a selected number of tissue types. The creativity using transplantation experiments is commendable, and the findings with the teratoma phenotype are intriguing and worth characterizing.

      Thank you to the reviewer for the positive feedback

      Weaknesses:

      The article presents a provocative idea related to the importance of positional control for organs and cells, which is at least in part regulated by map3k1. Nonetheless, the role of map3k1 or its potential interaction with regulators of the anterior-posterior, mediolateral axes, and PCGs is somewhat superficial. The authors could elaborate or even speculate more in the discussion section and the different scenarios incorporating these axial modulators into the map3k1 model presented in Figure 8 

      First, to strengthen the support for our finding that positional information is largely unaffected in map3k1 RNAi animals, we added data regarding the expression of additional relevant position control genes (PCGs) –ndl-4, ptk7, sp5, and wnt11-1 – to the PCG panel in Figure 5. The expression domain of ndl-4, an FGF receptor-like protein family member that contributes to head patterning and anterior pole maintenance, was normal in map3k1 RNAi. wnt11-1, a PCG with expression concentrated in the posterior end of the animal and with expression dependent on general Wnt activity, was also normal in map3k1 RNAi animals. ptk7, RNAi of which can result in supernumerary pharynges, also showed normal expression in map3k1 RNAi animals. Finally, sp5, a Wnt-activated gene with expression in the tail, also showed normal expression in map3k1 RNAi animals. 

      Second, to further support the conclusion that cells are not suitably responding to positional information after map3k1 RNAi, which we argue normally dictates where differentiation should occur, we added examples of differentiated cell types that are ectopically positioned within an atypical PCG expression domain for that cell type (Figure 5C). This underscores that following map3k1 RNAi the PCG expression domains do not change, but the pattern of differentiated cell types relative to these domains does shift. We also added data showing that regenerating tails had a proper wntP-2 gradient, but an anterior regenerating pharynx appeared outside of this wntP-2<sup>+</sup> zone and inside of an ndl-5<sup>+</sup> zone (Figure 5- figure supplement 1E). We added some discussion of these new data in the Figure 5 results section. We also noted, regarding independent recent map3k1 work (Lo, 2025), some evidence exists that a minor posterior shift in ndl-5 expression can occur after map3k1 RNAi.

      Next, we added a new element to the model figure (Figure 8B) depicting that PCG expression domains remain normal after map3k1 RNAi, with ectopic differentiation occurring in an incorrect positional information environment. We refer to this new panel in the discussion: "We suggest that map3k1 is not required for the spatial distribution of progenitor-extrinsic differentiation-promoting cues themselves, but for progenitors to be restricted from differentiating until these cues are received (Figure 8B)."; we then follow this statement with a summary in the Discussion of six pieces of evidence that support this model.

      Finally, we added some additional text to the discussion section about candidate mechanisms by which extrinsic cues could potentially regulate Map3k1, pointing to potential future inquiry directions. We suggest that map3k1 is not involved in regulating PCG activity domains themselves, but instead acts as a brake on differentiation within migratory progenitors through active signaling. This brake is then lifted when the progenitors hit their correct PCG-based migratory target, or when they hit their target tissue. How that occurs mechanistically is unknown. One scenario is that each progenitor is tuned to respond to a particular PCG-regulated environment (such as a particular ECM or signaling environment) to generate a molecular change that inactivates Map3K1 signaling, such as by inactivating or disengaging an RTK signal. Alternatively, the migratory process in progenitors could engage the Map3K1 signal, enabling signal cessation with arrival at a target location. When Map3K1 is active it could result in a transcriptional state that prevents full expression of differentiated factors required for maturation, tissue incorporation, and cessation of migration. These considerations are now added to the discussion.

      The article can be improved by addressing inconsistencies and adding details to the results, including the main figures and supplements. This represents one of the most significant weaknesses of this otherwise intriguing manuscript. Below are some examples of a few figures, but the authors are expected to pay close attention to the remaining figures in the paper.

      Details associated with the number of animals per experiment, statistical methods used, and detailed descriptions of figures appear inconsistent or lacking in almost all figures. In some instances, the percentage of animals affected by the phenotype is shown without detailing the number of animals in the experiment or the number of repeats. Figures and their legends throughout the paper lack details on what is represented and sometimes are mislabeled or unrelated. 

      We endeavored to ensure that these noted details are present throughout the legends and figures for all figure panels.

      Specifically, the arrows in Figure 1A are different colors. Still, no reasoning is given for this, and in the exact figure, the top side (1A) shows the percentages and the number of animals below. 

      The only reason for the different colored arrows was for visibility purposes. To avoid confusion, we now use white arrows for all FISH images in figure 1, and where ever else possible. We also replaced the percentages with n numbers in the bottom left corner of the live images in Figure 1A. 

      Conversely, in Figures 1B, C, and D, no details on the number of animals or percentages are shown, nor an explanation of why opsin was used in Figure 1A but not 1B. 

      The original Figure 1B represented a few different examples of ectopic eye/eye cell patterns in the map3k1 RNAi animals to demonstrate the variable and disorganized nature of the phenotype. To address this, we added further explanation in the legend. We also merged 1A and 1B for simplicity of interpretation. opsin was used in Figure 1A to label cell bodies of photoreceptors. anti-Arrestin was used in the example FISH images to see if these cells were interconnected via projections, which we now clarify in the legend and in the text. 

      Is Figure 1B missing an image for the respective control? Figure 1C needs details regarding what the two smaller boxes underneath are. 

      The control for Figure 1B was in Figure 1A; the merger of Figures 1A/B should address this. Boxes in Figure 1C were labelled with numbers corresponding to the image above them.

      Figure 1C could use an AP labeling map in 10 days (e.g., AP6 has one optic cup present). Figure 1C and F counts do not match. 

      We added a cartoon to 1C to show the region imaged. Note that the 36d trunk image (now Fig. 1B) has now been replaced with a full animal image and magnified boxes that show locations of example ectopic cells. That cell in 1C was categorized as in AP5. Note that additional animals were analyzed and data added to the distribution (now Fig. 1D). 

      In Figure 1C, we do not know the number of animals tested, controls used, the scale bar sizes in the first two images, nor the degree of magnification used despite the pharynx region appearing magnified in the second image.  Figure 1C is also shown out of chronological order; 36 days post RNAi is shown before 10 days post RNAi. Moreover, the legends for Figures 1C and 1D are swapped.

      We have endeavored to ensure sample numbers, control images, and appropriate scale bar notation in legends are present for all images. Figure 1C has now been split into two panels: Figure 1B and Figure 1C. It does not follow a chronological order in presentation for the following logic flow: we uncover and describe the phenotype; then, with knowledge of the defect, we walk back to see how early the phenotype starts after RNAi and what the pattern of ectopic cell distribution is when the phenotype starts to emerge (using the knowledge of which cells are affected from the overt phenotype described in 1A/B). 

      Additionally, Figure 1F and many other figures throughout the paper lack overall statistical considerations. Furthermore, Figure 1F has three components, but only one is labeled. Labeling each of them individually and describing them in the corresponding figure legend may be more appropriate.

      The main point of the graphs in 1F (now 1D) was the overt overall pattern difference with the wild-type, which never has ectopic eye cells in the midbody or tail, and that the ectopic eye cells progress throughout the entire AP axis. However, we concur that a statistical test is a reasonable thing to show here and that is now included in the legend. The 3 components (in Figure 1F, now Figure 1D) where kept together with one figure label (D) for simplicity, but were rearranged (top and bottom) with a cartoon to the side and with modified labeling for extra clarity. 

      Figure 2C shows images of gene expression for two genes, but the counts are shown for only one in Figure 2D. It is challenging to follow the author's conclusions without apparent reasoning and by only displaying quantitative considerations for one case but not the other. These inconsistencies are also observed in different figures. 

      In Figure 2C, FISH images of cintillo+ and dd_17258+ neurons are shown to display the specificity of this effect to some neurons and not others. Because cintillo+ cells did not expand at all (n=24/24 animals), the counts for them would all be zero values. We only counted data for dd_17258 cells because it was the neuron that expanded compared to the control animals. We have now added a note in the legend explaining this.

      In Figure 2D, 24/24 animals were reported to show the phenotype, but only eight were counted (is there a reason for this?).

      8 animals were used to quantitatively characterize the spread of cells along the AP axis, as it was deemed an adequate sample size to capture the change in distribution of 17258+ cells from being head restricted to being present throughout the body. Through multiple cohorts of animals in replicates, a total of 24/24 examined animals showed this expansion phenotype. Double FISH experiments were additionally carried out using dd_17258 and various PCGs; these data are now included in Figure 5C, and these animals were added to the total counts regarding quantitative analysis of the phenotype in Figure 2D. 

      In Figure 2E, the expression for three genes is shown, with some displaying anterior and posterior regions while others only show the anterior picture. Is there a particular reason for this? 

      The original first panel in Figure 2E showed an example of a non-expanding gland cell type, dd_9223, which is very restricted to the head in both control and map3k1 RNAi animals. Because we did not observe a phenotype for this cell type (no cells in all control and map3k1 RNAi animal tails), we only included tail images of cell types that showed an abnormal phenotype with clear expanded to the posterior (dd_8476 and dd_7131). However, we have now included tail images of dd_9223 cells and added data for dd_9223 to the graph in Figure 2E. 

      Also, in Figure 2F, the counts are shown for only the posterior region of two genes out of the three displayed in Figure 2E. It is unclear why the authors do not show counts for the anterior areas considered in Figure 2E. Furthermore, the legend for Figure 2D is missing, and the legend for 2F is mislabeled as a description for Figure 2D.

      We now include tail images for dd_9223 in Figure 2E to show that there are no ectopic cells in tails. We did not originally include counts of dd_9223 because there was no phenotype observed. dd_7131 and dd_8476 cell types appeared in the posterior of even control animals at a low frequency, unlike dd_9223 cells. However, we did now add counts for dd_9223 tail regions in the graph. We did not count the anterior regions of the animal because our goal was to show data for the visible phenotype (ectopic cells in the tail) not only with an example image, but also by showing the number of cells in the tail with a graph and statistical test. Legends have been updated with correct details.

      Supplement Figure 1 B reports data up to 6 weeks, but no text in the manuscript or supplement mentions any experiment going up to 6 weeks. There are no statistics for data in Supplement Figure 1E. Any significance between groups is unclear.

      More details about the RNAi feeding schedules have been added in the methods section. All RNAi timepoints are now specified specifically in the legends. The Figure 1F and Figure 1- figure supplement 1E (additional data: ovo<sup>+</sup>; smedwi-1<sup>-</sup> cell counts) and legends now mention the statistical tests performed and annotations (not significant *ns) or p values have been added to the graphs. For simplicity, we decided to include all smedwi-1+ counts together rather than splitting them into low and high smedwi-1+ cells, because we weren't really making any claims about low and high cells. 

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      It would be good to acknowledge in the discussion the recent paper from the Petersen lab on map3k1, published in PLoS Genet 2025, especially if the results differ between the two labs.

      We added reference/discussion regarding the recent PLoS Genetics Lo, 2025 map3k1 paper at several suitable points in the manuscript.

      Reviewer #2 (Recommendations for the authors):

      Please pay close attention to the description of experimental details and the consistency throughout the paper. It seems like the reader has to assume or come across information that is not readily available from the text or the legends in the paper. This is an interesting paper with intriguing findings. However, the version presented here appears rushed or put together on the flight.

      Thank you for your thorough feedback. We have endeavored to ensure all appropriate details are present in figures and/or figure legends.

    1. eLife Assessment

      This important study employs a closed-loop, theta-phase-specific optogenetic manipulation of medial septal parvalbumin-expressing neurons in rats and reports that disrupting theta-timescale coordination impairs performance of challenging aspects of spatial behaviors, while sparing hippocampal replay and spatial coding in hippocampal place cells. The findings are expected to advance theoretical understanding of learning and memory operations and to provide practical implications for the application of similar optogenetic approaches. The experiments were viewed as technically rigorous, but the strength of evidence provided in the current version of the manuscript was viewed as incomplete, mostly due to limited analyses and the descriptions of some of the experimental protocols.

    2. Reviewer #1 (Public review):

      Summary:

      This manuscript by Joshi and colleagues demonstrates that the precise theta-phase timing of spikes is causal for CA1 hippocampal theta sequences during locomotion on a linear track and is necessary for learning the cognitively demanding outbound component of a hippocampus-dependent alternation task (W-maze), independently of replay during immobility. To reach these conclusions, the authors developed a theta-phase-specific, closed-loop manipulation that used optogenetic activation of medial septal parvalbumin (PV) interneurons at the ascending phase of theta during locomotion. This protocol preserved immobility periods, allowing a clean and elegant dissociation from SWR-associated replay.

      The manuscript is well written and was a pleasure to read. The work described is of high quality and introduces several notable advances to the field:

      (a) It extends prior studies that manipulated theta oscillations by examining precise temporal structure (specifically theta sequences) rather than only LFP features.

      (b) The closed-loop manipulation enabled dissociation between deficits in theta sequences during a behavioural task and SWR-associated replay activity.

      (c) As controls, the authors included rats with suboptimal viral transduction or optic-fibre placement, and, within subjects, both stimulation-on (stim-on) and stimulation-off (stim-off) trials. Notably, sequence disruption persisted into stim-off periods within the same session.

      Overall, this is a strong manuscript that will provide valuable insights to the field. I have only minor comments:

      (1) As the authors note, it is striking that both behavioural performance and spike patterns are altered during stim-off trials. They propose that "disruption of theta sequences during the initial experience in an environment is sufficient to have lasting effects," implying that rapid, experience-dependent plasticity is driven by sequential firing. Does this imply that if rats were previously trained on the task, subsequent stim-on and stim-off trials would yield different outcomes, with stim-off trials showing improved performance and intact theta sequences? For example, if the sequence of one-third stim-on, one-third stim-off, one-third stim-on were inverted to off-on-off, would theta sequences be expected to emerge, disappear, and potentially re-emerge? While I am not asking for additional experiments, I think the discussion could be extended in this aspect.

      Alternatively, could the number of stim-off trials (one third of the total) be insufficient to support learning/induce plasticity? In the controls, ~50-100 trials appear necessary to achieve high performance.

      (2) In line with the point above, the authors characterise the behavioural changes induced by MS optogenetic stimulation specifically as a "learning deficit," as rats failed to improve across 300 trials in an initially novel environment (W-maze). While they present this as complementary to prior demonstrations of impaired performance on previously learned tasks (Zutshi et al., 2018; Quirk et al., 2021; Etter et al., 2023; Petersen et al., 2020), an alternative interpretation is a working-memory deficit. This would produce the same behavioural pattern, with reference memory (the less cognitively demanding trials) remaining intact despite stimulation and concomitant changes in theta sequences. This interpretation would also be consistent with work in certain disease models, where reduced synaptic plasticity and working-memory deficits co-occur with preserved place coding despite impaired theta sequences (e.g., Viana da Silva et al., 2024; Donahue et al., 2025).

      (3) It was not immediately clear whether SWR-associated activity was derived from the interleaved ~15-min rest sessions in a rest box, or from periods of immobility or reward consumption in the maze (aSWR, as in Jadhav et al 2012). Regardless, it would be informative to compare aSWR events within the maze to rest-box SWRs that may occur during more prolonged slow-wave episodes (even if not full sleep). This contrasts with Liu et al. (2024), who analysed replay during ~1.5-h sleep sessions.

    3. Reviewer #2 (Public review):

      Summary:

      The authors of this study developed a closed-loop optogenetic stimulation system with high temporal precision in rats to examine the effect of medial septum (MS) stimulation on the disruption of hippocampal activity at both behavioral and compressed time scales. They found that this manipulation preserved hippocampus single-cell-level spatial coding but affected theta sequences and performance during a spatial alternation task. The performance deficits were observed during the more cognitively demanding component of the task and even persisted after the stimulation was turned off. However, the effects of this disruption were confined to locomotor periods and did not impact waking rest replay, even during the early phase of stimulation-on. Their conclusion is consistent with previous findings from the Pastalkova lab, where MS disruption (using different methods) affected theta sequences and task performance but spared replay (Wang et al., 2015; Wang et al., 2016). However, it differs from a recent study in which optogenetic disruption of EC inputs during running affected both theta sequences and replay (Liu et al., 2023).

      Strengths:

      The experiments were well designed and controlled, and the results were generally well presented.

      Weaknesses:

      Major concerns are primarily technical but also conceptual. To further increase the impact of this study by contrasting findings from different disruptions, it is necessary to better align the analysis and detection methods.

      Major concerns:

      (1) To show that MS disruption does not affect spatial tuning, the authors computed the KL divergence of tuning curves between stimulation-on and stimulation-off conditions. I have two main questions about this analysis:

      (1.1) The authors seem to impose stringent inclusion criteria requiring a large number of spikes and a strong concentration of tuning curves. These criteria may have selected strongly spatially tuned cells, which are typically more stable and potentially less vulnerable to perturbations. Based on the Figure 2 caption, it seems that fewer than 10% of cells were included in the KL divergence analysis, which is lower than the usual proportion of place cells reported in the literature. What is the rationale for using such strict inclusion criteria? What happens to the cells that are not as strongly tuned but are still identified as significant place cells?

      (1.2) The KL divergence was computed between stimulation-on and stimulation-off conditions within the same animal group. However, the authors also showed that MS stimulation had lasting effects on theta sequences and performance even during stimulation-off periods. Would that lasting effect also influence spatial tuning? Based on these questions, the authors should perform additional analyses that directly measure spatial tuning quality and compare results across control and experimental groups - for example, spatial information of spikes (Skaggs et al., 1996), tuning stability, field length, and decoding error during running.

      (2) The authors compared their results with those from Liu et al. (2023) and proposed that the different outcomes could be explained by different sites of disruption. However, the detection and quantification methods for theta sequences and replay differ substantially between the two studies, emphasizing different aspects of the phenomenon. I am not suggesting that either method is superior, but providing additional analyses using aligned detection methods would better support the authors' interpretations and benefit the field by enabling clearer comparisons across studies. In the current analysis, the power spectrum of the decoded ahead/behind distance only indicates that there is a rhythmic pattern, without specifying the decoding features at different theta phases. Moreover, the continuous non-local representations during ripples could include stationary representations of a location or zigzag representations that do not exhibit a linear sequential trace. Given that, the authors should show averaged decoding results corrected by the animal's actual position within theta cycles and compute a quadrant ratio. For replay analysis, they could use a linear fit (as in Liu et al., 2023) and report the proportion of significant replay events.

      (3) The finding that theta sequences and performance were impaired even during stimulation-off periods is particularly interesting and warrants deeper exploration. In the Discussion, the authors claim that this may arise from "the rapid plasticity engaged during early learning." However, this explanation does not fully account for the observation. Previous studies have shown that theta sequences can develop very rapidly (Feng et al., Foster lab, 2015; Zhou et al., Dragoi lab, 2025). If the authors hypothesize that rapid plasticity during early stimulation-on disrupts the theta sequence, then the plasticity window must also be short and terminate during the subsequent stimulation-off period. Otherwise, why can't animals redevelop theta sequences during stimulation-off? The authors should conduct additional analyses during the stimulation-off periods of the W-maze task. For example:

      (3.1) What is the spike-theta phase relationship? Do the phases return to normal or remain altered as during stimulation-on?

      (3.2) Is there a significant place-field remapping from stimulation-on to stimulation-off? (Supplementary Figure 3F includes only a small subset of cells; what if population vector correlations are computed across all cells, or Bayesian decoding of stimulation-on spikes is performed using stimulation-off tuning curves?)

      (3.3) The authors should also discuss why the stimulation-off epochs were not sufficient to support learning, and if the stimulation-off place cell sequences could have supported replay.

      (4) Citations and/or discussion of key studies relevant to the current work are missing: Wang et al. in Pastalkova lab 2015-2016 studies for disruption of theta sequence (but not place cell sequence) disrupting learning but not replay, Drieu et al. in Zugaro lab 2018 study on disruption of theta sequence affecting sleep replay, Farooq and Dragoi 2019 for association between a lack of theta sequence and presence of waking rest replay during postnatal development, etc. The authors should discuss what the conceptually new findings in the current study are, given the findings of the previous literature above.

      (5) The assessment of theta sequence is not state-of-the-art:

      (5.1) Detecting the peak of cross-correlograms between neurons (CCG) relates to behavioral timescale CCG, not the theta sequence one; for the theta sequence, the closest to zero local peak should be used instead.

      (5.2) How were other methods of detecting theta sequences performing on the stimulation-on/stimulation-off data: Bayesian decoding, firing sequences?

      (5.3) How was phase precession during stimulation-on/stimulation-off?

      (6) It would be important to calculate additional variables in the replay part of the study to compare the quality of replay across the 2 groups:

      (6.1) Proportion of significant replay events out of the detected multiunit events.

      (6.2) The average extent of trajectory depicted by the significant replay events in the targeted compared to the control, stimulation-on/stimulation-off.

    4. Reviewer #3 (Public review):

      Joshi et al. present an elegant and technically rigorous study examining how the temporal structure of hippocampal spiking during locomotion contributes to spatial learning. Using a closed-loop, theta phase-specific optogenetic manipulation of medial septal parvalbumin-expressing neurons in rats, the authors demonstrate that disrupting theta-timescale coordination impairs performance on the cognitively demanding component outbound trajectory of a spatial alternation task, while sparing hippocampal replay, place coding, and the simpler inbound learning. The work aims to dissociate the role of theta-associated temporal organization during navigation from sharp-wave ripple-associated replay during subsequent rest periods, providing a mechanistic link between theta sequences and learning. The findings have important implications for models of septo-hippocampal coordination and the functional segregation between online (theta) and offline (SWR) network states. That said, there are a few conceptual and methodological issues that need to be addressed.

      One concern is the overall novelty of this work; the dissociation between online temporal sequence and offline replay events following memory deficits has previously been shown by Wang et al., 2016 elife. While the authors discuss Lui et al., 2023, which demonstrates MEC activation of inhibitory neurons at gamma frequencies during locomotion disrupts theta sequences, subsequent replay and learning (line 65-66), they do not reference Wang et al., 2016 who performed a very similar study with MS pharmacological inactivation, and report large decreases in theta power, attenuated theta frequencies together with behavioural deficits but SWR replay persisted. Given strong similarities in the manipulation and findings, this study should be discussed.

      Along the same lines, it should be noted that Brandon et al. (2014, Neuron) demonstrated that hippocampal place codes can still form in novel environments despite MS inactivation and loss of theta, indicating that spatial representations can emerge without intact septal drive. Referencing this study would strengthen the discussion of how temporal coordination, rather than spatial coding per se, underlies the learning deficits observed here.

      The conclusion that disrupting "theta microstructure" impairs learning relies on the assumption that the observed behavioral deficits arise from altered temporal coding from within hippocampal CA1 only. However, optogenetic modulation of medial septal PV neurons influences multiple downstream regions (entorhinal cortex, retrosplenial cortex) via widespread GABAergic projections. While the authors do touch on this, their discussion should expand to include the network-level consequences of entorhinal grid-cell disruption and how this could affect temporal coding both online and offline.

      The finding that replay content, rate, and duration are unchanged is critical to the paper's claim of dissociation. However, the analysis is restricted to immobility on the track. Given evidence for distinct awake vs. sleep replay, confirming that off-track rest and post-session sleep replays are similarly unaffected would confirm the conclusions of the paper. If these data are unavailable, the limitation should be acknowledged explicitly. Moreover, statistical power for detecting subtle differences in replay organization or spatial bias should be added to the supplement (n of events per animal, variability across sessions).

      The exact protocol for optogenetic stimulation is a bit confusing. For the task, the first and final third (66%) of trials were disrupted and were only stimulated when away from the reward well and only when the animal was moving. What proportion of time within "stimulated" trials remained unstimulated? Why were only 66% of trials stimulated?

    5. Author response:

      We thank all reviewers for their overall assessment, thoughtful comments, and suggestions. We are working to address each reviewer’s comment in detail. In this provisional response, we provide clarifications regarding our experimental approach and the novelty of our work, and include additional analyses that we have performed since the submission of the manuscript. We are also happy to report that we have now shared the raw data, intermediate analysis files, and the complete repository to facilitate replication of the analysis and figures.

      Code repo: github.com/LorenFrankLab/ms_stim_analysis

      Data repo: dandiarchive.org/dandiset/001634

      Docker containers (see GitHub repo for use instructions):

      Database: https://hub.docker.com/r/samuelbray32/spyglass-db-ms_stim_analysis

      Python notebooks: https://hub.docker.com/r/samuelbray32/spyglass-hub-ms_stim_analysis

      (1) Novelty and contrast with earlier manipulations:

      We thank the reviewers for suggesting that we explicitly contrast our results with prior pharmacological (Wang et al., 2016; Wang et al., 2015; Koenig et al., 2011; Brandon et al., 2014), systemic (Robbe & Buzsaki 2009; Petersen and Buzsáki 2020), and behavioral (Drieu et al., 2018) manipulations that also assessed some of the physiological features we evaluated. We will add a discussion of these studies, which will help us emphasize both the insights and discrepancies observed using these prior approaches. We will also more clearly explain the the novelty and importance of our specific approach for temporally and physiologically precise manipulation. Specifically, our approach (closed-loop theta-phase stimulation during locomotion) provides a level of physiological specificity that made it possible to dissociate theta-state dynamics from other hippocampal processes. This in turn allowed us to address a question that has remained unresolved across prior studies: Are hippocampal spatial sequences during locomotion (i.e., theta sequences) necessary to learn a novel hippocampal-dependent task?

      (2) Additional analysis on SWRs during rest:

      since submitting the manuscript, we have conducted additional analysis on the rate and length of SWRs in the rest box and found that their rate and length are also indistinguishable between targeted and control animals (effect of manipulation between control and targeted animals; rSWR rate: p=0.45; rSWR length: p=0.94, mixed effect model). We also find evidence for sequential neural representations in the rest box, when the encoding was performed in the behavioral arena. Example trajectories are shown below. These results are consistent with our observations on SWRs rate, length, and content in the behavioral arena. Additionally, we are in the process of evaluating and quantifying the results of decoding the rSWRs and will include those in the next version of the manuscript.

      Author response image 1.

      Sequential replay events observed in the rest box

      (3) Theta sequence measurement in the absence of theta:

      In the next version of the manuscript, we will explicitly explain why our manipulation makes it is more appropriate to measure sequential hippocampal representations during locomotion (i.e., theta sequences) without using theta oscillation or an epoch-averaged relatively large sliding window as a reference. The key insight here is that our manipulation suppresses theta and thus makes it difficult or impossible to accurately identify theta phase. We understand that theta-phase based approaches were used in prior work; however, these prior analyses may have confounded the absence of hippocampal theta sequences during locomotion by the inability to detect theta oscillatory phase reliably. We will show that our method of using clusterless Bayesian decoding in which we estimate the decoded position at every 2ms timestep is indeed able to capture endogenous hippocampal sequences even without imposing any requirements of aligning to theta oscillations, thus providing an unbiased estimate of the rhythmicity of hippocampal spatial representations.

      (4) Additional analysis on place cell stability and tuning:

      We thank the reviewer for this question. For the KL divergence analysis, we have imposed a spike-count criterion (100 spikes for each interval type —stimulation-off, stimulation-on, and the stimulus sub-interval) and a coverage criterion (50% HPD of the units’ spatial firing distribution was contained within 40cm on the linear track and 100cm on the w-track). These criteria were chosen to ensure that spatial tuning curves were sufficiently well sampled and localized to allow reliable estimation of KL divergence, which is particularly sensitive to noise arising from low spike counts or diffuse firing. Based on the reviewer’s suggestion, we have relaxed the unit inclusion criteria for KL divergence by relaxing the criteria for number of spikes and spatial coverage criterion to include more weakly tuned place cells and replicated our results (p=.146). Further, we have also evaluated the stability of place field order between stimulation-on and stimulation-off conditions using more standard methods (as in Wang et. al., 2015; spearman correlation of place field order, control vs targeted, p = .920, t-test). These results are consistent with our observations about place field stability during stimulation-off and stimulation-on conditions (Fig. 2F).

      Author response image 2.

      Spearman correlation of place field order during stimulation-on and stimulation-off conditions.

    1. eLife Assessment

      This is a useful study that investigates the role of the long non-coding RNA Dreg1 for the development, differentiation, or maintenance of group 2 ILC (ILC2). The authors generate Dreg1-/- mice and show a reduction of group 2 innate lymphoid cells (ILC2). However, the strength of evidence supporting the impact of Dreg1 on Gata3 expression, a transcription factor required for ILC2 cell fate decisions, and the cell-intrinsic requirement of Dreg1 for ILC2 remain incomplete. This study will be of interest to immunologists.

    2. Reviewer #1 (Public review):

      Summary:

      This study examines the role of the long non-coding RNA Dreg1 in regulating Gata3 expression and ILC2 development. Using Dreg1-deficient mice, the authors show a selective loss of ILC2s but not T or NK cells, suggesting a lineage-specific requirement for Dreg1. By integrating public chromatin and TF-binding datasets, they propose a Tcf1-Dreg1-Gata3 regulatory axis. The topic is relevant for understanding epigenetic regulation of ILC differentiation.

      Strengths:

      (1) Clear in vivo evidence for a lineage-specific role of Dreg1.

      (2) Comprehensive integration of genomic datasets.

      (3) Cross-species comparison linking mouse and human regulatory regions.

      Weaknesses:

      (1) Mechanistic conclusions remain correlative, relying on public data.

      (2) Lack of direct chromatin or transcriptional validation of Tcf1-mediated regulation.

      (3) Human enhancer function is not experimentally confirmed.

      (4) Insufficient methodological detail and limited mechanistic discussion.

    3. Reviewer #2 (Public review):

      The authors investigate the role of the long non-coding RNA Dreg1 for the development, differentiation, or maintenance of group 2 ILC (ILC2). Dreg1 is encoded close to the Gata3 locus, a transcription factor implicated in the differentiation of T cells and ILC, and in particular of type 2 immune cells (i.e., Th2 cells and ILC2). The center of the paper is the generation of a Dreg1-deficient mouse. While Dreg1-/- mice did not show any profound ab T or gd T cell, ILC1, ILC3, and NK cell phenotypes, ILC2 frequencies were reduced in various organs tested (small intestine, lung, visceral adipose tissue). In the bone marrow, immature ILC2 or ILC2 progenitors were reduced, whereas a common ILC progenitor was overrepresented, suggesting a differentiation block. Using ATAC-seq, the authors find that the promoter of Dreg1 is open in early lymphoid progenitors, and the acquisition of chromatin accessibility downstream correlates with increased Dreg1 expression in ILC2 progenitors. Examining publicly available Tcf1 CUT&Run data, they find that Tcf1 was specifically bound to the accessible sites of the Dreg1 locus in early innate lymphoid progenitors. Finally, the syntenic region in the human genome contains two non-coding RNA genes with an expression pattern resembling mouse Dreg1.

      The topic of the manuscript is interesting. However, there are various limitations that are summarized below.

      (1) The authors generated a new mouse model. The strategy should be better described, including the genetic background of the initially microinjected material. How many generations was the targeted offspring backcrossed to C57BL/6J?

      (2) The data is obtained from mice in which the Dreg1 gene is deleted in all cells. A cell-intrinsic role of Dreg1 in ILC2 has not been demonstrated. It should be shown that Dreg1 is required in ILC2 and their progenitors.

      (3) The data on how Dreg1 contributes to the differentiation and or maintenance of ILC2 is not addressed at a very definitive level. Does Dreg1 affect Gata3 expression, mRNA stability, or turnover in ILC2? Previous work of the authors indicated that knockdown of Dreg1 does not affect Gata3 expression (PMID: 32970351).

      (4) How Dreg1 exactly affects ILC2 differentiation remains unclear.

    1. eLife Assessment

      This study presents a platform to implement closed-loop experiments in mice based on auditory feedback. The authors provide convincing evidence that their platform enables a variety of closed-loop experiments using neural or movement signals, indicating that it will be a valuable resource to the neuroscience community. The paper could be strengthened by the addition of additional tutorials, such as on how to run an experiment.

    2. Reviewer #1 (Public review):

      Summary:

      The authors provide a resource to the systems neuroscience community by offering their Python-based CLoPy platform for closed-loop feedback training. In addition to using neural feedback, as is common in these experiments, they include a capability to use real-time movement extracted from DeepLabCut as the control signal. The methods and repository are detailed for those who wish to use this resource. Furthermore, they demonstrate the efficacy of their system through a series of mesoscale calcium imaging experiments. These experiments use a large number of cortical regions for the control signal in the neural feedback setup, while the movement feedback experiments are analyzed more extensively. The revised preprint has improved substantially upon the previous submission.

      Strengths:

      The primary strength of the paper is the availability of their CLoPy platform. Currently, most closed-loop operant conditioning experiments are custom built by each lab, and carry a relatively large startup cost to get running. This platform lowers the barrier to entry for closed-loop operant conditioning experiments, in addition to making the experiments more accessible to those with less technical expertise.

      Another strength of the paper is the use of many different cortical regions as control signals for the neurofeedback experiments. Rodent operant conditioning experiments typically record from the motor cortex, and maybe one other region. Here, the authors demonstrate that mice can volitionally control many different cortical regions not limited to those previously studied, recording across many regions in the same experiment. This demonstrates the relative flexibility of modulating neural dynamics, including in non-motor regions.

      Finally, adapting the closed-loop platform to use real-time movement as a control signal is a nice addition. Incorporating movement kinematics into operant conditioning experiments has been a challenge due to the increased technical difficulties of extracting real-time kinematic data from video data at a latency where it can be used as a control signal for operant conditioning. In this paper, they demonstrate that the mice can learn the task using their forelimb position, at a rate that is quicker than the neurofeedback experiments.

      Weaknesses:

      Many of the original weaknesses have been addressed in the revised preprint.

      While the dataset contains an impressive amount of animals and cortical regions for the neurofeedback experiment, my excitement for these experiments is tempered by the relative incompleteness of the dataset.

      Additionally, adoption of the platform may be hindered by the absence of a tutorial on how to run a session.

    3. Reviewer #2 (Public review):

      Summary:

      In this work, Gupta & Murphy present several parallel efforts. On one side, they present the hardware and software they use to build a head-fixed mouse experimental setup that they use to track in "real-time" the calcium activity in one or two spots at the surface of the cortex. On the other side, they present another setup that they use to take advantage of the "real-time" version of DeepLabCut with their mice. The hardware and software that they used/develop is described at length, both in the article and in a companion GitHub repository. Next, they present experimental work that they have done with these two setups, training mice to max out a virtual cursor to obtain a reward, by taking advantage of auditory tone feedback that is provided to the mice as they modulate either (1) their local cortical calcium activity, or (2) their limb position.

      Strengths:

      This work illustrates the fact that thanks to readily available experimental building blocks, body movement and calcium imaging can be carried out using readily available components, including imaging the brain using an incredibly cheap consumer electronics RGB camera (RGB Raspberry Pi Camera). It is a useful source of information for researchers that may be interested in building a similar setup, given the highly detailed overview of the system. Finally, it further confirms previous findings regarding the operant conditioning of the calcium dynamics at the surface of the cortex (Clancy et al. 2020) and suggests an alternative based on deeplabcut to the motor tasks that aim to image the brain at the mesoscale during forelimb movements (Quarta et al. 2022).

      Weaknesses:

      This work covers 3 separate research endeavors: (1) The development of two separate setups, their corresponding software. (2) A study that is highly inspired from the Clancy et al. 2021 paper on the modulation of the local cortical activity measured through a mesoscale calcium imaging setup. (3) A study of the mesoscale dynamics of the cortex during forelimb movements learning. Sadly, the analyses of the physiological data appears incomplete, and more generally, the paper shows weaknesses regarding several points:

      The behavioral setups that are presented are representative of the state of the art in the field of mesoscale imaging/head fixed behavior community, rather than a highly innovative design. Still, they definitely have value as a starting point for laboratories interested in implementing such approaches.

      Throughout the paper, there are several statements that point out how important it is to carry out this work in a closed-loop setting with an auditory feedback, but sadly there is no "no feedback" control in cortical conditioning experiments, while there is a no-feedback condition in the forelimb movement study, which shows that learning of the task can be achieved in the absence of feedback.

      The analysis of the closed-loop neuronal data behavior lacks controls. Increased performance can be achieved by modulating actively only one of the two ROIs, this is not really analyzed, while this finding which does not match previous reports (Clancy et al. 2020) would be important to further examine.

    4. Reviewer #3 (Public review):

      Summary:

      The study demonstrates the effectiveness of a cost-effective closed-loop feedback system for modulating brain activity and behavior in head-fixed mice. Authors have tested real-time closed-loop feedback system in head-fixed mice two types of graded feedback: 1) Closed-loop neurofeedback (CLNF), where feedback is derived from neuronal activity (calcium imaging), and 2) Closed-loop movement feedback (CLMF), where feedback is based on observed body movement. It is a python based opensource system, and the authors call it CLoPy. Authors also claim to provide all software, hardware schematics, and protocols to adapt it to various experimental scenarios. This system is capable and can be adapted for a wide use case scenarios.

      Authors have shown that their system can control both positive (water drop) and negative reinforcement (buzzer-vibrator). This study also shows that using the closed-loop system, mice have shown to better performance, learnt arbitrary tasks and can adapt to changes in the rules as well. By integrating real-time feedback based on cortical GCaMP imaging and behavior tracking authors have provided strong evidence that such closed-loop systems can be instrumental in exploring the dynamic interplay between brain activity and behavior.

      Strengths:

      Simplicity of feedback systems design. Simplicity of implementation and potential adoption.

      Weaknesses:

      Long latencies, due to slow Ca2+ dynamics and slow imaging (15 FPS), may limit the application of the system.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Public reviews:

      Reviewer #1 (Public review):

      Summary: 

      The authors provide a resource to the systems neuroscience community, by offering their Python-based CLoPy platform for closed-loop feedback training. In addition to using neural feedback, as is common in these experiments, they include a capability to use real-time movement extracted from DeepLabCut as the control signal. The methods and repository are detailed for those who wish to use this resource. Furthermore, they demonstrate the efficacy of their system through a series of mesoscale calcium imaging experiments. These experiments use a large number of cortical regions for the control signal in the neural feedback setup, while the movement feedback experiments are analyzed more extensively.

      Strengths:

      The primary strength of the paper is the availability of their CLoPy platform. Currently, most closed-loop operant conditioning experiments are custom built by each lab and carry a relatively large startup cost to get running. This platform lowers the barrier to entry for closed-loop operant conditioning experiments, in addition to making the experiments more accessible to those with less technical expertise.

      Another strength of the paper is the use of many different cortical regions as control signals for the neurofeedback experiments. Rodent operant conditioning experiments typically record from the motor cortex and maybe one other region. Here, the authors demonstrate that mice can volitionally control many different cortical regions not limited to those previously studied, recording across many regions in the same experiment. This demonstrates the relative flexibility of modulating neural dynamics, including in non-motor regions.

      Finally, adapting the closed-loop platform to use real-time movement as a control signal is a nice addition. Incorporating movement kinematics into operant conditioning experiments has been a challenge due to the increased technical difficulties of extracting real-time kinematic data from video data at a latency where it can be used as a control signal for operant conditioning. In this paper they demonstrate that the mice can learn the task using their forelimb position, at a rate that is quicker than the neurofeedback experiments.

      Weaknesses:

      There are several weaknesses in the paper that diminish the impact of its strengths. First, the value of the CLoPy platform is not clearly articulated to the systems neuroscience community. Similarly, the resource could be better positioned within the context of the broader open-source neuroscience community. For an example of how to better frame this resource in these contexts, I recommend consulting the pyControl paper. Improving this framing will likely increase the accessibility and interest of this paper to a less technical neuroscience audience, for instance by highlighting the types of experimental questions CLoPy can enable.

      We appreciate the editor’s feedback regarding the clarity of the CLoPy platform's value and its positioning within the broader neuroscience community. We agree and understand the importance of effectively communicating the utility of CLoPy to both the systems neuroscience field and the wider open-source neuroscience community.

      To address this, we have revised the introduction and discussion sections of the manuscript to more clearly articulate the unique contributions of the CLoPy platform. Specifically:

      (1) We have emphasized how CLoPy can address experimental questions in systems neuroscience by highlighting its ability to enable real-time closed-loop experiments, such as investigating neural dynamics during behavior or studying adaptive cortical reorganization after injury. These examples are aimed at demonstrating its practical utility to the neuroscience audience.

      (2) We have positioned CLoPy within the broader open-source neuroscience ecosystem, drawing comparisons to similar resources like pyControl. We describe how CLoPy complements existing tools by focusing on real-time optical feedback and integration with genetically encoded indicators, which are becoming increasingly popular in systems neuroscience. We also emphasize its modularity and ease of adoption in experimental settings with limited resources.

      (3) To make the manuscript more accessible to a less technically inclined audience, we have restructured certain sections to focus on the types of experiments CLoPy enables, rather than the technical details of the implementation.

      We have consulted the pyControl paper, as suggested, and have used it as a reference point to improve the framing of our resource. We believe these changes will increase the accessibility and appeal of the paper to a broader neuroscience audience.

      While the dataset contains an impressive amount of animals and cortical regions for the neurofeedback experiment, and an analysis of the movement-feedback experiments, my excitement for these experiments is tempered by the relative incompleteness of the dataset, as well as its description and analysis in the text. For instance, in the neurofeedback experiment, many of these regions only have data from a single mouse, limiting the conclusions that can be drawn. Additionally, there is a lack of reporting of the quantitative results in the text of the document, which is needed to better understand the degree of the results. Finally, the writing of the results section could use some work, as it currently reads more like a methods section.

      Thank you for your thoughtful and constructive feedback on our manuscript. We appreciate the time and effort you took to review our work and provide detailed suggestions for improvement. Below, we address the key points raised in your review:

      (1) Dataset Completeness: We acknowledge that some of the neurofeedback experiments include data from only a single mouse for some cortical regions while for some cortical regions, there are several animals. This was due to practical constraints during the study, and we understand the limitations this poses for drawing broad conclusions. We felt it was still important to include these data sets with smaller sample sizes as they might be useful for others pursuing this direction in the future. To address this, we have revised the text to explicitly acknowledge these limitations and clarify that the results for some regions are exploratory in nature. We believe our flexible tool will provide a means for our lab and others include more animals representing additional cortical regions in future studies. Importantly, we have included all raw and processed data as well as code for future analysis.

      (2) Quantitative Results: We recognize the importance of reporting quantitative results in the text for better clarity and interpretation. In response, we have added more detailed description of the quantitative findings from both the neurofeedback and movement-feedback experiments. This will include effect sizes, statistical measures, and key numerical results to provide a clearer understanding of the degree and significance of the observed effects.

      (3) Results Section Writing: We appreciate your observation that parts of the results section read more like a methods section. To improve clarity and focus, we have restructured the results section to present the findings in a more concise and interpretative manner, while moving overly detailed descriptions of experimental procedures to the methods section.

      Suggestions for improved or additional experiments, data or analyses:

      Not necessary for this paper, but it would be interesting to see if the CLNF group could learn without auditory feedback.

      This is a great suggestion and certainly something that could be done in the future.

      There are no quantitative results in the results section. I would add important results to help the reader better interpret the data. For example, in: "Our results indicated that both training paradigms were able to lead mice to obtain a significantly larger number of rewards over time," You could show a number, with an appropriate comparison or statistical test, to demonstrate that learning was observed.

      Thank you for pointing this out. We have mentioned quantification values in the results now, along with being mentioned in the figure legends, and we are quoting it in following sentences. “A ΔF/F0 threshold value was calculated from a baseline session on day 0 that would have allowed 25% performance. Starting from this basal performance of around 25% on day 1, mice (CLNF No-rule-change, N=23, n=60 and CLNF Rule-change, N=17, n=60) were able to discover the task rule and perform above 80% over ten days of training (Figure 4A, RM ANOVA p=2.83e-5), and Rule-change mice even learned a change in ROIs or rule reversal (Figure 4A, RM ANOVA p=8.3e-10, Table 5 for different rule changes). There were no significant differences between male and female mice (Supplementary Figure 3A).”

      For: "Performing this analysis indicated that the Raspberry Pi system could provide reliable graded feedback within ~63 {plus minus} 15 ms for CLNF experiments." The LED test shows the sending of the signal, but the actual delay for the audio generation might be longer. This is also longer than the 50 ms mentioned in the abstract.

      We appreciate the reviewer’s insightful comment. The latency reported (~63ms) was measured using the LED test, which captures the time from signal detection to output triggering on the Raspberry Pi GPIO. We agree that the total delay for auditory feedback generation could include an additional latency component related to the digital-to-analog conversion and speaker response. In our setup, we employ a fast Audiostream library written in C to generate the audio signal and expect the delay contribution to be negligible compared to the GPIO latency. Though we did not do this, it can be confirmed by an oscilloscope-based pilot measurement (for additional delay calculation). We have updated the manuscript to clarify that the 63 ± 15 ms value reflects the GPIO-triggered output latency, and we have revised the abstract to accurately state the delay as “~63 ms” rather than 50 ms. This ensures consistency and avoids underestimation of the latency. We have corrected the LED latency for CLNF and CLMF experiments in the abstract as well.

      It could be helpful to visualize an individual trial for each experiment type, for instance how the audio frequency changes as movement speed / calcium activity changes.

      We have added Supplementary Figure 8 that contains this data where you can see the target cortical activity trace, target paw speed, rewards, along with the audio frequency generated.

      The sample sizes are small (n=1) for a few groups. I am excited by the variety of regions recorded, so it could be beneficial for the authors to collect a few more animals to beef up the sample sizes.

      We've acknowledged that some of the sample sizes are small. Importantly, we have included raw and processed data as well as code for future analysis. We felt it was still important to still include these data sets with smaller sample sizes as they might be useful for others pursuing this direction in the future.

      I am curious as to why 60 trials sessions were used. Was it mostly for the convenience of a 30 min session, or were the animals getting satiated? If the former, would learning have occurred more rapidly with longer sessions?

      This is a great observation and the answer is it was mostly due to logistical reasons. We tried to not keep animals headfixed for more than 45 minutes in each session as they become less engaged with long duration headfixed sessions. After headfixing them, it takes about 15 minutes to get the experiment going and therefore 30 - 40 minutes long recorded sessions seemed appropriate before they stop being engaged or before they get satiated in the task. We provided supplemental water after the sessions and we observed that they consumed water after the sessions so they were not fully satiated during the sessions even when they performed well in the task and got maximum rewards. We also had inter-trial rest periods of 10s that elongated the session duration. We think it would be interesting to explore the relationship between session duration(number of trials) and task learning progression over the days in a separate study.

      Figure 4E is interesting, it seems like the changes in the distribution of deltaF was in both positive and negative directions, instead of just positive. I'd be curious as to the author's thoughts as to why this is the case. Relatedly, I don't see Figure 4E, and a few other subplots, mentioned in the text. As a general comment, I would address each subplot in the text.

      We have split Figure 4 into two to keep the figures more readable. Previous Figure 4E-H are now Figure 5A-D in the revised manuscript. The online real-time CLNF sessions were using a moving window average to calculate ΔF/F<sub>0</sub>  and the figures were generated by averaging the whole recorded sessions. We have added text in Methods under “Online ΔF/F<sub>0</sub>calculation” and “Offline ΔF/F<sub>0</sub> calculation” sections making it clear about how we do our ΔF/F<sub>0</sub> normalization based on average fluorescence over the entire session. Using this method of normalization does increase the baseline so that some peaks appear to be below zero. Additionally, it is unclear what strategy animals are employing to achieve the rule specific target activity. The task did not constrain them to have a specific strategy for cortical activation - they were rewarded as long as they crossed the threshold in target ROI(s). For example, in 2-ROI experiments, to increase ROI1-ROI2 target activity, they could increase activity of ROI1 relative to ROI2 or decreased activity of ROI1 relative to ROI1 - both would have led to a reward as long as the result crossed the threshold.

      We have now addressed and added reference to the figures in the text in Results under “Mice can explore and learn an arbitrary task, rule, and target conditions” and “Mice can rapidly adapt to changes in the task rule” sections - thanks for pointing this out.

      For: "In general, all ROIs assessed that encompassed sensory, pre-motor, and motor areas were capable of supporting increased reward rates over time," I would provide a visual summary showing the learning curves for the different types of regions.

      We have rewritten this section to emphasize that these conclusions were based on pooled data from multiple regions of interest. The sample sizes for each type of region are different and some are missing. We believe it would be incomplete and not comparable to present this as a regular analysis since the sample sizes were not balanced. We would be happy to dive deeper into this and point to the raw and processed dataset if anyone would like to explore this further by GitHub or other queries.

      Relatedly, I would further explain the fast vs slow learners, and if they mapped onto certain regions.

      Mice were categorized into fast or slow learners based on the slope of learning over days (reward progression over the days) as shown in Supplementary Figure 3C,D. Our initial aim was not to probe cortical regions that led to fast vs slow learning but this was a grouping we did afterwards. Based on the analysis we did, the fast learners included the sensory (V1), somatosensory (BC, HL), and motor (M1, M2) areas, while the slow learners included the motor (M1, M2), and higher order (TR, RL) cortical areas. Testing all dorsal cortical areas would be prudent to establish their role in fast or slow learning and it is an interesting future direction.

      Also I would make the labels for these plots (e.g. Supp Fig3) more intuitive, versus the acronyms currently used.

      We have made more expressive labels and explained the acronyms below the Supplementary Figure 3.

      The CLMF animals showed a decrease in latency across learning, what about the CLNF animals? There is currently no mention in the text or figures.

      We have now incorporated the CLNF task latency data into both the Results text and Figure 4C. Briefly, task latency decreased as performance improved, increased following a rule change, and then decreased again as the animals relearned the task. The previous Figure 4C has been updated to Figure 4D, and the former Figure 4D has been moved to Supplementary Figure 4E.

      Reviewer #2 (Public review):

      Summary:

      In this work, Gupta & Murphy present several parallel efforts. On one side, they present the hardware and software they use to build a head-fixed mouse experimental setup that they use to track in "real-time" the calcium activity in one or two spots at the surface of the cortex. On the other side, the present another setup that they use to take advantage of the "real-time" version of DeepLabCut with their mice. The hardware and software that they used/develop is described at length, both in the article and in a companion GitHub repository. Next, they present experimental work that they have done with these two setups, training mice to max out a virtual cursor to obtain a reward, by taking advantage of auditory tone feedback that is provided to the mice as they modulate either (1) their local cortical calcium activity, or (2) their limb position.

      Strengths:

      This work illustrates the fact that thanks to readily available experimental building blocks, body movement and calcium imaging can be carried using readily available components, including imaging the brain using an incredibly cheap consumer electronics RGB camera (RGB Raspberry Pi Camera). It is a useful source of information for researchers that may be interested in building a similar setup, given the highly detailed overview of the system. Finally, it further confirms previous findings regarding the operant conditioning of the calcium dynamics at the surface of the cortex (Clancy et al. 2020) and suggests an alternative based on deeplabcut to the motor tasks that aim to image the brain at the mesoscale during forelimb movements (Quarta et al. 2022).

      Weaknesses:

      This work covers 3 separate research endeavors: (1) The development of two separate setups, their corresponding software. (2) A study that is highly inspired from the Clancy et al. 2020 paper on the modulation of the local cortical activity measured through a mesoscale calcium imaging setup. (3) A study of the mesoscale dynamics of the cortex during forelimb movements learning. Sadly, the analyses of the physiological data appears uncomplete, and more generally the paper tends to offer overstatements regarding several points:

      In contrast to the introductory statements of the article, closed-loop physiology in rodents is a well-established research topic. Beyond auditory feedback, this includes optogenetic feedback (O'Connor et al. 2013, Abbasi et al. 2018, 2023), electrical feedback in hippocampus (Girardeau et al. 2009), and much more.

      We have included and referenced these papers in our introduction section (quoted below) and rephrased the part where our previous text indicated there are fewer studies involving closed-loop physiology.

      “Some related studies have demonstrated the feasibility of closed-loop feedback in rodents, including hippocampal electrical feedback to disrupt memory consolidation (Girardeau et al.2009), optogenetic perturbations of somatosensory circuits during behavior (O'Connor et al.2013), and more recent advances employing targeted optogenetic interventions to guide behavior (Abbasi et al. 2023).”

      The behavioral setups that are presented are representative of the state of the art in the field of mesoscale imaging/head fixed behavior community, rather than a highly innovative design. In particular, the closed-loop latency that they achieve (>60 ms) may be perceived by the mice. This is in contrast with other available closed-loop setups.

      We thank the reviewer for this thoughtful comment and fully agree that our closed-loop latency is larger than that achieved in some other contemporary setups. Our primary aim in presenting this work, however, is not to compete with the lowest possible latencies, but to provide an open-source, accessible, and flexible platform that can be readily adopted by a broad range of laboratories. By building on widely available and lower-cost components, our design lowers the barrier of entry for groups that wish to implement closed-loop imaging and behavioral experiments, while still achieving latencies well within the range that can support many biologically meaningful applications.

      For example, our latency (~60 ms) remains compatible with experimental paradigms such as:

      Motor learning and skill acquisition, where sensorimotor feedback on the scale of tens to hundreds of milliseconds is sufficient to modulate performance.

      Operant conditioning and reward-based learning, in which reinforcement timing windows are typically broader and not critically dependent on sub-20 ms latencies.

      Cortical state dependent modulation, where feedback linked to slower fluctuations in brain activity (hundreds of milliseconds to seconds) can provide valuable insight.

      Studies of perception and decision-making, in which stimulus response associations often unfold on behavioral timescales longer than tens of milliseconds.

      We believe that emphasizing openness, affordability, and flexibility will encourage widespread adoption and adaptation of our setup across laboratories with different research foci. In this way, our contribution complements rather than competes with ultra-low-latency closed-loop systems, providing a practical option for diverse experimental needs.

      Through the paper, there are several statements that point out how important it is to carry out this work in a closed-loop setting with an auditory feedback, but sadly there is no "no feedback" control in cortical conditioning experiments, while there is a no-feedback condition in the forelimb movement study, which shows that learning of the task can be achieved in the absence of feedback.

      We fully agree that such a control would provide valuable insight into the contribution of feedback to learning in the CLNF paradigm. In designing our initial experiments, we envisioned multiple potential control conditions, including No-feedback and Random-feedback. However, our first and primary objective was to establish whether mice could indeed learn to modulate cortical ROI activation through auditory feedback, and to further investigate this across multiple cortical regions. For this reason, we focused on implementing the CLNF paradigm directly, without the inclusion of these additional control groups. To broaden the applicability of the system, we subsequently adapted the platform to the CLMF experiments, where we did incorporate a No-feedback group. These results, as the reviewer notes, strengthen the evidence for the role of feedback in shaping task performance. We agree that the inclusion of a No-feedback control group in the CLNF paradigm will be crucial in future studies to further dissect the specific contribution of feedback to cortical conditioning.

      The analysis of the closed-loop neuronal data behavior lacks controls. Increased performance can be achieved by modulating actively only one of the two ROIs, this is not clearly analyzed (for instance looking at the timing of the calcium signal modulation across the two ROIs. It seems that overall ROIs1 and 2 covariate, in contrast to Clancy et al. 2020. How can this be explained?

      We agree that the possibility of increased performance being driven by modulation of a single ROI is an important consideration. Our study indeed began with 1-ROI closed-loop experiments. In those early experiments, while we did observe animals improving performance across days, we realized that daily variability in ongoing cortical GCaMP activity could lead to fluctuations in threshold-crossing events. The 2-ROI design was subsequently introduced to reduce this variability, as the target activity was defined as the relative activity between the two ROIs (e.g., ROI1 – ROI2). This approach offered a more stable signal by normalizing ongoing fluctuations. In our analysis of the early 2-ROI experiments, we observed that animals adopted diverging strategies to achieve threshold crossings. Specifically, some animals increased activity in ROI1 relative to ROI2, while others decreased activity in ROI2 to accomplish the same effect. Once discovered, each animal consistently adhered to its chosen strategy throughout subsequent training sessions. This was an early and intriguing observation, but as the experiments were not originally designed to systematically test this effect, we limited our presentation to the analysis of a small number of animals (shown in Figure 11). We have added details about this observation in our Results section as well, quoted below-

      “In the 2-ROI experiment where the task rule required “ROI1 - ROI2” activity to cross a threshold for reward delivery, mice displayed divergent strategies. Some animals predominantly increased ROI1 activity, whereas others reduced ROI2 activity, both approaches leading to successful threshold crossing (Figure 11)”.

      We hope this clarifies how the use of two ROIs helps explain the apparent covariation of the signals, and why some divergence from the observations of Clancy et al. (2020) may be expected.

      Reviewer #3 (Public review):

      Summary:

      The study demonstrates the effectiveness of a cost-effective closed-loop feedback system for modulating brain activity and behavior in head-fixed mice. Authors have tested real-time closed-loop feedback system in head-fixed mice two types of graded feedback: 1) Closed-loop neurofeedback (CLNF), where feedback is derived from neuronal activity (calcium imaging), and 2) Closed-loop movement feedback (CLMF), where feedback is based on observed body movement. It is a python based opensource system, and authors call it CLoPy. The authors also claim to provide all software, hardware schematics, and protocols to adapt it to various experimental scenarios. This system is capable and can be adapted for a wide use case scenario.

      Authors have shown that their system can control both positive (water drop) and negative reinforcement (buzzer-vibrator). This study also shows that using the close loop system mice have shown better performance, learnt arbitrary task and can adapt to change in the rule as well. By integrating real-time feedback based on cortical GCaMP imaging and behavior tracking authors have provided strong evidence that such closed-loop systems can be instrumental in exploring the dynamic interplay between brain activity and behavior.

      Strengths:

      Simplicity of feedback systems designed. Simplicity of implementation and potential adoption.

      Weaknesses:

      Long latencies, due to slow Ca2+ dynamics and slow imaging (15 FPS), may limit the application of the system.

      We appreciate the reviewer’s comment and agree that latency is an important factor in our setup. The latency arises partly from the inherent slow kinetics of calcium signaling and GCaMP6s, and partly from the imaging rate of 15 FPS (every 66 ms). These limitations can be addressed in several ways: for example, using faster calcium indicators such as GCaMP8f, or adapting the system to electrophysiological signals, which would require additional processing capacity. In our implementation, image acquisition was fixed at 15 FPS to enable real-time frame processing (256 × 256 resolution) on Raspberry Pi 4B devices. With newer hardware, such as the Raspberry Pi 5, substantially higher acquisition and processing rates are feasible (although we have not yet benchmarked this extensively). More powerful platforms such as Nvidia Jetson or conventional PCs would further support much faster data acquisition and processing.

      Major comments:

      (1) Page 5 paragraph 1: "We tested our CLNF system on Raspberry Pi for its compactness, general-purpose input/output (GPIO) programmability, and wide community support, while the CLMF system was tested on an Nvidia Jetson GPU device." Can these programs and hardware be integrated with windows-based system and a microcontroller (Arduino/ Tency). As for the broad adaptability that's what a lot of labs would already have (please comment/discuss)?

      While we tested our CLNF system on a Raspberry Pi (chosen for its compactness, GPIO programmability, and large user community) and our CLMF system on an Nvidia Jetson GPU device (to leverage real-time GPU-based inference), the underlying software is fully written in Python. This design choice makes the system broadly adaptable: it can be run on any device capable of executing Python scripts, including Windows-based PCs, Linux machines, and macOS systems. For hardware integration, we have confirmed that the framework works seamlessly with microcontrollers such as Arduino or Teensy, requiring only minor modifications to the main script to enable sending and receiving of GPIO signals through those boards. In fact, we are already using the same system in an in-house project on a Linux-based PC where an Arduino is connected to the computer to provide GPIO functionality. Furthermore, the system is not limited to Raspberry Pi or Arduino boards; it can be interfaced with any GPIO-capable devices, including those from Adafruit and other microcontroller platforms, depending on what is readily available in individual labs. Since many neuroscience and engineering laboratories already possess such hardware, we believe this design ensures broad accessibility and ease of integration across diverse experimental setups.

      (2) Hardware Constraints: The reliance on Raspberry Pi and Nvidia Jetson (is expensive) for real-time processing could introduce latency issues (~63 ms for CLNF and ~67 ms for CLMF). This latency might limit precision for faster or more complex behaviors, which authors should discuss in the discussion section.

      In our system, we measured latencies of approximately ~63 ms for CLNF and ~67 ms for CLMF. While such latencies indeed limit applications requiring millisecond precision, such as fast whisker movements, saccades, or fine-reaching kinematics, we emphasize that many relevant behaviors, including postural adjustments, limb movements, locomotion, and sustained cortical state changes, occur on timescales that are well within the capture range of our system. Thus, our platform is appropriate for a range of mesoscale behavioral studies that probably needs to be discussed more. It is also important to note that these latencies are not solely dictated by hardware constraints. A significant component arises from the inherent biological dynamics of the calcium indicator (GCaMP6s) and calcium signaling itself, which introduce slower temporal kinetics independent of processing delays. Newer variants, such as GCaMP8f, offer faster response times and could further reduce effective biological latency in future implementations.

      With respect to hardware, we acknowledge that Raspberry Pi provides a low-cost solution but contributes to modest computational delays, while Nvidia Jetson offers faster inference at higher cost. Our choice reflects a balance between accessibility, cost-effectiveness, and performance, making the system deployable in many laboratories. Importantly, the modular and open-source design means the pipeline can readily be adapted to higher-performance GPUs or integrated with electrophysiological recordings, which provide higher temporal resolution. Finally, we agree with the reviewer that the issue of latency highlights deeper and interesting questions regarding the temporal requirements of behavior classification. Specifically, how much data (in time) is required to reliably identify a behavior, and what is the minimum feedback delay necessary to alter neural or behavioral trajectories? These are critical questions for the design of future closed-loop systems and ones that our work helps frame.

      We have added a slightly modified version of our response above in the discussion section under “Experimental applications and implications”.

      (3) Neurofeedback Specificity: The task focuses on mesoscale imaging and ignores finer spatiotemporal details. Sub-second events might be significant in more nuanced behaviors. Can this be discussed in the discussion section?

      This is a great point  and we have added the following to the discussion section. “In the case of CLNF we have focused on regional cortical GCAMP signals that are relatively slow in kinetics. While such changes are well suited for transcranial mesoscale imaging assessment, it is possible that cellular 2-photon imaging (Yu et al. 2021) or preparations that employ cleared crystal skulls (Kim et al. 2016) could resolve more localized and higher frequency kinetic signatures.”

      (4) The activity over 6s is being averaged to determine if the threshold is being crossed before the reward is delivered. This is a rather long duration of time during which the mice may be exhibiting stereotyped behaviors that may result in the changes in DFF that are being observed. It would be interesting for the authors to compare (if data is available) the behavior of the mice in trials where they successfully crossed the threshold for reward delivery and in those trials where the threshold was not breached. How is this different from spontaneous behavior and behaviors exhibited when they are performing the test with CLNF? 

      We would like to emphasize that we are not directly averaging activity over 6 s to compare against the reward threshold. Instead, the preceding 6 s of activity is used solely to compute a dynamic baseline for ΔF/F<sub>0</sub> ( ΔF/F<sub>0</sub> = (F –F<sub>0</sub> )/F<sub>0</sub>). Here, F<sub>0</sub>is calculated as the mean fluorescence intensity over the prior 6 s window and is updated continuously throughout the session. This baseline is then subtracted from the instantaneous fluorescence signal to detect relative changes in activity. The reward threshold is therefore evaluated against these baseline-corrected ΔF/F<sub>0</sub> values at the current time point, not against an average over 6 s. This moving-window baseline correction is a standard approach in calcium imaging analyses, as it helps control for slow drifts in signal intensity, bleaching effects, or ongoing fluctuations unrelated to the behavior of interest. Thus, the 6-s window is not introducing a temporal lag in reward assignment but is instead providing a reference to detect rapid increases in cortical activity.  We have added the term dynamic baseline to the Methods to clarify.

      Recommendations for the authors

      Reviewer #1 (Recommendations for the authors):

      Additional suggestions for improved or additional experiments, data or analyses.

      For: "Looking closely at their reward rate on day 5 (day of rule change), they had a higher reward rate in the second half of the session as compared to the first half, indicating they were adapting to the rule change within one session." It would be helpful to see this data, and would be good to see within-session learning on the rule change day

      Thank you for pointing this out. We had missed referencing the figure in the text, and have now added a citation to Supplementary Figure 4A, which shows the cumulative rewards for each day of training. As seen in the plot for day 5, the cumulative rewards are comparable to those on day 1, with most rewards occurring during the second half of the session.

      For: "These results suggest that motor learning led to less cortical activation across multiple regions, which may reflect more efficient processing of movement-related activity," it could also be the case that the behaviour became more stereotyped over learning, which would lead to more concentrated, correlated activity. To test this, it would be good to look at the limb variability across sessions. Similarly, if it is movement-related, there should be good decoding of limb kinematics.

      Indeed, we observed that behavior became more stereotyped over the course of learning, as shown in Supplementary Figure 4C, 4D. One plausible explanation for the reduction in cortical activation across multiple regions is that behavior itself became more stereotyped, a possibility we have explored in the manuscript. Specifically, forelimb movements during the trial became increasingly correlated as mice improved on the task, particularly in the groups that received auditory feedback (Rule-change and No-rule-change groups; Figure 8). As movements became more correlated, overall body movements during trials decreased and aligned more closely with the task rule (Figure 9D). This suggests that reduced cortical activity may in part reflect changes in behavior. Importantly, however, in the Rule-change group, we observed that on the day of the rule switch (day 5), when the target shifted from the left to the right forelimb, cortical activity increased bilaterally (Figure 9A–C). This finding highlights our central point: groups that received feedback (Rule-change and No-rule-change) were able to identify the task rule more effectively, and both their behavior and cortical activity became more specifically aligned with the rule compared to the No-feedback group. We agree with the reviewers that additional analyses along these lines would be valuable future directions. To facilitate this, we have included the movement data for readers who may wish to pursue further analyses, details can be found under “Data and code availability” in Methods section. However, given the limited sample sizes in our dataset and the need to keep the manuscript focused on the central message, we felt that including these additional analyses here would risk obscuring the main findings.

      For: "We believe the decrease in ΔF/F0peak is unlikely to be driven by changes in movement, as movement amplitudes did not decrease significantly during these periods (Figure 7D CLMF Rule-change)." I would formally compare the two conditions. This is an important control. Also, another way to see if the change in deltaF is related to movement would be to see if you can predict movement from the deltaF.

      Figure 7D in the previous version is Figure 9D in the current revision of the manuscript. We've assessed this for the examples shown based on graphing the movement data, unfortunately there is not enough of that data to do a group analysis of movement magnitude. We would suggest that this would be an excellent future direction that would take advantage of the flexible open source nature of our tool.

      Recommendations for improving the writing and presentation.

      In the abstract there is no mention of the rationale for the project, or the resulting significance. I would modify this to increase readership by the behavioral neuroscience community. Similarly, the introduction also doesn't highlight the value of this resource for the field. Again, I think the pyControl paper does a good job of this. For readability, I would add more subheadings earlier in the results, to separate the different technical aspects of the system.

      We have revised the introduction to include the rationale for the project, its potential implications, and its relevance for translational research. We have also framed the work within the broader context of the behavioral and systems neuroscience community. We greatly appreciate this suggestion, as we believe it enhances the clarity and accessibility of the manuscript for the community.

      For: "While brain activity can be controlled through feedback, other variables such as movements have been less studied, in part because their analysis in real time is more challenging." I would highlight research that has studied the control of behavior through feedback, such as the Mathis paper where mice learn to pull a joystick to a virtual box, and adapt this motion to a force perturbation.

      We have added a citation to the Mathis paper and describe this as an additional form of feedback. The text is quoted below:

      “Opportunities also exist in extending real time pose classification (Forys et al. 2020; Kane et al. 2020) and movement perturbation (Mathis et al. 2017) to shape aspects of an animal’s motor repertoire.”

      Some of the results content would be better suited for the methods, one example: "A previous version of the CLNF system was found to have non-linear audio generation above 10 kHz, partly due to problems in the audio generation library and partly due to the consumer-grade speaker hardware we were employing. This was fixed by switching to the Audiostream (https://github.com/kivy/audiostream) library for audio generation and testing the speakers to make sure they could output the commanded frequencies"

      This is now moved to the Methods section.

      For: "There are reports of cortical plasticity during motor learning tasks, both at cellular and mesoscopic scales (17-19), supporting the idea that neural efficiency could improve with learning," not sure I agree with this, the studies on cortical plasticity are usually to show a neural basis for the learning observed, efficiency is separate from this.

      We have modified this statement to remove the concept of efficiency "There are reports of cortical plasticity during motor learning tasks, both at cellular and mesoscopic scales (17-19).”

      The paragraph that opens "Distinct task- and reward-related cortical dynamics" that describes the experiment should appear in the previous section, as the data is introduced there.

      We have moved the mentioned paragraphs in the previous section where we presented the data and other experiment details. This makes the text more readable and contextual.

      I would present the different ROI rules with better descriptors and visualization to improve the readability.

      We have added Supplementary Figure 7, which provides visualizations of the ROIs across all task rules used in the CLNF experiments.

      Minor corrections to the text and figures.

      Figure 1 is a little crowded, combining the CLNF and CLMF experiments, I would turn this into a 2 panel figure, one for each, similar to how you did figure 2.

      We have revised Figure 1 to include two panels, one for CLNF and one for CLMF. The colored components indicate elements specific to each setup, while the uncolored components represent elements shared between CLNF and CLMF. Relevant text in the manuscript is updated to refer to these figures.

      For Figure 2, the organization of the CLMF section is not intuitive for the reader. I would reorder it so it has a similar flow as the CLNF experiment.

      We have revised the figure by updating the layout of panel B (CLMF) to align with panel A (CLNF), thereby creating a more intuitive and consistent flow between the panels. We appreciate this helpful suggestion, which we believe has substantially improved the clarity of the figure. The corresponding text in the manuscript has also been updated to reflect these changes.

      For Figure 3, highlight that C and E are examples. They also seem a little out of place, so they could even be removed.

      We have now explicitly labeled Figures 3C and 3E as representative examples (figure legend and on figure itself). We believe including these panels provides helpful context for readers: Figure 3C illustrates how the ROIs align on the dorsal cortical brain map with segmented cortical regions, while Figure 3E shows example paw trajectories in three dimensions, allowing visualization of the movement patterns observed during the trials.

      In the plots, I would add sample sizes, for instance, in CLNF learning curve in Figure 4A, how many animals are in each group? 

      We have labeled Figure 4 with number of animals used in CLNF (No-rule-change, N=23; Rule-change, N=17), and CLMF (Rule-change, N=8; No-rule-change, N=4; No-feedback, N=4).

      Also, Figure 7 for example, which figures are single-sessions, versus across animals? For Figure 7c, what time bin is the data taken from?

      We have clarified this now and mentioned it in all the figures. Figure 7 in the previous version is Figure 9 in the current updated manuscript. Figure 9A is from individual sessions on different days from the same mouse. Figure 9B is the group average reward centered ΔF/F<sub>0</sub> activity in different cortical regions (Rule-change, N=8; No-rule-change, N=4; No-feedback, N=4). Figure 9C shows average ΔF/F<sub>0</sub> peak values obtained within -1sec to +1sec centered around the reward point (N=8).

      It says "punish" in Figure 3, but there is no punishment?

      Yes, the task did not involve punishment. Each trial resulted in either a success, which is followed by a reward, or a failure, which is followed by a buzzer sound. To better reflect these outcomes, we have updated Figure 3 and replaced the labels “Reward” with “Success” and “Punish” with “Failure.”

      The regression on 5c doesn't look quite right, also this panel is not mentioned in the text.

      The figure referred to by the reviewer as Figure 5 is now presented as Figure 6 in the revised manuscript. Regarding the reviewer’s observation about the regression line in the left panel of Figure 5C, the apparent misalignment arises because the majority of the data points are densely clustered at the center of the scatter plot, where they overlap substantially. The regression line accurately reflects this concentration of overlapping data. To improve clarity, we have updated the figure and ensured that it is now appropriately referenced in the Results section.

      Reviewer #2 (Recommendations for the authors):

      (1) There would be many interesting observations and links between the peripheral and cortical studies if there was a body video available during the cortical study. Is there any such data available?

      We agree that a detailed analysis of behavior during the CLNF task would be necessary to explore any behavior correlates with success in the task. Unfortunately, we do not have a sufficient video of the whole body to perform such an analysis.

      (2) The text (p. 24) states: [intracortical GCAMP transients measured over days became more stereotyped in kinetics and were more correlated (to each other) as the task performance increased over the sessions (Figure 7E).] But I cannot find this quantification in the figures or text?

      Figure 7 in the previous version of the manuscript now appears as Figure 9. In this figure, we present cortical activity across selected regions during trials, and in Figure 9E we highlight that this activity becomes more correlated. Since we did not formally quantify variability, we have removed the previous claim that the activity became stereotyped and revised the text in the updated manuscript accordingly.

      Typos:

      10-serest c (page 13)

      Inverted color codes in figure 4E vs F

      Reviewer #3 (Recommendations for the authors):

      We have mostly attempted to limit the feedback to suggestions and posed a few questions that might be interesting to explore given the dataset the authors have collected.

      Comments:

      In close loop systems the latency is primary concern, and authors have successfully tested the latency of the system (Delay): from detection of an event to the reaction time was less than 67ms.

      We have commented on the issues and limitations caused by latency, and potential future directions to overcome these challenges in responses to some of the previous comments.

      Additional major comments:

      "In general, all ROIs assessed that encompassed sensory, pre-motor, and motor areas were capable of supporting increased reward rates over time (Figure 4A, Animation 1)." Fig 4A is merely showing change in task performance over time and does not have information regarding the changes observed specific to CLNF for each ROI.

      We acknowledge that the sample size for individual ROI rules was not sufficient for meaningful comparisons. To address this limitation, we pooled the data across all the rules tested. The manuscript includes a detailed list of the rules along with their corresponding sample sizes for transparency.

      A ΔF/F<sub>0</sub> threshold value was calculated from a baseline session on day 0 that would have allowed 25% performance. Starting from this basal performance of around 25% on day 1, mice (CLNF No-rule-change, n=28 and CLNF Rule-change, n=13). It is unclear what the replicates here are. Trials or mice? The corresponding Figure legend has a much smaller n value.

      Thank you for pointing this out. We realized that we had not indicated the sample replicates in the figure, and the use of n instead of N for the number of animals may have been misleading. We have now corrected the notation and clarified this information in the figure to resolve the discrepancy.

      What were the replicates for each ROI pairs evaluated?

      Each ROI rule and number of mice and trials are listed in Table 5 and Table 6.

      Our analysis revealed that certain ROI rules (see description in methods) lead to a greater increase in success rate over time than others (Supplementary Figure 3D). The Supplementary figures 3C and 3D are blurry and could use higher resolution images. 

      We have increased the font size of the text that was previously difficult to read and re-exported the figure at a higher resolution (300 DPI). We believe these changes will resolve the issue.

      Also, It will help the reader is a visual representation of the ROI pairs are provided, instead of the text view. One interesting question is whether there are anatomical biases to fast vs slow learning pairs (Directionality - anterior/posterior, distance between the selected ROIs etc). This could be interesting to tease apart.

      We have added Supplementary Figure 7, which provides visualizations of the ROIs across all task rules used in the CLNF experiments. While a detailed investigation of the anatomical basis of fast versus slow learning cortical ROIs is beyond the scope of the present study, we agree that this represents an exciting future direction for further research.

      How distant should the ROIs be to achieve increased task performance?

      We appreciate this insightful question. We did not specifically test this scenario. In our study, we selected 0.3 × 0.3 mm ROIs centered on the standard AIBS mouse brain atlas (CCF). At this resolution, ROIs do not overlap, regardless of their placement in a two-ROI experiment. Furthermore, because our threshold calculations are based on baseline recordings, we expect the system would function for any combination of ROI placements. Nonetheless, exploring this systematically would be an interesting avenue for future experiments.

      Figures:

      I would leave out some of the methodological details such as the protocol for water restriction (Fig. 3) out of the legend. This will help with readability.

      We have removed some of the methodological details, including those mentioned above, from the legend of Figure 3 in the updated manuscript.

      Fig 1 and Fig 2: In my opinion, It would be easier for the reader if the current Fig. 2, which provides a high level description of CLNF and CLBF is presented as Fig. 1. The current Fig. 1, goes into a lot of methodological implementation details, and also includes a lot of programming jargon that is being introduced early in the paper that is hard to digest early on in the paper's narrative.

      Thank you for the suggestion. In the new manuscript, Figure 1 and Figure 2 have been swapped.

      Higher-resolution images/ plots are needed in many instances. Unsure if this is the pdf compression done by the manuscript portal that is causing this.

      All figures were prepared in vector graphics format using the open-source software Inkscape. For this manuscript, we exported the images at 300 DPI, which is generally sufficient for publication-quality documents. The submission portal may apply additional processing, which could have resulted in a reduction in image quality. We will carefully review the final submission files and ensure that all figures are clear and of high quality.

      The authors repeatedly show ROI specific analysis M1_L, F1_R etc. It will be helpful to provide a key, even if redundant in all figures to help the reader.

      We have now included keys to all such abbreviations in all the figures.

      There are also instances of editorialization and interpretation e.g., "Surprisingly, the "Rule-change" mice were able to discover the change in rule and started performing above 70% within a day of the rule change, on day 6" that would be more appropriate in the main body of the paper.

      Thank you for pointing this out in the figure legend, and we have removed it now since we already discussed this in the Results.

      Minor comments

      (1) The description of Figure 1 is hard to follow and can be described better based on how the information is processed and executed in the system from source to processing and back. Using separated colors (instead of shaded of grey) for the neuro feedback and movement feedback would help as well. Common components could have a different color. The specification like the description of the config file should come later.

      Figure 1 in the previous version is Figure 2 in the updated version. We have taken suggestions from other reviewers and made the figure easier to understand and split it into two panels with color coding Green for CLNF, Pink for CLMF specific parts while common shared parts are left without any color.

      (2) Page 20 last paragraph:

      Authors are neglecting that the rule change is done one day prior and the results that you see in the second half on the 6th day are not just because of the first half of the 6th day instead combined training on the 5th day (rule change) and then the first half of the 6th day. Rephrasing this observation is essential.

      We have revised the text for clarity to indicate that the performance increase observed on day 6 is not necessarily attributable to training on that day. In fact, we noted and mentioned that mice began to perform the task better during the second half of the session on day 5 itself.

      (3)  The method section description of the CLMF setup (Page no 39 first paragraph) is more detailed, a diagram of this setup would make it easy to follow and a better read.

      We have made changes to the CLMF setup (Figure 1B) and CLMF schematic (Figure 2B) to make it easier to understand parts of the setup and flow of control.

    1. eLife Assessment

      This is a valuable study that integrates behavioral and molecular approaches to identify neuromodulators influencing blood-feeding behavior in the disease vector Anopheles stephensi. Through gene expression analyses across blood-seeking life stages and RNA interference experiments, the authors present solid evidence that co-knockdown of the neuromodulators short Neuropeptide F and RYamide affects blood-seeking states in A. stephensi. However, evidence demonstrating that these neuropeptides are sufficient to promote host-seeking is lacking.

    2. Reviewer #1 (Public review):

      Summary:

      Here Bansal et al., present a study on the fundamental blood and nectar feeding behaviors of the critical disease vector, Anopheles stephensi. The study encompasses not just the fundamental changes in blood feeding behaviors of the crucially understudied vector, but then use a transcriptomic approach to identify candidate neuromodulation path ways which influence blood feeding behavior in this mosquito species. The authors then provide evidence through RNAi knockdown of candidate pathways that the neuromodulators sNPF and Rya modulate feeding either via their physiological activity in the brain alone or through joint physiological activity along the brain-gut axis (but critically not the gut alone). Overall, I found this study to be built on tractable, well-designed behavioral experiments.

      Their study begins with a well-structured experiment to assess how the feeding behaviors of A. stephensi changes over the course of its life history and in response to its age, mating and oviposition status. The authors are careful and validate their experimental paradigm in the more well-studied Ae. aegypti, and are able to recapitulate the results of prior studies which show that mating is pre-requisite for blood feeding behaviors in Ae. aegypt. Here they find A. stephensi like another Anopheline mosquitoes has a more nuanced regulation of its blood and nectar feeding behaviors.

      The authors then go on to show in a Y- maze olfactometer that to some degree, changes in blood feeding status depend on behavioral modulation to host-cues, and this is not likely to be a simple change to the biting behaviors alone. I was especially struck by the swap in valence of the host-cues for the blood-fed and mated individuals which had not yet oviposited. This indicates that there is a change in behavior that is not simply desensitization to host-cues while navigating in flight, but something much more exciting happening.

      The authors then use a transcriptomic approach to identify candidate genes in the blood feeding stages of the mosquito's life cycle to identify a list of 9 candidates which have a role in regulating the host-seeking status of A. stephensi. Then through investigations of gene knockdown of candidates they identify the dual action of RYa and sNPF and candidate neuromodulators of host-seeking in this species. Overrall, I found the experiments to be well-designed. I found the molecular approach to be sound. While I do not think the molecular approach is necessarily an all-encompassing mechanism identification (owing mostly to the fact that genetic resources are not yet available in A. stephensi as they are in other dipteran models), I think it sets up a rich lines of research questions for the neurobiology of mosquito behavioral plasticity and comparative evolution of neuromodulator action.

      Strengths:

      I am especially impressed by the authors' attention to small details in the course of this article. As I read and evaluated this article I continued to think how many crucial details I may have missed if I were the scientist conducting these experiments. That attention to detail paid off in spades and allowed the authors to carefully tease apart molecular candidates of blood-seeking stages. The authors top down approach to identifying RYamide and sNPF starting from first principles behavioral experiments is especially comprehensive. The results from both the behavioral and molecular target studies will have broad implications for the vectorial capacity of this species and comparative evolution of neural circuit modulation.

      I believe the authors have adequately addressed all of my concerns; however, I think an accompanying figure to match the explained methods of the tissue-specific knockdown would help readers. The methods are now explicitly written for the timing and concentrations required to achieve tissue-specific knockdown, but seeing the data as a supplement would be especially reassuring given the critical nature of tissue-specific knockdown to the final interpretations of this paper.

    3. Reviewer #2 (Public review):

      Summary:

      In this manuscript, Bansal et al examine and characterize feeding behaviour in Anopheles stephensi mosquitoes. While sharing some similarities to the well-studied Aedes aegypti mosquito, the authors demonstrate that mated-females, but not unmated (virgin) females, exhibit suppression in their blood-feeding behaviour. Using brain transcriptomic analysis comparing sugar fed, blood fed and starved mosquitoes, several candidate genes potentially responsible for influencing blood-feeding behaviour were identified, including two neuropeptides (short NPF and RYamide) that are known to modulate feeding behaviour in other mosquito species. Using molecular tools including in situ hybridization, the authors map the distribution of cells producing these neuropeptides in the nervous system and in the gut. Further, by implementing systemic RNA interference (RNAi), the study suggests that both neuropeptides appear to promote blood-feeding (but do not impact sugar feeding) although the impact was observed only after both neuropeptide genes underwent knockdown.

      While the authors have addressed most of the concerns of the original manuscript, a few issues remain. Particularly, the following two points:

      (5) Figure 4

      The authors state that there is more efficient knockdown in the head of unfed females; however, this is not accurate since they only get knockdown in unfed animals, and no evidence of any knockdown in fed animals (panel D). This point should be revised in the results test as well.

      Perhaps we do not understand the reviewer's point or there has been a misunderstanding. In Figure 4D, we show that while there is more robust gene knockdown in unfed females, blood-fed females also showed modest but measurable knockdowns ranging from 5-40% for RYamide and 2-21% for sNPF.

      NEW-

      In both the dsRNA treatments where animals were fed, neither was significantly different from control. Therefore, there is no change, and indeed this is confirmed by the author's labelling of the figure stats in panel 4D.

      In addition, do the uninjected and dsGFP-injected relative mRNA expression data reflect combined RYa and sNPF levels? Why is there no variation in these data,...

      In these qPCRs, we calculated relative mRNA expression using the delta-delta Ct method (see line 975). For each neuropeptide its respective control was used. For simplicity, we combined the RYa and sNPF control data into a single representation. The value of this control is invariant because this method sets the control baseline to a value of 1.

      NEW-

      The authors are claiming that there is no variation between individual qPCR experiments (particularly in their controls)? Normally, one uses a known standard value (or calibrator) across multiple experiments/plates so that variation across biological replicates can be assessed. This has an impact on statistical analyses since there is no variation in the control data. Indeed, this impacts all figures/datasets in the manuscript where qPCR data is presented. All the controls have zero variation!

    4. Reviewer #3 (Public review):

      Summary:

      This manuscript investigates the regulation of host-seeking behavior in Anopheles stephensi females across different life stages and mating states. Through transcriptomic profiling, the authors identify differential gene expression between "blood-hungry" and "blood-sated" states. Two neuropeptides, sNPF and RYamide, are highlighted as potential mediators of host-seeking behavior. RNAi knockdown of these peptides alters host-seeking activity, and their expression is anatomically mapped in the mosquito brain (sNPF and RYamide) and midgut (sNPF only).

      Strengths:

      (1) The study addresses an important question in mosquito biology, with relevance to vector control and disease transmission.

      (2) Transcriptomic profiling is used to uncover gene expression changes linked to behavioral states.

      (3) The identification of sNPF and RYamide as candidate regulators provides a clear focus for downstream mechanistic work.

      (3) RNAi experiments demonstrate that these neuropeptides are necessary for normal host-seeking behavior.

      (4) Anatomical localization of neuropeptide expression adds depth to the functional findings.

      Weaknesses:

      (1) The title implies that the neuropeptides promote host-seeking, but sufficiency is not demonstrated and some conclusions appear premature based on the current data. The support for this conclusion would be strengthened with functional validation using peptide injection or genetic manipulation.

      (2) The identification of candidate receptors is promising, but the manuscript would be significantly strengthened by testing whether receptor knockdowns phenocopy peptide knockdowns. Without this, it is difficult to conclude that the identified receptors mediate the behavioral effects.

      (3) Some important caveats, such as variation in knockdown efficiency and the possibility of off-target effects, are not adequately discussed.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Bansal et al. present a study on the fundamental blood and nectar feeding behaviors of the critical disease vector, Anopheles stephensi. The study encompasses not just the fundamental changes in blood feeding behaviors of the crucially understudied vector, but then uses a transcriptomic approach to identify candidate neuromodulation pathways which influence blood feeding behavior in this mosquito species. The authors then provide evidence through RNAi knockdown of candidate pathways that the neuromodulators sNPF and Rya modulate feeding either via their physiological activity in the brain alone or through joint physiological activity along the brain-gut axis (but critically not the gut alone). Overall, I found this study to be built on tractable, well-designed behavioral experiments.

      Their study begins with a well-structured experiment to assess how the feeding behaviors of A. stephensi change over the course of its life history and in response to its age, mating, and oviposition status. The authors are careful and validate their experimental paradigm in the more well-studied Ae. aegypti, and are able to recapitulate the results of prior studies, which show that mating is a prerequisite for blood feeding behaviors in Ae. aegypt. Here they find A. Stephensi, like other Anopheline mosquitoes, has a more nuanced regulation of its blood and nectar feeding behaviors.

      The authors then go on to show in a Y-maze olfactometer that ,to some degree, changes in blood feeding status depend on behavioral modulation to host cues, and this is not likely to be a simple change to the biting behaviors alone. I was especially struck by the swap in valence of the host cues for the blood-fed and mated individuals, which had not yet oviposited. This indicates that there is a change in behavior that is not simply desensitization to host cues while navigating in flight, but something much more exciting is happening.

      The authors then use a transcriptomic approach to identify candidate genes in the blood-feeding stages of the mosquito's life cycle to identify a list of 9 candidates that have a role in regulating the host-seeking status of A. stephensi. Then, through investigations of gene knockdown of candidates, they identify the dual action of RYa and sNPF and candidate neuromodulators of host-seeking in this species. Overall, I found the experiments to be well-designed. I found the molecular approach to be sound. While I do not think the molecular approach is necessarily an all-encompassing mechanism identification (owing mostly to the fact that genetic resources are not yet available in A. stephensi as they are in other dipteran models), I think it sets up a rich line of research questions for the neurobiology of mosquito behavioral plasticity and comparative evolution of neuromodulator action.

      We appreciate the reviewer’s detailed summary of our work. We thank them for their positive comments and agree with them on the shortcomings of our approach.

      Strengths:

      I am especially impressed by the authors' attention to small details in the course of this article. As I read and evaluated this article, I continued to think about how many crucial details could potentially have been missed if this had not been the approach. The attention to detail paid off in spades and allowed the authors to carefully tease apart molecular candidates of blood-seeking stages. The authors' top-down approach to identifying RYamide and sNPF starting from first principles behavioral experiments is especially comprehensive. The results from both the behavioral and molecular target studies will have broad implications for the vectorial capacity of this species and comparative evolution of neural circuit modulation.

      We really appreciate that the reviewer has recognised the attention to detail we have tried to put, thank you!

      Weaknesses:

      There are a few elements of data visualizations and methodological reporting that I found confusing on a first few read-throughs. Figure 1F, for example, was initially confusing as it made it seem as though there were multiple 2-choice assays for each of the conditions. I would recommend removing the "X" marker from the x-axis to indicate the mosquitoes did not feed from either nectar, blood, or neither in order to make it clear that there was one assay in which mosquitoes had access to both food sources, and the data quantify if they took both meals, one meal, or no meals.

      We thank the reviewer for flagging the schematic in figure 1F. As suggested, we have removed the “X” markers from the x-axis and revised the axis label from “choice of food” to “choice made” to better reflect what food the mosquitoes chose in the assay. For clarity, we have now also plotted the same data as stacked graphs at the bottom of Fig. 1F, which clearly shows the proportion of mosquitoes fed on each particular choice. We avoid the stacked graph as the sole representation of this data, as it does not capture the variability in the data.

      I would also like to know more about how the authors achieved tissue-specific knockdown for RNAi experiments. I think this is an intriguing methodology, but I could not figure out from the methods why injections either had whole-body or abdomen-specific knockdown.

      The tissue-specific knockdown (abdomen only or abdomen+head) emerged from initial standardisations where we were unable to achieve knockdown in the head unless we used higher concentrations of dsRNA and did the injections in older females. We realised that this gave us the opportunity to isolate the neuronal contribution of these neuropeptides in the phenotype produced. Further optimisations revealed that injecting dsRNA into 0-10h old females produced abdomen-specific knockdowns without affecting head expression, whereas injections into 4 days old females resulted in knockdowns in both tissues. Moreover, head knockdowns in older females required higher dsRNA concentrations, with knockdown efficiency correlating with the amount injected. In contrast, abdominal knockdowns in younger females could be achieved even with lower dsRNA amounts.

      We have mentioned the knockdown conditions- time of injection and the amount dsRNA injected- for tissue-specific knockdowns in methods but realise now that it does not explain this well enough. We have now edited it to state our methodology more clearly (see lines 932-948).

      I also found some interpretations of the transcriptomic to be overly broad for what transcriptomes can actually tell us about the organism's state. For example, the authors mention, "Interestingly, we found that after a blood meal, glucose is neither spent nor stored, and that the female brain goes into a state of metabolic 'sugar rest', while actively processing proteins (Figure S2B, S3)".

      This would require a physiological measurement to actually know. It certainly suggests that there are changes in carbohydrate metabolism, but there are too many alternative interpretations to make this broad claim from transcriptomic data alone.

      We thank the reviewer for pointing this out and agree with them. We have now edited our statement to read:

      “Instead, our data suggests altered carbohydrate metabolism after a blood meal, with the female brain potentially entering a state of metabolic 'sugar rest' while actively processing proteins (Figure S2B, S3). However, physiological measurements of carbohydrate and protein metabolism will be required to confirm whether glucose is indeed neither spent nor stored during this period.” See lines 271-277.

      Reviewer #2 (Public review):

      Summary:

      In this manuscript, Bansal et al examine and characterize feeding behaviour in Anopheles stephensi mosquitoes. While sharing some similarities to the well-studied Aedes aegypti mosquito, the authors demonstrate that mated females, but not unmated (virgin) females, exhibit suppression in their bloodfeeding behaviour. Using brain transcriptomic analysis comparing sugar-fed, blood-fed, and starved mosquitoes, several candidate genes potentially responsible for influencing blood-feeding behaviour were identified, including two neuropeptides (short NPF and RYamide) that are known to modulate feeding behaviour in other mosquito species. Using molecular tools, including in situ hybridization, the authors map the distribution of cells producing these neuropeptides in the nervous system and in the gut. Further, by implementing systemic RNA interference (RNAi), the study suggests that both neuropeptides appear to promote blood-feeding (but do not impact sugar feeding), although the impact was observed only after both neuropeptide genes underwent knockdown.

      Strengths and/or weaknesses:

      Overall, the manuscript was well-written; however, the authors should review carefully, as some sections would benefit from restructuring to improve clarity. Some statements need to be rectified as they are factually inaccurate.

      Below are specific concerns and clarifications needed in the opinion of this reviewer:

      (1) What does "central brains" refer to in abstract and in other sections of the manuscript (including methods and results)? This term is ambiguous, and the authors should more clearly define what specific components of the central nervous system was/were used in their study.

      Central brain, or mid brain, is a commonly used term to refer to brain structures/neuropils without the optic lobes (For example: https://www.nature.com/articles/s41586-024-07686-5). In this study we have focused our analysis on the central brain circuits involved in modulating blood-feeding behaviour and have therefore excluded the optic lobes. As optic lobes account for nearly half of all the neurons in the mosquito brain (https://pmc.ncbi.nlm.nih.gov/articles/PMC8121336/), including them would have disproportionately skewed our transcriptomic data toward visual processing pathways. 

      We have indicated this in figure 3A and in the methods (see lines 800-801, 812). We have now also clarified it in the results section for neurotranscriptomics to avoid confusion (see lines 236-237).

      (2) The abstract states that two neuropeptides, sNPF and RYamide are working together, but no evidence is summarized for the latter in this section.

      We thank the reviewer for pointing this out. We have now added a statement “This occurs in the context of the action of RYa in the brain” to end of the abstract, for a complete summary of our proposed model. 

      (3) Figure 1

      Panel A: This should include mating events in the reproductive cycle to demonstrate differences in the feeding behavior of Ae. aegypti.

      Our data suggest that mating can occur at any time between eclosion and oviposition in An. stephensi and between eclosion and blood feeding in Ae. aegypti. Adding these into (already busy) 1A, would cloud the purpose of the schematic, which is to indicate the time points used in the behavioural assays and transcriptomics.

      Panel F: In treatments where insects were not provided either blood or sugar, how is it that some females and males had fed? Also, it is unclear why the y-axis label is % fed when the caption indicates this is a choice assay. Also, it is interesting that sugar-starved females did not increase sugar intake. Is there any explanation for this (was it expected)?

      We apologise for the confusion. The experiment is indeed a choice assay in which sugar-starved or sugar-sated females, co-housed with males, were provided simultaneous access to both blood and sugar, and were assessed for the choice made (indicated on the x-axis): both blood and sugar, blood only, sugar only, or neither. The x-axis indicates the choice made by the mosquitoes, not the choice provided in the assay, and the y-axis indicates the percentage of males or females that made each particular choice. We have now removed the “X” markers from the x-axis and revised the axis label from “choice of food” to “choice made” to better reflect what food the mosquitoes chose to take.

      In this assay, we scored females only for the presence or absence of each meal type (blood or sugar) and are therefore unable to comment on whether sugar-starved females consumed more sugar than sugarsated females. However, when sugar-starved, a higher proportion of females consumed both blood and sugar, while fewer fed on blood alone.

      For clarity, we have now also plotted the same data as stacked graphs at the bottom of Fig. 1F, which clearly shows the proportion of mosquitoes fed on each particular choice. We avoid the stacked graph as the sole representation of this data as it does not capture the variability in the data.

      (4) Figure 3

      In the neurotranscriptome analysis of the (central) brain involving the two types of comparisons, can the authors clarify what "excluded in males" refers to? Does this imply that only genes not expressed in males were considered in the analysis? If so, what about co-expressed genes that have a specific function in female feeding behaviour?

      This is indeed correct. We reasoned that since blood feeding is exclusive to females, we should focus our analysis on genes that were specifically upregulated in them. As the reviewer points out, it is very likely that genes commonly upregulated in males and females may also promote blood feeding and we will miss out on any such candidates based on our selection criteria. 

      (5) Figure 4

      The authors state that there is more efficient knockdown in the head of unfed females; however, this is not accurate since they only get knockdown in unfed animals, and no evidence of any knockdown in fed animals (panel D). This point should be revised in the results test as well.

      Perhaps we do not understand the reviewer’s point or there has been a misunderstanding. In figure 4D, we show that while there is more robust gene knockdown in unfed females, blood-fed females also showed modest but measurable knockdowns ranging from 5-40% for RYamide and 2-21% for sNPF. 

      Relatedly, blood-feeding is decreased when both neuropeptide transcripts are targeted compared to uninjected (panel C) but not compared to dsGFP injected (panel E). Why is this the case if authors showed earlier in this figure (panel B) that dsGFP does not impact blood feeding?

      We realise this concern stems from our representation of the data. Since we had earlier determined that dsGFP-injected females fed similarly to uninjected females (fig 4B), we used these controls interchangeably in subsequent experiments. To avoid confusion, we have now only used the label ‘control’ in figure 4 (and supplementary figure S9) and specified which control was used for each experiment in the legend.

      In addition to this, we wanted to clarify that fig 4C and 4E are independent experiments. 4C is the behaviour corresponding to when the neuropeptides were knocked down in both heads and abdomens. 4E is the behaviour corresponding to when the neuropeptides were knocked down in only the abdomens. We have now added a schematic in the plots to make this clearer.

      In addition, do the uninjected and dsGFP-injected relative mRNA expression data reflect combined RYa and sNPF levels? Why is there no variation in these data,…

      In these qPCRs, we calculated relative mRNA expression using the delta-delta Ct method (see line 975). For each neuropeptide its respective control was used. For simplicity, we combined the RYa and sNPF control data into a single representation. The value of this control is invariant because this method sets the control baseline to a value of 1.

      …and how do transcript levels of RYa and sNPF compare in the brain versus the abdomen (the presentation of data doesn't make this relationship clear).

      The reviewer is correct in pointing out that we have not clarified this relationship in our current presentation. While we have not performed absolute mRNA quantifications, we extracted relative mRNA levels from qPCR data of 96h old unmanipulated control females. We observed that both sNPF and RYa transcripts are expressed at much lower levels in the abdomens, as compared to those in the heads, as shown in Author response Image 1 below. 

      Author response image 1.

      (6) As an overall comment, the figure captions are far too long and include redundant text presented in the methods and results sections.

      We thank the reviewer for flagging this and have now edited the legends to remove redundancy.  

      (7) Criteria used for identifying neuropeptides promoting blood-feeding: statement that reads "all neuropeptides, since these are known to regulate feeding behaviours". This is not accurate since not all neuropeptides govern feeding behaviors, while certainly a subset do play a role.

      We agree with the reviewer that not all neuropeptides regulate feeding behaviours. Our statement refers to the screening approach we used: in our shortlist of candidates, we chose to validate all neuropeptides.

      (8) In the section beginning with "Two neuropeptides - sNPF and RYa - showed about 25% and 40% reduced mRNA levels...", the authors state that there was no change in blood-feeding and later state the opposite. The wording should be clarified as it is unclear.

      Thank you for pointing this out. We were referring to an unchanged proportion of the blood fed females. We have now edited the text to the following: 

      “Two neuropeptides - sNPF and RYa - showed about 25% and 40% reduced mRNA levels in the heads but the proportion of females that took blood meals remained unchanged”. See lines 338-340.

      (9) Just before the conclusions section, the statement that "neuropeptide receptors are often ligandpromiscuous" is unjustified. Indeed, many studies have shown in heterologous systems that high concentrations of structurally related peptides, which are not physiologically relevant, might cross-react and activate a receptor belonging to a different peptide family; however, the natural ligand is often many times more potent (in most cases, orders of magnitude) than structurally related peptides. This is certainly the case for various RYamide and sNPF receptors characterized in various insect species.

      We agree with the reviewer and apologise for the mistake. We have now removed the statement.

      (10) Methods

      In the dsRNA-mediated gene knockdown section, the authors could more clearly describe how much dsRNA was injected per target. At the moment, the reader must carry out calculations based on the concentrations provided and the injected volume range provided later in this section.

      We have now edited the section to reflect the amount of dsRNA injected per target. Please see lines 921-931.

      It is also unclear how tissue-specific knockdown was achieved by performing injection on different days/times. The authors need to explain/support, and justify how temporal differences in injection lead to changes in tissue-specific expression. Does the blood-brain barrier limit knockdown in the brain instead, while leaving expression in the peripheral organs susceptible?

      To achieve tissue-specific knockdowns of sNPF and RYa, we optimised both the time of injection as well as the dsRNA concentration to be injected. Injecting dsRNA into 0-10h females produced abdomen-specific knockdowns without affecting head expression, whereas injections into 96h old females resulted in knockdowns in both tissues. Head knockdowns in older females required higher dsRNA concentrations, with knockdown efficiency correlating with the amount injected. In contrast, abdominal knockdowns in younger females could be achieved even with lower dsRNA amounts, reflecting the lower baseline expression of sNPF in abdomens compared to heads and the age-dependent increase in head expression (as confirmed by qPCR). It is possible that the blood-brain barrier also limits the dsRNA entering the brain, thereby requiring higher amounts to be injected for head knockdowns. 

      We have now edited this section to state our methodology more clearly (see lines 932-948).

      For example, in Figure 4, the data support that knockdown in the head/brain is only effective in unfed animals compared to uninjected animals, while there is no evidence of knockdown in the brain relative to dsGFP-injected animals. Comparatively, evidence appears to show stronger evidence of abdominal knockdown mostly for the RYa transcript (>90%) while still significantly for the sNPF transcript (>60%).

      As we explained earlier, this concern likely stems from our representation of the data. Since we had earlier determined that dsGFP-injected females fed similarly to uninjected females (fig 4B), we used these controls interchangeably in subsequent experiments. To avoid confusion, we have now only used the label ‘control’ in figure 4 (and supplementary figure S9) and specified which control was used for each experiment in the legend.

      In addition to this, we wanted to clarify that fig 4C and 4E are independent experiments. 4C is the behaviour corresponding to when the neuropeptides were knocked down in both heads and abdomens.  4E is the behaviour corresponding to when the neuropeptides were knocked down in only the abdomen. We have now added a schematic in the plots to make this clearer.

      Reviewer #3 (Public review):

      Summary:

      This manuscript investigates the regulation of host-seeking behavior in Anopheles stephensi females across different life stages and mating states. Through transcriptomic profiling, the authors identify differential gene expression between "blood-hungry" and "blood-sated" states. Two neuropeptides, sNPF and RYamide, are highlighted as potential mediators of host-seeking behavior. RNAi knockdown of these peptides alters host-seeking activity, and their expression is anatomically mapped in the mosquito brain (sNPF and RYamide) and midgut (sNPF only).

      Strengths:

      (1) The study addresses an important question in mosquito biology, with relevance to vector control and disease transmission.

      (2) Transcriptomic profiling is used to uncover gene expression changes linked to behavioral states.

      (3) The identification of sNPF and RYamide as candidate regulators provides a clear focus for downstream mechanistic work.

      (4) RNAi experiments demonstrate that these neuropeptides are necessary for normal host-seeking behavior.

      (5) Anatomical localization of neuropeptide expression adds depth to the functional findings.

      Weaknesses:

      (1) The title implies that the neuropeptides promote host-seeking, but sufficiency is not demonstrated (for example, with peptide injection or overexpression experiments).

      Demonstrating sufficiency would require injecting sNPF peptide or its agonist. To date, no small-molecule agonists (or antagonists) that selectively mimic sNPF or RYa neuropeptides have been identified in insects. An NPY analogue, TM30335, has been reported to activate the Aedes aegypti NPY-like receptor 7 (NPYLR7; Duvall et al., 2019), which is also activated by sNPF peptides at higher doses (Liesch et al., 2013). Unfortunately, the compound is no longer available because its manufacturer, 7TM Pharma, has ceased operations. Synthesising the peptides is a possibility that we will explore in the future.

      (2) The proposed model regarding central versus peripheral (gut) peptide action is inconsistently presented and lacks strong experimental support.

      The best way to address this would be to conduct tissue-specific manipulations, the tools for which are not available in this species. Our approach to achieve head+abdomen and abdomen only knockdown was the closest we could get to achieving tissue specificity and allowed us to confirm that knockdown in the head was necessary for the phenotype. However, as the reviewer points out, this did not allow us to rule out any involvement of the abdomen. This point has been addressed in lines 364-371.

      (3) Some conclusions appear premature based on the current data and would benefit from additional functional validation.

      The most definitive way of demonstrating necessity of sNPF and RYa in blood feeding would be to generate mutant lines. While we are pursuing this line of experiments, they lie beyond the scope of a revision. In its absence, we relied on the knockdown of the genes using dsRNA. We would like to posit that despite only partial knockdown, mosquitoes do display defects in blood-feeding behaviour, without affecting sugar-feeding. We think this reflects the importance of sNPF in promoting blood feeding.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Overall, I found this manuscript to be well-prepared, visually the figures are great and clearly were carefully thought out and curated, and the research is impactful. It was a wonderful read from start to finish. I have the following recommendations:

      Thank you very much, we are very pleased to hear that you enjoyed reading our manuscript!

      (1) For future manuscripts, it would make things significantly easier on the reviewer side to submit a format that uses line numbers.

      We sincerely apologise for the oversight. We have now incorporated line numbers in the revised manuscript.

      (2) There are a few statements in the text that I think may need clarification or might be outside the bounds of what was actually studied here. For example, in the introduction "However, mating is dispensable in Anophelines even under conditions of nutritional satiety". I am uncertain what is meant by this statement - please clarify.

      We apologise for the lack of clarity in the statement and have now deleted it since we felt it was not necessary.

      (3) Typo/Grammatical minutiae:

      (a) A small idiosyncrasy of using hyphens in compound words should also be fixed throughout. Typically, you don't hyphenate if the words are being used as a noun, as in the case: e.g. "Age affects blood feeding.". However, you would hyphenate if the two words are used as a compound adjective "Age affects blood-feeding behavior". This may not be an all-inclusive list, but here are some examples where hyphens need to either be removed or added. Some examples:

      "Nutritional state also influences other internal state outputs on blood-feeding": blood-feeding -> blood feeding

      "... the modulation of blood-feeding": blood-feeding -> blood feeding

      "For example, whether virgin females take blood-meals...": blood-meals -> blood meals

      ".... how internal and external cues shape meal-choice"-> meal choice

      "blood-meal" is often used throughout the text, but is correctly "blood meal" in the figures.

      There are many more examples throughout.

      We apologise for these errors and appreciate the reviewer’s keen eye. We have now fixed them throughout the manuscript.  

      (b) Figure 1 Caption has a typo: "co-housed males were accessed for sugar-feeding" should be "co-housed males were assessed for sugar feeding"

      We apologise for the typo and thank the reviewer for spotting it. We have now corrected this.  

      (c) It would be helpful in some other figure captions to more clearly label which statement is relevant to which part of the text. For example, in Figure 4's caption.

      "C,D. Blood-feeding and sugar-feeding behaviour of females when both RYa and sNPF are knocked down in the head (C). Relative mRNA expressions of RYa and sNPF in the heads of dsRYa+dssNPF - injected blood-fed and unfed females, as compared to that in uninjected females, analysed via qPCR (D)."

      I found re-referencing C and D at the end of their statements makes it look as thought C precedes the "Relative mRNA expression" and on a first read through, I thought the figure captions were backwards. I'd recommend reformatting here and throughout consistently to only have the figure letter precede its relevant caption information, e.g.:

      "C. Blood-feeding and sugar-feeding behaviour of females when both RYa and sNPF are knocked down in the head. D. Relative mRNA expressions of RYa and sNPF in the heads of dsRYa+dssNPF - injected bloodfed and unfed females, as compared to that in uninjected females, analysed via qPCR."

      We have now edited the legends as suggested.

      Reviewer #2 (Recommendations for the authors):

      Separately from the clarifications and limitations listed above, the authors could strengthen their study and the conclusions drawn if they could rescue the behavioural phenotype observed following knockdown of sNPF and RYamide. This could be achieved by injection of either sNPF or RYa peptide independently or combined following knockdown to validate the role of these peptides in promoting blood-feeding in An. stephensi. Additionally, the apparent (but unclear) regionalized (or tissue-specific) knockdown of sNPF and RYamide transcripts could be visualized and verified by implementing HCR in situ hyb in knockdown animals (or immunohistochemistry using antibodies specific for these two neuropeptides). 

      In a follow up of this work, we are generating mutants and peptides for these candidates and are planning to conduct exactly the experiments the reviewer suggests.

      Reviewer #3 (Recommendations for the authors):

      The loss-of-function data suggest necessity but not sufficiency. Synthetic peptide injection in non-hostseeking (blood-fed mated or juvenile) mosquitoes would provide direct evidence for peptide-induced behavioral activation. The lack of these experiments weakens the central claim of the paper that these neuropeptides directly promote blood feeding.

      As noted above, we plan to synthesise the peptide to test rescue in a mutant background and sufficiency.  

      Some of the claims about knockdown efficiency and interpretation are conflicting; the authors dismiss Hairy and Prp as candidates due to 30-35% knockdown, yet base major conclusions on sNPF and RYamide knockdowns with comparable efficiencies (25-40%). This inconsistency should be addressed, or the justification for different thresholds should be clearly stated.

      We have not defined any specific knockdown efficacy thresholds in the manuscript, as these can vary considerably between genes, and in some cases, even modest reductions can be sufficient to produce detectable phenotypes. For example, knockdown efficiencies of even as low as about 25% - 40% gave us observable phenotypes for sNPF and RYa RNAi (Figure S9B-G).

      No such phenotypes were observed for Hairy (30%) or Prp (35%) knockdowns. Either these genes are not involved in blood feeding, or the knockdown was not sufficient for these specific genes to induce phenotypes. We cannot distinguish between these scenarios. 

      The observation that knockdown animals take smaller blood meals is interesting and could reflect a downstream effect of altered host-seeking or an independent physiological change. The relationship between meal size and host-seeking behavior should be clarified.

      We agree with the reviewer that the reduced meal size observed in sNPF and RYa knockdown animals could result from their inability to seek a host or due to an independent effect on blood meal intake. Unfortunately, we did not measure host-seeking in these animals. We plan to distinguish between these possibilities using mutants in future work.

      Several figures are difficult to interpret due to cluttered labeling and poorly distinguishable color schemes. Simplifying these and improving contrast (especially for co-housed vs. virgin conditions) would enhance readability. 

      We regret that the reviewer found the figures difficult to follow. We have now revised our annotations throughout the manuscript for enhanced readability. For example, “D1<sup>B”</sup> is now “D1<sup>PBM”</sup> (post-bloodmeal) and “D1<sup>O”</sup> is now “D1<sup>PO”</sup> (post-oviposition). Wherever mated females were used, we have now appended “(m)” to the annotations and consistently depicted these females with striped abdomens in all the schematics. We believe these changes will improve clarity and readability.

      The manuscript does not clearly justify the use of whole-brain RNA sequencing to identify peptides involved in metabolic or peripheral processes. Given that anticipatory feeding signals are often peripheral, the logic for brain transcriptomics should be explained.

      The reviewer is correct in pointing out that feeding signals could also emerge from peripheral tissues. Signals from these tissues – in response to both changing nutritional and reproductive states – are then integrated by the central brain to modulate feeding choices. For example, in Drosophila, increased protein intake is mediated by central brain circuitry including those in the SEZ and central complex (Munch et al., 2022; Liu et al., 2017; Goldschmidt et al., 202ti). In the context of mating, male-derived sex peptide further increases protein feeding by acting on a dedicated central brain circuitry (Walker et al., 2015). We, therefore focused on the central brain for our studies.

      The proposed model suggests brain-derived peptides initiate feeding, while gut peptides provide feedback. However, gut-specific knockdowns had no effect, undermining this hypothesis. Conversely, the authors also suggest abdominal involvement based on RNAi results. These contradictions need to be resolved into a consistent model.

      We thank the reviewer for raising this point and recognise their concern. Our reasons for invoking an involvement of the gut were two-fold:

      (1) We find increased sNPF transcript expression in the entero-endocrine cells of the midgut in blood-hungry females, which returns to baseline after a blood-meal (Fig. 4L, M).

      (2) While the abdomen-only knockdowns did not affect blood feeding, every effective head knockdown that affected blood feeding also abolished abdominal transcript levels (Fig. S9C, F). (Achieving a head-only reduction proved impossible because (i) systemic dsRNA delivery inevitably reaches the abdomen and (ii) abdominal expression of both peptides is low, leaving little dynamic range for selective manipulation.) Consequently, we can only conclude the following: 1) that brain expression is required for the behaviour, 2) that we cannot exclude a contributory role for gut-derived sNPF. We have discussed this in lines 364-371.

      The identification of candidate receptors is promising, but the manuscript would be significantly strengthened by testing whether receptor knockdowns phenocopy peptide knockdowns. Without this, it is difficult to conclude that the identified receptors mediate the behavioral effects.

      We agree that functional validation of the receptors would strengthen the evidence for sNPF and RYa-mediated control of blood feeding in An. stephensi. We selected these receptors based on sequence homology. A possibility remains that sNPF neuropeptides activate more than one receptor, each modulating a distinct circuit, as shown in the case of Drosophila Tachykinin (https://pmc.ncbi.nlm.nih.gov/articles/PMC10184743/). This will mean a systematic characterisation and knockdown of each of them to confirm their role. We are planning these experiments in the future.  

      The authors compared the percentage changes in sugar-fed and blood-fed animals under sugar-sated or sugar-starved conditions. Figure 1F should reflect what was discussed in the results.

      Perhaps this concern stems from our representation of the data in figure 1F? We have now edited the xaxis and revised its label from “choice of food” to “choice made” to better reflect what food the mosquitoes chose to take.

      For clarity, we have now also plotted the same data as stacked graphs at the bottom of Fig. 1F, which clearly shows the proportion of mosquitoes fed on each particular choice. We avoid the stacked graph as the sole representation of this data because it does not capture the variability in the data.

      Minor issues:

      (1) The authors used mosquitoes with belly stripes to indicate mated females. To be consistent, the post-oviposition females should also have belly stripes.

      We thank the reviewer for pointing this out. We have now edited all the figures as suggested.

      (2) In the first paragraph on the right column of the second page, the authors state, "Since females took blood-meals regardless of their prior sugar-feeding status and only sugar-feeding was selectively suppressed by prior sugar access." Just because the well-fed animals ate less than the starved animals does not mean their feeding behavior was suppressed.

      Perhaps there has been a misunderstanding in the experimental setup of figure 1F, probably stemming from our data representation. The experiment is a choice assay in which sugar-starved or sugar-sated females, co-housed with males, were provided simultaneous access to both blood and sugar, and were assessed for the choice made (indicated on the x-axis): both blood and sugar, blood only, sugar only, or neither. We scored females only for the presence or absence of each meal type (blood or sugar) and did not quantify the amount consumed.

      (3) The figure legend for Figure 1A and the naming convention for different experimental groups are difficult to follow. A simplified or consistently abbreviated scheme would help readers navigate the figures and text.

      We regret that the reviewer found the figure difficult to follow. We have now revised our annotations throughout the manuscript for enhanced readability. For example, “D1<sup>B”</sup> is now “D1<sup>PBM”</sup> (post-bloodmeal) and “D1<sup>O”</sup> is now “D1<sup>PO”</sup> (post-oviposition).

      (4) In the last paragraph of the Y-maze olfactory assay for host-seeking behaviour in An. stephensi in Methods, the authors state, "When testing blood-fed females, aged-matched sugar-fed females (bloodhungry) were included as positive controls where ever possible, with satisfactory results." The authors should explicitly describe what the criteria are for "satisfactory results".

      We apologise for the lack of clarity. We have now edited the statement to read:

      “When testing blood-fed females, age-matched sugar-fed females (blood-hungry) were included wherever possible as positive controls. These females consistently showed attraction to host cues, as expected.” See lines 786-790.

      (5) In the first paragraph of the dsRNA-mediated gene knockdown section in Methods, dsRNA against GFP is used as a negative control for the injection itself, but not for the potential off-target effect.

      We agree with the reviewer that dsGFP injections act as controls only for injection-related behavioural changes, and not for off-target effects of RNAi. We have now corrected the statement. See lines 919-920.

      To control for off-target effects, we could have designed multiple dsRNAs targeting different parts of a given gene. We regret not including these controls for potential off-target effects of dsRNAs injected. 

      (6) References numbers 48, 89, and 90 are not complete citations.

      We thank the reviewer for spotting these. We have now corrected these citations.

    1. eLife Assessment

      This paper provides a useful new theory of the hallucinatory effects of 5-HT2A psychedelics. The authors present convincing evidence that a computational model trained with the Wake-Sleep algorithm can reproduce some features of hallucinations by varying the strength of top-down connections in the model, though it is not clear that this model applies to 5-HT2A hallucinogens in particular. The work will be of interest to researchers studying hallucinations or offline activity and plasticity more broadly.

    2. Reviewer #1 (Public review):

      Bredenberg et al. aim to model some of the visual and neural effects of psychedelics via the Wake-Sleep algorithm. This is an interesting study with findings that challenge certain mainstream ideas in psychedelic neuroscience.

      While some of my concerns have been addressed in revision, I am still not convinced that this model applies to 5-HT2A hallucinogens, as opposed to a pharmacologically distinct hallucinogen. I think it is important to justify which class of hallucinogens this model applies to and distinguish it from other hallucinogens. While some researchers tend to group several hallucinogens together (e.g., 5-HT2A agonists, NMDA antagonists, kappa-opioids agonists), I'm not convinced this is warranted, when they have distinct subjective and cognitive effects (including quite different visual distortions, and again I point out that the kappa-opioid agonist salvinorin A, which is referred to as an "oneirogen," has been described as particularly dream-like, perhaps more so than 5-HT2A hallucinogens), as well as some differences in therapeutic outcomes (ketamine seems to not have as persisting of therapeutic effects, and kappa-opioid agonist have yet to be shown to be therapeutic). Their use patterns highlight this (e.g., 5-HT2A drugs are used less in non-festival/rave social settings compared to NMDA drugs like ketamine, which can be used frequently enough to the point of abuse; kappa-opioid agonists have quite mixed effects in terms of pleasurable outcomes, thereby rarely being used/abused and almost never to my knowledge being used recreationally).

      In sum, more is needed to justify the claim that this work applies to 5-HT2A drugs in particular.

    3. Reviewer #2 (Public review):

      This work is a nice contribution to the literature in articulating a specific, testable theory of how psychedelics act to generate hallucinations and plasticity.

      I believe my concerns from the first round of review have been addressed in this version.

    4. Author response:

      The following is the authors’ response to the original reviews.

      First, we thank the reviewers for the valuable and constructive reviews. Thanks to these, we believe the article has been considerably improved. We have organized our response to address points that are relevant to both reviewers first, after which we address the unique concerns of each individual reviewer separately. We briefly paraphrase each concern and provide comments for clarification, outlining the precise changes that we have made to the text.

      Common Concerns (R1 & R2):

      Can you clarify how NREM and REM sleep relate to the oneirogen hypothesis?

      Within the submission draft we tried to stay agnostic as to whether mechanistically similar replay events occur during NREM or REM sleep; however, upon a more thorough literature review, we think that there is moderately greater evidence in favor of Wake-Sleep-type replay occurring during REM sleep which is related to classical psychedelic drug mechanisms of action.

      First, we should clarify that replay has been observed during both REM and NREM sleep, and dreams have been documented during both sleep stages, though the characteristics of dreams differ across stages, with NREM dreams being more closely tied to recent episodic experience and REM dreams being more bizarre/hallucinatory (see Stickgold et al., 2001 for a review). Replay during sleep has been studied most thoroughly during NREM sharp-wave ripple events, in which significant cortical-hippocampal coupling has been observed (Ji & Wilson, 2007). However, it is critical to note that the quantification methods used to identify replay events in the hippocampal literature usually focus on identifying what we term ‘episodic replay,’ which involves a near-identical recapitulation of neural trajectories that were recently experienced during waking experimental recordings (Tingley & Peyrach, 2020). In contrast, our model focuses on ‘generative replay,’ where one expects only a statistically similar reproduction of neural activity, without any particular bias towards recent or experimentally controlled experience. This latter form of replay may look closer to the ‘reactivation’ observed in cortex by many studies (e.g. Nguyen et al., 2024), where correlation structures of neural activity similar to those observed during stimulus-driven experience are recapitulated. Under experimental conditions in which an animal is experiencing highly stereotyped activity repeatedly, over extended periods of time, these two forms of replay may be difficult to dissociate.

      Interestingly, though NREM replay has been shown to couple hippocampal and cortical activity, a similar study in waking animals administered psychedelics found hippocampal replay without any obvious coupling to cortical activity (Domenico et al., 2021). This could be because the coupling was not strong enough to produce full trajectories in the cortex (psychedelic administration did not increase ‘alpha’ enough), and that a causal manipulation of apical/basal influence in the cortex may be necessary to observe the increased coupling. Alternatively, as Reviewer 1 noted, it may be that psychedelics induce a form of hippocampus-decoupled replay, as one would expect from the REM stage of a recently proposed complementary learning systems model (Singh et al., 2022). 

      Evidence in favor of a similarity between the mechanism of action of classical psychedelics and the mechanism of action of memory consolidation/learning during REM sleep is actually quite strong. In particular, studies have shown that REM sleep increases the activity of soma-targeting parvalbumin (PV) interneurons and decreases the activity of apical dendrite-targeting somatostatin (SOM) interneurons (Niethard et al., 2021), that this shift in balance is controlled by higher-order thalamic nuclei, and that this shift in balance is critical for synaptic consolidation of both monocular deprivation effects in early visual cortex (Zhou et al., 2020) and for the consolidation of auditory fear conditioning in the dorsal prefrontal cortex (Aime et al., 2022). These last studies were not discussed in our previous text–we have added them, in addition to a more nuanced description of the evidence connecting our model to NREM and REM replay. 

      Relevant modifications: Page 4, 1st paragraph; Page 11, 1st paragraph.

      Can you explain how synaptic plasticity induced by psychedelics within your model relates to learning at a behavioral level?

      While the Wake-Sleep algorithm is a useful model for unsupervised statistical learning, it is not a model of reward or fear-based conditioning, which likely occur via different mechanisms in the brain (e.g. dopamine-dependent reinforcement learning or serotonin-dependent emotional learning). The Wake-Sleep algorithm is a ‘normative plasticity algorithm,’ that connects synaptic plasticity to the formation of structured neural representations, but it is not the case that all synaptic plasticity induced by psychedelic administration within our model should induce beneficial learning effects. According to the Wake-Sleep algorithm, plasticity at apical synapses is enhanced during the Wake phase, and plasticity at basal synapses is enhanced during the Sleep phase; under the oneirogen hypothesis, hallucinatory conditions (increased ‘alpha’) cause an increase in plasticity at both apical and basal sites. Because neural activity is in a fundamentally aberrant state when ‘alpha’ is increased, there are no theoretical guarantees that plasticity will improve performance on any objective: psychedelic-induced plasticity within our model could perhaps better be thought of as ‘noise’ that may have a positive or negative effect depending on the context.

      In particular, such ‘noise’ may be beneficial for individuals or networks whose synapses have become locked in a suboptimal local minimum. The addition of large amounts of random plasticity could allow a system to extricate itself from such local minima over subsequent learning (or with careful selection of stimuli during psychedelic experience), similar to simulated annealing optimization approaches. If our model were fully validated, this view of psychedelic-induced plasticity as ‘noise’ could have relevance for efforts to alleviate the adverse effects of PTSD, early life trauma, or sensory deprivation; it may also provide a cautionary note against repeated use of psychedelic drugs within a short time frame, as the plasticity changes induced by psychedelic administration under our model are not guaranteed to be good or useful in-and-of themselves without subsequent re-learning and compensation.

      We should also note that we have deliberately avoided connecting the oneirogen hypothesis model to fear extinction experimental results that have been observed through recordings of the hippocampus or the amygdala (Bombardi & Giovanni, 2013; Jiang et al., 2009; Kelly et al., 2024; Tiwari et al., 2024). Both regions receive extensive innervation directly from serotonergic synapses originating in the dorsal raphe nucleus, which have been shown to play an important role in emotional learning (Lesch & Waider, 2012); because classical psychedelics may play a more direct role in modulating this serotonergic innervation, it is possible that fear conditioning results (in addition to the anxiolytic effects of psychedelics) cannot be attributed to a shift in balance between apical and basal synapses induced by psychedelic administration. We have provided a more detailed review of these results in the text, as well as more clarity regarding their relation to our model.

      Relevant modifications: Page 9, final paragraph; Page 12, final paragraph.

      Reviewer 1 Concerns:

      Is it reasonable to assign a scalar parameter ‘alpha’ to the effects of classical psychedelics? And is your proposed mechanism of action unique to classical psychedelics? E.g. Could this idea also apply to kappa opioid agonists, ketamine, or the neural mechanisms of hallucination disorders?

      We have clarified that within our model ‘alpha’ is a parameter that reflects the balance between apical and basal synapses in determining the activity of neurons in the network. For the sake of simplicity we used a single ‘alpha’ parameter, but realistically, each neuron would have its own ‘alpha’ parameter, and different layers or individual neurons could be affected differentially by the administration of any particular drug; therefore, our scalar ‘alpha’ value can be thought of as a mean parameter for all neurons, disregarding heterogeneity across individual neurons.

      There are many different mechanisms that could theoretically affect this ‘alpha’ parameter, including: 5-HT2a receptor agonism, kappa opioid receptor binding, ketamine administration, or possibly the effects of genetic mutations underlying the pathophysiology of complex developmental hallucination disorders. We focused exclusively on 5-HT2a receptor agonism for this study because the mechanism is comparatively simple and extensively characterized, but similar mechanisms may well be responsible for the hallucinatory symptoms of a variety of drugs and disorders.

      Relevant modifications: Page 4, first paragraph; Page 13, first paragraph.

      Can you clarify the role of 5-HT2a receptor expression on interneurons within your model?

      While we mostly focused on the effects of 5-HT2a receptors on the apical dendrites of pyramidal neurons, these receptors are also expressed on soma-targeting parvalbumin (PV) interneurons. This expression on PV interneurons is consistent with our proposed psychedelic mechanism of action, because it could lead to a coordinated decrease in the influence of somatic and proximal dendritic inputs while increasing the influence of apical dendritic inputs. We have elaborated on this point, and moved the discussion earlier in the text.

      Relevant modifications: Page 1, 1st paragraph; Page 4, 2nd paragraph.

      Discussions of indigenous use of psychedelics over millenia may amount to over-romanticization.

      We ultimately decided to remove these discussions from the main text, as they had little bearing on the content of our work. Within the Ethics Declarations section we softened our claims from “millenia” to “centuries,” as indigenous psychedelic use over this latter period of time is well-substantiated.

      Relevant modifications: removed from introduction; modified Ethics Declarations

      You isolate the 5-HT2a agonism as the mechanism of action underlying ‘alpha’ in your model, but there exist 5-HT2a agonists that do not have hallucinatory effects (e.g. lisuride). How do you explain this?

      Lisuride has much-reduced hallucinatory effects compared to other psychedelic drugs at clinical doses (though it does indeed induce hallucinations at high doses; Marona-Lewicka et al., 2002), and we should note that serotonin (5-HT) itself is pervasive in the cortex without inducing hallucinatory effects during natural function. Similarly, MDMA is a partial agonist for 5-HT2a receptors, but it has much-reduced perceptual hallucination effects relative to classical psychedelics (Green et al., 2003) in addition to many other effects not induced by classical psychedelics.

      Therefore, while we argue that 5-HT2a agonism induces an increase in influence of apical dendritic compartments and a decrease in influence of basal/somatic compartments, and that this change induces hallucinations, we also note that there are many other factors that control whether or not hallucinations are ultimately produced, so that not all 5-HT2a agonists are hallucinogenic. There are two possible additional factors that could contribute to this phenomenon: 5-HT receptor binding affinity and cellular membrane permeability.

      Importantly, many 5-HT2a receptor agonists are also 5-HT1a receptor agonists (e.g. serotonin itself and lisuride), while MDMA has also been shown to increase serotonin, norepinephrine, and dopamine release (Green et al., 2003). While 5-HT2a receptor agonism has been shown to reduce sensory stimulus responses (Michaiel et al., 2019), 5-HT1a receptor agonism inhibits spontaneous cortical activity (Azimi et al., 2020); thus one might expect the net effect of administering serotonin or a nonselective 5-HT receptor agonist to be widespread inhibition of a circuit, as has been observed in visual cortex (Azimi et al., 2020). Therefore, selective 5-HT2a agonism is critical for the induction of hallucinations according to our model, though any intervention that jointly excites pyramidal neurons’ apical dendrites and inhibits their basal/somatic compartments across a broad enough area of cortex would be predicted to have a similar effect. Lisuride has a much higher binding affinity for 5-HT1a receptors than, for instance, LSD (Marona-Lewicka et al., 2002).

      Secondly, it has recently been shown that both the head-twitch effect (a coarse behavioral readout of hallucinations in animals) and the plasticity effects of psychedelics are abolished when administering 5-HT2a agonists that are impermeable to the cellular membrane because of high polarity, and that these effects can be rescued by temporarily rendering the cellular membrane permeable (Vargas et al., 2023). This suggests that the critical hallucinatory effects of psychedelics (apical excitation according to our model) may be mediated by intracellular 5-HT2a receptors. Notably, serotonin itself is not membrane permeable in the cortex.

      Therefore, either of these two properties could play a role in whether a given 5-HT2a agonist induces hallucinatory effects. We have provided an extended discussion of these nuances in our revision.

      Relevant modifications: Page 1, paragraph 2.

      Your model proposes that an increase in top-down influence on neural activity underlies the hallucinatory effects of psychedelics. How do you explain experimental results that show increases in bottom-up functional connectivity (either from early sensory areas or the thalamus)?

      Firstly, we should note that our proposed increase in top-down influence is a causal, biophysical property, not necessarily a statistical/correlative one. As such, we will stress that the best way to test our model is via direct intervention in cortical microcircuitry, as opposed to correlative approaches taken by most fMRI studies, which have shown mixed results with regard to this particular question. Correlative approaches can be misleading due to dense recurrent coupling in the system, and due to the coarse temporal and spatial resolution provided by noninvasive recording technologies (changes in statistical/functional connectivity do not necessarily correspond to changes in causal/mechanistic connectivity, i.e. correlation does not imply causation).

      There are two experimental results that appear to contradict our hypothesis that deserve special consideration. The first shows an increase in directional thalamic influence on the distributed cortical networks after psychedelic administration (Preller et al., 2018). To explain this, we note that this study does not distinguish between lower-order sensory thalamic nuclei (e.g. the lateral and medial geniculate nuclei receiving visual and auditory stimuli respectively) and the higher-order thalamic nuclei that participate in thalamocortical connectivity loops (Whyte et al., 2024). Subsequent more fine-grained studies have noted an increase in influence of higher order thalamic nuclei on the cortex (Pizzi et al., 2023; Gaddis et al., 2022), and in fact extensive causal intervention research has shown that classical psychedelics (and 5-HT2a agonism) decrease the influence of incoming sensory stimuli on the activity of early sensory cortical areas, indicating decoupling from the sensory thalamus (Evarts et al., 1955; Azimi et al., 2020; Michaiel et al. 2019). The increased influence of higher-order thalamic nuclei is consistent with both the cortico-striatal-thalamo-cortical (CTSC) model of psychedelic action as well as the oneirogen hypothesis, since higher-order thalamic inputs modulate the apical dendrites of pyramidal neurons in cortex (Whyte et al., 2024).

      The second experimental result notes that DMT induces traveling waves during resting state activity that propagate from early visual cortex to deeper cortical layers (Alamia et al., 2020). There are several possibilities that could explain this phenomenon: 1) it could be due to the aforementioned difficulties associated with directed functional connectivity analyses, 2) it could be due to a possible high binding affinity for DMT in the visual cortex relative to other brain areas, or 3) it could be due to increases in apical influence on activity caused by local recurrent connectivity within the visual cortex which, in the absence of sensory input, could lead to propagation of neural activity from the visual cortex to the rest of the brain. This last possibility is closest to the model proposed by (Ermentrout & Cowan, 1979), and which we believe would be best explained within our framework by a topographically connected recurrent network architecture trained on video data; a potentially fruitful direction for future research.

      Relevant modifications: Page 9, paragraph 1; Page 10, final paragraph; Page 11, final paragraph.

      Shouldn’t the hallucinations generated by your model look more ‘psychedelic,’ like those produced by the DeepDream algorithm?

      We believe that the differences in hallucination visualization quality between our Wake-Sleep-trained models and DeepDream are mostly due to differences in the scale and power of the models used across these two studies. We are confident that with more resources (and potentially theoretical innovations to improve the Wake-Sleep algorithm’s performance) the produced hallucination visualizations could become more realistic.

      We note that more powerful generative models trained with backpropagation are able to produce surreal images of comparable quality (Rezende et al., 2014; Goodfellow et al., 2020; Vahdat & Kautz, 2020), though these have not yet been used as a model of psychedelic hallucinations. However, the DeepDream model operates on top of large pretrained image processing models, and does not provide an biologically mechanistic/testable interpretation of its hallucination effects. When training smaller models with a local synaptic plasticity rule (as opposed to backpropagation), the hallucination effects are less visually striking due to the reduced quality of our trained generative model, though they are still strongly tied to the statistics of sensory inputs, as quantified by our correlation similarity metric (Fig. 5b).

      To demonstrate that our proposed hallucination mechanism is capable of producing more complex hallucinations in larger, more powerful models, we employed our same hallucination generation mechanism in a pretrained Very Deep Variational Autoencoder (VDVAE) (Child et al., 2021), which is a hierarchical variational autoencoder with a nearly identical structure compared to our Wake-Sleep-trained networks, with both a bottom-up inference pathway and a top-down generative pathway that maps cleanly onto our multicompartmental neuron model. VDVAEs are trained on the same objective function as our Wake-Sleep-trained networks, but using the backpropagation algorithm. The VDVAE models were able to generate much more complex hallucinations (emergence of complex geometric patterns, smooth deformations of objects and faces), whose complexity arguably exceeds those produced by the DeepDream algorithm. Therefore while the VDVAEs are less biologically realistic (they do not learn via local synaptic plasticity), they function as a valuable high-level model of hallucination generation that complements our Wake-Sleep-trained approach. As further validation, we were also able to replicate our key results and testable predictions with these models.

      Relevant modifications: Results section “Modeling hallucinations in large-scale pretrained networks”; Figure 6, S7, S8; Page 12, paragraph 3; Methods section “Generating hallucinations in hierarchical variational autoencoders.”

      Your model assumes domination by entirely bottom-up activity during the ‘wake’ phase, and domination entirely by top-down activity during ‘sleep,’ despite experimental evidence indicating that a mixture of top-down and bottom-up inputs influence neural activity during both stages in the brain. How do you explain this?

      Our use of the Wake-Sleep algorithm, in which top-down inputs (Sleep) or bottom-up inputs (Wake) dominate network activity is an over-simplification made within our model for computational and theoretical reasons. Models that receive a mixture of top-down and bottom-up inputs during ‘Wake’ activity do exist (in particular the closely related Boltzmann machine (Ackley et al., 1985)), but these models are considerably more computationally costly to train due to a need to run extensive recurrent network relaxation dynamics for each input stimulus. Further, these models do not generalize as cleanly to processing temporal inputs. For this reason, we focused on the Wake-Sleep algorithm, at the cost of some biological realism, though we note that our model should certainly be extended to support mixed apical-basal waking regimes. We have added a discussion of this in our ‘Model Limitations’ section.

      Relevant modifications: Page 12, paragraph 4.

      Your model proposes that 5-HT2a agonism enhances glutamatergic transmission, but this is not true in the hippocampus, which shows decreases in glutamate after psychedelic administration.

      We should note that our model suggests only compartment specific increases in glutamatergic transmission; as such, our model does not predict any particular directionality for measures of glutamatergic transmission that includes signaling at both apical and basal compartments in aggregate, as was measured in the provided study (Mason et al., 2020).

      You claim that your model is consistent with the Entropic Brain theory, but you report increases in variance, not entropy. In fact, it has been shown that variance decreases while entropy increases under psychedelic administration. How do you explain this discrepancy?

      Unfortunately, ‘entropy’ and ‘variance’ are heavily overloaded terms in the noninvasive imaging literature, and the particularities of the method employed can exert a strong influence on the reported effects. The reduction in variance reported by (Carhart-Harris et al., 2016) is a very particular measure: they are reporting the variance of resting state synchronous activity, averaged across a functional subnetwork that spans many voxels; as such, the reduction in variance in this case is a reduction in broad, synchronous activity. We do not have any resting state synchronous activity in our network due to the simplified nature of our model (particularly an absence of recurrent temporal dynamics), so we see no reduction in variance in our model due to these effects.

      Other studies estimate ‘entropy’ or network state disorder via three different methods that we have been able to identify. 1) (Carhart-Harris et al., 2014) uses a different measure of variance: in this case, they subtract out synchronous activity within functional subnetworks, and calculate variability across units in the network. This measure reports increases in variance (Fig. 6), and is the closest measure to the one we employ in this study. 2) (Lebedev et al., 2016) uses sample entropy, which is a measure of temporal sequence predictability. It is specifically designed to disregard highly predictable signals, and so one might imagine that it is a measure that is robust to shared synchronous activity (e.g. resting state oscillations). 3) (Mediano et al., 2024) uses Lempel-Ziv complexity, which is, similar to sample entropy, a measure of sequence diversity; in this case the signal is binarized before calculation, which makes this method considerably different from ours. All three of the preceding methods report increases in sequence diversity, in agreement with our quantification method. Our strongest explanation for why the variance calculation in (Carhart-Harris et al., 2016) produces a variance reduction is therefore due to a reduction in low-rank synchronous activity in subnetworks during resting state.

      As for whether the entropy increase is meaningful: we share Reviewer 1’s concern that increases in entropy could simply be due to a higher degree of cognitive engagement during resting state recordings, due to the presence of sensory hallucinations or due to an inability to fall asleep. This could explain why entropy increases are much more minimal relative to non-hallucinating conditions during audiovisual task performance (Siegel et al., 2024; Mediano et al., 2024). However, we can say that our model is consistent with the Entropic Brain Theory without including any form of ‘cognitive processing’: we observe increases in variability during resting state in our model, but we observe highly similar distributions of activity when averaging over a wide variety of sensory stimulus presentations (Fig. 5b-c). This is because variability in our model is not due to unstructured noise: it corresponds to an exploration of network states that would ordinarily be visited by some stimulus. Therefore, when averaging across a wide variety of stimuli, the distribution of network states under hallucinating or non-hallucinating conditions should be highly similar.

      One final point of clarification: here we are distinguishing Entropic Brain Theory from the REBUS model–the oneirogen hypothesis is consistent with the increase in entropy observed experimentally, but in our model this entropy increase is not due to increased influence of bottom-up inputs (it is due instead to an increase in top-down influence). Therefore, one could view the oneirogen hypothesis as consistent with EBT, but inconsistent with REBUS.

      Relevant modifications: Page 10, paragraph 1.

      You relate your plasticity rule to behavioral-timescale plasticity (BTSP) in the hippocampus, but plasticity has been shown to be reduced in the hippocampus after psychedelic administration. Could you elaborate on this connection?

      When we were establishing a connection between our ‘Wake-Sleep’ plasticity rule and BTSP learning, the intended connection was exclusively to the mathematical form of the plasticity rule, in which activity in the apical dendrites of pyramidal neurons functions as an instructive signal for plasticity in basal synapses (and vice versa): we will clarify this in the text. Similarly, we point out that such a plasticity rule tends to result in correlated tuning between apical and basal dendritic compartments, which has been observed in hippocampus and cortex: this is intended as a sanity check of our mapping of the Wake-Sleep algorithm to cortical microcircuitry, and has limited further bearing on the effects of psychedelics specifically.

      Reduction in plasticity in the hippocampus after psychedelic administration could be due to a complementary learning systems-type model, in which the hippocampus becomes partly decoupled from the cortex during REM sleep (Singh et al., 2022); were this to be the case, it would not be incompatible with our model, which is mostly focused on the cortex. Notably, potentiating 5HT-2a receptors in the ventral hippocampus does not induce the head-twitch response, though it does produce anxiolytic effects (Tiwari et al., 2024), indicating that the hallucinatory and anxiolytic effects of classical psychedelics may be partly decoupled. 

      Reviewer 2 Concerns:

      Could you provide visualizations of the ‘ripple’ phenomenon that you’re referring to?

      In our revised submission, ‘ripple’ phenomena are now visible in two places: Fig 2c-d, and Fig 6 (rows 2 and 3). Because the VDVAE models used to generate Figure 6 produce higher quality generated images, the ripples appearing in these plots are likely more prototypical, but it is not easy to evaluate the quality of these visualizations relative to subjective hallucination phenomena.

      Could you provide a more nuanced description of alternative roles for top-down feedback, beyond being used exclusively for learning as depicted in your model?

      For the sake of simplicity, we only treat top-down inputs in our model as a source of an instructive teaching signal, the originator of generative replay events during the Sleep phase, and as the mechanism of hallucination generation. However, as discussed in a response to a previous question, in the cortex pyramidal neurons receive and respond to a mixture of top-down and bottom-up processing.

      There are a variety of theories for what role top-down inputs could play in determining network activity. To name several, top-down input could function as: 1) a denoising/pattern completion signal (Kadkhodaie & Simoncelli, 2021), 2) a feedback control signal (Podlaski & Machens, 2020), 3) an attention signal (Lindsay, 2020), 4) ordinary inputs for dynamic recurrent processing that play no specialized role distinct from bottom-up or lateral inputs except to provide inputs from higher-order association areas or other sensory modalities (Kar et al., 2019; Tugsbayar et al., 2025). Though our model does not include these features, they are perfectly consistent with our approach.

      In particular, denoising/pattern completion signals in the predictive coding framework (closely related to the Wake-Sleep algorithm) also play a role as an instructive learning signal (Salvatori et al., 2021); and top-down control signals can play a similar role in some models (Gilra & Gerstner, 2017; Meulemans et al., 2021). Thus, options 1 and 2 are heavily overlapping with our approach, and are a natural consequence of many biologically plausible learning algorithms that minimize a variational free energy loss (Rao & Ballard, 1997; Ackley et al., 1985). Similarly, top-down attentional signals can exist alongside top-down learning signals, and some models have argued that such signals can be heavily overlapping or mutually interchangeable (Roelfsema & van Ooyen, 2005). Lastly, generic recurrent connectivity (from any source) can be incorporated into the Wake-Sleep algorithm (Dayan & Hinton, 1996), though we avoided doing this in the present study due to an absence of empirical architecture exploration in the literature and the computational complexity associated with training on time series data.

      To conclude, there are a variety of alternative functions proposed for top-down inputs onto pyramidal neurons in the cortex, and we view these additional features as mutually compatible with our approach; for simplicity we did not include them in our Wake-Sleep-trained model, but we believe that these features are unlikely to interfere with our testable predictions or empirical results. In fact, the pretrained VDVAE models that we worked with do include top-down influence during the Wake-stage inference process, and these models recapitulated our key results and testable predictions (Fig. S8).

      Relevant modifications: Fig. S8; Page 12, paragraph 4.

    1. eLife Assessment

      This valuable study highlights the key role of NK cells and PD-L1+ neutrophils in worsening sepsis responses in the context of MASH (metabolic dysfunction-associated steatohepatitis). It focused on the role of neutrophils in mediating this effect, which is based on a choline-deficient high-fat diet model of various knockouts or selective ablation of immune cell types. While the data presented are of great interest, there are concerns around the reliability of the strength of the evidence provided, which is currently considered incomplete. The study may be of interest to researchers in immunopathological disease mechanisms once confirmatory studies have been completed.

      [Editors' note: the authors no longer have access to the original flow cytometry data and plan to compile new datasets for further consideration.]

    2. Reviewer #1 (Public review):

      Summary:

      By using an established NAFLD model, choline-deficient high-fat diet, Barros et al show that LPS challenge causes excessive IFN-γ production by hepatic NK cells which further induces recruitment and polarization of a PD-L1 positive neutrophil subset leading to massive TNFα production and increased host mortality. Genetic inhibition of IFN-γ or pharmacological blockade of PD-L1 decreases recruitment of these neutrophils and TNFα release, consequently preventing liver damage and decreasing host death.

      Since NAFLD is often accompanied by chronic, low-grade inflammation, it can lead to an overactive but dysfunctional immune response and increase the body's overall susceptibility to infections, therefore this is very important research question.

      Strengths:

      The biggest strength of the manuscript is vast number of mouse strains used.

      Weaknesses:

      After the review, there are still some open questions from my side:

      (1) I would like the authors to defend their choice of diet type since this has not been done in the review/response to authors. In case they cannot, we need additional proof (HFD or WD model).

      (2) Since the authors used same control groups (chow and HFCD), as required by the animal ethics committee, they must have power analysis test to show that the number of controls (but also in other groups) they used is enough to see the effect. Please provide it.

    3. Reviewer #2 (Public review):

      Summary:

      This is an extremely interesting mouse study, trying to understand how sepsis is tolerated during obesity/NAFLD. The researchers combine a well-established model of NASH (Choline-deficiency with High Fat Diet) with a sepsis model (IP injection of 10mg/kg LPS), leading to dramatic mortality in mice. Using this model, they characterize the complex contributions of immune cells. Specifically, they find that NK-cells and Neutrophils contribute the most to mortality in this model due to IFNG and PD-L1+ Neutrophils.

      Strengths:

      The biggest strength of the manuscript is how clear the primary phenotypes/endpoints of their model are. Within 6 hours of LPS injection, there is a stark elevation of liver inflammation and damage, which is exacerbated by a High Fat/CholineDeficient diet (HFCD). And after 1 day, almost all of the mice die. Using these endpoints, the authors were able to identify which cells were critical for mortality in the model and the specific mediators involved.

      Comments on revisions:

      I have no further comments.

    4. Author response:

      The following is the authors’ response to the original reviews.

      We thank the editor and reviewers for their constructive questions, valuable feedback, and for approving our manuscript. We truly appreciate the opportunity to improve our work based on their insightful comments. Before addressing the editor’s and each referee’s remarks individually, we provide below a point-by-point response summarizing the revisions made.

      Duplication of control groups across experiments

      We appreciate the reviewers’ concern regarding the potential duplication of control groups. In the revised manuscript, we have explicitly clarified that independent groups of control mice were used for each experiment. These details are now clearly indicated in the Materials and Methods section to avoid any ambiguity and to reinforce the rigor of our experimental design (Page 15, Line 453-455): “Furthermore, knockout animals and those treated with pharmacological inhibitors or neutralizing antibodies shared the same control groups (chow and HFCD), as required by the animal ethics committee.”

      Validation of the MASLD model

      To strengthen the metabolic characterization of our MASLD model, we have now included additional parameters, including liver weight, Picrosirius staining and blood glucose measurements. These data are presented as new graphs in the revised manuscript and support the metabolic relevance of the HFCD diet model (Figure Suplementary S1). The corresponding description has been added to the Results section (Page 5, Lines 116-117) as follows: “Mice fed HFCD showed no increase in liver weight and collagen deposition as evidenced by Picrosirius staining (Fig. S1A and Fig. S1C)”

      Assessment of liver injury in RagKO and anti-NK1.1 mice

      We fully agree that assessment of liver injury is essential for these models. For mice treated with antiNK1.1, ALT levels are shown in Figure 4G, confirming increased liver injury after treatment. Regarding Rag⁻/⁻ mice, the animals exhibit exacerbation of liver injury when fed a HFCD diet and challenged with LPS (Page 7, Lines 183–184). The corresponding description has been added to the Results section (Page 7, Lines 175-176) as follows: “Interestingly, Rag1-deficient animals under the HFCD remained susceptible to the LPS challenge (Fig. 4C) with exacerbation of liver injury (Fig. 4D) ”

      Discussion of limitations

      We have expanded the Discussion section to provide a more comprehensive and balanced perspective on the limitations of our model and experimental approach (Page 13-14, Lines 401–414) “Our study presents several limitations that should be acknowledged and discussed. First, we cannot entirely rule out the possibility that our mice deficient in pro-inflammatory components exhibit reduced responsiveness to LPS. However, our ex vivo analyses using splenocytes from these animals revealed a preserved cytokine production following LPS stimulation. These results suggest that the in vivo differences observed are primarily driven by the MAFLD condition rather than by intrinsic defects in LPS sensitivity. Second, the absence of publicly available single-cell RNA-seq datasets from MAFLD subjects under endotoxemic or septic conditions limited our ability to perform direct translational comparisons. To overcome this, we analyzed existing MAFLD patients and experimental MAFLD datasets, which consistently demonstrated upregulation of IFN-y and TNF-α inflammatory pathways in MALFD. In line with these findings, our murine model revealed TNF-α⁺ myeloid and IFN-y⁺ NK cell populations, thereby reinforcing the validity and translational relevance of our results.”. This revision highlights the constraints of the MASLD model, the inherent variability among in vivo experiments, and the interpretative limitations related to immunodeficient mouse strains.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) In Figure 4 the authors are showing the number of IFN+ positive CD4, CD8, and NK 1.1+ cells. Could they show from total IFNg production, how much it goes specifically on NK cells and how much on other cell populations since NK1.1 is NK but also NKT and gamma delta T cell marker? Also, in Figure 2E the authors see a substantial increase in IFNg signal in T cells.

      While we did not specifically assess IFNγ production in NKT cells or other minor populations, our data indicate that the NK1.1+CD3+ cells (NKT cells) cited in Page 7, Lines  188-192 were essentially absent in the liver tissue of LPS-challenged animals, as shown in Supplementary Figures 3C and S10. The corresponding description has been added to the Results section (Page 7, Lines 188-192) as follows: “We observed that the number of NK cells increased in the liver tissue of PBS-treated MAFLD mice compared with mice fed a control diet (Fig. 4E). LPS challenge increased the accumulation of NK1.1+CD3− NK cells in the liver tissue of MAFLD mice and the absence of NK1.1+CD3+ NKT cells (Fig. S3C and 4E)”.

      This absence was consistent across all experimental conditions, corroborating our focus on NK1.1+CD3− cells as the primary source of NK1.1-associated IFNγ production. Furthermore, data demonstrated in Figure 2E illustrate the presence of IFNγ primarily in NK cells. Therefore, the observed IFNγ signal, attributed to NK1.1+ cells, predominantly reflects conventional NK cells, with minimal contribution from NKT or γδ T cells.

      (2) In Figure 4C, the authors state that the results suggest that T and B cells do not contribute to susceptibility to LPS challenge. However, they observe a drop in survival compared to chow+LPS. Are the authors certain there is no statistical significance there?

      The observed decrease in survival is consistent with our expectations, as T and B cells are not the primary source of interferon-gamma (IFNγ) in this context. Even in their absence, animals remain susceptible to LPS challenge due to the presence of other IFNγ-producing cells that drive the observed lethality. We have carefully re-examined the statistical analysis and confirm that it was correctly performed.  

      (3) Since the survival curve and rate are exactly the same (60%) in Figures 3F, 3G, 4C, 4F, 5G, and 5H I would just like to double-check that the authors used different controls for each experiment.

      The number of mice used in each experiment was carefully determined to ensure sufficient statistical power while fully complying with the limits established by our institutional Animal Ethics Committee. To minimize animal use, the same control group was shared across multiple survival experiments. Despite using shared controls, the total number of animals per experimental group was adequate to produce robust and reproducible survival outcomes. All groups were properly randomized, and the shared control data were rigorously incorporated into statistical analyses. This strategy allowed us to maintain both ethical standards and the scientific rigor of our findings.

      (4) In Figure 5 the authors are saying that it is neutrophils but not monocytes mediate susceptibility of animals with NAFLD to endotoxemia. However, CXCR2i depletion and CCR2 knock out mice affect both monocytes/macrophages and neutrophils. And in Figures 5E, 5G, and 5H they see that a) LPS+CXCR2i decreases liver damage more than LPS+anti Ly6G, b) HFCD mice challenged with LPS and treated with anti-LY6G do not rescue survival to levels of CHOW LPS and c) anti Ly6G treatment helps less than CXCR2i. Therefore, from both knock out mice and depletion experiments the authors can conclude that most likely monocytes (but potentially also other cells) together with neutrophils are substantial for the development of endotoxemic shock in choline-deficient high-fat diet model.

      While neutrophils express CCR2, our data clearly show that CCR2 deficiency does not impair neutrophil migration, as demonstrated in Supplemental Figures 5A and 5B (added to the manuscript, page 8, lines 213–217). The corresponding description has been added to the Results section (Page 8, Lines 213217) as follows: ``Interestingly, animals deficient in monocyte migration (CCR2-/-) showed a high mortality rate compared to wild type after LPS challenge and neutrophil migration is not altered (Fig. 5SA and Fig. 5SB)``, In contrast, CCR2 deficiency primarily affects monocyte recruitment, yet in our experimental conditions, monocyte depletion or CCR2 knockout did not significantly alter the severity of endotoxemic shock, indicating that monocytes play a minimal role in mediating susceptibility in HFCD-fed mice.

      To specifically investigate neutrophils, we used pharmacological blockade of CXCR2 to inhibit migration and antibody-mediated neutrophil depletion. Both approaches have consistently demonstrated that neutrophils are critical factors in endotoxemic shock.

      These findings support our conclusion that neutrophils are the primary cellular contributors to susceptibility in HFCD-fed mice during endotoxemia, with monocytes making a negligible contribution under the tested conditions.

      (6) In Figure 6A (but also others with PD-L1) did the authors do isotype control? And can they show how much of PD1+ population goes on neutrophils, and how much on all the other populations?

      To address this issue, we performed additional analyses to assess the distribution of PD-L1 expression on CD45+CD11B+ leukocytes. These new results, detailed on Page 9, lines 245-250, and now presented in Supplemental Figure 6, demonstrate that PD-L1 expression is predominantly enriched in neutrophils compared to other immune subsets. This observation further reinforces our conclusion that neutrophils represent a major source of PD-L1 in our experimental model.

      To ensure the robustness of these findings, we also included FMO controls for PD-L1 staining in the newly added Supplemental Figure S6. These controls validate the specificity of our gating strategy and confirm the reliability of the detected PD-L1 signal. The corresponding description has been added to the Results section (Page 9, Lines 245-250) as follows: ``First, we observed that only the MAFLD diet caused a significant increase in PD-L1 expression in CD45+CD11b+ leukocytes after LPS challenge (Fig. S6C). We observed that within this population, neutrophils predominate in their expression when compared to monocytes (Fig. 6SA, Fig. 6SB, and Fig. 6SD). Furthermore, PD-L+1 neutrophils showed an exacerbated migration of PD-L1+ neutrophils towards the liver (Fig. 6A and 6B)”

      (7) In Figure 6D it is interesting that there is not an increase in PD-L1+ neutrophils in LPS HFCD IFNg+/+ mice in comparison to LPS chow IFNg+/+ mice, since those should be like WT mice (Figure 6A going from 50% to 97%) and so an increase should be seen?

      The apparent difference between Figures 6A and 6D likely reflects inter-experimental variability rather than a biological discrepancy. Although the absolute percentages of PD-L1⁺ neutrophils varied slightly among independent experiments, the overall phenotype and trend were consistently maintained namely, that PD-L1 expression on neutrophils is enhanced in response to LPS stimulation and modulated by IFNγ signaling. Thus, the data shown in Figure 6D are representative of this consistent phenotype despite minor quantitative variation.

      (8) In Figure 7 do the authors have isotype control for TNFa because gating seems a bit random so an isotype control graph would help a lot as supplementary information, in order to make the figure more persuasive

      To address the concern regarding gating in Figure 7, we have included the FMO showing TNFα as a histogram Supplementary Figure 8gG. These control reaffirm the accuracy and reliability of our gating strategy for TNFα, further supporting the robustness of our data. The corresponding description has been added to the Results section (Page 9, Lines 272-274) as follows:`` We observed an exacerbated TNF-α expression by PD-L1+ neutrophils from MAFLD when compared to control chow animals (Fig. 7A, Fig. 7B, Fig. 7D, and Fig8SG).

      (9) Figure 6C IFNg+/+ mice on CHOW +LPS is same as Figure 8E mice chow +LPS but just with different numbers. Can the authors explain this?

      Although the data points in Figures 6C and 8E may appear similar, we confirm that they originate from entirely independent experiments and represent distinct datasets. To enhance clarity and avoid any potential confusion, we have adjusted the figure presentation and sizing in the revised manuscript. These changes make it clear that the datasets, while comparable, are derived from separate experimental replicates.

      (10) Figure 1E chow B6+LPS is the same as Figure 5D B6+LPS but should they be different since those should be two different experiments?

      We confirm that Figures 1E and 5D correspond to data obtained from independent experiments. Although the experimental conditions were similar, each dataset was generated and analyzed separately to ensure the reproducibility and robustness of our results.

      Reviewer #2 (Recommendations for the authors):

      (1) Why did you look at kidney injury in Figure 1D? I think this should be explained a little.

      We assessed kidney injury alongside ALT, a marker of liver damage, because both the liver and kidneys are among the primary organs affected during sepsis and endotoxemia. This rationale has been added to the manuscript (page 5, lines 129–131): “Remarkably, compared to the Chow group, HFCD mice exposed to LPS did not show greater changes in other organs commonly affected by endotoxemia, such as the kidneys (Figure 1D).” By evaluating markers of injury in both organs, we aimed to determine whether our physiopathological condition was liver-specific or indicative of broader systemic injury.

      (2) I know Figure 2C isn't your data, but why are there so few NK cells, considering NK cells are a resident liver cell type? Doesn't that also bring into question some of your data if there are so few NK cells? And the IFNG expression (2E) looks to mostly come from T-cells (CD8?).

      The data shown in Figure 2C were reanalyzed from a separate NAFLD model based on a 60% high-fat diet. Although this model differs from ours, the observed low number of NK cells is consistent with expectations for animals subjected solely to a hyperlipidic diet, which primarily provides an inflammatory stimulus that promotes recruitment rather than maintaining high baseline NK cell numbers.

      In our experimental model, these observations align with published data. Specifically, liver tissue from NAFLD animals typically exhibits low baseline NK cell numbers, but upon LPS challenge, there is a marked increase in NK cell recruitment to the liver. This dynamic illustrates the interplay between dietinduced inflammation and immune cell recruitment in our experimental context and supports the interpretation of our IFNγ data.

      (3) In your methods, I think you didn't explain something. You said LPS was administered to 56 week old mice, but that HFCD diet was started in 5-6 week old mice and lasted 2 weeks, then LPS was administered. So LPS administration happened when the mice were 7-8 weeks old, right?

      We thank the reviewer for pointing out this inconsistency in our Methods section. The reviewer is correct: the HFCD diet was initiated in 5–6-week-old mice, and LPS was administered after 2 weeks on the diet, such that LPS challenge occurred when the mice were 7–8 weeks old.

      We have revised the Methods section (add page 15-16, lines 474–480).  to clarify this timeline and ensure it is accurately described in the manuscript. The corresponding description has been added to the Materials and Methods section (Page 14, Lines 436-442) as follows: “Lipopolysaccharide (LPS; Escherichia coli (O111:B4), L2630, Sigma-Aldrich, St. Louis, MO, USA) was administered intraperitoneally (i.p.; 10 mg/kg) in C57BL/6, CCR2 -/-, IFN-/-, and TNFR1R2 -/- mice. The HFCD was initiated in 5–6 week-old mice, and LPS was administered after 2 weeks on the diet, meaning that LPS administration occurred when the mice were 7–8 weeks old, with body weights ranging from 22 to 26 g. LPS was previously solubilized in sterile saline and frozen at -70°C. The animals were euthanized 6 hours after LPS administration”.

      (4) Throughout the manuscript, I would consider changing the term NAFLD to something else. I think HFCD diet is a closer model to NASH, so there needs to be some discussion on that. And the field is changing these terms, so NAFLD is now MASLD and NASH is now MASH.

      We appreciate the reviewer’s comment regarding the terminology and disease classification. In our experimental conditions, the animals were subjected to a high-fat, choline-deficient (HFCD) diet for only two weeks, a period considered very early in the progression of diet-induced liver disease. At this stage, histological analysis revealed lipid accumulation in hepatocytes without evidence of hepatocellular injury, inflammation, or fibrosis. Therefore, our model more closely resembles the metabolic-associated fatty liver disease (MAFLD, formerly NAFLD) stage rather than the more advanced metabolic-associated steatohepatitis (MASH, formerly NASH).

      Indeed, prolonged exposure to HFCD diets, typically 8 to 16 weeks, is required to induce the inflammatory and fibrotic features characteristic of MASH. Since our objective was to study the initial metabolic and immune alterations preceding overt liver injury, we believe that using the term MAFLD more accurately reflects the pathological stage represented in our model. Accordingly, we have revised the text to align with the updated nomenclature and disease context.

      (6) I am concerned about over interpretation of the publicly available RNA-seq data in Figure 2. This data comes from human NAFLD patients with unknown endotoxemia and mouse models using a traditional high-fat diet model. So it is hard to compare these very disparate datasets to yours. Also, if these datasets have elevated IFNG, why does your model require LPS injection?

      We thank the reviewer for their thoughtful comments regarding the interpretation of the RNA-seq data presented in Figure 2. We would like to clarify that the human NAFLD datasets referenced in our study do not specifically include patients with endotoxemia; rather, they focus on individuals with NAFLD alone.

      Comparing data from human and murine MAFLD models, we observed that NK cells, T cells, and neutrophils are present and contribute to the hepatic inflammatory environment. Our reanalysis indicates that the elevations of IFNγ and TNF in NAFLD are primarily derived from NK cells, T cells, and myeloid cells, respectively.

      In our experimental model, LPS administration was used to evaluate whether these immune populations particularly NK cells are further potentiated under a hyperinflammatory state, leading to exacerbated IFNγ production. This approach allows us to determine whether increased IFNγ contributes to worsening outcomes in NAFLD, providing mechanistic insights that cannot be obtained from static human or traditional mouse datasets alone.

      (7) The zoom-ins for the histology (for example, Figure 1E) don't look right compared to the dotted square. The shape and area expanded don't match. And the cells in the zoom-in don't look exactly the same either.

      We have thoroughly re-examined the histological sections and the corresponding zoom-ins, including the example in Figure 1E. Upon verification, we confirm that the zoom-ins accurately represent the highlighted areas indicated by the dotted squares. The apparent discrepancies in shape or cellular appearance are likely due to minor differences in orientation or cropping during figure preparation. Nevertheless, the content and regions depicted are consistent with the original sections.  

      (8) Did the authors measure myeloid infiltration in the CCR2-/- mice? Did you measure Neutrophil infiltration in the TNF-Receptor KO mice?

      Analysis of CD45+ cell migration in CCR2 knockout mice, as shown in Supplemental Figure 5C and 5D, demonstrates that the absence of CCR2 does not impair overall leukocyte migration. Similarly, assessment of neutrophil migration in TNF receptor (TNFR1/2) knockout mice, presented in Supplemental Figure 8A, shows that neutrophil trafficking is not affected in these animals. These results indicate that the respective knockouts do not compromise the migration of the analyzed immune populations, supporting the interpretations presented in our study.

      (9) Regarding Methods for RNA-seq Analysis. Was the Mitochondrial percentage cutoff 0.8%, because that seems low. And was there not a Padj or FDR cutoff for the differential expression?

      The mitochondrial percentage in our scRNA-seq analysis reflects the proportion of mitochondrial gene expression per cell, which serves as a quality control metric. A low mitochondrial gene expression percentage, such as the 0.8% cutoff used here, is indicative of highly viable cells.

      For differential gene expression analysis, we employed the FindMarkers function in Seurat with standard parameters: adjusted p-value (Padj) < 0.05 and log2 fold change > 0.25 for upregulated genes, and adjusted p-value < 0.05 with log2 fold change < -0.25 for downregulated genes. These thresholds ensure robust identification of differentially expressed genes while balancing sensitivity and specificity.

      (10) Regarding Methods for Flow Cytometry. How were IFNG and TNF staining performed? Was this an intracellular stain? Did you need to block secretion? TNF and IFNG antibodies have the same fluorophore (PE), so were these stainings and analyses performed separately?

      Six hours after LPS challenge, non-parenchymal liver cells were isolated using Percoll gradient centrifugation. Because the animals were in a hyperinflammatory state induced by LPS, no in vitro stimulation was performed; all staining was carried out immediately after cell isolation. Detection of IFNγ and TNF was performed via intracellular staining using the Foxp3 staining kit (eBioscience). Due to both antibodies being conjugated to PE, IFN-γ and TNF-α staining and analyses were conducted in separate experiments. These distinct staining protocols and analyses are detailed in Supplemental Figures 10 and 11. The corresponding description has been added to the Materials and Methods section (Page 16, Lines 490-493) as follows: ``As animals were already in a hyperinflammatory state, no additional in vitro stimulation was required. Intracellular detection of IFN-γ and TNF-α was conducted using the Foxp3 staining kit (eBioscience). Since both antibodies were conjugated to PE, staining and analyses were performed in separate experiments``

      Reviewer #3 (Recommendations for the authors):

      (1) Achieving an NAFLD model/disease is the starting point of this study. I understand that a two-week HFCD diet period was applied due to the decrease in lymphocyte numbers. Was it enough to initiate NAFLD then? Or is it a milder metabolic disease? Which parameters have been evaluated to accept this model as a NAFLD model?

      Indeed, the two-week HFCD diet induces an early-stage form of NAFLD, characterized by initial fat accumulation in the liver without significant hepatic injury. While this represents a milder metabolic phenotype, it is sufficient to study the inflammatory and immune responses associated with NAFLD. To validate this model, we assessed multiple parameters: liver weight, blood glucose levels, and collagen deposition. These measurements confirmed the presence of early-stage NAFLD features in the animals, providing a relevant and reliable context for investigating susceptibility to endotoxemia and immune cell dynamics. They are shown in Figure Suplementary 1 and the text was included in the manuscript (Page 5, Lines 116-117): “Mice fed HFCD showed no increase in liver weight and collagen deposition as evidenced by Picrosirius staining (Fig. S1A and Fig. S1C) ”.

      (2) It is true that the CD274 gene (encoding PD-L1) and the IFNGR2 gene, corresponding to the IFNγ receptor, are among the upregulated genes when authors analyzed the publicly available RNAseq data but they are not the most significantly elevated genes. What is the reasoning behind this cherrypicking? Why are other high DEGs not analyzed but these two are analyzed?

      We highlighted the expression of the IFN-γ receptor (IFNGR2) and CD274 (encoding PD-L1) in the publicly available RNA-seq data to align and corroborate these findings with the key results observed later in our study. To avoid redundancy, we chose to present these genes in the initial figures as they are directly relevant to the subsequent analyses. Regarding the broader analysis of human RNA-seq data, our primary objective was to identify enriched biological processes and pathways, which served as a foundation for the focus and direction of this study.

      (3) Figures 3C-3G: I understand that IFNg-/- and NFR1R2a-/- mice are not showing elevated liver damage but it may simply be because of the non-responsiveness to the LPS challenge. I suggest using a different challenge or recovery experiments with the cytokines to show that the challenge is successful and results are caused by NAFLD, truly. The same goes for Figure 6: Looking at Figure 6D one may think that IFNg deficiency alters the LPS response independent of the diet condition (or NAFLD condition).

      We appreciate the reviewer’s insightful comment and fully understand the concern regarding the potential non-responsiveness of IFN-γ⁻/⁻ and TNFR1R2a⁻/⁻ mice to the LPS challenge. To address this point and confirm that these knockout animals are indeed responsive to LPS stimulation, we conducted an additional set of ex vivo experiments.

      Specifically, WT and cytokine-deficient (IFN-γ⁻/⁻) mice were fed either Chow or HFCD for two weeks, after which spleens were collected, and splenocytes were challenged in vitro with LPS. We then quantified TNF, IFN, and IL-6 production to confirm that these mice are capable of mounting cytokine responses upon LPS stimulation.

      Due to current breeding limitations and a temporary issue in colony maintenance of TNF-deficient mice, we were unable to include TNFR1R2a⁻/⁻ animals in this additional experiment. Nevertheless, we prioritized performing the analysis with the available knockout line to avoid leaving this important point unaddressed.

      These additional data demonstrate that IFN-γ-deficient mice remain responsive to LPS, reinforcing that the differences observed in vivo are related to the NAFLD condition rather than a lack of LPS responsiveness.

      (4) Figure 1 vs Figure 4: Rag-/- mice seem more susceptible to LPS-derived death even after normal conditions. But If I compare the survival data between Figure 1 and Figure 4, Rag-/- HFCD diet mice seem to be doing better than wt mice after LPS treatment. (1 day survival vs 2 days survival). How do you explain these different outcomes?

      We thank the reviewer for this insightful question regarding the survival data in Figures 1 and 4. Although there is a one-day difference in survival outcomes, Rag-/- mice consistently exhibit increased susceptibility to LPS-induced mortality can influence the exact survival timing. Nonetheless, across all experiments, Rag-/- mice display a reproducible phenotype of heightened sensitivity to LPS challenge, which is supported by multiple independent observations in our study.

      (5) How do you explain Figure 4J in connection to the observation presented with Figure 7: TNFa tissue levels, even though significant, seem very similar between the conditions?

      We would like to clarify that the animals in this study are in a metabolic syndrome state, with early-stage NAFLD characterized by hepatic fat accumulation without significant tissue injury, as shown in Figure 1C.

      Under these conditions, the LPS challenge triggers an exacerbated inflammatory response, leading to increased secretion of IFN-γ and TNF-α, primarily from NK cells and neutrophils. While TNFα levels may appear visually similar across conditions, the HFCD mice exhibit a heightened predisposition for an amplified immune response compared to chow-fed mice. This difference is consistent with the functional outcomes observed in our study and highlights the diet-specific sensitization of the immune system.

    1. eLife Assessment

      This work describes the establishment of an image analysis pipeline for signal correction, segmentation and quantitative data analysis of multilayered organoid and tumoroid systems. The revised study is important for the field to address many practical challenges in deep-tissue visualization. The image analysis pipeline is well-designed and compelling.

    2. Reviewer #1 (Public review):

      Summary:

      The image analysis pipeline is tested in analysing microscopy imaging data of gastruloids of varying sizes, for which an optimised protocol for in toto image acquisition is established based on whole mount sample preparation using an optimal refractive index matched mounting media, opposing dual side imaging with two-photon microscopy for enhaced laser penetration, dual view registration and weighted fusion for improved in toto sample data representation. For enhanced imaging speed in a two-photon microscope, parallel imaging was used and the authors performed spectral unmixing analysis to avoid issues of signal cross-talk.

      In the image analysis pipeline image, different pre-treatments are done dependent on the analysis to be performed (for nuclear segmentation - contrast enhancement and normalisation; for quantitative analysis of gene expression - corrections for optical artifacts inducing signal intensity variations). Stardist3D was used for the nuclear segmentation. The study analyses in toto properties of gastruloid nuclear density, patterns of cell division, morphology, deformation and gene expression.

      Strengths:

      The methods developed are sound, well described and well validated, using a sample challenging for microscopy, gastruloids. Many of the established methods are very useful (e.g. registration, corrections, signal normalisation, lazy loading bioimage visualisation, spectral decomposition analysis), facilitate the development of quantitative research and would be of interest to the wide scientific community.

      Comments on revisions:

      I am happy with the job the authors have done with the revision. No further comments.

    3. Reviewer #2 (Public review):

      Summary:

      This study presents an integrated experimental and computational pipeline for high-resolution, quantitative imaging and analysis of gastruloids. The experimental module employs dual-view two-photon spectral imaging combined with optimized clearing and mounting techniques, enabling improved deep-tissue visualization compared with conventional methods. This advanced approach allows comprehensive 3D imaging of whole-mount immunostained gastruloids, capturing both tissue-scale architecture and single-cell-level information.

      The computational module encompasses both pre-processing of acquired images and downstream analysis, providing quantitative insights into the structural and molecular characteristics of gastruloids. The pre-processing pipeline, tailored for dual-view two-photon microscopy, includes spectral unmixing of fluorescence signals using depth-dependent spectral profiles, as well as image fusion via rigid 3D transformation based on content-based block-matching algorithms. Nuclei segmentation was performed using a custom-trained StarDist3D model, validated against 2D manual annotations, and achieving an F1 score of 85+/-3% at a 50% intersection-over-union (IoU) threshold. Another custom-trained StarDist3D model enabled accurate detection of proliferating cells and the generation of 3D spatial maps of nuclear density and proliferation probability. Moreover, the pipeline facilitates detailed morphometric analysis of cell density and nuclear deformation, revealing pronounced spatial heterogeneities during early gastruloid morphogenesis.

      All computational tools developed in this study are released as open-source, Python-based software.

      Strengths:

      The authors applied two-photon microscopy to whole-mount deep imaging of gastruloids, achieving in toto visualization at single-cell resolution. By combining spectral imaging with an unmixing algorithm, they successfully separated four fluorescent signals, enabling spatial analysis of gene expression patterns.

      The image analysis method for nuclei segmentation was thoroughly benchmarked against existing methods, demonstrating advantages over conventional approaches, and its applicability across diverse datasets was convincingly established. The authors also evaluated the state-of-the-art Cellpose-SAM framework, showing that it performs well on their data and that the authors' preprocessing strategy can further enhance Cellpose-SAM's segmentation performance in deep tissues.<br /> The entire computational workflow, from image pre-processing to segmentation with a custom-trained StarDist3D model and subsequent quantitative analysis, is made available as open-source software. In addition, user-friendly interfaces are provided through the open-source, community-driven napari platform, facilitating interactive exploration and analysis.

      Weaknesses:

      In my initial review, I noted that the developed image analysis pipeline lacked benchmarking against existing methods and provided only a limited demonstration of its applicability to other datasets. These points have been appropriately addressed in the revised manuscript, and I have no further weaknesses to note.

      Appraisal:

      The authors set out to establish a quantitative imaging and analysis pipeline for gastruloids using dual-view two-photon microscopy, spectral unmixing, and a custom computational framework for 3D segmentation and gene expression analysis. This aim was compellingly achieved. The integration of experimental and computational modules enables high-resolution in toto imaging and robust quantitative analysis at the single-cell level. The data presented support the authors' conclusions regarding the ability to capture spatial patterns of gene expression and cellular morphology across developmental stages.

      Impact and utility:

      This work presents a compelling and broadly applicable methodological advance. The approach is particularly impactful for the developmental biology community, as it allows researchers to extract quantitative information from high-resolution images to better understand morphogenetic processes. The data are publicly available on Zenodo, and the software is released on GitHub, making them highly valuable resources for the community. Given that suitable datasets for developing advanced 3D cell segmentation methods remain scarce in biological image analysis, the public release of these data is significant and is expected to stimulate further advances in the development of sophisticated computational approaches.

      Comments on revisions:

      The authors have addressed the previous revision thoroughly and appropriately. I have no further suggestions or additional recommendations at this time.

    4. Reviewer #3 (Public review):

      Summary

      The paper presents a imaging and analysis pipeline for whole-mount gastruloid imaging with two-photon microscopy. The presented pipeline includes spectral unmixing, registration, segmentation, and a wavelength-depended intensity normalization step, followed by quantitative analysis of spatial gene expression patterns and nuclear morphometry on a tissue level. The utility of the approach is demonstrated by several experimental findings such as establishing spatial correlations between local nuclear deformation and tissue density changes, as well as radial distribution pattern of mesoderm markers. The pipeline is distributed as a Python package, notebooks and multiple napari plugins.

      Strengths

      The paper is well-written with detailed methodological descriptions, which I think would make it a valuable reference for researchers performing similar volumetric tissue imaging experiments (gastruloids/organoids). The pipeline itself addresses many practical challenges including resolution loss within tissue, registration of large volumes, nuclear segmentation, and intensity normalization. Especially the intensity decay measurements and wavelength-dependent intensity normalization approach using nuclear (Hoechst) signal as reference is very interesting and should be applicable to other imaging contexts. The morphometric analysis is equally well done with the correlation between nuclear shape deformation and tissue density changes being a interesting finding. The paper is quite thorough in its technical description of the methods (which are a lot) and their experimental validation is appropriate. Finally, the provided code and napari plugins seem to be well done (I installed a selected list of the plugins and they ran without issues) and should be very helpful for the community.

      Comments on revisions:

      The minor issues that I originally raised in my first review have been fully resolved in the revised version.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:  

      Reviewer #1 (Public review):  

      Summary:  

      The image analysis pipeline is tested in analysing microscopy imaging data of gastruloids of varying sizes, for which an optimised protocol for in toto image acquisition is established based on whole mount sample preparation using an optimal refractive index matched mounting media, opposing dual side imaging with two-photon microscopy for enhanced laser penetration, dual view registration, and weighted fusion for improved in toto sample data representation. For enhanced imaging speed in a two-photon microscope, parallel imaging was used, and the authors performed spectral unmixing analysis to avoid issues of signal cross-talk.  

      In the image analysis pipeline, different pre-treatments are done depending on the analysis to be performed (for nuclear segmentation - contrast enhancement and normalisation; for quantitative analysis of gene expression - corrections for optical artifacts inducing signal intensity variations). Stardist3D was used for the nuclear segmentation. The study analyses into properties of gastruloid nuclear density, patterns of cell division, morphology, deformation, and gene expression.  

      Strengths:  

      The methods developed are sound, well described, and well-validated, using a sample challenging for microscopy, gastruloids. Many of the established methods are very useful (e.g. registration, corrections, signal normalisation, lazy loading bioimage visualisation, spectral decomposition analysis), facilitate the development of quantitative research, and would be of interest to the wider scientific community.

      We thank the reviewer for this positive feedback.

      Weaknesses:  

      A recommendation should be added on when or under which conditions to use this pipeline. 

      We thank the reviewer for this valuable feedback, we added the text in the revised version, ines 418 to 474. “In general, the pipeline is applicable to any tissue, but it is particularly useful for large and dense 3D samples—such as organoids, embryos, explants, spheroids, or tumors—that are typically composed of multiple cell layers and have a thickness greater than 50 µm”.

      “The processing and analysis pipeline are compatible with any type of 3D imaging data (e.g. confocal, 2 photon, light-sheet, live or fixed)”.

      “Spectral unmixing to remove signal cross-talk of multiple fluorescent targets is typically more relevant in two-photon imaging due to the broader excitation spectra of fluorophores compared to single-photon imaging. In confocal or light-sheet microscopy, alternating excitation wavelengths often circumvents the need for unmixing. Spectral decomposition performs even better with true spectral detectors; however, these are usually not non-descanned detectors, which are more appropriate for deep tissue imaging. Our approach demonstrates that simultaneous cross-talk-free four-color two-photon imaging can be achieved in dense 3D specimen with four non-descanned detectors and co-excitation by just two laser lines. Depending on the dispersion in optically dense samples, depth-dependent apparent emission spectra need to be considered”.

      “Nuclei segmentation using our trained StarDist3D model is applicable to any system under two conditions: (1) the nuclei exhibit a star-convex shape, as required by the StarDist architecture, and (2) the image resolution is sufficient in XYZ to allow resampling. The exact sampling required is object- and system-dependent, but the goal is to achieve nearly isotropic objects with diameters of approximately 15 pixels while maintaining image quality. In practice, images containing objects that are natively close to or larger than 15 pixels in diameter should segment well after resampling. Conversely, images with objects that are significantly smaller along one or more dimensions will require careful inspection of the segmentation results”.

      “Normalization is broadly applicable to multicolor data when at least one channel is expected to be ubiquitously expressed within its domain. Wavelength-dependent correction requires experimental calibration using either an ubiquitous signal at each wavelength. Importantly, this calibration only needs to be performed once for a given set of experimental conditions (e.g., fluorophores, tissue type, mounting medium)”.

      “Multi-scale analysis of gene expression and morphometrics is applicable to any 3D multicolor image. This includes both the 3D visualization tools (Napari plugins) and the various analytical plots (e.g., correlation plots, radial analysis). Multi-scale analysis can be performed even with imperfect segmentation, as long as segmentation errors tend to cancel out when averaged locally at the relevant spatial scale. However, systematic errors—such as segmentation uncertainty along the Z-axis due to strong anisotropy—may accumulate and introduce bias in downstream analyses. Caution is advised when analyzing hollow structures (e.g., curved epithelial monolayers with large cavities), as the pipeline was developed primarily for 3D bulk tissues, and appropriate masking of cavities would be needed”.

      Reviewer #2 (Public review):  

      Summary:  

      This study presents an integrated experimental and computational pipeline for high-resolution, quantitative imaging and analysis of gastruloids. The experimental module employs dual-view two-photon spectral imaging combined with optimized clearing and mounting techniques to image whole-mount immunostained gastruloids. This approach enables the acquisition of comprehensive 3D images that capture both tissue-scale and single-cell level information.  

      The computational module encompasses both pre-processing of acquired images and downstream analysis, providing quantitative insights into the structural and molecular characteristics of gastruloids. The pre-processing pipeline, tailored for dual-view two-photon microscopy, includes spectral unmixing of fluorescence signals using depth-dependent spectral profiles, as well as image fusion via rigid 3D transformation based on content-based block-matching algorithms. Nuclei segmentation was performed using a custom-trained StarDist3D model, validated against 2D manual annotations, and achieving an F1 score of 85+/-3% at a 50% intersection-over-union (IoU) threshold. Another custom-trained StarDist3D model enabled accurate detection of proliferating cells and the generation of 3D spatial maps of nuclear density and proliferation probability. Moreover, the pipeline facilitates detailed morphometric analysis of cell density and nuclear deformation, revealing pronounced spatial heterogeneities during early gastruloid morphogenesis.  

      All computational tools developed in this study are released as open-source, Python-based software.  

      Strengths:  

      The authors applied two-photon microscopy to whole-mount deep imaging of gastruloids, achieving in toto visualization at single-cell resolution. By combining spectral imaging with an unmixing algorithm, they successfully separated four fluorescent signals, enabling spatial analysis of gene expression patterns.  

      The entire computational workflow, from image pre-processing to segmentation with a custom-trained StarDist3D model and subsequent quantitative analysis, is made available as open-source software. In addition, user-friendly interfaces are provided through the open-source, community-driven Napari platform, facilitating interactive exploration and analysis.

      We thank the reviewer for this positive feedback.

      Weaknesses:  

      The computational module appears promising. However, the analysis pipeline has not been validated on datasets beyond those generated by the authors, making it difficult to assess its general applicability.

      We agree that applying our analysis pipeline to published datasets—particularly those acquired with different imaging systems—would be valuable. However, only a few high-resolution datasets of large organoid samples are publicly available, and most of these either lack multiple fluorescence channels or represent 3D hollow structures. Our computational pipeline consists of several independent modules: spectral filtering, dual-view registration, local contrast enhancement, 3D nuclei segmentation, image normalization based on a ubiquitous marker, and multiscale analysis of gene expression and morphometrics. We added the following sentences to the Discussion, lines 418 to 474, and completed the discussion on applicability with a table showing the purpose, requirements, applicability and limitations of each step of the processing and analysis pipeline.

      “Spectral filtering has already been applied in other systems (e.g. [7] and [8]), but is here extended to account for imaging depth-dependent apparent emission spectra of the different fluorophores. In our pipeline, we provide code to run spectral filtering on multichannel images, integrated in Python. In order to apply the spectral filtering algorithm utilized here, spectral patterns of each fluorophore need to be calibrated as a function of imaging depth, which depend on the specific emission windows and detector settings of the microscope”.

      “Image normalization using a wavelength-dependent correction also requires calibration on a given imaging setup to measure the difference in signal decay among the different fluorophores species. To our knowledge, the calibration procedures for spectral-filtering and our image-normalization approach have not been performed previously in 3D samples, which is why validation on published datasets is not readily possible. Nevertheless, they are described in detail in the Methods section, and the code used—from the calibration measurements to the corrected images—is available open-source at the Zenodo link in the manuscript”.

      Dual-view registration, local contrast enhancement, and multiscale analysis of gene expression and morphometrics are not limited to organoid data or our specific imaging modalities. To evaluate our 3D nuclei segmentation model, we tested it on diverse systems, including gastruloids stained with the nuclear marker Draq5 from Moos et al. [1]; breast cancer spheroids; primary ductal adenocarcinoma organoids; human colon organoids and HCT116 monolayers from Ong et al. [2]; and zebrafish tissues imaged by confocal microscopy from Li et al [3]. These datasets were acquired using either light-sheet or confocal microscopy, with varying imaging parameters (e.g., objective lens, pixel size, staining method). The results are added in the manuscript, Fig. S9b.

      Besides, the nuclei segmentation component lacks benchmarking against existing methods.  

      We agree with the reviewer that a benchmark against existing segmentation methods would be very useful. We tried different pre-trained models:

      CellPose, which we tested in a previous paper ([4]) and which showed poor performances compared to our trained StarDist3D model.

      DeepStar3D ([2]) is only available in the software 3DCellScope. We could not benchmark the model on our data, because the free and accessible version of the software is limited to small datasets. An image of a single whole-mount gastruloid with one channel, having dimensions (347,467,477) was too large to be processed, see screenshot below. The segmentation model could not be extracted from the source code and tested externally because the trained DeepStar3D weights are encrypted.

      Author response image 1.

      Screenshot of the 3DCellScore software. We could not perform 3D nuclei segmentation of a whole-mount gastruloids because the image size was too large to be processed.

      AnyStar ([5]), which is a model trained from the StarDist3D architecture, was not performing well on our data because of the heterogeneous stainings. Basic pre-processing such as median and gaussian filtering did not improve the results and led to wrong segmentation of touching nuclei. AnyStar was demonstrated to segment well colon organoids in Ong et al, 2025 ([2]), but the nuclei were more homogeneously stained. Our Hoechst staining displays bright chromatin spots that are incorrectly labeled as individual nuclei.

      Cellos ([6]), another model trained from StarDist3D, was also not performing well. The objects used for training and to validate the results are sparse and not touching, so the predicted segmentation has a lot of false negatives even when lowering the probability threshold to detect more objects. Additionally, the network was trained with an anisotropy of (9,1,1), based on images with low z resolution, so it performed poorly on almost isotropic images. Adapting our images to the network’s anisotropy results in an imprecise segmentation that can not be used to measure 3D nuclei deformations.

      We tried both Cellos and AnyStar predictions on a gastruloid image from Fig. S2 of our main manuscript.  The results are added in the manuscript, Fig. S9b. Fig3 displays the results qualitatively compared to our trained model Stardist-tapenade.

      Author response image 2.

      Qualitative comparison of two published segmentation models versus our model. We show one slice from the XY plane for simplicity. Segmentations are displayed with their contours only. (Top left) Gastruloid stained with Hoechst, image extracted from Fig S2 of our manuscript. (Top right) Same image overlayed with the prediction from the Cellos model, showing many false negatives. (Bottom left) Same image overlayed with the prediction from our Stardist-tapenade model. (Bottom right) Same image overlayed with the prediction from the AnyStar model, false positives are indicated with a red arrow.

      CellPose-SAM, which is a recent model developed building on the CellPose framework. The pre-trained model performs well on gastruloids imaged using our pipeline, and performs better than StarDist3D at segmenting elongated objects such as deformed nuclei. The performances are qualitatively compared on Fig. S9a and S10.  We also demonstrate how using local contrast enhancement improves the results of CellPose-SAM (Fig. S10a), showing the versatility of the Tapenade pre-processing module. Tissue-scale, packing-related metrics from Cellpose–SAM labels qualitatively match those from stardist-tapenade as shown Fig.10c and d.

      Appraisal:  

      The authors set out to establish a quantitative imaging and analysis pipeline for gastruloids using dual-view two-photon microscopy, spectral unmixing, and a custom computational framework for 3D segmentation and gene expression analysis. This aim is largely achieved. The integration of experimental and computational modules enables high-resolution in toto imaging and robust quantitative analysis at the single-cell level. The data presented support the authors' conclusions regarding the ability to capture spatial patterns of gene expression and cellular morphology across developmental stages.  

      Impact and utility:  

      This work presents a compelling and broadly applicable methodological advance. The approach is particularly impactful for the developmental biology community, as it allows researchers to extract quantitative information from high-resolution images to better understand morphogenetic processes. The data are publicly available on Zenodo, and the software is released on GitHub, making them highly valuable resources for the community.  

      We thank the reviewer for these positive feedbacks.

      Reviewer #3 (Public review):

      Summary  

      The paper presents an imaging and analysis pipeline for whole-mount gastruloid imaging with two-photon microscopy. The presented pipeline includes spectral unmixing, registration, segmentation, and a wavelength-dependent intensity normalization step, followed by quantitative analysis of spatial gene expression patterns and nuclear morphometry on a tissue level. The utility of the approach is demonstrated by several experimental findings, such as establishing spatial correlations between local nuclear deformation and tissue density changes, as well as the radial distribution pattern of mesoderm markers. The pipeline is distributed as a Python package, notebooks, and multiple napari plugins.  

      Strengths  

      The paper is well-written with detailed methodological descriptions, which I think would make it a valuable reference for researchers performing similar volumetric tissue imaging experiments (gastruloids/organoids). The pipeline itself addresses many practical challenges, including resolution loss within tissue, registration of large volumes, nuclear segmentation, and intensity normalization. Especially the intensity decay measurements and wavelength-dependent intensity normalization approach using nuclear (Hoechst) signal as reference are very interesting and should be applicable to other imaging contexts. The morphometric analysis is equally well done, with the correlation between nuclear shape deformation and tissue density changes being an interesting finding. The paper is quite thorough in its technical description of the methods (which are a lot), and their experimental validation is appropriate. Finally, the provided code and napari plugins seem to be well done (I installed a selected list of the plugins and they ran without issues) and should be very helpful for the community.

      We thank the reviewer for his positive feedback and appreciation of our work.

      Weaknesses  

      I don't see any major weaknesses, and I would only have two issues that I think should be addressed in a revision:  

      (1) The demonstration notebooks lack accompanying sample datasets, preventing users from running them immediately and limiting the pipeline's accessibility. I would suggest to include (selective) demo data set that can be used to run the notebooks (e.g. for spectral unmixing) and or provide easily accessible demo input sample data for the napari plugins (I saw that there is some sample data for the processing plugin, so this maybe could already be used for the notebooks?).  

      We thank the reviewer for this relevant suggestion. The 7 notebooks were updated to automatically download sample tests. The different parts of the pipeline can now be run immediately:

      https://github.com/GuignardLab/tapenade/tree/chekcs_on_notebooks/src/tapenade/notebooks

      (2) The results for the morphometric analysis (Figure 4) seem to be only shown in lateral (xy) views without the corresponding axial (z) views. I would suggest adding this to the figure and showing the density/strain/angle distributions for those axial views as well.

      A morphometric analysis based on the axial views was added as Fig. S6a of the manuscript, complementary to the XY views.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):  

      In lines 64 and 65, it is mentioned that confocal and light-sheet microscopy remain limited to samples under 100μm in diameter. I would recommend revising this sentence. In the paper of Moos and colleagues (also cited in this manuscript; PMID: 38509326), gastruloid samples larger than 100μm are imaged in toto with an open-top dual-view and dual-illumination light-sheet microscope, and live cell behaviour is analysed. Another example, if considering also multi-angle systems, is the impressive work of McDole and colleagues (PMID: 30318151), in which one of the authors of this manuscript is a corresponding author. There, multi-angle light sheet microscopy is used for in toto imaging and reconstruction of post-implantation mouse development (samples much larger than 100μm). Some multi-sample imaging strategies have been developed for this type of imaging system, though not to the sample number extent allowed by the Viventis LS2 system or the Bruker TruLive3D imager, which have higher image quality limitations.

      We thank the reviewer for this remark. As reported in their paper, Moos et al. used dual-view light-sheet microscopy to image gastruloids, which are particularly dense and challenging tissues, with whole-mount samples of approximately 250 µm in diameter. Nevertheless, their image quality metric (DCT) shows a rapid twofold decrease within 50 µm depth (Extended Fig 5.h), whereas with two-photon microscopy, our image quality metric (FRC-QE) decreases by a factor of two over 150 µm in non-cleared samples (PBS) (see Fig. 2 c). While these two measurements (FRC-QE versus DCT) are not directly comparable, the observed difference reflects the superior depth performance of two-photon microscopy, owing in part to the use of non-descanned detectors. In our case, imaging was performed with Hoechst, a blue fluorophore suboptimal for deep imaging, whereas in the Moos dataset (Draq5, far-red), the configuration was more favorable for imaging in depth  which further supports our conclusion.

      In McDole et al, tissues reaching 250µm were imaged from 4 views, but do not reach cellular-scale resolution in deeper layers compatible with cell segmentation to our knowledge.

      We corrected the sentence ‘However, light-sheet and confocal imaging approaches remain limited to relatively small organoids typically under 100 micrometers in diameter ‘ by the following (line 64) :

      “While advances in light-sheet microscopy have extended imaging depth in organoids, maintaining high image quality throughout thick samples remains challenging. In practice, quantitative analyses are still largely restricted to organoids under roughly 100 µm in diameter”.

      It is worth mentioning that two-photon microscopes are much more widely available than light sheet microscopes, and light sheet systems with 2-photon excitation are even less accessible, which makes the described workflow of Gros and colleagues have a wide community interest.  

      We thank the reviewer for this remark, and added this suggestion line 74:

      “Finally, two-photon microscopes are typically more accessible than light-sheet systems and allow for straightforward sample mounting, as they rely on procedures comparable to standard confocal imaging”.

      Reviewer #2 (Recommendations for the authors):  

      Suggestions:  

      A comparison with established pre-trained models for 3D organoid image segmentation (e.g., Cellos[1], AnyStar[2], and DeepStar3D[3], all based on StarDist3D) would help highlight the advantages of the authors' custom StarDist3D model, which has been specifically optimized for two-photon microscopy images.  

      (1)  Cellos: https://doi.org/10.1038/s41467-023-44162-6

      (2)  AnyStar: https://doi.org/10.1109/WACV57701.2024.00742

      (3)  DeepStar3D: https://doi.org/10.1038/s41592-025-02685-4

      We agree with the reviewer that a benchmark against existing segmentation methods is very useful. This is addressed in the revised version, as detailed above (Figure 3).

      Recommendations:  

      Please clarify the following point. In line 195, the authors state, "This allowed us to detect all mitotic nuclei in whole-mount samples for any stage and size." Does this mean that the custom-trained StarDist3D model can detect 100% of mitotic nuclei? It was not clear from the manuscript, figures, or videos how this was validated. Given the reported performance scores of the StarDist3D model for detecting all nuclei, claiming 100% detection of mitotic nuclei seems surprisingly high.

      We thank the reviewer for this comment. As it was detailed in the methods section, the detection score reaches 82%, and only the complete pipeline (detection+minimal manual curation) allows us to detect all mitotic nuclei. To make it clearer, the following precisions were added in the Results section:

      ”To detect division events, we stained gastruloids with phosphohistone H3 (ph3) and trained a separate custom Stardist3D model using 3D annotations of nuclei expressing ph3 (see Methods III H). This model together allowed us to detect nearly all mitotic nuclei in whole-mount samples for any stage and size (Fig.3f and Suppl.Movie 4), and we used minimal manual curation to correct remaining errors.”

      Minor corrections:  

      It appears that Figures 4-6 are missing from the submitted version, but they can be found in the manuscript available on bioRxiv.

      We thank the reviewer for this remark, this was corrected immediately to add Figures 4 to 6.

      In line 185, is the intended phrase "by comparing the 2D predictions and the 2D sliced annotated segments..."? 

      To gain some clarity, we replaced the initial sentence:

      “The f1 score obtained by comparing the 3D prediction and the 3D ground-truth is well approximated by the f1 score obtained by comparing the 2D annotations and the 2D sliced annotated segments, with at most a 5% difference between the two scores.” by

      “The f1 score obtained in 3D (3D prediction compared with the 3D ground-truth) is well approximated by the f1 score obtained in 2D (2D predictions compared with the 2D sliced annotated segments). The difference between the 2 scores was at most 5%.”

      Reviewer #3 (Recommendations for the authors):

      (1) How is the "local neighborhood volume" defined, and how was it computed?

      The reviewer is referring to this paragraph (the term is underscored) :

      “To probe quantities related to the tissue structure at multiple scales, we smooth their signal with a Gaussian kernel of width σ, with σ defined as the spatial scale of interest. From the segmented nuclei instances, we compute 3D fields of cell density (number of cells per unit volume), nuclear volume fraction (ratio of nuclear volume to local neighborhood volume), and nuclear volume at multiple scales.”

      To improve clarity, the phrasing has been revised: the term local neighborhood volume has been replaced by local averaging volume, and a reference to the Methods section has been added.

      From the segmented nuclei instances, we compute 3D fields of cell density (number of cells per unit volume), nuclear volume fraction (ratio of space occupied by nuclear volume within the local averaging volume, as defined in the Methods III I), and nuclear volume at multiple scales.

      (2) In the definition of inertia tensor (18), isn't the inner part normally defined in the reversed way (delta_i,j - ...)?

      We thank the reviewer for noticing this error, which we fixed in the manuscript.

      (3) For intensity normalization, the paper uses the Hoechst signal density as a proxy for a ubiquitous nuclei signal. I would assume that this is problematic, for eg, dividing cells (which would overestimate it). Would using the average Hoechst signal per nucleus mask (as segmentation is available) be a better proxy?

      We agree that this idea is appealing if one assumes a clear relationship between nuclear volume and Hoechst intensity. However, since cell and nuclear volumes vary substantially with differentiation state (see Fig. 4), such a normalization approach would introduce additional biases at large spatial scales. We believe that the most robust improvement would instead consist in masking dividing cells during the normalization procedure, as these events could be detected and excluded from the computation.

      Nonetheless, we believe the method proposed by the reviewer could prove relevant for other types of data, so we will implement this recommendation in the code available in the Tapenade package.

      (4) Figures 4-6 were part of the Supplementary Material, but should be included in the main text?

      We thank the reviewer for this remark, this was corrected immediately to add Figures 4-6.

      We also noticed a missing reference to Fig. S3 in the main text, so we added lines 302 to 307 to comment on the wavelength-dependency of the normalization method. We improved the description of Fig.6, which lacked clarity (line 316 to 321, line 327).

      (1) Moos, F., Suppinger, S., de Medeiros, G., Oost, K.C., Boni, A., Rémy, C., Weevers, S.L., Tsiairis, C., Strnad, P. and Liberali, P., 2024. Open-top multisample dual-view light-sheet microscope for live imaging of large multicellular systems. Nature Methods, 21(5), pp.798-803.

      (2) Ong, H. T.; Karatas, E.; Poquillon, T.; Grenci, G.; Furlan, A.; Dilasser, F.; Mohamad Raffi, S. B.; Blanc, D.; Drimaracci, E.; Mikec, D.; Galisot, G.; Johnson, B. A.; Liu, A. Z.; Thiel, C.; Ullrich, O.; OrgaRES Consortium; Racine, V.; Beghin, A. (2025). Digitalized organoids: integrated pipeline for high-speed 3D analysis of organoid structures using multilevel segmentation and cellular topology.  Nature Methods, 22(6), pp.1343-1354

      (3) Li, L., Wu, L., Chen, A., Delp, E.J. and Umulis, D.M., 2023. 3D nuclei segmentation for multi-cellular quantification of zebrafish embryos using NISNet3D. Electronic Imaging, 35, pp.1-9.

      (4) Vanaret, J., Dupuis, V., Lenne, P. F., Richard, F., Tlili, S., & Roudot, P. (2023). A detector-independent quality score for cell segmentation without ground truth in 3D live fluorescence microscopy. IEEE Journal of Selected Topics in Quantum Electronics, 29(4:Biophotonics), 1-12.

      (5) Dey, N., Abulnaga, M., Billot, B., Turk, E. A., Grant, E., Dalca, A. V., & Golland, P. (2024). AnyStar: Domain randomized universal star-convex 3D instance segmentation. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (pp. 7593-7603).

      (6) Mukashyaka, P., Kumar, P., Mellert, D. J., Nicholas, S., Noorbakhsh, J., Brugiolo, M., ... & Chuang, J. H. (2023). High-throughput deconvolution of 3D organoid dynamics at cellular resolution for cancer pharmacology with Cellos. Nature Communications, 14(1), 8406.

      (7) Rakhymzhan, A., Leben, R., Zimmermann, H., Günther, R., Mex, P., Reismann, D., ... & Niesner, R. A. (2017). Synergistic strategy for multicolor two-photon microscopy: application to the analysis of germinal center reactions in vivo. Scientific reports, 7(1), 7101.

      (8) Dunsing, V., Petrich, A., & Chiantia, S. (2021). Multicolor fluorescence fluctuation spectroscopy in living cells via spectral detection. Elife, 10, e69687.

    1. eLife Assessment

      This important work compares the size of two brain areas, the amygdala and the hippocampus, across 12 species belonging to the Macaca genus. The authors find, using a convincing methodological approach, that amygdala - but not hippocampal - volume varies with social tolerance grade, with high tolerance species showing larger amygdala than low tolerance species of macaques. Interestingly, their findings also suggest an inverted developmental effect, with intolerant species showing an increase in amygdala volume across the lifespan, compared to tolerant species exhibiting the opposite trend. Overall, this paper offers new insights into the neural basis of social and emotional processing.

    2. Reviewer #1 (Public review):

      Summary:

      This paper investigates the potential link between amygdala volume and social tolerance in multiple macaque species. Through a comparative lens, the authors considered tolerance grade, species, age, sex, and other factors that may contribute to differing brain volumes. They found that amygdala, but not hippocampal, volume differed across tolerance grades such that high-tolerance species showed larger amygdala than low-tolerance species of macaques. They also found that less tolerant species exhibited increases in amygdala volume with age, while more tolerant species showed the opposite. Given their wide range of species with varied biological and ecological factors, the authors' findings provide new, important evidence for changes in amygdala volume in relation to social tolerance grades. Contributions from these findings will greatly benefit future efforts in the field to characterize brain regions critical for social and emotional processing across species.

      (1) This study demonstrates a concerted and impressive effort to comparatively examine neuroanatomical contributions to sociality in monkeys. The authors impressively collected samples from 12 macaque species with multiple datapoints across species age, sex, and ecological factors. Species from all four social tolerance grades were present. Further, the age range of the animals is noteworthy, particularly the inclusion of individuals over 20 years old.

      (2) This work is the first to report neuroanatomical correlates of social tolerance grade in macaques in one coherent study. Given the prevalence of macaques as a model of social neuroscience, considerations of how socio-cognitive demands are impacted by the amygdala are highly important. The authors' findings will certainly inform future studies on this topic.

      (3) The methodology and supplemental figures for acquiring brain MRI images are nicely detailed. Clear information on these parameters is crucial for future comparative interpretations of sociality and brain volume, and the authors do an excellent job of describing this process in full.

      (4) The following comments were brought up during the review. In their revision, the authors have sufficiently addressed all of these comments by providing detailed responses and updating their manuscript. First, the revision clarified how much one could draw conclusions about "nature vs. nurture" from this study. Second, the revision also clarified the contributions of very young and very old animals in their correlations. Third, in their revision, the authors expanded on how their results could be interpreted in the context of multiple behavioral traits by Thierry (2021) by providing more detailed descriptions. Finally, during the revision, the authors clarified that both intolerant and tolerant species experience complex socio-cognitive demands and highlighted that socio-cognitive challenges arise across the tolerance spectrum under different behavioral demands.

    3. Reviewer #2 (Public review):

      Summary:

      This comparative study of macaque species and type of social interaction is both ambitious and inevitably comes with a lot of caveats. The overall conclusion is that more intolerant species have a larger amygdala. There are also opposing development profiles regarding amygdala volume depending on whether it is a tolerant or intolerant species.

      To achieve any sort of power they have combined data from 4 centres - that have all used different scanning methods and there are some resolution differences. The authors have also had to group species into 4 classifications - again to assist with any generalisations and power. They have focussed on the volumes of two structures, the amygdala and the hippocampus, which seems appropriate. Neither structure is homogeneous and so it may well be that a targeted focus on specific nuclei or subfields would help (the authors may well do this next) - but as the variables would only increase further along with the number of potential comparisons, alongside small group numbers, it seems only prudent to treat these findings are preliminary. That said, it is highly unlikely that large numbers of macaque brains will become available in the near future.

      This introduction is by way of saying that the study achieves what it sets out to do, but there are many reasons to see this study as preliminary. The main message seems to be twofold: 1) that more intolerant species have relatively larger amygdalae, and 2) that with development there is an opposite pattern of volume change (increasing with age in intolerant sp and decreasing with age in tolerant species). Finding 1 is the opposite of that predicted in Table 1 - this is fine, but it should be made clearer in the Discussion that this is the case otherwise the reader may feel confused. As I read it, the authors have switched their prediction in the Discussion, which feels uncomfortable.

      It is inevitable that the data in a study of this complexity are all too prone to post hoc considerations, to which the authors indulge. I suspect I would end up doing the same but it feels a bit like 'heads I win, tails you lose'. In the case of Grade 1 species, the individuals have a lot to learn especially if they are not top of the hierarchy, but at the same time there are fewer individuals in the troop, making predictions very tricky. As noted above, I am concerned by the seemingly opposite predictions in Table 1 and those in the Discussion regarding tolerance and amygdala volume. (It may be that the predictions in Table 1 are the opposite to how I read them, in which case the Table and preceding text needs to align.)

      Comments on revisions:

      I am happy with all of the revisions and the care shown by the authors.

    4. Reviewer #3 (Public review):

      Summary:

      In this study, the authors were looking at neurocorrelates of behavioural differences within the genus Macaca. To do so, they engaged in real-world dissection of dead animals (unconnected to the present study) coming from a range of different institutions. They subsequently compare different brain areas, here the amygdala and the hippocampus, across species. Crucially, these species have been sorted according to different levels of social tolerance grades (from 1 to 4). 12 species are represented across 42 individuals. The sampling process has weaknesses ("only half" of the species contained by the genus, and Macaca mulatta, the rhesus macaque, representing 13 of the total number of individuals), but also strengths (the species are decently well represented across the 4 grades) for the given purpose and for the amount of work required here. I will not judge the dissection process as I am not a neuroanatomist, and I will assume that the different interventions do not alter volume in any significant ways / or that the different conditions in which the bodies were kept led to the documented differences across species.

      There are two main results of the study. First, in line with their predictions, the authors find that more tolerant macaque species have larger amygdala, compared to the hippocampus that remains undifferentiated across species. Second, they also identify developmental effects, although with different trends: in tolerant species, the amygdala relative volume decreases across the lifespan, while in intolerant species, the contrary occurs. The modifications brought up between the two versions of the article have answered my remarks regarding age/grade/brain area differences.

      As such, I think the results are holding strong, but maybe more work is needed with respect to interpretation.<br /> Classification of the social grade, as well as the issue of nature vs nurture have been addressed by the authors, I thank them for this.<br /> I still feel the integration of the amygdala as a common cognitive & emotional center could be possibly more pushed in the discussion, although I acknowledge that it would be complicated to do without knowing how the emotional and social lives of these animals impacted the growth of their amygdala...

      Strengths:

      Methods & breadth of species tested

      Weaknesses:

      Interpretations, which, although softened, could still be more integrated with the literature on emotion

    5. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews: 

      Reviewer #1 (Public review):

      We thank Reviewer #1 for its thoughtful and constructive feedback. We found the suggestions particularly helpful in refining the conceptual framework and clarifying key aspects of our interpretations.

      Summary:

      This paper investigates the potential link between amygdala volume and social tolerance in multiple macaque species. Through a comparative lens, the authors considered tolerance grade, species, age, sex, and other factors that may contribute to differing brain volumes. They found that amygdala, but not hippocampal, volume differed across tolerance grades, such that hightolerance species showed larger amygdala than low-tolerance species of macaques. They also found that less tolerant species exhibited increases in amygdala volume with age, while more tolerant species showed the opposite. Given their wide range of species with varied biological and ecological factors, the authors' findings provide new evidence for changes in amygdala volume in relation to social tolerance grades. Contributions from these findings will greatly benefit future efforts in the field to characterize brain regions critical for social and emotional processing across species.

      Strengths:

      (1) This study demonstrates a concerted and impressive effort to comparatively examine neuroanatomical contributions to sociality in monkeys. The authors impressively collected samples from 12 macaque species with multiple datapoints across species age, sex, and ecological factors. Species from all four social tolerance grades were present. Further, the age range of the animals is noteworthy, particularly the inclusion of individuals over 20 years old - an age that is rare in the wild but more common in captive settings. 

      (2) This work is the first to report neuroanatomical correlates of social tolerance grade in macaques in one coherent study. Given the prevalence of macaques as a model of social neuroscience, considerations of how socio-cognitive demands are impacted by the amygdala are highly important. The authors' findings will certainly inform future studies on this topic.

      (3) The methodology and supplemental figures for acquiring brain MRI images are well detailed. Clear information on these parameters is crucial for future comparative interpretations of sociality and brain volume, and the authors do an excellent job of describing this process in full.

      Weaknesses:

      (1) The nature vs. nurture distinction is an important one, but it may be difficult to draw conclusions about "nature" in this case, given that only two data points (from grades 3 and 4) come from animals under one year of age (Method Figure 1D). Most brains were collected after substantial social exposure-typically post age 1 or 1.5-so the data may better reflect developmental changes due to early life experience rather than innate wiring. It might be helpful to frame the findings more clearly in terms of how early experiences shape development over time, rather than as a nature vs. nurture dichotomy.

      We agree with the reviewer that presenting our findings through a strict nature vs. nurture dichotomy was potentially misleading. We have revised the introduction and the discussion (e.g. lines 85-95 and 363-365) to clarify that we examined how neurodevelopmental trajectories differ across social grades with the caveat of related to the absence of very young individuals in our samples.  We now explicitly mention that our results may reflect both early species-typical biases and experience-dependent maturation.

      We positioned our study on social tolerance in a comparative neuroscience framework and introduced a tentative working model that articulates behavioral traits, cognitive dimensions, and their potential subcortical neural substrates

      Drawing upon 18 behavioral traits identified in Thierry’s comparative analyses (Thierry, 2021, 2007), we organize these traits into three core dimensions: socio-cognitive demands, behavioral inhibition, and the predictability of the social environment (Table 1). This conceptualization does not aim to redefine social tolerance itself, but rather to provide a structured basis for testing neuroanatomical hypotheses related to social style variability. It echoes recent efforts to bridge behavioral ecology and cognitive neuroscience by linking specific mental abilities – such as executive functions or metacognition – with distinct prefrontal regions shaped by social and ecological pressures (Bouret et al., 2024).

      “Cross-fostering experiments (De Waal and Johanowicz, 1993), along with our own results, suggest that social tolerance grades reflect both early, possibly innate predispositions and later environmental shaping”.

      (2) It would be valuable to clarify how the older individuals, especially those 20+ years old, may have influenced the observed age-related correlations (e.g., positive in grades 1-2, negative in grades 3-4). Since primates show well-documented signs of aging, some discussion of the potential contribution of advanced age to the results could strengthen the interpretation.

      We thank the reviewer for highlighting this important point. In our dataset, younger and older subjects are underrepresented, but they are distributed across all subgroups. Therefore, we do not think that it could drive the interaction effect we are reporting. In our sample, amygdala volume tended to increase with age in intolerant species and decrease in tolerant species. We included a new analysis (Figure 4) that allows providing a clearer assessment of when social grades 1 vs 4 differed in terms of amygdala and hippocampus volume. While our model accounts for age continuously, we agree that age-related variation deserves cautious interpretation and require longitudinal designs in future studies.

      We also added the following statements in the discussion (lines 386-391)

      “Due to a limited sample size of our study, this crossing trend, already accounted for by our continuous age model, should be further investigated. These results call for cautious interpretation of age-related variation and further emphasize the importance of longitudinal studies integrating both behavioral, cognitive and anatomical data in non-human primates, which would help to better understand the link between social environment and brain development (Song et al., 2021)”.

      (3) The authors categorize the behavioral traits previously described in Thierry (2021) into 3 selfdefined cognitive requirements, however, they do not discuss under what conditions specific traits were assigned to categories or justify why these cognitive requirements were chosen. It is not fully clear from Thierry (2021) alone how each trait would align with the authors' categories. Given that these traits/categories are drawn on for their neuroanatomical hypotheses, it is important that the authors clarify this. It would be helpful to include a table with all behavioral traits with their respective categories, and explain their reasoning for selecting each cognitive requirement category.

      Thank you for this important suggestion. We have extensively revised the introduction to explain how we derived from the scientific literature the three cognitive dimensions—socio-cognitive demands, behavioral inhibition, and predictability of the social environment—. We now provide a complete overview of the 18 behavioral traits described in Thierry’s framework and their cognitive classification in a dedicated table , along with hypothesized neural correlates. We have also mentioned traits that were not classified in our framework along with short justification of this classification. We believe this addition significantly improves the transparency and intelligibility of our conceptual approach.

      “The concept of social tolerance, central to this comparative approach, has sometimes been used in a vague or unidimensional way. As Bernard Thierry (2021) pointed out, the notion was initially constructed around variations in agonistic relationships – dominance, aggressiveness, appeasement or reconciliation behaviors – before being expanded to include affiliative behaviors, allomaternal care or male–male interactions (Thierry, 2021). These traits do not necessarily align along a single hierarchical axis but rather reflect a multidimensional complexity of social style, in which each trait may have co-evolved with others (Thierry, 2021, 2000; Thierry et al., 2004). Moreover, the lack of a standardized scientific definition has sometimes led to labeling species as “tolerant” or “intolerant” without explicit criteria (Gumert and Ho, 2008; Patzelt et al., 2014). These behavioral differences are characterized by different styles of dominance (Balasubramaniam et al., 2012), severity of agonistic interactions (Duboscq et al., 2014), nepotism (Berman and Thierry, 2010; Duboscq et al., 2013; Sueur et al., 2011) and submission signals (De Waal and Luttrell, 1985; Rincon et al., 2023), among the 18 covariant behavioral traits described in Thierry's classification of social tolerance (Thierry, 2021, 2017, 2000)”.

      “To ground the investigation of social tolerance in a comparative neuroanatomical framework, we introduce a tentative working model that articulates behavioral traits, cognitive dimensions, and their potential subcortical neural substrates. Drawing upon 18 behavioral traits identified in Thierry’s comparative analyses (Thierry, 2021, 2007), we organized these traits into three core dimensions: socio-cognitive demands, behavioral inhibition, and the predictability of the social environment (Table 1). This conceptualization does not aim to redefine social tolerance itself, but rather to provide a structured basis for testing neuroanatomical hypotheses related to social style variability. It echoes recent efforts to bridge behavioral ecology and cognitive neuroscience by linking specific mental abilities – such as executive functions or metacognition – with distinct prefrontal regions shaped by social and ecological pressures (Bouret et al., 2024; Testard 2022)”.

      (4) One of the main distinctions the authors make between high social tolerance species and low tolerance species is the level of complex socio-cognitive demands, with more tolerant species experiencing the highest demands. However, socio-cognitive demands can also be very complex for less tolerant species because they need to strategically balance behaviors in the presence of others. The relationships between socio-cognitive demands and social tolerance grades should be viewed in a more nuanced and context-specific manner. 

      We fully agree and we did not mean that intolerant species lives in a ‘simple’ social environment but that the ones of more tolerant species is markedly more demanding. Evidence supporting this statement include their more efficient social networks (Sueur et al., 2011) and more complex communicative skills (e.g. tolerant macaques displayed higher levels of vocal diversity and flexibility than intolerant macaques in social situation with high uncertainty (Rebout et al., 2020).

      In the revised version (lines 106-122), we now highlight that socio-cognitive challenges arise across the tolerance spectrum, including in less tolerant species where strategic navigation of rigid hierarchies and risk-prone interactions is required. We hope that this addition offers a more balanced and nuanced framing of socio-cognitive demands across macaque societies

      “The first category, socio-cognitive demands, refers to the cognitive resources needed to process, monitor, and flexibly adapt to complex social environments. Linking those parameters to neurological data is at the core of the social brain theory to explain the expansion of the neocortex in primates (Dunbar). Macaques social systems require advanced abilities in social memory, perspective-taking, and partner evaluation (Freeberg et al., 2012). This is particularly true in tolerant species, where the increased frequency and diversity of interactions may amplify the demands on cognitive tracking and flexibility. Tolerant macaque species typically live in larger groups with high interaction frequencies, low nepotism, and a wider range of affiliative and cooperative behaviors, including reconciliation, coalition-building, and signal flexibility (REF). Tolerant macaque species also exhibit a more diverse and flexible vocal and facial repertoire than intolerants ones which may help reduce ambiguity and facilitate coordination in dense social networks (Rincon et al., 2023; Scopa and Palagi, 2016; Rebout 2020). Experimental studies further show that macaques can use facial expressions to anticipate the likely outcomes of social interactions, suggesting a predictive function of facial signals in managing uncertainty (Micheletta et al., 2012; Waller et al., 2016). Even within less tolerant species, like M. mulatta, individual variation in facial expressivity has been linked to increased centrality in social networks and greater group cohesion, pointing to the adaptive value of expressive signaling across social styles (Whitehouse et al., 2024)”.

      (5) While the limitations section touches on species-related considerations, the issue of individual variability within species remains important. Given that amygdala volume can be influenced by factors such as social rank and broader life experience, it might be useful to further emphasize that these factors could introduce meaningful variation across individuals. This doesn't detract from the current findings but highlights the importance of considering life history and context when interpreting subcortical volumes-particularly in future studies.

      We have now emphasized this point in the limitations section (lines 441-456). While our current dataset does not allow us to fully control for individual-level variables across all collection centers, we recognize that factors such as rank, social exposure, and individual life history may influence subcortical volumes

      “Although we explained some interspecies variability, adding subjects to our database will increase statistical power and will help addressing potential confounding factors such as age or sex in future studies. One will benefit from additional information about each subject. While considered in our modelling, the social living and husbandry conditions of the individuals in our dataset remain poorly documented. The living environment has been considered, and the size of social groups for certain individuals, particularly for individuals from the CdP, have been recorded. However, these social characteristics have not been determined for all individuals in the dataset. As previously stated, the social environment has a significant impact on the volumetry of certain regions. Furthermore, there is a lack of data regarding the hierarchy of the subjects under study and the stress they experience in accordance with their hierarchical rank and predictability of social outcomes position (McCowan et al., 2022)”. 

      Reviewer #2 (Public review):

      We thank Reviewer #2 for its thoughtful remarks and for acknowledging the value of our comparative approach despite its inherent constraints.

      Summary:

      This comparative study of macaque species and the type of social interaction is both ambitious and inevitably comes with a lot of caveats. The overall conclusion is that more intolerant species have a larger amygdala. There are also opposing development profiles regarding amygdala volume depending on whether it is a tolerant or intolerant species.

      To achieve any sort of power, they have combined data from 4 centres, which have all used different scanning methods, and there are some resolution differences. The authors have also had to group species into 4 classifications - again to assist with any generalisations and power. They have focused on the volumes of two structures, the amygdala and the hippocampus, which seems appropriate. Neither structure is homogeneous and so it may well be that a targeted focus on specific nuclei or subfields would help (the authors may well do this next) - but as the variables would only increase further along with the number of potential comparisons, alongside small group numbers, it seems only prudent to treat these findings are preliminary. That said, it is highly unlikely that large numbers of macaque brains will become available in the near future.

      This introduction is by way of saying that the study achieves what it sets out to do, but there are many reasons to see this study as preliminary. The main message seems to be twofold: (1) that more intolerant species have relatively larger amygdalae, and (2) that with development, there is an opposite pattern of volume change (increasing with age in intolerant species and decreasing with age in tolerant species). Finding 1 is the opposite of that predicted in Table 1 - this is fine, but it should be made clearer in the Discussion that this is the case, otherwise the reader may feel confused. As I read it, the authors have switched their prediction in the Discussion, which feels uncomfortable. 

      We thank the reviewer for this important observation. In the original version, Table 1 presented simplified direct predictions linking social tolerance grades to amygdala and hippocampus volumes. We recognize that this formulation may have created confusion In the revised manuscript, we have thoroughly restructured the table and its accompanying rationale. Table 1 now better reflects our conceptual framework grounded in three cognitive dimensions—sociocognitive demands, behavioral inhibition, and social predictability—each linked to behavioral traits and associated neural hypotheses based on published literature. This updated framework, detailed in lines 144-169 of the introduction, provides a more nuanced basis for interpreting our results and avoids the inconsistencies previously noted. The Discussion was also revised accordingly (lines 329-255) to clarify where our findings diverge from the original predictions and to explore alternative explanations based on social complexity. Rather than directly predicting amygdala size from social tolerance grades, we propose that variation in volume emerges from differing combinations of cognitive pressures across species.

      It is inevitable that the data in a study of this complexity are all too prone to post hoc considerations, to which the authors indulge. In the case of Grade 1 species, the individuals have a lot to learn, especially if they are not top of the hierarchy, but at the same time, there are fewer individuals in the troop, making predictions very tricky. As noted above, I am concerned by the seemingly opposite predictions in Table 1 and those in the Discussion regarding tolerance and amygdala volume. (It may be that the predictions in Table 1 are the opposite of how I read them, in which case the Table and preceding text need to align.)

      In order to facilitate the interpretation of our Bayesian modelling, we have selected a more focused ROI in our automatic segmentation procedure of the Hippocampus (from Hippocampal Formation to Hippocampus) and have added to the new analysis (Figure 4) that helps to properly test whether the hippocampus significantly differs between species from social grade 1 vs 4. The present analysis found that this is the case in adult monkeys. This is therefore consistent with our hypothesis that amygdala volumes are principally explained by heightened sociocognitive demands in more tolerant species.

      We also acknowledge the reviewer’s concerns about the limited generalizability due to our sample. The challenges of comparative neuroimaging in non-human primates—especially when using post-mortem datasets—are substantial. Given the ethical constraints and the rarity of available specimens, increasing the number of individuals or species is not feasible in the short term. However, we have made all data and code publicly available and clearly stated the limitations of our sample in the manuscript. Despite these constraints, we believe our dataset offers an unprecedented comparative perspective, particularly due to the inclusion of rare and tolerant species such as M. tonkeana, M. nigra, and M. thibetana, which have never been included in structural MRI studies before. We hope this effort will serve as a foundation for future collaborative initiatives in primate comparative neuroscience.

      Reviewer #3 (Public review):

      We thank Reviewer #3 for their thoughtful and detailed review. Their comments helped us refine both the conceptual and interpretative aspects of the manuscript. We respond point by point below.

      Summary:

      In this study, the authors were looking at neurocorrelates of behavioural differences within the genus Macaca. To do so, they engaged in real-world dissection of dead animals (unconnected to the present study) coming from a range of different institutions. They subsequently compare different brain areas, here the amygdala and the hippocampus, across species. Crucially, these species have been sorted according to different levels of social tolerance grades (from 1 to 4). 12 species are represented across 42 individuals. The sampling process has weaknesses ("only half" of the species contained by the genus, and Macaca mulatta, the rhesus macaque, representing 13 of the total number of individuals), but also strengths (the species are decently well represented across the 4 grades) for the given purpose and for the amount of work required here. I will not judge the dissection process as I am not a neuroanatomist, and I will assume that the different interventions do not alter volume in any significant ways / or that the different conditions in which the bodies were kept led to the documented differences across species. 

      25 brains were extracted by the authors themselves who are highly with this procedure. Overall, we believe that dissection protocols did not alter the total brain volume. Despite our expertise, we experienced some difficulties to not damage the cerebellum. Therefore, this region was not included in our analysis. We also noted that this brain region was also damaged or absent from the Prime-DE dataset.

      Several protocols were used to prepare and store tissue. It could have impacted the total brain volume.

      We agree that differences in tissue preparation and storage could potentially affect total brain volume. Therefore, we explicitly included the main sample preparation variable — whether brains had been previously frozen — as a covariate in our model. This factor did not explain our results. Moreover, Figures 1D and 1I display the frozen status and its correlation with the amygdala and hippocampus ratios, respectively. Figure 2 shows the parameters of the model and the posterior distributions for the frozen status and total brain volume effects.

      There are two main results of the study. First, in line with their predictions, the authors find that more tolerant macaque species have larger amygdala, compared to the hippocampus, which remains undifferentiated across species. Second, they also identify developmental effects, although with different trends: in tolerant species, the amygdala relative volume decreases across the lifespan, while in intolerant species, the contrary occurs. The results look quite strong, although the authors could bring up some more clarity in their replies regarding the data they are working with. From one figure to the other, we switch from model-calculated ratio to modelpredicted volume. Note that if one was to sample a brain at age 20 in all the grades according to the model-predicted volumes, it would not seem that the difference for amygdala would differ much across grades, mostly driven with Grade 1 being smaller (in line with the main result), but then with Grade 2 bigger than Grade 3, and then Grade 4 bigger once again, but not that different from Grade 2.

      Overall, despite this, I think the results are pretty strong, the correlations are not to be contested, but I also wonder about their real meaning and implications. This can be seen under 3 possible aspects:

      (1)  Classification of the social grade

      While it may be familiar to readers of Thierry and collaborators, or to researchers of the macaque world, there is no list included of the 18 behavioral traits used to define the three main cognitive requirements (socio-cognitive demands, predictability of the environment, inhibitory control). It would be important to know which of the different traits correspond to what, whether they overlap, and crucially, how they are realized in the 12 study species, as there could be drastic differences from one species to the next. For now, we can only see from Table S1 where the species align to, but it would be a good addition to have them individually matched to, if not the 18 behavioral traits, at least the 3 different broad categories of cognitive requirements.

      We fully agree with this observation. In the revised version of the manuscript, we now include a detailed conceptual table listing all 18 behavioral traits from Thierry’s framework. For each trait, we provide its underlying social implications, its associated cognitive dimension (when applicable), and the hypothesized neural correlate. 

      While some traits may could have been arguably classified in several cognitive dimensions (e.g. reconciliation rate), we preferred to assign each to a unique dimension for clarity. Additionally, the introduction (lines 95-169 + Table1) now explains how each trait was evaluated based on existing literature and assigned to one of the three proposed cognitive categories: socio-cognitive demands, behavioral inhibition, or social unpredictability. This structure offers a clearer and more transparent basis for the neuroanatomical hypotheses tested in the study.

      “Navigating social life in primate societies requires substantial cognitive resources: individuals must not only track multiple relationships, but also regulate their own behavior, anticipate others’ reactions, and adapt flexibly to changing social contexts. Taken advantage of databases of magnetic resonance imaging (MRI) structural scans, we conducted the first comparative study integrating neuroanatomical data and social behavioral data from closely related primate species of the same genus to address the following questions: To what extent can differences in volumes of subcortical brain structures be correlated with varying degrees of social tolerance? Additionally, we explored whether these dispositions reflect primarily innate features, shaped by evolutionary processes, or acquired through socialization within more or less tolerant social environments”.

      “The first category, socio-cognitive demands, refers to the cognitive resources needed to process, monitor, and flexibly adapt to complex social environments. Linking those parameters to neurological data is at the core of the social brain theory to explain the expansion of the neocortex in primates (Dunbar). Macaques social systems require advanced abilities in social memory, perspective-taking, and partner evaluation (Freeberg et al., 2012). This is particularly true in tolerant species, where the increased frequency and diversity of interactions may amplify the demands on cognitive tracking and flexibility. Tolerant macaque species typically live in larger groups with high interaction frequencies, low nepotism, and a wider range of affiliative and cooperative behaviors, including reconciliation, coalition-building, and signal flexibility (REF). Tolerant macaque species also exhibit a more diverse and flexible vocal and facial repertoire than intolerants ones which may help reduce ambiguity and facilitate coordination in dense social networks (Rincon et al., 2023; Scopa and Palagi, 2016; Rebout 2020). Experimental studies further show that macaques can use facial expressions to anticipate the likely outcomes of social interactions, suggesting a predictive function of facial signals in managing uncertainty (Micheletta et al., 2012; Waller et al., 2016). Even within less tolerant species, like M. mulatta, individual variation in facial expressivity has been linked to increased centrality in social networks and greater group cohesion, pointing to the adaptive value of expressive signaling across social styles (Whitehouse et al., 2024)”.

      “The second category, inhibitory control, includes traits that involve regulating impulsivity, aggression, or inappropriate responses during social interactions. Tolerant macaques have been shown to perform better in tasks requiring behavioral inhibition and also express lower aggression and emotional reactivity in both experimental and natural contexts (Joly et al., 2017; Loyant et al., 2023). These features point to stronger self-regulation capacities in species with egalitarian or less rigid hierarchies. More broadly, inhibition – especially in its strategic form (self-control) – has been proposed to play a key role in the cohesion of stable social groups. Comparative analyses across mammals suggest that this capacity has evolved primarily in anthropoid primates, where social bonds require individuals to suppress immediate impulses in favour of longer-term group stability (Dunbar and Shultz, 2025). This view echoes the conjecture of Passingham and Wise (2012), who proposed that the emergence of prefrontal area BA10 in anthropoids enabled the kind of behavioural flexibility needed to navigate complex social environments (Passingham et al., 2012)”.

      “The third category, social environment predictability, reflects how structured and foreseeable social interactions are within a given society. In tolerant species, social interactions are more fluid and less kin-biased, leading to greater contextual variation and role flexibility, which likely imply a sustained level of social awareness. In fact, as suggested by recent research, such social uncertainty and prolonged incentives are reflected by stress-related physiology : tolerant macaques such as M. tonkeana display higher basal cortisol levels, which may be indicative of a chronic mobilization of attentional and regulatory resources to navigate less predictable social environments (Sadoughi et al., 2021)”.

      “Each behavioral trait was individually evaluated based on existing empirical literature regarding the types of cognitive operations it likely involves. When a primary cognitive dimension could be identified, the trait was assigned accordingly. However, some behaviors – such as maternal protection, allomaternal care, or delayed male dispersal – do not map neatly onto a single cognitive process. These traits likely emerge from complex configurations of affective and socialmotivational systems, and may be better understood through frameworks such as attachment theory (Suomi, 2008), which emphasizes the integration of social bonding, emotional regulation, and contextual plasticity. While these dimensions fall beyond the scope of the present framework, they offer promising directions for future research, particularly in relation to the hypothalamic and limbic substrates of social and reproductive behavior”.

      “Rather than forcing these traits into potentially misleading categories, we chose to leave them unclassified within our current cognitive framework. This decision reflects both a commitment to conceptual clarity and the recognition that some behaviors emerge from a convergence of cognitive demands that cannot be neatly isolated. This tripartite framework, leaving aside reproductive-related traits, provides a structured lens through which to link behavioral diversity to specific cognitive processes and generate neuroanatomical predictions”.

      (2) Issue of nature vs nurture

      Another way to look at the debate between nature vs nurture is to look at phylogeny. For now, there is no phylogenetic tree that shows where the different grades are realized. For example, it would be illuminating to know whether more related species, independently of grades, have similar amygdala or hippocampus sizes. Then the question will go to the details, and whether the grades are realized in particular phylogenetic subdivisions. This would go in line with the general point of the authors that there could be general species differences.

      As pointed out by Thierry and collaborators, the social tolerance concept is already grounded in a phylogenetic framework as social tolerance matches the phylogenetical tree of these macaque species, suggesting a biological ground of these behavioral observations. Given the modest sample size and uneven species representation, we opted not to adopt tools such as Phylogenetic Generalized Least Squares (PGLS) in our analysis. Our primary aim in this study was to explore neuroanatomical variation as a function of social traits, not to perform a phylogenetic comparative analysis per see. That said, we now explicitly acknowledge this limitation in the Discussion and indicate that future work using larger datasets and phylogenetic methods will be essential to disentangle social effects from evolutionary relatedness. We hope that making our dataset openly available will facilitate such futures analyses.

      With respect to nurture, it is likely more complicated: one needs to take into account the idiosyncrasies of the life of the individual. For example, some of the cited literature in humans or macaques suggests that the bigger the social network, the bigger the brain structure considered. Right, but this finding is at the individual level with a documented life history. Do we have any of this information for any of the individuals considered (this is likely out of the scope of this paper to look at this, especially for individuals that did not originate from CdP)?

      We appreciate this insightful observation. Indeed, findings from studies in humans and nonhuman primates showing associations between brain structure and social network size typically rely on detailed life history and behavioral data at the individual level. Unfortunately, such finegrained information was not consistently available across our entire sample. While some individuals from the Centre de Primatologie (CdP) were housed in known group compositions and social settings, we did not have access to longitudinal social data—such as rank, grooming rates, or network centrality—that would allow for robust individual-level analyses. We now acknowledge this limitation more clearly in the Discussion (lines 436-443), and we fully agree that future work combining neuroimaging with systematic behavioral monitoring will be necessary to explore how species-level effects interact with individual social experience.

      (3) Issue of the discussion of the amygdala's function

      The entire discussion/goal of the paper, states that the amygdala is connected to social life. Yet, before being a "social center", the amygdala has been connected to the emotional life of humans and non-humans alike. The authors state L333/34 that "These findings challenge conventional expectations of the amygdala's primary involvement in emotional processes and highlight the complexity of the amygdala's role in social cognition". First, there is no dichotomy between social cognition and emotion. Emotion is part of social cognition (unless we and macaques are robots). Second, there is nowhere in the paper a demonstration that the differences highlighted here are connected to social cognition differences per se. For example, the authors have not tested, say, if grade 4 species are more afraid of snakes than grade 1 species. If so, one could predict they would also have a bigger amygdala, and they would probably also find it in the model. My point is not that the authors should try to correlate any kind of potential aspect that has been connected to the amygdala in the literature with their data (see for example the nice review by DomínguezBorràs and Vuilleumier, https://doi.org/10.1016/B978-0-12-823493-8.00015-8), but they should refrain from saying they have challenged a particular aspect if they have not even tested it. I would rather engage the authors to try and discuss the amygdala as a multipurpose center, that includes social cognition and emotion.

      We thank the reviewer for this important and nuanced point. We have revised the manuscript to adopt a more cautious and integrative tone regarding the function of the amygdala. In the revised Discussion (lines 341-355), we now explicitly state that the amygdala is involved in a broad range of processes—emotional, social, and affective—and that these domains are deeply intertwined. Rather than proposing a strict dissociation, we now suggest that the amygdala supports integrated socio-emotional functions that are mobilized differently across social tolerance styles. We also cite recent relevant literature (e.g., Domínguez-Borràs & Vuilleumier, 2021) to support this view and have removed any claim suggesting we challenge the emotional function of the amygdala per se. Our aim is to contribute to a richer understanding of how affective and social processes co-construct structural variation in this region.

      Strengths:

      Methods & breadth of species tested.

      Weaknesses:

      Interpretation, which can be described as 'oriented' and should rather offer additional views.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Private Comments:

      (1) Table 1 should be formatted for clarity i.e., bolded table headers, text realignment, and spacing. It was not clear at first glance how information was organized. It may also be helpful to place behavioral traits as the first column, seeing that these traits feed into the author's defined cognitive requirements.

      We have reformatted Table 1 to improve clarity and readability. Behavioral traits now appear in the first column, followed by cognitive dimensions and hypothesized neural correlates. Column headers have been bolded and alignment has been standardized.

      (2) Figures could include more detail to help with interpretations. For example, Figure 3 should define values included on the x-axis in the figure caption, and Figure 4 should explain the use of line, light color, and dark color. Figure 1 does not have a y-axis title.

      The figures have been revised and legends completed to ensure more clarity.

      (3) Please proofread for typos throughout.

      The manuscript has been carefully proofread, and all typographical and grammatical errors have been corrected. These changes are visible in the tracked version.

      Reviewer #2 (Recommendations for the authors):

      Specific comments:

      (1) Given all of the variability would it not be a good idea to just compare (eg in the supplemental) the macaque data from just the Strasbourg centre for m mulatta and m toneanna. I appreciate the ns will be lower, but other matters are more standardized.

      We fully understand the reviewer’s suggestion to restrict the comparison to data collected at a single site in order to minimize inter-site variability. However, as noted, such an analysis would come at the cost of statistical power, as the number of individuals per species within a single center is small. For example, while M. tonkeana is well represented at the Strasbourg centre, only one individual of M. mulatta is available from the same site. Thus, a restricted comparison would severely limit the interpretability of results, particularly for age-related trajectories. To address variability, we included acquisition site and brain preservation method as covariates or predictors where appropriate, and we have been cautious in our interpretations. We also now emphasize in the Methods and Discussion the value of future datasets with more standardized acquisition protocols across species and centers. We hope that by openly sharing our data and workflow, we can contribute to this broader goal.

      (2) I have various minor edits:

      (a) L 25 abstract - Specify what is meant by 'opposite trend'; the reader cannot infer what this is.

      Modified in line 25-28: “Unexpectedly, tolerant species exhibited a decrease in relative amygdala volume across the lifespan, contrasting with the age-related increase observed in intolerant species—a developmental pattern previously undescribed in primates.”

      (b) L67 - The reference 'Manyprimates' needs fixing as it does in the references section.

      After double checking, Manyprimates studies are international collaborative efforts that are supposed to be cite this way (https://manyprimates.github.io/#pubs).

      (c) L74 - Taking not Taken.

      This typo has been corrected.

      (d) L129 - It says 'total volume', but this is corrected total volume?

      We have clarified in the figures legends that the “total brain volume” used in our analyses excludes the cerebellum and the myelencephalon, as specified in our image preprocessing protocol. This ensures consistency across individuals and institutions.

      (e) L138 - Suddenly mentions 'frozen condition' without any prior explanation - this needs explaining in the legend - also L144.

      We have added an explanation of the ‘frozen condition’ variable in in the relevant figure legend.

      (f) L166 - Results - it would be helpful to remind readers what Grade 1 signifies, ie intolerant species.

      We now include a brief reminder in the Results section that Grade 1 corresponds to socially intolerant species, to help readers unfamiliar with the classification (Lines 240-251).

      (g)Figure 4 - Provide the ns for each of the 4 grades to help appreciate the meaningfulness of the curves, etc.

      The number of subjects has been added to the Figure and a novel analysis helps in the revised ms help to appreciate the meaningfulness of some of these curves.

      (h) L235 - 'we had assumed that species of high social tolerance grade would have presented a smaller amygdala in size compared to grade 1'. But surely this is the exact opposite of what is predicted in Table 1 - ie, the authors did not predict this as I read the paper (Unless Table l is misleading/ambiguous and needs clarification).

      As discussed in our response to Reviewer #2 and #3, we have restructured both Table 1 and the Discussion to ensure consistency. We now explicitly state that the findings diverge from our initial inhibitory-control-based prediction and propose alternative interpretations based on sociocognitive demands.

      (i) L270 - 'This observation' which?? Specify.

      We have replaced ‘this observation’ with a precise reference to the observed developmental decrease in amygdala volume in tolerant species.

      (j) L327 - 'groundbreaking' is just hype given that there are so many caveats - I personally do not like the word - novel is good enough.

      We have replaced the word ‘groundbreaking’ with ‘novel’ to adopt a more measured and appropriate tone in the discussion.

      (3) I might add that I am happy with the ethics regarding this study. 

      Thanks, we are also happy that we were able to study macaque brains from different species using opportunistic samplings along with already available data. We are collectively making progress on this!

      (4) Finally, I should commend the authors on all the additional information that they provide re gender/age/species. Given that there are 2xs are many females as males, it would be good to know if this affects the findings. I am not a primatologist, so I don't know, for example, if the females in Grade 1 monkeys are just as intolerant as the males?

      We thank the reviewer for this thoughtful comment. We now explicitly mention the female-biased sex ratio in the Methods section and report in the Results (Figure 2, Figure 3) that sex was included as a covariate in our Bayesian models. While a small effect of sex was found for hippocampal volume, no effect was observed for the amygdala. Given the strong imbalance in our dataset (2:1 female-to-male ratio), we refrained from drawing any conclusion about sex-specific patterns, as these would require larger and more balanced samples. Although we did not test for sex-by-grade interactions, we agree that this question—especially regarding whether females and males express social style differences similarly across grades—represents an important direction for future comparative work.

      Reviewer #3 (Recommendations for the authors):

      I found the article well-written, and very easy to follow, so I have little ways to propose improvements to the article to the authors, besides addressing the various major points when it comes to interpretation of the data.

      One list I found myself wanting was in fact the list of the social tolerance grades, and the process by which they got selected into 3 main bags of socio-cognitive skills. Then it would become interesting to see how each of the 12 species compares within both the 18 grades (maybe once again out of the scope of this paper, there are likely reviews out there that already do that, but then the authors should explicitly mention so in the paper: X, 19XX have compared 15 out of 18 traits in YY number of macaque species); and within the 3 major subcognitive requirements delineated by the authors, maybe as an annex?

      We thank the reviewer for this thoughtful suggestion. In the revised manuscript, we now include a detailed table (Table 1) that lists the 18 behavioral traits derived from Thierry’s framework, along with their associated cognitive dimension and hypothesized neuroanatomical correlate. While we did not create a matrix mapping each of the 12 species across all 18 traits due to space and data availability constraints, we agree this is an important direction that should be tackled by primatologist. We now include a sentence (line 87-90) in the manuscript to guide readers to previous comparative reviews (e.g., Thierry, 2000; Thierry et al., 2004, 2021) that document the expression of these traits across macaque species. We also clarify that our three cognitive categories are conceptual tools intended to structure neuroanatomical predictions, and not formal clusters derived from quantitative analyses.

      In the annex, it would also be good to have a general summarizing excel/R file for the raw data, with important information like age, sex, and the relevant calculated volumes for each individual. The folders available following the links do not make it an easy task for a reader to find the raw data in one place.

      We fully agree with the reviewer on the importance of data accessibility. We have now uploaded an additional supplementary file in .csv format on our OSF repository, which includes individuallevel metadata for all 42 macaques: species, sex, age, social grade, total brain volume, amygdala volume, and hippocampus volume. The link to this file is now explicitly mentioned in the Data Availability section. We hope this will facilitate comparisons with other datasets and improve usability for the community. In addition, we provide in a supplementary table the raw data that were used for our Bayesian modelling (see below).

      The availability of the raw data would also clear up one issue, which I believe results from the modelling process: it looks odd on Figure 2, that volume ratios, defined as the given brain area volume divided by the total brain volume, give values above 1 (especially for the hippocampus). As such, the authors should either modify the legend or the figure. In general, it would be nicer to have the "real values" somewhere easily accessible, so that they can be compared more broadly with: 1) other macaques species to address questions relevant to the species; 2) other primates to address other questions that are surely going to arise from this very interesting work!

      We thank the reviewer for pointing this out. The ratio values in Figure 1 correspond to the proportion of the regional volume (amygdala or hippocampus) relative to the total brain volume, excluding the cerebellum and myelencephalon. As such, values above 0.01 (i.e., above 1% of the brain volume) are expected for these structures and do not indicate an error. We have updated the figure legend to clarify this point explicitly. In addition, we have now made a cleaned .csv file available via OSF, containing all raw volumetric data and metadata in a format that facilitates cross-species or cross-study comparisons. This replaces the previous folder-based structure, which may have been less accessible.

      Typos:

      L233: delete 'in'

      L430: insert space in 'NMT template(Jung et al., 2021).'

    1. eLife Assessment

      The current work uses DNA-tethered motor trapping to reduce vertical forces and improve datasets for kinesin-1 motility under load. The evidence is compelling and the significance is important to the kinesin field. Kinesin-1 is more robust and less prone to premature detachment than previously indicated. This represents a significant advancement in the field and is generally applicable to work with optical tweezers.

    2. Reviewer #1 (Public review):

      Summary:

      The manuscript by Hensley and Yildez studies the mechanical behavior of kinesin under conditions where the z-component of the applied force is minimized. This is accomplished by tethering the kinesin to the trapped bead with a long double stranded DNA segment as opposed to directly binding the kinesin to the large bead. It complements several recent studies that have used different approaches to looking at the mechanical properties of kinesin under low z-force loads. The study shows that much of the mechanical information gleaned from the traditional "one bead" with attached kinesin approach was probably profoundly influenced by the direction of the applied force. The authors speculate that when moving small vesicle cargos (particularly membrane bound ones) the direction of resisting force on the motor has much less of a z-component than might be experience if the motor were moving large organelles like mitochondria.

      Strengths:

      The approach is sound and provides an alternative method to examine the mechanics of kinesin under conditions where the z-component of the force is lessened. The data show that kinesin has very different mechanical properties compared to those extensively reported with using the "single-bead" assay where the molecule is directly coupled to a large bead which is then trapped.

      Weaknesses:

      The sub stoichiometry binding of kinesins to the multivalent DNA complicates the interpretation of the data.

      Comments on revisions:

      The authors have addressed my concerns.

    3. Reviewer #2 (Public review):

      This short report by Hensley and Yildiz explores kinesin-1 motility under more physiological load geometries than previous studies. Large Z-direction (or radial) forces are a consequence of certain optical trap experimental geometries, and likely do not occur in the cell. Use of a long DNA tether between the motor and the bead can alleviate Z-component forces. The authors perform three experiments. In the first, they use two assay geometries - one with kinesin attached directly to a bead and the other with kinesin attached via a 2 kbp DNA tether - with a constant-position trap to determine that reducing the Z component of force leads to a difference in stall time but not stall force. In the second, they use the same two assay geometries with a constant-force trap to replicate the asymmetric slip bond of kinesin-1; reducing the Z component of force leads to a small but uniform change in the run lengths and detachment rates under hindering forces but not assisting forces. In the third, they connect two or three kinesin molecules to each DNA, and measure a stronger scaling in stall force and time when the Z component of force is reduced. They conclude that kinesin-1 is a more robust motor than previously envisaged, where much of its weakness came from the application of axial force. If forces are instead along the direction of transport, kinesin can hold on longer and work well in teams. The experiments are rigorous, and the data quality is very high. There is little to critique or discuss. The improved dataset will be useful for modeling and understanding multi-motor transport. The conclusions complement other recent works that used different approaches to low-Z component kinesin force spectroscopy, and provide strong value to the kinesin field.

      Comments on revisions:

      The authors have satisfied all of my comments. I commend them on an excellent paper.

    4. Reviewer #3 (Public review):

      Hensley et al. present an important study into the force-detachment behaviour of kinesin-1, using a newly adapted methodological approach. This new method of DNA-tethered motor trapping is effective in reducing vertical forces and can be easily optimised for other motors and protein characterisation. The major strength of the paper is characterising kinesin-1 under low z-forces, which is likely to reflect the physiological scenario. They find kinesin-1 is more robust and less prone to premature detachment. The motors exhibit higher stall rates and times. Under hindering and assisting loads, kinesin-1 detachment is more asymmetric and sensitive, and with low z-force shows that slip-behaviour kinetics prevail. Another achievement of this paper is the demonstration of the multi-motor kinesin-1 assay using their low-z force method, showing that multiple kinesin-1 motors are capable of generating higher forces (up to 15 pN, and nearly proportional to motor number), thus opening an avenue to study multiple motor coordination. Overall, the data have been collected in a rigorous manner, the new technique is sound and effective, and results presented are compelling.

    5. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Recommendations for the authors):

      (1) My primary concern is that in some of the studies, there are not enough data points to be totally convincing. This is particularly apparent in the low z-force condition of Figure 1C.

      We agree that adequate sampling is essential for drawing robust conclusions. To address this concern, we performed a post hoc sensitivity analysis to assess the statistical power of our dataset. Given our sample sizes (N = 85 and 45) and observed variability, the experiment had 80% power (α = 0.05) to detect a difference in stall force of approximately 0.36 pN (Cohen’s d ≈ 0.38). The actual difference observed between conditions was 0.25 pN (d ≈ 0.26), which lies below the minimum detectable effect size. Thus, the non-significant result (p = 0.16) likely reflects that any true difference, if present, is smaller than the experimental sensitivity, rather than a lack of sufficient sampling.

      Importantly, both measured stall forces fall within the reported range for kinesin-1 in the literature, supporting that the dataset is representative and the measurements are reliable.

      (2) I'm also concerned about Figure 2B. Does each data point in the three graphs represent only a single event? If so, this should probably be repeated several more times to ensure that the data are robust.

      Each data point shown corresponds to the average of many processive runs, ranging from 32 to 167. This has been updated in the figure caption accordingly.

      (3) Figure 3. I'm surprised that the authors could not obtain a higher occupancy of the multivalent DNA tether with kinesin motors. They were adding up to a 30X higher concentration of kinesin, but still did not achieve stoichiometric labeling. The reasons for this should be discussed. This makes interpretation of the mechanical data much tougher. For instance, only 6-7% of the beads would be driven by three kinesins. Unless the movement of hundreds of beads were studied, I think it would be difficult to draw any meaningful insight, since most of the events would be reflective of beads with only one or sometimes two kinesins bound. I think more discussion is required to describe how these data were treated.

      The mass-photometry data in Figure 3B were acquired in the presence of a 3-fold molar excess of kinesin (Supplemental Figure 4) relative to the DNA chassis. In comparison, optical trapping studies were performed at a 10-20-fold molar excess of kinesin, resulting in a substantially higher percentage of chassis with multiple motors. The reason why we had to perform mass photometry measurements at lower molar excess than the optical trap is that at higher kinesin concentrations, the “kinesin-only” peak dominated and obscured 2- or 3-kinesin-bound species, preventing reliable fitting of the mass photometry data. 

      We have now used the mass photometry measurements to extrapolate occupancies under trapping conditions. We estimate 76-93% of 2-motor chassis are bound to two kinesins and ~70% of 3-motor chassis are bound to three kinesins under our trapping conditions. Moreover, the mean forces in Figures 3C–D exceed those expected for a single kinesin, consistent with occupancy substantially greater than one motor per chassis.

      We wrote: “To estimate the percentage of chassis with two and three motors bound, we performed mass photometry measurements at a 3-fold molar excess of kinesin to the chassis, as higher ratios would obscure the distinction of complexes from the kinesin-only population. Assuming there is no cooperativity among the binding sites, we modeled motor occupancy using a Binomial distribution (Figure 3_figure supplement 2). We observed 17-29% of particles corresponded to the two-motor species on the 2-motor chassis in mass photometry, indicating that 45-78% of the 2-motor chassis was bound to two kinesins. Similarly, 15% and 40% of the 3motor chassis were bound to two and three kinesins, respectively.  

      In optical trapping assays, we used 10-fold and 20-fold molar excess of kinesin for 2-motor and 3-motor chassis, respectively, to substantially increase the percentage of the chassis carried by multiple kinesins. Under these conditions, we estimate 76-93% of the 2-motor chassis were bound to two kinesins, and 30% and 70% of 3-motor chassis were bound to two and three kinesins, respectively.”

      “Multi-motor trapping assays were performed similarly using 10x and 20x kinesin for 2- and 3motor chassis, respectively. To estimate the percentage of chassis with multiple motors, we used the probability of kinesin binding to a site on a chassis from mass photometry in 3x excess condition to compute an effective dissociation constant where r is the molar ratio of kinesin to chassis. Single-site occupancy at higher molar excesses of kinesin was calculated using this parameter. ”

      We also added Figure 3_figure supplement 2 to explain our Binomial model.

      (4) Page 5, 1st paragraph. Here, the authors are comparing time constants from stall experiments to data obtained with dynein from Ezber et al. This study used the traditional "one bead" trapping approach with dynein bound directly to the bead under conditions where it would experience high z-forces. Thus, the comparison between the behavior of kinesin at low z-forces is not necessarily appropriate. Has anyone studied dynein's mechanics under low z-force regimes?

      We thank the reviewer for catching a citation error. The text has been corrected to reference Elshenawy et al. 2020, which reported stall time constants for mammalian dynein. 

      To our knowledge, dynein’s mechanics under explicitly low z-force conditions have not yet been reported; however, given the more robust stalling behavior of dynein and greater collective force generation, the cited paper was chosen to compare low z-force kinesin to a motor that appears comparatively unencumbered by z-forces. Our study adds to growing evidence that high z-forces disproportionately limit kinesin performance. 

      For clarification, we modified that sentence as follows: “These time constants are comparable to those reported for minus-end-directed dynein under high z-forces”.

      Reviewer #2 (Recommendations for the authors):

      (1) P3 pp2, a DNA tensiometer cannot control the force, but it can measure it; get the distance between the two ends of the tensiometer, and apply WLC.

      The text has been updated to more accurately reflect the differences between optical trapping and kinesin motility against a DNA tensiometer with a fixed lattice position.

      (2) Fig. 2b, SEM is a poor estimate or error for exponentially distributed run lengths. Other methods, like bootstrapping an exponential distribution fit, may provide a more realistic estimate.

      Run lengths were plotted as an inverse cumulative distribution function and fitted to a single exponential decay (Supplementary Figure S3). The plotted value represents the fitted decay constant (characteristic run length) ± SE (standard error of the fit), not the arithmetic mean ± SEM. Velocity values are reported as mean ± SEM. Detachment rate was computed as velocity divided by run length, except at 6 and 10 pN hindering loads, where minimal forward displacement necessitated fitting run-time decays directly. In those cases, the plotted detachment rate equals the inverse of the fitted time constant. The figure caption has been updated accordingly.

      (3) Kinesin-1 is covalently bound to a DNA oligo, which then attaches to the DNA chassis by hybridization. This oligo is 21 nt with a relatively low GC%. At what force does this oligo unhybridize? Can the authors verify that their stall force measurements are not cut short by the oligo detaching from the chassis?

      The 21-nt attachment oligo (38 % GC) is predicted to have ΔG<sub>37C</sub> ≈-25 kcal/mole or approximately 42 kT. If we assume this is the approximate amount of work required to unhybridize the oligo, we would expect the rupture force to be >15 pN. This significantly exceeds the stall force of a single kinesin. Since the stalling events rarely exceed a few seconds, it is unlikely that our oligos quickly detach from the chassis under such low forces.  

      Furthermore, optical trapping experiments are tuned such that no more than 30% of beads display motion within several minutes after they are brought near microtubules. After stalling events, the motor dissociates from the MT, and the bead snaps back to the trap center. Most beads robustly reengage with the microtubule, typically within 10 s, suggesting that the same motor chassis reengages with the microtubule after microtubule detachment. Successive runs of the same bead typically have similar stall forces, suggesting that the motors do not disengage from the chassis under resistive forces exerted by the trap.

      (4) Figure 1, a justification or explanation should be provided for why events lower than 1.5 pN were excluded. It appears arbitrary.

      Single-motor stall-force measurements used a trap stiffness of 0.08–0.10 pN/nm. At this stiffness, a 1.5 pN force corresponds to 15–19 nm bead displacement, roughly two kinesin steps, and events below this threshold could not be reliably distinguished from Brownian noise. For this reason, forces < 1.5 pN were excluded.

      In Methods, we wrote “Only peak forces above 1.5 pN (corresponding to a 15-19 nm bead displacement) were analyzed to clearly distinguish runs from the tracking noise.”

      (5) Figure 2b, is the difference in velocity statistically significant?

      The difference in velocity is statistically significant for most conditions. We did not compare velocities for -10 and -6 pN as these conditions resulted in little forward displacement. However, the p-values for all of the other conditions are -4 pN: 0.0026, -2 pN: 0.0001, -1 pN: 0.0446, +0.5 pN: 0.3148, +2 pN: 0.0001, +3 pN: 0.1191, +4 pN: 0.0004.

      (6) The number of measurements for each experimental datapoint in the corresponding figure caption should be provided. SEM is used without, but N is not reported in the caption.

      Figure captions have now been updated to report the number of trajectories (N) for each data point.

      Reviewer #3 (Recommendations for the authors):  

      (1) The method of DNA-tethered motor trapping to enable low z-force is not entirely novel, but adapted from Urbanska (2021) for use in conventional optical trapping laboratories without reliance on microfluidics. However, I appreciate that they have fully established it here to share with the community. The authors could strengthen their methods section by being transparent about protein weight, protein labelling, and DNA ladders shown in the supplementary information. What organism is the protein from? Presumably human, but this should be specified in the methods. While the figures show beautiful data and exemplary traces, the total number of molecules analysed or events is not consistently reported. Overall, certain methodological details should be made sufficient for reproducibility.

      We appreciate the reviewer’s attention to methodological clarity. The constructs used are indeed human kinesin-1, KIF5B. The Methods now specify protein origin, molecular weights, and labeling details, and all figure captions report the number of trajectories analyzed to ensure reproducibility.

      (2) The major limitation the study presents is overarching generalisability, starting with the title. I recommend that the title be specific to kinesin-1. 

      The title has been revised to specify kinesin-1. 

      The study uses two constructs: a truncated K560 for conventional high-force assays, and full-length Kif5b for the low z-force method. However, for the multi-motor assay, the authors use K560 with the rationale of preventing autoinhibition due to binding with DNA, but that would also have limited characterisation in the single-molecule assay. Overall, the data generated are clear, high-quality, and exciting in the low z-force conditions. But why have they not compared or validated their findings with the truncated construct K560? This is especially important in the force-feedback experiments and in comparison with Andreasson et al. and Carter et al., who use Drosophila kinesin-1. Could kinesin-1 across organisms exhibit different force-detachment kinetics? It is quite possible. 

      Construct choice was guided by physiological relevance and considerations of autoinhibition: K560 was used for high z-force single-motor assays. The results of these assays are consistent with conventional bead assays performed by Andreasson et al. and Carter et al. using kinesin from a different organism. Therefore, we do not believe there are major differences between force properties of Drosophila and human kinesin-1.

      For low z-force assays, we used full-length KIF5B, which has nearly identical velocity and stall force to K560 in standard bead assays. We used this construct for low z force assays because it has a longer and more flexible stalk than K560 and better represents the force behavior of kinesin under physiological conditions. We then used constitutively-active K560 motors for multi-motor experiments to avoid potential complications from autoinhibition of full-length kinesin.

      Similarly, the authors test backward slipping of Kif5b and K560 and measure dwell times in multi-motor assays. Why not detail the backward slippage kinetics of Kif5b and any step-size impact under low z-forces? For instance, with the traces they already have, the authors could determine slip times, distances, and frequency in horizontal force experiments. Overall, the manuscript could be strengthened by analysing both constructs more fully.

      Slip or backstep analyses were not performed on single-motor data because such events were rare; kinesin typically detached rather than slipped. In contrast, multi-motor assays exhibited frequent slip events corresponding to the detachment of individual motors, which were analyzed in detail.

      We wrote “In comparison, slipping events were rarely observed in beads driven by a single motor, suggesting that kinesin typically detaches rather than slipping back on the microtubule under hindering loads.”

      Appraisal and impact:

      This study contributes to important and debated evidence on kinesin-1 force-detachment kinetics. The authors conclude that kinesin-1 exhibits a slip-bond interaction with the microtubule under increasing forces, while other recent studies (Noell et al. and Kuo et al.), which also use low z-force setups, conclude catch-bond behaviour under hindering loads. I find the results not fully aligned with their interpretation. The first comparison of low zforces in their setup with Noell et al. (2024), based on stall times, does not hold, because it is an apples-to-oranges comparison. Their data show a stall time constant of 2.52 s, which is comparable to the 3 s reported by Noell et al., but the comparison is made with a weighted average of 1.49 s. The authors do report that detachment rates are lower in low z-force conditions under unloaded scenarios. So, to completely rule out catch-bond-like behaviour is unfair. That said, their data quality is good and does show that higher hindering forces lead to higher detachment rates. However, on closer inspection, the range of 0-5 pN shows either a decrease or no change in detachment rate, which suggests that under a hindering force threshold, catch-bond-like or ideal-bond-like behaviour is possible, followed by slipbond behaviour, which is amazing resolution. Under assisting loads, the slip-bond character is consistent, as expected. Overall, the study contributes to an important discussion in the biophysical community and is needed, but requires cautious framing, particularly without evidence of motor trapping in a high microtubule-affinity state rather than genuine bond strengthening.

      We are not completely ruling out the catch bond behavior in our manuscript. As the reviewer pointed out, our results are consistent with the asymmetric slip bond model, whereas DNA tensiometer assays are more consistent with the catch bond behavior. The advantage of our approach is the capability to directly control the magnitude and direction of load exerted on the motor in the horizontal axis and measure the rate at which the motor detaches from the microtubule as it walks under constant load. In comparison, DNA tensiometer assays cannot control the force, but measure the time it takes the motor to fall off from the microtubule after a brief stall. The extension of the DNA tether is used to estimate the force exerted on the motor during a stall in those assays. The slight disadvantage of our method is the presence of low zforces, whereas DNA tensiometer assays are expected to have little to no z-force. We wrote that the discrepancy between our results can be attributed to the presence of low z forces in our DNA tethered trapping assembly, which may result in a higher-than-normal detachment rate under high hindering loads, thereby resulting in less asymmetry in the force detachment kinetics. We also added that this discrepancy can be addressed by future studies that directly control and measure horizontal force and measure the motor detachment rate in the absence of z forces. Optical trapping assays with small nanoparticles (Sudhakar et al. Science 2021) may be well suited to conclusively reveal the bond characteristics of kinesin under hindering loads.

      Reviewing Editor Comments:

      The reviewers are in agreement with the importance of the findings and the quality of the results. The use of the DNA tether reduces the z-force on the motor and provides biologically relevant insight into the behavior of the motor under load. The reviewers' suggestions are constructive and focus on bolstering some of the data points and clarifying some of the methodological approaches. My major suggestion would be to clarify the rationale for concluding that kinesin-1 exhibits slip-bond behavior with increasing force in light of the work of Noell (10.1101/2024.12.03.626575) and Kuo et al (2022 10.1038/s41467022-31069-x), both of which take advantage of DNA tethers.

      Please see our response to the previous comment. In the revised manuscript, we first clarified that our results are in agreement with previous theoretical (Khataee & Howard, 2019) and experimental studies (Kuo et al., 2022; Noell et al., 2024; Pyrpassopoulos et al., 2020) that kinesin exhibits slower detachment under hindering load. This asymmetry became clear when the z-force was reduced or eliminated. 

      We clarified the differences between our results and DNA tensiometer assays and provided a potential explanation for these discrepancies. We also proposed that future studies might be required to fully distinguish between asymmetric slip, ideal, or catch bonding of kinesin under hindering loads.

      We wrote:

      “Our results agree with the theoretical prediction that kinesin exhibits higher asymmetry in force-detachment kinetics without z-forces (Khataee & Howard, 2019), and are consistent with optical trapping and DNA tensiometer assays that reported more persistent stalling of kinesin in the absence of z-forces (Kuo et al., 2022; Noell et al., 2024; Pyrpassopoulos et al., 2020).

      Force-detachment kinetics of protein-protein interactions have been modeled as either a slip, ideal, or catch bond, which exhibit an increase, no change, or a decrease in detachment rate, respectively, under increasing force (Thomas et al., 2008). Slip bonds are most commonly observed in biomolecules, but studies on cell adhesion proteins reported a catch bond behavior (Marshall et al., 2003). Although previous trapping studies of kinesin reported a slip bond behavior (Andreasson et al., 2015; Carter & Cross, 2005), recent DNA tensiometer studies that eliminated the z-force showed that the detachment rate of the motor under hindering forces is lower than that of an unloaded motor walking on the microtubule (Kuo et al., 2022; Noell et al., 2024), consistent with the catch bond behavior. Unlike these reports, we observed that the stall duration of kinesin is shorter than the motor run time under unloaded conditions, and the detachment rate of kinesin increases with the magnitude of the hindering force. Therefore, our results are more consistent with the asymmetric slip bond behavior. The difference between our results and the DNA tensiometer assays (Kuo et al., 2022; Noell et al., 2024) can be attributed to the presence of low z-forces in our DNA-tethered optical trapping assays, which may increase the detachment rate under high hindering forces. Future studies that could directly control hindering forces and measure the motor detachment rate in the absence of z-forces would be required to conclusively reveal the bond characteristics of kinesin under hindering loads.”

    1. eLife Assessment

      This paper undertakes an important investigation to determine whether movement slowing in microgravity is due to a strategic conservative approach or rather due to an underestimation of the mass of the arm. The experimental dataset is unique, the coupled experimental and computational analyses comprehensive, and the effect is strong. However, the authors present incomplete results to support the claim that movement slowing is due to mass underestimation. Further analysis is needed to rule out alternative explanations.

    2. Reviewer #1 (Public review):

      The authors have conducted substantial additional analyses to address the reviewers' comments. However, several key points still require attention. I was unable to see the correspondence between the model predictions and the data in the added quantitative analysis. In the rebuttal letter, the delta peak speed time displays values in the range of [20, 30] ms, whereas the data were negative for the 45{degree sign} direction. Should the reader directly compare panel B of Figure 6 with Figure 1E? The correspondence between the model and the data should be made more apparent in Figure 6. Furthermore, the rebuttal states that a quantitative prediction was not expected, yet it subsequently argues that there was a quantitative match. Overall, this response remains unclear.

      A follow-up question concerns the argument about strategic slowing. The authors argue that this explanation can be rejected because the timing of peak speed should be delayed, contrary to the data. However, there appears to be a sign difference between the model and the data for the 45{degree sign} direction, which means that it was delayed in this case. Did I understand correctly? In that regard, I believe that the hypothesis of strategic slowing cannot yet be firmly rejected and the discussion should more clearly indicate that this argument is based on some, but not all, directions. I agree with the authors on the importance of the mass underestimation hypothesis, and I am not particularly committed to the strategic slowing explanation, but I do not see a strong argument against it. If the conclusion relies on the sign of the delta peak speed, then the authors' claims are not valid across all directions, and greater caution in the interpretation and discussion is warranted. Regarding the peak acceleration time, I would be hesitant to draw firm conclusions based on differences smaller than 10 ms (Figures R3 and 6D).

      The authors state in the rebuttal that the two hypotheses are competing. This is not accurate, as they are not mutually exclusive and could even vary as a function of movement direction. The abstract also claims that the data "refutes" strategic slowing, which I believe is too strong. The main issue is that, based on the authors' revised manuscript, the lack of quantitative agreement between the model and the data for the mass underestimation hypothesis is considered acceptable because a precise quantitative match is not expected, and the predictions overall agree for some (though not all) directions and phases (excluding post-in). That is reasonable, but by the same logic, the small differences between the model prediction and the strategic slowing hypothesis should not be taken as firm evidence against it, as the authors seem to suggest. In practice, I recommend a more transparent and cautious interpretation to avoid giving readers the false impression that the evidence is decisive. The mass underestimation hypothesis is clearly supported, but the remaining aspects are less clear, and several features of the data remain unexplained.

    3. Reviewer #2 (Public review):

      This study explores the underlying causes of the generalized movement slowness observed in astronauts in weightlessness compared to their performance on Earth. The authors argue that this movement slowness stems from an underestimation of mass rather than a deliberate reduction in speed for enhanced stability and safety.

      Overall, this is a fascinating and well-written work. The kinematic analysis is thorough and comprehensive. The design of the study is solid, the collected dataset is rare, and the model adds confidence to the proposed conclusions.

      Compared to the previous version, the authors have thoroughly addressed my concerns. The model is now clear and well-articulated, and alternative hypotheses have been ruled out convincingly. The paper is improved and suitable for publication in my opinion, making a significant contribution to the field.

      Strengths:

      - Comprehensive analysis of a unique data set of reaching movement in microgravity<br /> - Use of a sensible and well-thought experimental approach<br /> - State-of-the-art analyses of main kinematic parameter<br /> - Computational model simulations of arm reaching to test alternative hypotheses and support the mass underestimation one

      This work has no major weakness as it stands, and the discussion provides a fair evaluation of the findings and conclusions.

    4. Reviewer #3 (Public review):

      Summary:

      The authors describe an interesting study of arm movements carried out in weightlessness after a prolonged exposure to the so-called microgravity conditions of orbital spaceflight. Subjects performed radial point-to-point motions of the fingertip on a touch pad. The authors note a reduction in movement speed in weightlessness, which they hypothesize could be due to either an overall strategy of lowering movement speed to better accommodate the instability of the body in weightlessness or an underestimation of body mass. They conclude for the latter, mainly based on two effects. One, slowing in weightlessness is greater for movement directions with higher effective mass at the end effector of the arm. Two, they present evidence for increased number of corrective submovements in weightlessness. They contend that this provides conclusive evidence to accept the hypothesis of an underestimation of body mass.

      Strengths:

      In my opinion, the study provides a valuable contribution, the theoretical aspects are well presented through simulations, the statistical analyses are meticulous, the applicable literature is comprehensively considered and cited and the manuscript is well written.

      Weaknesses:

      I nevertheless am of the opinion that the interpretation of the observations leaves room for other possible explanations of the observed phenomenon, thus weakening the strength of the arguments.

      To strengthen the conclusions, I feel that the following points would need to be addressed:

      (1) The authors model the movement control through equations that derive the input control variable in terms of the force acting on the hand and treating the arm as a second-order low pass filter (Eq. 13). Underestimation of the mass in the computation of a feedforward command would lead to a lower-than-expected displacement to that command. But it is not clear if and how the authors account for a potential modification of the time constants of the 2nd order system. The CNS does not effectuate movements with pure torque generators. Muscles have elastic properties that depend on their tonic excitation level, reflex feedback and other parameters. Indeed, Fisk et al.* showed variations of movement characteristics consistent with lower muscle tone, lower bandwidth and lower damping ratio in 0g compared to 1g. Could the variations in the response to the initial feedforward command be explained by a misrepresentation of the limbs damping and natural frequency, leading to greater uncertainty to the consequences of the initial command. This would still be an argument for un-adapted feedforward control of the movement, leading to the need for more corrective movements. But it would not necessarily reflect an underestimation of body mass.

      *Fisk, J. O. H. N., Lackner, J. R., & DiZio, P. A. U. L. (1993). Gravitoinertial force level influences arm movement control. Journal of neurophysiology, 69(2), 504-511.

      While the authors attempt to differentiate their study from previous studies where limb neuromechanical impedance was shown to be modified in weightlessness by emphasizing that in the current study the movements were rapid and the initial movement is "feedforward". But this incorrectly implies that the limb's mechanical response to the motor command is determined only by active feedback mechanisms. In fact:

      (a) All commands to the muscle pass through the motor neurons. These neurons receive descending activations related not only to the volitional movement, but also to the dynamic state of the body and the influence of other sensory inputs, including the vestibular system. A decrease in descending influences from the vestibular organs will lower the background sensitivity to all other neural influences on the motor neuron. Thus, the motor neuron may be less sensitive to the other volitional and reflexive synaptic inputs that it may receive.

      (b) Muscle tone plays a significant role in determining the force and the time course of the muscle contraction. In a weightless environment, where tonic muscle activity is likely to be reduced, there is the distinct possibility that muscles will react more slowly and with lower amplitude to an otherwise equivalent descending motor command, particularly in the initial moments before spinal reflexes come into play. These, and other neuronal mechanisms could lead to the "under-actuation" effect observed in the current study, without necessarily being reflective of an underestimation of mass per se.

      (2) The subject's body in weightless is much more sensitive to reaction forces in interactions with the environment in the absence of the anchoring effect of gravity pushing the body into the floor and in the absence of anticipatory postural adjustments that typically accompany upper-limb motions in Earth gravity in order to maintain an upright posture. The authors dismiss this possibility because the taikonauts were asked to stabilize their bodies with the contralateral hand. But the authors present no evidence that this was sufficient to maintain the shoulder and trunk at a strictly constant position, as is supposed by the simplified biomechanical model used in their optimal control framework. Indeed, a small backward motion of the shoulder would result in a smaller acceleration of the fingertip and a smaller extent of the initial ballistic motion of the hand with respect to the measurement device (the tablet), consistent with the observations reported in the study. Note that stability of the base might explain why 45º movements were apparently less affected in weightlessness, according to many of the reported analyses, including those related to corrective movements (Fig. 5 B, C, F; Fig. 6D), than the other two directions. If the trunk is being stabilized by the left arm, the same reaction forces on the trunk due to the acceleration of the hand will result in less effective torque on the trunk, given that the reaction forces act with a much smaller moment arm with respect to the left shoulder (the hand movement axis passes approximately through the left shoulder for the 45º target) compared to either the forward or rightward motions of the hand.

      (3) The above is exacerbated by potential changes in the frictional forces between the fingertip and the tablet. The movements were measured by having the subjects slide their finger on the surface of a touch screen. In weightlessness, the implications of this contact can be expected to be quite different than on the ground. While these forces may be low on Earth, the fact is that we do not know what forces the taikonauts used on orbit. In weightlessness, the taikonauts would need to actively press downward to maintain contact with the screen, while on Earth gravity will do the work. The tangential forces that resist movement due to friction might therefore be different in 0g. . Indeed, given the increased instability of the body and the increased uncertainty of movement direction of the hand, taikonauts may have been induced to apply greater forces against the tablet in order to maintain contact in weightlessness, which would in turn slow the motion of the finger on the table and increase the reaction forces acting on the trunk. This could be particularly relevant given that the effect of friction would interact with the limb in a direction-dependent fashion, given the anisotropy of the equivalent mass at the fingertip evoked by the authors

      I feel that the authors have done an admirable job of exploring the how to explain the modifications to movement kinematics that they observed on orbit within the constraints of the optimal control theory applied to a simplified model of the human motor system. While I fully appreciate the value of such models to provide insights into question of human sensorimotor behaviour, to draw firm conclusions on what humans are actually experiencing based only on manipulations of the computational model, without testing the model's implicit assumptions and without considering the actual neurophysiological and biomechanical mechanisms, can be misleading. One way to do this could be to examine these questions through extensions to the model used in the simulations (changing activation dynamics of the torque generators, allowing for potential motion backward motion of the shoulder and trunk, etc.). A better solution would be to emulate the physiological and biomechanical conditions on Earth (supporting the arm against gravity to reduce muscle tone, placing the subject on a moveable base that requires that the body be stabilized with the other hand) in order to distinguish the hypothesis of an underestimation of mass vs. other potential sources of under-actuation and other potential effects of weightlessness on the body.

      In sum, my opinion is that the authors are relying too much on a theoretical model as a ground truth and thus overstate their conclusions. But to provide a convincing argument that humans truly underestimate mass in weightlessness, they should consider more judiciously the neurophysiology and biomechanics that fall outside the purview of the simplified model that they have chosen. If a more thorough assessment of this nature is not possible, then I would argue that a more measured conclusion of the paper should be 1) that the authors observed modifications to movement kinematics in weightlessness consistent with an under-actuation for the intended motion, 2) that a simplified model of human physiology and biomechanics that incorporates principles of optimal control suggest that the source of this under-actuation might be an underestimation of mass in the computation of an appropriate feedforward motor command, and 3) that other potential neurophysiological or biomechanical effects cannot be excluded due to limitations of the computational model.

    5. Author response:

      The following is the authors’ response to the original reviews

      eLife Assessment

      This paper undertakes an important investigation to determine whether movement slowing in microgravity is due to a strategic conservative approach or rather due to an underestimation of the mass of the arm. While the experimental dataset is unique and the coupled experimental and computational analyses comprehensive, the authors present incomplete results to support the claim that movement slowing is due to mass underestimation. Further analysis is needed to rule out alternative explanations.

      We thank the editor and reviewers for the thoughtful and constructive comments, which helped us substantially improve the manuscript. In this revised version, we have made the following key changes:

      - Directly presented the differential effect of microgravity in different movement directions, showing its quantitative match with model predictions.

      - Showed that changing cost function with the idea of conservative strategy is not a viable alternative.

      - Showed our model predictions remain largely the same after adding Coriolis and centripetal torques.

      - Discussed alternative explanations including neuromuscular deconditioning, friction, body stability, etc.

      - Detailed the model description and moved it to the main text, as suggested.

      Our point-to-point response is numbered to facilitate cross-referencing.

      We believe the revisions and the responses adequately addresses the reviewers’ concerns, and new analysis results strengthened our conclusion that mass underestimation is the major contributor to movement slowing in microgravity.

      Reviewer #1 (Public review):

      Summary:

      This article investigates the origin of movement slowdown in weightlessness by testing two possible hypotheses: the first is based on a strategic and conservative slowdown, presented as a scaling of the motion kinematics without altering its profile, while the second is based on the hypothesis of a misestimation of effective mass by the brain due to an alteration of gravity-dependent sensory inputs, which alters the kinematics following a controller parameterization error.

      Strengths:

      The article convincingly demonstrates that trajectories are affected in 0g conditions, as in previous work. It is interesting, and the results appear robust. However, I have two major reservations about the current version of the manuscript that prevent me from endorsing the conclusion in its current form.

      Weaknesses:

      (1) First, the hypothesis of a strategic and conservative slow down implicitly assumes a similar cost function, which cannot be guaranteed, tested, or verified. For example, previous work has suggested that changing the ratio between the state and control weight matrices produced an alteration in movement kinematics similar to that presented here, without changing the estimated mass parameter (Crevecoeur et al., 2010, J Neurophysiol, 104 (3), 1301-1313). Thus, the hypothesis of conservative slowing cannot be rejected. Such a strategy could vary with effective mass (thus showing a statistical effect), but the possibility that the data reflect a combination of both mechanisms (strategic slowing and mass misestimation) remains open.

      Response (1): Thank you for raising this point. The basic premise of this concern is that changing the cost function for implementing strategic slowing can reproduce our empirical findings, thus the alternative hypothesis that we aimed to refute in the paper remain possible. At least, it could co-exist with our hypothesis of mass underestimation. In the revision, we show that changing the cost function only, as suggested here, cannot produce the behavioral patterns observed in microgravity.

      As suggested, we modified the relative weighting of the state and control cost matrices (i.e., Q and R in the cost function Eq 15) without considering mass underestimation. While this cost function scaling can decrease peak velocity – a hallmark of strategic slowing – it also inevitably leads to later peak timings. This is opposite to our robust findings: the taikonauts consistently “advanced” their peak velocity and peak acceleration in time. Note, these model simulation patterns have also been shown in Crevecoeur et al. (2010), the paper mentioned by the reviewer (see their Figure 7B).

      We systematically changed the ratio between the state and control weight matrices in the simulation, as suggested. We divided Q and multiplied R by the same factor α, the cost function scaling parameter α as defined in Crevecoeur et al. (2010). This adjustment models a shift in movement strategy in microgravity, and we tested a wide range of α to examine reasonable parameter space. Simulation results for α = 3 and α = 0.3 are shown in Figure 1—figure supplement 2 and Figure 1—figure supplement 3 respectively. As expected, with α = 3 (higher control effort penalty), peak velocities and accelerations are reduced, but their timing is delayed. Conversely, with α = 0.3, both peak amplitude and timing increase. Hence, changing the cost function to implement a conservative strategy cannot produce the kinematic pattern observed in microgravity, which is a combination of movement slowing and peak timing advance.

      Therefore, we conclude that a change in optimal control strategy alone is insufficient to explain our empirical findings. Logically speaking, we cannot refute the possibility of strategic slowing, which can still exist on top of the mass underestimation we proposed here. However, our data does not support its role in explaining the slowing of goal-directed hand reaching in microgravity. We have added these analyses to the Supplementary Materials and expanded the Discussion to address this point.

      (2) The main strength of the article is the presence of directional effects expected under the hypothesis of mass estimation error. However, the article lacks a clear demonstration of such an effect: indeed, although there appears to be a significant effect of direction, I was not sure that this effect matched the model's predictions. A directional effect is not sufficient because the model makes clear quantitative predictions about how this effect should vary across directions. In the absence of a quantitative match between the model and the data, the authors' claims regarding the role of misestimating the effective mass remain unsupported.

      Response (2): First, we have to clarify that our study does not aim to quantitatively fit observed hand trajectory. The two-link arm model simulates an ideal case of moving a point mass (effective mass) on a horizontal plane without friction (Todorov, 2004; 2005). In contrast, in the experiment, participants moved their hand on a tabletop without vertical arm support, so the movement was not strictly planar and was affected by friction. Thus, this kind of model can only illustrate qualitative differences between conditions, as in the majorities of similar modeling studies (e.g., Shadmehr et al., 2016). In our study, qualitative simulation means the model is intended to reproduce the directional differences between conditions—not exact numeric values—in key kinematic measures. Specifically, it should capture how the peak velocity and acceleration amplitudes and their timings differ between normal gravity and microgravity (particularly under the mass-underestimation assumption).

      Second, the reviewer rightfully pointed out that the directional effect is essential for our theorization of the importance of mass underestimation. However, the directional effect has two aspects, which were not clearly presented in our original manuscript. We now clarify both here and in the revision. The first aspect is that key kinematic variables (peak velocity/acceleration and their timing) are affected by movement direction, even before any potential microgravity effect. This is shown by the ranking order of directions for these variables (Figure 1C-H). The direction-dependent ranking, confirmed by pre-flight data, indicates that effective mass is a determining factor for reaching kinematics, which motivated us to study its role in eliciting movement slowing in space. This was what our original manuscript emphasized and clearly presented.

      The second aspect is that the hypothetical mass underestimation might also differentially affect movements in different directions. This was not clearly presented in the original manuscript. However, we would not expect a quantitative match between model predictions and empirical data, for the reasons mentioned above. We now show this directional ranking in microgravity-elicited kinematic changes in both model simulations and empirical data. The overall trend is that the microgravity effect indeed differs between directions, and the model predictions and the data showed a reasonable qualitative match (Author response image 1 below).

      Shown in Author response image 1, we found that for amplitude changes (Δ peak speed, Δ peak acceleration) both the model and the mean of empirical data show the same directional ordering (45° > 90° > 135°) in pre-in and post-in comparisons. For timing (Δ peak-speed time, Δ peak-acceleration time), which we consider the most diagnostic, the same directional ranking was observed. We only found one deviation, i.e., the predicted sign (earlier peaks) was confirmed at 90° and 135°, but not at 45°. As discussed in Response (6), the absence of timing advance at 45° may reflect limitations of our simplified model, which did not consider that the 45° direction is essentially a single-joint reach. Taken together, the directional pattern is largely consistent with the model predictions based on mass underestimation. The model successfully reproduces the directional ordering of amplitude measures -- peak velocity and peak acceleration. It also captures the sign of the timing changes in two out of the three directions. We added these new analysis results in the revision and expanded Discussion accordingly.

      The details of our analysis on directional effects: We compared the model predictions (Author response image 1, left) with the experimental data (Author response image 1, right) across the three tested directions (45°, 90°, 135°). In the experimental data panels, both Δ(pre-in) (solid bars) and Δ(post-in) (semi-transparent bars) with standard error are shown. The directional trends are remarkably similar between model prediction and actual data. The post-in comparison is less aligned with model prediction; we postulate that the incomplete after-flight recovery (i.e., post data had not returned to pre-flight baselines) might obscure the microgravity effect. Incomplete recovery has also been shown in our original manuscript: peak speed and peak acceleration did not fully recover in post-flight sessions when compared to pre-flight sessions. To further quantify the correspondence between model and data, we performed repeated-measures correlation (rm-corr) analyses. We found significant within-subject correlations for three of the four metrics. For pre–in, Δ peak speed time (r<sub>rm</sub> = 0.627, t(23) = 3.858, p < 0.001), Δ peak acceleration time (r<sub>rm</sub> = 0.591, t(23) = 3.513, p = 0.002), and Δ peak acceleration (r<sub>rm</sub> = 0.573, t(23) = 3.351, p = 0.003) were significant, whereas Δ peak speed was not (r<sub>rm</sub> = 0.334, t(23) = 1.696, p = 0.103). These results thus show that the directional effect, as predicted our model, is observed both before spaceflight and in spaceflight (the pre-in comparison).

      Author response image 1.

      Directional comparison between model predictions and experimental data across the three reach directions (45°, 90°, 135°). Left: model outputs. Right: experimental data shown as Δ relative to the in-flight session; solid bars = Δ(in − pre) and semi-transparent bars = Δ(in − post). Colors encode direction consistently across panels (e.g., 45° = darker hue, 90° = medium, 135° = lighter/orange). Panels (clockwise from top-left): Δ peak speed (cm/s), Δ peak speed time (ms), Δ peak acceleration time (ms), and Δ peak acceleration (cm/s²). Bars are group means; error bars denote standard error across participants.

      Citations:

      Todorov, E. (2004). Optimality principles in sensorimotor control. Nature Neuroscience, 7(9), 907.

      Todorov, E. (2005). Stochastic optimal control and estimation methods adapted to the noise characteristics of the sensorimotor system. Neural Computation, 17(5), 1084–1108.

      Shadmehr, R., Huang, H. J., & Ahmed, A. A. (2016). A Representation of Effort in Decision-Making and Motor Control. Current Biology: CB, 26(14), 1929–1934.

      In general, both the hypotheses of slowing motion (out of caution) and misestimating mass have been put forward in the past, and the added value of this article lies in demonstrating that the effect depended on direction. However, (1) a conservative strategy with a different cost function can also explain the data, and (2) the quantitative match between the directional effect and the model's predictions has not been established.

      We agree that both hypotheses have been put forward before, however they are competing hypotheses that have not been resolved. Furthermore, the mass underestimation hypothesis is a conjecture without any solid evidence; previous reports on mass underestimation of object cannot directly translate to underestimation of body. As detailed in our responses above, we have shown that a conservative strategy implemented via a different cost function cannot reproduce the key findings in our dataset, thereby supporting the alternative hypothesis of mass underestimation. Moreover, we found qualitative agreement between the model predictions and the experimental data in terms of directional effects, which further strengthens our interpretation.

      Specific points:

      (1) I noted a lack of presentation of raw kinematic traces, which would be necessary to convince me that the directional effect was related to effective mass as stated.

      Response (3): We are happy to include exemplary speed and acceleration trajectories. Kinematic profiles from one example participant are shown in Figure 2—figure supplement 6.

      (2) The presentation and justification of the model require substantial improvement; the reason for their presence in the supplementary material is unclear, as there is space to present the modelling work in detail in the main text. Regarding the model, some choices require justification: for example, why did the authors ignore the nonlinear Coriolis and centripetal terms?

      Response (4): Great suggestion. In the revision, we have moved the model into the main text and added further justification for using this simple model.

      We initially omitted the nonlinear Coriolis and centripetal terms in order to start with a minimal model. Importantly, excluding these terms does not affect the model’s main conclusions. In the revision we added simulations that explicitly include these terms. The full explanation and simulations are provided in the Supplementary Notes 2 (this time we have to put it into the Supplementary to reduce the texts devoted to the model). More explanations can also be found in our response to Reviewer 2 (response (6)). The results indicate that, although these velocity-dependent forces show some directional anisotropy, their contribution is substantially smaller relative to that of the included inertial component; specifically, they have only a negligible impact on the predicted peak amplitudes and peak times.

      (3) The increase in the proportion of trials with subcomponents is interesting, but the explanatory power of this observation is limited, as the initial percentage was already quite high (from 60-70% during the initial study to 70-85% in flight). This suggests that the potential effect of effective mass only explains a small increase in a trend already present in the initial study. A more critical assessment of this result is warranted.

      Response (5): Thank you for your thoughtful comment. You are correct that the increase in the percentage of trials with submovements is modest, but a more critical change was observed in the timing between submovement peaks—specifically, the inter-peak interval (IPI). These intervals became longer during flight. Taken together with the percentage increase, the submovement changes significantly predicted the increase in movement duration, as shown by our linear mixed-effects model, which indicated that IPI increased.

      Reviewer #2 (Public review):

      This study explores the underlying causes of the generalized movement slowness observed in astronauts in weightlessness compared to their performance on Earth. The authors argue that this movement slowness stems from an underestimation of mass rather than a deliberate reduction in speed for enhanced stability and safety.

      Overall, this is a fascinating and well-written work. The kinematic analysis is thorough and comprehensive. The design of the study is solid, the collected dataset is rare, and the model tends to add confidence to the proposed conclusions. That being said, I have several comments that could be addressed to consolidate interpretations and improve clarity.

      Main comments:

      (1) Mass underestimation

      a) While this interpretation is supported by data and analyses, it is not clear whether this gives a complete picture of the underlying phenomena. The two hypotheses (i.e., mass underestimation vs deliberate speed reduction) can only be distinguished in terms of velocity/acceleration patterns, which should display specific changes during the flight with a mass underestimation. The experimental data generally shows the expected changes but for the 45° condition, no changes are observed during flight compared to the pre- and post-phases (Figure 4). In Figure 5E, only a change in the primary submovement peak velocity is observed for 45°, but this finding relies on a more involved decomposition procedure. It suggests that there is something specific about 45° (beyond its low effective mass). In such planar movements, 45° often corresponds to a movement which is close to single-joint, whereas 90° and 135° involve multi-joint movements. If so, the increased proportion of submovements in 90° and 135° could indicate that participants had more difficulties in coordinating multi-joint movements during flight. Besides inertia, Coriolis and centripetal effects may be non-negligible in such fast planar reaching (Hollerbach & Flash, Biol Cyber, 1982) and, interestingly, they would also be affected by a mass underestimation (thus, this is not necessarily incompatible with the author's view; yet predicting the effects of a mass underestimation on Coriolis/centripetal torques would require a two-link arm model). Overall, I found the discrepancy between the 45° direction and the other directions under-exploited in the current version of the article. In sum, could the corrective submovements be due to a misestimation of Coriolis/centripetal torques in the multi-joint dynamics (caused specifically -or not- by a mass underestimation)?

      Response (6): Thank you for raising these important questions. We unpacked the whole paragraph into two concerns: 1) the possibility that misestimation of Coriolis and centripetal torques might lead to corrective submovements, and 2) the weak effect in the 45° direction unexploited. These two concerns are valid but addressable, and they did not change our general conclusions based on our empirical findings (see Supplementary note 2. Coriolis and centripetal torques have minimal impact).

      Possible explanation for the 45° discrepancy

      We agree with the reviewer that the 45° direction likely involves more single-joint (elbow-dominant) movement, whereas the 90° and 135° directions require greater multi-joint (elbow + shoulder) coordination. This is particularly relevant when the workspace is near body midline (e.g., Haggard & Richardson, 1995), as the case in our experimental setup. To demonstrate this, we examined the curvature of the hand trajectories across directions. Using cumulative curvature (positive = counterclockwise), we obtained average values of 6.484° ± 0.841°, 1.539° ± 0.462°, and 2.819° ± 0.538° for the 45°, 90°, and 135° directions, respectively. The significantly larger curvature in the 45° condition suggests that these movements deviate more from a straight-line path, a hallmark of more elbow-dominant movements.

      Importantly, this curvature pattern was present in both the pre-flight and in-flight phases, indicating that it is a general movement characteristic rather than a microgravity-induced effect. Thus, the 45° reaches are less suitable for modeling with a simplified two-link arm model compared to the other two directions. We believe this is the main reason why the model predictions based on effective mass become less consistent with the empirical data for the 45° direction.

      We have now incorporated this new analysis in the Results and discussed it in the revised Discussion.

      Citation: Haggard, P., Hutchinson, K., & Stein, J. (1995). Patterns of coordinated multi-joint movement. Experimental Brain Research, 107(2), 254-266.

      b) Additionally, since the taikonauts are tested after 2 or 3 weeks in flight, one could also assume that neuromuscular deconditioning explains (at least in part) the general decrease in movement speed. Can the authors explain how to rule out this alternative interpretation? For instance, weaker muscles could account for slower movements within a classical time-effort trade-off (as more neural effort would be needed to generate a similar amount of muscle force, thereby suggesting a purposive slowing down of movement). Therefore, could the observed results (slowing down + more submovements) be explained by some neuromuscular deconditioning combined with a difficulty in coordinating multi-joint movements in weightlessness (due to a misestimation or Coriolis/centripetal torques) provide an alternative explanation for the results?

      Response (7): Neuromuscular deconditioning is indeed a space effect; thanks for bringing this up as we omitted the discussion of this confounds in our original manuscript. Prolonged stay in microgravity can lead to a reduction of muscle strength, but this is mostly limited to lower limb. For example, a recent well-designed large-sample study have shown that while lower leg muscle showed significant strength reductions, no changes in mean upper body strength was found (Scott et al., 2023), consistent with previous propositions that muscle weakness is less for upper-limb muscles than for postural and lower-limb muscles (Tesch et al., 2005). Furthermore, the muscle weakness is unlikely to play a major role here since our reaching task involves small movements (~12cm) with joint torques of a magnitude of ~2N·m. Of course, we cannot completely rule out the contribution of muscle weakness; we can only postulate, based on the task itself (12 cm reaching) and systematic microgravity effect (the increase in submovements, the increase in the inter-submovements intervals, and their significant prediction on movement slowing), that muscle weakness is an unlikely major contributor for the movement slowing.

      The reviewer suggests that poor coordination in microgravity might contribute to slowing down + more submovements. This is also a possibility, but we did not find evidence to support it. First, there is no clear evidence or reports about poor coordination for simple upper-limb movements like reaching investigated here. Note that reaching or aiming movement is one of the most studied tasks among astronauts. Second, we further analyzed our reaching trajectories and found no sign of curvature increase, a hallmark of poor coordination of Coriolis/centripetal torques, in our large collection of reaching movements. We probably have the largest dataset of reaching movements collected in microgravity thus far, given that we had 12 taikonauts and each of them performed about 480 to 840 reaching trials during their spaceflight. We believe the probability of Type II error is quite low here.

      Citation: Tesch, P. A., Berg, H. E., Bring, D., Evans, H. J., & LeBlanc, A. D. (2005). Effects of 17-day spaceflight on knee extensor muscle function and size. European journal of applied physiology, 93(4), 463-468.

      Scott J, Feiveson A, English K, et al. Effects of exercise countermeasures on multisystem function in long duration spaceflight astronauts. npj Microgravity. 2023;9(11).

      (2) Modelling

      a) The model description should be improved as it is currently a mix of discrete time and continuous time formulations. Moreover, an infinite-horizon cost function is used, but I thought the authors used a finite-horizon formulation with the prefixed duration provided by the movement utility maximization framework of Shadmehr et al. (Curr Biol, 2016). Furthermore, was the mass underestimation reflected both in the utility model and the optimal control model? If so, did the authors really compute the feedback control gain with the underestimated mass but simulate the system with the real mass? This is important because the mass appears both in the utility framework and in the LQ framework. Given the current interpretations, the feedforward command is assumed to be erroneous, and the feedback command would allow for motor corrections. Therefore, it could be clarified whether the feedback command also misestimates the mass or not, which may affect its efficiency. For instance, if both feedforward and feedback motor commands are based on wrong internal models (e.g., due to the mass underestimation), one may wonder how the astronauts would execute accurate goal-directed movements.

      b) The model seems to be deterministic in its current form (no motor and sensory noise). Since the framework developed by Todorov (2005) is used, sensorimotor noise could have been readily considered. One could also assume that motor and sensory noise increase in microgravity, and the model could inform on how microgravity affects the number of submovements or endpoint variance due to sensorimotor noise changes, for instance.

      c) Finally, how does the model distinguish the feedforward and feedback components of the motor command that are discussed in the paper, given that the model only yields a feedback control law? Does 'feedforward' refer to the motor plan here (i.e., the prefixed duration and arguably the precomputed feedback gain)?

      Response (8): We thank the reviewer for raising these important and technically insightful points regarding our modeling framework. We first clarify the structure of the model and key assumptions, and then address the specific questions in points (a)–(c) below.

      We used Todorov’s (2005) stochastic optimal control method to compute a finite-horizon LQG policy under sensory noise and signal-dependent motor noise (state noise set to zero). The cost function is: (see details in updated Methods). The resulting time-varying gains {L<sub>k</sub>, K<sub>k</sub>} correspond to the feedforward mapping and the feedback correction gain, respectively. The control law can be expressed as:

      where u<sub>k</sub> is the control input, is the nominal planned state, is the estimated state, L<sub>k</sub> is the feedforward (nominal) control associated with the planned trajectory, and K<sub>k</sub> is the time-varying feedback gain that corrects deviations from the plan.

      To define the motor plan for comparison with behavior, we simulate the deterministic open-loop

      trajectory by turning off noise and disabling feedback corrections, i.e., . In this framework, “feedforward” refers to this nominal motor plan. Thus, sensory and signal-dependent noise influence the computed policy (via the gains), but are not injected when generating the nominal trajectory. This mirrors the minimum-jerk practice used to obtain nominal kinematics in prior utility-based work (Shadmehr, 2016), while optimal control provides a more physiologically grounded nominal plan. In the revision, we have updated the equations, provided more modeling details, and moved the model description to the main text to reduce possible confusions.

      In the implementation of the “mass underestimation” condition, the mass used to compute the policy is the underestimated mass (), whereas the actual mass is used when simulating the feedforward trajectories. Corrective submovements are analyzed separately and are not required for the planning-deficit findings reported here.

      Answers of the three specific questions:

      a) We mistakenly wrote a continuous-time infinite-horizon cost function in our original manuscript, whereas our controller is actually implemented as a discrete-time finite-horizon LQG with a terminal cost, over a horizon set by the utility-based optimal movement duration T<sub>opt</sub>. The underestimated mass is used in both the utility model (to determine T<sub>opt</sub>) and in the control computation (i.e., internal model), while the true mass is used when simulating the movement. This mismatch captures the central idea of feedforward planning based on an incorrect internal model.

      b) As described, our model includes signal-dependent motor noise and sensory noise, following Todorov (2005). We also evaluated whether increased noise levels in microgravity could account for the observed behavioral changes. Simulation results showed that increasing either source of noise did not alter the main conclusions or reverse the trends in our key metrics. Moreover, our experimental data showed no significant increase in endpoint variability in microgravity (see analyses and results in Figure 2—figure supplement 3 & 4), making it unlikely that increased sensorimotor noise alone accounts for the observed slowing and submovement changes.

      c) In our framework, the time-varying gains {L<sub>K</sub>,K<sub>K</sub>}define the feedforward and feedback components of the control policy. While both gains are computed based on a stochastic optimal control formulation (including noise), for comparison with behavior we simulate only the nominal feedforward plan, by turning off both noise and feedback: . This defines a deterministic open-loop trajectory, which we use to capture planning-level effects such as peak timing shifts under mass underestimation. Feedback corrections via gains exist in the full model but are not involved in these specific analyses. We clarified this modeling choice and its behavioral relevance in the revised text.

      We have updated the equations and moved the model description into the main text in the revised manuscript to avoid confusion.

      (3) Brevity of movements and speed-accuracy trade-off

      The tested movements are much faster (average duration approx. 350 ms) than similar self-paced movements that have been studied in other works (e.g., Wang et al., J Neurophysiology, 2016; Berret et al., PLOS Comp Biol, 2021, where movements can last about 900-1000 ms). This is consistent with the instructions to reach quickly and accurately, in line with a speed-accuracy trade-off. Was this instruction given to highlight the inertial effects related to the arm's anisotropy? One may however, wonder if the same results would hold for slower self-paced movements (are they also with reduced speed compared to Earth performance?). Moreover, a few other important questions might need to be addressed for completeness: how to ensure that astronauts did remember this instruction during the flight? (could the control group move faster because they better remembered the instruction?). Did the taikonauts perform the experiment on their own during the flight, or did one taikonaut assume the role of the experimenter?

      Response (9): Thanks for highlighting the brevity of movements in our experiment. Our intention in emphasizing fast movements is to rigorously test whether movement is indeed slowed down in microgravity. The observed prolonged movement duration clearly shows that microgravity affects people’s movement duration, even when they are pushed to move fast. The second reason for using fast movement is to highlight that feedforward control is affected in microgravity. Mass underestimation specifically affects feedforward control in the first place, shown by the microgravity-related changes in peak velocity/acceleration. Slow movement would inevitably have online corrections that might obscure the effect of mass underestimation. Note that movement slowing is not only observed in our speed-emphasized reaching task, but also in whole-arm pointing in other astronauts’ studies (Berger, 1997; Sangals, 1999), which have been quoted in our paper. We thus believe these findings are generalizable.

      Regarding the consistency of instructions: all our experiments conducted in the Tiangong space station were monitored in real time by experimenters in the control center located in Beijing. The task instructions were presented on the initial display of the data acquisition application and ample reading time was allowed. All the pre-, in-, and post-flight test sessions were administered by the same group of personnel with the same instruction. It is common that astronauts serve both as participants and experimenters at the same time. And, they were well trained for this type of role on the ground. Note that we had multiple pre-flight test sessions to familiarize them with the task. All these rigorous measures were in place to obtain high-quality data. In the revision, we included these experimental details for readers that are not familiar with space studies, and provided the rationales for emphasizing fast movements.

      Citations:

      Berger, M., Mescheriakov, S., Molokanova, E., Lechner-Steinleitner, S., Seguer, N., & Kozlovskaya, I. (1997). Pointing arm movements in short- and long-term spaceflights. Aviation, Space, and Environmental Medicine, 68(9), 781–787.

      Sangals, J., Heuer, H., Manzey, D., & Lorenz, B. (1999). Changed visuomotor transformations during and after prolonged microgravity. Experimental Brain Research. Experimentelle Hirnforschung. Experimentation Cerebrale, 129(3), 378–390.

      (4) No learning effect

      This is a surprising effect, as mentioned by the authors. Other studies conducted in microgravity have indeed revealed an optimal adaptation of motor patterns in a few dozen trials (e.g., Gaveau et al., eLife, 2016). Perhaps the difference is again related to single-joint versus multi-joint movements. This should be better discussed given the impact of this claim. Typically, why would a "sensory bias of bodily property" persist in microgravity and be a "fundamental constraint of the sensorimotor system"?

      Response (10): We believe that the presence or absence of adaptation between our study and Gaveau et al.’s study cannot be simply attributed to single-joint versus multi-joint movements. Their adaptation concerned incorporating microgravity into movement control to minimize effort, whereas ours concerned accurately perceiving body mass. Gaveau et al.’s task involved large-amplitude vertical reaching, a scenario in which gravity strongly affects joint torques and movement execution. Thus, adaptation to microgravity can lead to better execution, providing a strong incentive for learning. By contrast, our task consisted of small-amplitude horizontal movements, where the gravitational influence on biomechanics is minimal.

      More importantly, we believe the lack of adaptation for mass underestimation is not totally surprising. When an inertial change is perceived (such as an extra weight attached to the forearm, as in previous motor adaptation studies), people can adapt their reaching within tens of trials. In that case, sensory cues are veridical, as they correctly signal the inertial perturbation. However, in microgravity, reduced gravitational pull and proprioceptive inputs constantly inform the controller that the body mass is less than its actual magnitude. In other words, sensory cues in space are misleading for estimating body mass. The resulting sensory bias prevents the sensorimotor system from adapting. Our initial explanation on this matter was too brief; we expanded it in the revised Discussion.

      Reviewer #3 (Public review):

      Summary:

      The authors describe an interesting study of arm movements carried out in weightlessness after a prolonged exposure to the so-called microgravity conditions of orbital spaceflight. Subjects performed radial point-to-point motions of the fingertip on a touch pad. The authors note a reduction in movement speed in weightlessness, which they hypothesize could be due to either an overall strategy of lowering movement speed to better accommodate the instability of the body in weightlessness or an underestimation of body mass. They conclude for the latter, mainly based on two effects. One, slowing in weightlessness is greater for movement directions with higher effective mass at the end effector of the arm. Two, they present evidence for an increased number of corrective submovements in weightlessness. They contend that this provides conclusive evidence to accept the hypothesis of an underestimation of body mass.

      Strengths:

      In my opinion, the study provides a valuable contribution, the theoretical aspects are well presented through simulations, the statistical analyses are meticulous, the applicable literature is comprehensively considered and cited, and the manuscript is well written.

      Weaknesses:

      Nevertheless, I am of the opinion that the interpretation of the observations leaves room for other possible explanations of the observed phenomenon, thus weakening the strength of the arguments.

      First, I would like to point out an apparent (at least to me) divergence between the predictions and the observed data. Figures 1 and S1 show that the difference between predicted values for the 3 movement directions is almost linear, with predictions for 90º midway between predictions for 45º and 135º. The effective mass at 90º appears to be much closer to that of 45º than to that of 135º (Figure S1A). But the data shown in Figure 2 and Figure 3 indicate that movements at 90º and 135º are grouped together in terms of reaction time, movement duration, and peak acceleration, while both differ significantly from those values for movements at 45º.

      Furthermore, in Figure 4, the change in peak acceleration time and relative time to peak acceleration between 1g and 0g appears to be greater for 90º than for 135º, which appears to me to be at least superficially in contradiction with the predictions from Figure S1. If the effective mass is the key parameter, wouldn't one expect as much difference between 90º and 135º as between 90º and 45º? It is true that peak speed (Figure 3B) and peak speed time (Figure 4B) appear to follow the ordering according to effective mass, but is there a mathematical explanation as to why the ordering is respected for velocity but not acceleration? These inconsistencies weaken the author's conclusions and should be addressed.

      Response (11): Indeed, the model predicts an almost equal separation between 45° and 90° and between 90° and 135°, while the data indicate that the spacing between 45° and 90° is much smaller than between 90° and 135°. We do not regard the divergence as evidence undermining our main conclusion since 1) the model is a simplification of the actual situation. For example, the model simulates an ideal case of moving a point mass (effective mass) without friction and without considering Coriolis and centripetal torques. 2) Our study does not make quantitative predictions of all the key kinematic measures; that will require model fitting, parameter estimation, and posture-constrained reaching experiments; instead, our study uses well-established (though simplified) models to qualitatively predict the overall behavioral pattern we would observe. For this purpose, our results are well in line with our expectations: though we did not find equal spacing between direction conditions, we do confirm that the key kinematic measures (Figure 2 and Figure 3 as questioned) show consistent directional trends between model predictions and empirical data. We added new analysis results on this matter: the directional effect we observed (how the key measures changed in microgravity across direction condition) is significantly correlated with our model predictions in most cases. Please check our detailed response (2) above. These results are also added in the revision.

      We also highlight in the revision that our modeling is not to quantitatively predict reaching behaviors in space, but to qualitatively prescribe that how mass underestimation, but not the conservative control strategy, can lead to divergent predictions about key kinematic measures of fast reaching.

      Then, to strengthen the conclusions, I feel that the following points would need to be addressed:

      (1) The authors model the movement control through equations that derive the input control variable in terms of the force acting on the hand and treat the arm as a second-order low-pass filter (Equation 13). Underestimation of the mass in the computation of a feedforward command would lead to a lower-than-expected displacement to that command. But it is not clear if and how the authors account for a potential modification of the time constants of the 2nd order system. The CNS does not effectuate movements with pure torque generators. Muscles have elastic properties that depend on their tonic excitation level, reflex feedback, and other parameters. Indeed, Fisk et al. showed variations of movement characteristics consistent with lower muscle tone, lower bandwidth, and lower damping ratio in 0g compared to 1g. Could the variations in the response to the initial feedforward command be explained by a misrepresentation of the limbs' damping and natural frequency, leading to greater uncertainty about the consequences of the initial command? This would still be an argument for unadapted feedforward control of the movement, leading to the need for more corrective movements. But it would not necessarily reflect an underestimation of body mass.

      Fisk, J. O. H. N., Lackner, J. R., & DiZio, P. A. U. L. (1993). Gravitoinertial force level influences arm movement control. Journal of neurophysiology, 69(2), 504-511.

      Response (12): We agree that muscle properties, tonic excitation level, proprioception-mediated reflexes all contribute to reaching control. Fisk et al. (1993) study indeed showed that arm movement kinematics change, possibly owing to lower muscle tone and/or damping. However, reduced muscle damping and reduced spindle activity are more likely to affect feedback-based movements. Like in Fisk et al.’s study, people performed continuous arm movements with eyes closed; thus their movements largely relied on proprioceptive control. Our major findings are about the feedforward control, i.e., the reduced and “advanced” peak velocity/acceleration in discrete and ballistic reaching movements. Note that the peak acceleration happens as early as approximately 90-100ms into the movements, clearly showing that feedforward control is affected -- a different effect from Fisk et al’s findings. It is unlikely that people “advanced” their peak velocity/acceleration because they feel the need for more later corrective movements. Thus, underestimation of body mass remains the most plausible explanation.

      (2) The movements were measured by having the subjects slide their finger on the surface of a touch screen. In weightlessness, the implications of this contact are expected to be quite different than those on the ground. In weightlessness, the taikonauts would need to actively press downward to maintain contact with the screen, while on Earth, gravity will do the work. The tangential forces that resist movement due to friction might therefore be different in 0g. This could be particularly relevant given that the effect of friction would interact with the limb in a direction-dependent fashion, given the anisotropy of the equivalent mass at the fingertip evoked by the authors. Is there some way to discount or control for these potential effects?

      Response (13): We agree that friction might play a role here, but normal interaction with a touch screen typically involves friction between 0.1N and 0.5N (e.g., Ayyildiz et al., 2018). We believe that the directional variation of the friction is even smaller than 0.1N. It is very small compared to the force used to accelerate the arm for the reaching movement (10N-15N). Thus, friction anisotropy is unlikely to explain our data. Indeed, our readers might have the same concern, we thus added some discussion about possible effect of friction.

      Citation: Ayyildiz M, Scaraggi M, Sirin O, Basdogan C, Persson BNJ. Contact mechanics between the human finger and a touchscreen under electroadhesion. Proc Natl Acad Sci U S A. 2018 Dec 11;115(50):12668-12673.

      (3) The carefully crafted modelling of the limb neglects, nevertheless, the potential instability of the base of the arm. While the taikonauts were able to use their left arm to stabilize their bodies, it is not clear to what extent active stabilization with the contralateral limb can reproduce the stability of the human body seated in a chair in Earth gravity. Unintended motion of the shoulder could account for a smaller-than-expected displacement of the hand in response to the initial feedforward command and/or greater propensity for errors (with a greater need for corrective submovements) in 0g. The direction of movement with respect to the anchoring point could lead to the dependence of the observed effects on movement direction. Could this be tested in some way, e.g., by testing subjects on the ground while standing on an unstable base of support or sitting on a swing, with the same requirement to stabilize the torso using the contralateral arm?

      Response (14): Body stabilization is always a challenge for human movement studies in space. We minimized its potential confounding effects by using left-hand grasping and foot straps for postural support throughout the experiment. We think shoulder stability is an unlikely explanation because unexpected shoulder instability should not affect the feedforward (early) part of the ballistic reaching movement: the reduced peak acceleration and its early peak were observed at about 90-100ms after movement initiation. This effect is too early to be explained by an expected stability issue. This argument is now mentioned in the revised Discussion.

      The arguments for an underestimation of body mass would be strengthened if the authors could address these points in some way.

      Recommendations for the authors:

      Reviewing Editor Comments:

      General recommendation

      Overall, the reviewers agreed this is an interesting study with an original and strong approach. Nonetheless, there were significant weaknesses identified. The main criticism is that there is insufficient evidence for the claim that the movement slowing is due to mass underestimation, rather than other explanations for the increased feedback corrections. To bolster this claim, the reviewers have requested a deeper quantitative analysis of the directional effect and comparison to model predictions. They have also suggested that a 2-dof arm model could be used to predict how mass underestimation would influence multi-joint kinematics, and this should be compared to the data. Alternatively, or additionally, a control experiment could be performed (described in the reviews). We do realize that some of these options may not be feasible or practical. Ultimately, we leave it to you to determine how best to strengthen and solidify the argument for mass underestimation, rather than other causes.

      As an alternative approach, you could consider tempering the claim regarding mass underestimation and focus more on the result that slower movements in microgravity are not simply a feedforward, rescaling of the movement trajectories, but rather, have greater feedback corrections. In this case, the reviewers feel it would still be critical to explain and discuss potential reasons for the corrections beyond mass underestimation.

      We hope that these points are addressable, either with new analyses, experiments, or with a tempering of the claims. Addressing these points would help improve the eLife assessment.

      Reviewer #1 (Recommendations for the authors):

      (1) Move model descriptions to the main text to present modelling choices in more detail

      Response (15): Thank you for the suggestion. We have moved the model descriptions to the main text to present the modeling choices in more detail and to allow readers to better cross-reference the analyses.

      (2) Perform quantitative comparisons of the directional effect with the model's predictions, and add raw kinematic traces to illustrate the effect in more detail.

      Response (16): Thanks for the suggestion, we have added the raw kinematics figure from a representative participant and please refer to Response (2) above for the comparisons of directional effect.

      (3) Explore the effect of varying cost parameters in addition to mass estimation error to estimate the proportion of data explained by the underestimation hypothesis.

      Response (17): Thank you for the suggestion. This has already been done—please see Response (1) above.

      Reviewer #2 (Recommendations for the authors):

      Minor comments:

      (1) It must be justified early on why reaction times are being analyzed in this work. I understood later that it is to rule out any global slowing down of behavioral responses in microgravity.

      Response (18): Exactly, RT results are informative about the absence of a global slowing down. Contrary to the conservative-strategy hypothesis, taikonauts did not show generalized slowing; they actually had faster reaction times during spaceflight, incompatible with a generalized slowing strategy. Thanks for point out; we justified that early in the text.

      (2) Since the results are presented before the methods, I suggest stressing from the beginning that the reaching task is performed on a tablet and mentioning the instructions given to the participants, to improve the reading experience. The "beep" and "no beep" conditions also arise without obvious justification while reading the paper.

      Response (19): Great suggestions. We now give out some experimental details and rationales at the beginning of Results.

      (3) Figure 1C: The vel profiles are not returning to 0 at the end, why? Is it because the feedback gain is computed based on the underestimated mass or because a feedforward controller is applied here? Is it compatible with the experimental velocity traces?

      Response (20): Figure. 1C shows the forward simulation under the optimal control policy. In our LQG formulation the terminal velocity is softly penalized (finite weight) rather than hard-constrained to zero; with a fixed horizon° the optimal solution can therefore end with a small residual velocity.

      In the behavioral data, the hand does come to rest: this is achieved by corrective submovements during the homing phase.

      (4) Left-skewed -> I believe this is right-skewed since the peak velocity is earlier.

      Response (21): Yes, it should be right-skewed, thanks for point that out.

      (5) What was the acquisition frequency of the positional data points? (on the tablet).

      Response (22): The sampling frequency is 100 Hz. Thanks for pointing that out; we’ve added this information to the Methods.

      (6) Figure S1. The planned duration seems to be longer than in the experiment (it is more around 500 ms for the 135-degree direction in simulation versus less than 400 ms in the experiment). Why?

      Response (23): We apologize for a coding error that inadvertently multiplied the body-mass parameter by an extra factor, making the simulated mass too high. We have corrected the code, rerun the simulations, and updated Figures 1 and S1; all qualitative trends remain unchanged, and the revised movement durations (≈300–400 ms) are closer to the experimental values.

      (7) After Equation 13: "The control law is given by". This is not the control law, which should have a feedback form u=K*x in the LQ framework. This is just the dynamic equations for the auxiliary state and the force. Please double-check the model description.

      Response (24): Thank you for point this out. We have updated and refined all model equations and descriptions, and moved the model description from the Supplementary Materials to the main text; please see the revised manuscript.

      Reviewer #3 (Recommendations for the authors):

      (1) I have a concern about the interpretation of the anisotropic "equivalent mass". From my understanding, the equivalent mass would be what an external actor would feel as an equivalent inertia if pushing on the end effector from the outside. But the CNS does not push on the arm with a pure force generator acting at the hand to effectuate movement. It applies torque around the joints by applying forces across joints with muscles, causing the links of the arm to rotate around the joints. If the analysis is carried out in joint space, is the effective rotational inertia of the arm also anisotropic with respect to the direction of the movement of the hand? In other words, can the authors reassure me that the simulations are equivalent to an underestimation of the rotational inertia of the links when applied to the joints of the limb? It could be that these are mathematically the same; I have not delved into the mathematics to convince myself either way. But I would appreciate it if the authors could reassure me on this point.

      Response (25): Thank you for raising this point. In our work, “equivalent mass” denotes the operational-space inertia projected along the hand-movement direction u, computed as:

      This formulation describes the effective mass perceived at the end effector along a given direction, and is standard in operational-space control.

      Although the motor command can be coded as either torque/force in the CNS, the actual executions are equivalent no matter whether it is specified as endpoint forces or joint torques, since force and torque are related by . For small excursions as investigated here, this makes the directional anisotropy in endpoint inertia consistent with the anisotropy of the effective joint-space inertia required to produce the same endpoint motion. Conceptually, therefore, our “mass underestimation” manipulation in operational space corresponds to underestimating the required joint-space inertia mapped through the Jacobian. Since our behavioral data are hand positions, using the operational-space representation is the most direct and appropriate way for modeling.

      (2) I would also like to suggest one more level of analysis to test their hypothesis. The authors decomposed the movements into submovements and measured the prevalence of corrective submovements in weightlessness vs. normal gravity. The increase in corrective submovements is consistent with the hypothesis of a misestimation of limb mass, leading to an unexpectedly smaller displacement due to the initial feedforward command, leading to the need for corrections, leading to an increased overall movement duration. According to this hypothesis, however, the initial submovement, while resulting in a smaller than expected displacement, should have the same duration as the analogous movements performed on Earth. The authors could check this by analyzing the duration of the extracted initial submovements.

      Response (26): We appreciate the reviewer’s suggestion regarding the analysis of the initial submovement duration. In our decomposition framework, each submovement is modeled as a symmetric log-normal (bell-shaped) component, such that the time to peak speed is always half of the component duration. Thus, the initial submovement duration is directly reflected in the initial submovement peak-speed time already reported in our original manuscript (Figure. 5F).

      However, we respectfully disagree with the assumption that mass underestimation would necessarily yield the same submovement duration as on Earth. Under mass underestimation, the movement is effectively under-actuated, and the initial submovement can terminate prematurely, leading to a shorter duration. This is indeed what we observed in the data. Therefore, our reported metrics already address the reviewer’s proposal and support the conclusion that mass underestimation reduces the initial submovement duration in microgravity. Per your suggestion, we now added one more sentence to explain to the reader that initial submovement peak-speed time reflect the duration of the initial submovement.

      Some additional minor suggestions:

      (1) I believe that it is important to include the data from the control subjects, in some form, in the main article. Perhaps shading behind the main data from the taikonauts to show similarities or differences between groups. It is inconvenient to have to go to the supplementary material to compare the two groups, which is the main test of the experiment.

      Response (27): Thank you for the suggestion. For all the core performance variables, the control group showed flat patterns, with no changes across test sessions at all. Thus, including these figures (together with null statistical results) in the main text would obscure our central message, especially given the expanded length of the revised manuscript (we added model details and new analysis results). Instead, following eLife’s format, we have reorganized the Supplementary Material so that each experimental figure has a corresponding supplementary figure showing the control data. This way, readers can quickly locate the control results and directly compare them with the experimental data, while keeping the main text focused.

      (2) "Importantly, sensory estimate of bodily property in microgravity is biased but evaded from sensorimotor adaptation, calling for an extension of existing theories of motor learning." Perhaps "immune from" would be a better choice of words.

      Response (28): Thanks for the suggestion, we edited our text accordingly.

      (3) "First, typical reaching movement exhibits a symmetrical bell-shaped speed profile, which minimizes energy expenditure while maximizing accuracy according to optimal control principles (Todorov, 2004)." While Todorov's analysis is interesting and well accepted, it might be worthwhile citing the original source on the phenomenon of bell-shaped velocity profiles that minimize jerk (derivative of acceleration) and therefore, in some sense, maximize smoothness. Flash and Hogan, 1985.

      Response (29): Thanks for the suggestion, we added the citation of minimum jerk.

      (4) "Post-hoc analyses revealed slower reaction times for the 45° direction compared to both 90° (p < 0.001, d = 0.293) and 135° (p = 0.003, d = 0.284). Notably, reactions were faster during the in-flight phase compared to pre-flight (p = 0.037, d = 0.333), with no significant difference between in-flight and post-flight phases (p = 0.127)." What can one conclude from this?

      Response (30): Although these decreases reached statistical significance, their magnitudes were small. The parallel pattern across groups suggests the effect is not driven by microgravity, but is more plausibly a mild learning/practice effect. We now mentioned this in the Discussion.

      (5) "In line with predictions, peak acceleration appeared significantly earlier in the 45° direction than other directions (45° vs. 90°, p < 0.001, d = 0.304; 45° vs. 135°, p < 0.001, d = 0.271)." Which predictions? Because the effective mass is greater at 45º? Could you clarify the prediction?

      Response (31): We should be more specific here; thank you for raising this. The predictions are the ones about peak acceleration timing (shown in Fig. 1H). We now modified this sentence as:

      “In line with model predictions (Figure 1H), ….”.

      (6) Figure 2: Why do 45º movements have longer reaction times but shorter movement durations?

      Response (32): Appreciate your careful reading of the results. We believe this is possibly due to flexible motor control across conditions and trials, i.e., people tend to move faster when people react slower with longer reaction time. This has been reflected in across-direction comparisons (as spotted by the reviewer here), and it has also been shown within participant and across participants: For both groups, we found a significant negative correlation between movement duration (MD) and reaction time (RT), both across and within individuals (Figure 2—figure supplement 5). This finding indicates that participants moved faster when their RT was slower, and vice versa. This flexible motor adjustment, likely due to the task requirement for rapid movements, remained consistent during spaceflight.

    1. eLife Assessment

      In this useful study, the authors conducted an impressive amount of atomistic simulations with a realistic asymmetric lipid bilayer to probe how the HIV-1 envelope glycoprotein (Env) transmembrane domain, cytoplasmic tail, and membrane environment influence ectodomain orientation and antibody epitope exposure. The simulations convincingly show that ectodomain motion is dominated by tilting relative to the membrane and explicitly demonstrate the role of membrane asymmetry in modulating the protein conformation and orientation. However, due to the qualitative nature of the conducted analyses, the evidence for the coupling between membrane-proximal regions and the antigenic surface is considered incomplete. With stronger integration of prior experimental and computational literature, this work has the potential to serve as a reference for how Env behaves in a realistic, glycosylated, membrane-embedded context.

    2. Reviewer #1 (Public review):

      Summary:

      In the manuscript "Conformational Variability of HIV-1 Env Trimer and Viral Vulnerability", the authors study the fully glycosylated HIV-1 Env protein using an all-atom forcefield. It combines long all-atom simulations of Env in a realistic asymmetric bilayer with careful data analysis. This work clarifies how the CT domain modulates the overall conformation of the Env ectodomain and characterizes different MPER-TMD conformations. The authors also carefully analyze the accessibility of different antibodies to the Env protein.

      Strengths:

      This paper is state-of-the-art, given the scale of the system and the sophistication of the methods. The biological question is important, the methodology is rigorous, and the results will interest a broad audience.

      Weaknesses:

      The manuscript lacks a discussion of previous studies. The authors should consider addressing or comparing their work with the following points:

      (1) Tilting of the Env ectodomain has also been reported in previous experimental and theoretical work:

      https://doi.org/10.1101/2025.03.26.645577

      (2) A previous all-atom simulation study has characterized the conformational heterogeneity of the MPER-TMD domain:

      https://doi.org/10.1021/jacs.5c15421

      (3) Experimental studies have shown that MPER-directed antibodies recognize the prehairpin intermediate rather than the prefusion state:

      https://doi.org/10.1073/pnas.1807259115

      (4) How does the CT domain modulate the accessibility of these antibodies studied? The authors are in a strong position to compare their results with the following experimental study:

      https://doi.org/10.1126/science.aaa9804

    3. Reviewer #2 (Public review):

      (1) Summary

      In this work, the authors aim to elucidate how a viral surface protein behaves in a membrane environment and how its large-scale motions influence the exposure of antibody-binding sites. Using long-timescale, all-atom molecular dynamics simulations of a fully glycosylated, full-length protein embedded in a virus-like membrane, the study systematically examines the coupling between ectodomain motion, transmembrane orientation, membrane interactions, and epitope accessibility. By comparing multiple model variants that differ in cleavage state, initial transmembrane configuration, and presence of the cytoplasmic tail, the authors aim to identify general features of protein-membrane dynamics relevant to antibody recognition.

      (2) Strengths

      A major strength of this study is the scope and ambition of the simulations. The authors perform multiple microsecond-scale simulations of a highly complex, biologically realistic system that includes the full ectodomain, transmembrane region, cytoplasmic tail, glycans, and a heterogeneous membrane. Such simulations remain technically challenging, and the work represents a substantial computational and methodological effort.

      The analysis provides a clear and intuitive description of large-scale protein motions relative to the membrane, including ectodomain tilting and transmembrane orientation. The finding that the ectodomain explores a wide range of tilt angles while the transmembrane region remains more constrained, with limited correlation between the two, offers useful conceptual insight into how global motions may be accommodated without large rearrangements at the membrane anchor.

      Another strength is the explicit consideration of membrane and glycan steric effects on antibody accessibility. By evaluating multiple classes of antibodies targeting distinct regions of the protein, the study highlights how membrane proximity and glycan dynamics can differentially influence access to different epitopes. This comparative approach helps place the results in a broader immunological context and may be useful for readers interested in antibody recognition or vaccine design.

      Overall, the results are internally consistent across multiple simulations and model variants, and the conclusions are generally well aligned with the data presented.

      (3) Weaknesses

      The main limitations of the study relate to sampling and model dependence, which are inherent challenges for simulations of this size and complexity. Although the simulations are long by current standards, individual trajectories explore only portions of the available conformational space, and several conclusions rely on pooling data across a limited number of replicas. This makes it difficult to fully assess the robustness of some quantitative trends, particularly for rare events such as specific epitope accessibility states.

      In addition, several aspects of the model construction, including the treatment of missing regions, loop rebuilding, and initial configuration choices, are necessarily approximate. While these approaches are reasonable and well motivated, the extent to which some conclusions depend on these modeling choices is not always fully clear from the current presentation.

      Finally, the analysis of antibody accessibility is based on geometric and steric criteria, which provide a useful first-order approximation but do not capture potential conformational adaptations of antibodies or membrane remodeling during binding. As a result, the accessibility results should be interpreted primarily as model-based predictions rather than definitive statements about binding competence.

      Despite these limitations, the study provides a valuable and carefully executed contribution, and its datasets and analytical framework are likely to be useful to others interested in protein-membrane interactions and antibody recognition.

    4. Reviewer #3 (Public review):

      Summary:

      This study uses large-scale all-atom molecular dynamics simulations to examine the conformational plasticity of the HIV-1 envelope glycoprotein (Env) in a membrane context, with particular emphasis on how the transmembrane domain (TMD), cytoplasmic tail (CT), and membrane environment influence ectodomain orientation and antibody epitope exposure. By comparing Env constructs with and without the CT, explicitly modeling glycosylation, and embedding Env in an asymmetric lipid bilayer, the authors aim to provide an integrated view of how membrane-proximal regions and lipid interactions shape Env antigenicity, including epitopes targeted by MPER-directed antibodies.

      Strengths:

      A key strength of this work is the scope and realism of the simulation systems. The authors construct a very large, nearly complete Env-scale model that includes a glycosylated Env trimer embedded in an asymmetric bilayer, enabling analysis of membrane-protein interactions that are difficult to capture experimentally. The inclusion of specific glycans at reported sites, and the focus on constructs with and without the CT, are well motivated by existing biological and structural data.

      The simulations reveal substantial tilting motions of the ectodomain relative to the membrane, with angles spanning roughly 0-30{degree sign} (and up to ~50{degree sign} in some analyses), while the ectodomain itself remains relatively rigid. This framing, that much of Env's conformational variability arises from rigid-body tilting rather than large internal rearrangements, is an important conceptual contribution. The authors also provide interesting observations regarding asymmetric bilayer deformations, including localized thinning and altered lipid headgroup interactions near the TMD and CT, which suggest a reciprocal coupling between Env and the surrounding membrane.

      The analysis of antibody-relevant epitopes across the prefusion state, including the V1/V2 and V3 loops, the CD4 binding site, and the MPER, is another strength. The study makes effective use of existing experimental knowledge in this context, for example, by focusing on specific glycans known to occlude antibody binding, to motivate and interpret the simulations.

      Weaknesses:

      While the simulations are technically impressive, the manuscript would benefit from more explicit cross-validation against prior experimental and computational work throughout the Results and Discussion, and better framing in the introduction. Many of the reported behaviors, such as ectodomain tilting, TMD kinking, lipid interactions at helix boundaries, and aspects of membrane deformation, have been described previously in a range of MD studies of HIV Env and related constructs (e.g., PMC2730987, PMC2980712, PMC4254001, PMC4040535, PMC6035291, PMC12665260, PMID: 33882664, PMC11975376). Clearly situating the present results relative to these studies would strengthen the paper by clarifying where the simulations reproduce established behavior and where they extend it to more complete or realistic systems.

      A related limitation is that the work remains largely descriptive with respect to conformational coupling. Numerous experimental studies have demonstrated functional and conformational coupling between the TMD, CT, and the antigenic surface, with effects on Env stability, infectivity, and antibody binding (e.g., PMC4701381, PMC4304640, PMC5085267). In this context, the statement that ectodomain and TMD tilting motions are independent is a strong conclusion that is not fully supported by the analyses presented, particularly given the authors' acknowledgment that multiple independent simulations are required to adequately sample conformational space. More direct analyses of coupling, rather than correlations inferred from individual trajectories, would help align the simulations with the existing experimental literature. Given the scale of these simulations, a more thorough analysis of coupling could be this paper's most seminal contribution to the field.

      The choice of membrane composition also warrants deeper discussion. The manuscript states that it relies on a plasma membrane model derived from a prior simulation-based study, which itself is based on host plasma membrane (PMID: 35167752), but experimental analyses have shown that HIV virions differ substantially from host plasma membranes (e.g., PMC46679, PMC1413831, PMC10663554, PMC5039752, PMC6881329). In particular, virions are depleted in PC, PE, and PI, and enriched in phosphatidylserine, sphingomyelins, and cholesterol. These differences are likely to influence bilayer thickness, rigidity, and lipid-protein interactions and, therefore, may affect the generality of the conclusions regarding Env dynamics and antigenicity. Notably, the citation provided for membrane composition is a laboratory self-citation, a secondary source, rather than a primary experimental study on plasma membrane composition.

      Finally, there are pervasive issues with citation and methodological clarity. Several structural models are referred to only by PDB ID without citation, and in at least one case, a structure described as cryo-EM is in fact an NMR-derived model. Statements regarding residue flexibility, missing regions in structures, and comparisons to prior dynamics studies are often presented without appropriate references. The Methods section also lacks sufficient detail for a system of this size and complexity, limiting readers' ability to assess robustness or reproducibility.

      With stronger integration of prior experimental and computational literature, this work has the potential to serve as a valuable reference for how Env behaves in a realistic, glycosylated, membrane-embedded context. The simulation framework itself is well-suited for future studies incorporating mutations, strain variation, antibodies, inhibitors, or receptor and co-receptor engagement. In its current form, the primary contribution of the study is to consolidate and extend existing observations within a single, large-scale model, providing a useful platform for future mechanistic investigations.

    5. Author response:

      In response to the comments raised, we outline below the revisions we plan to strengthen the manuscript.

      First, we will expand the Introduction and Discussion sections to provide clearer comparison with prior experimental and computational studies of ectodomain tilting, MPER–TMD conformational heterogeneity, and membrane deformation, and to discuss how our simulations reproduce and extend these earlier observations.

      Second, we plan to add analyses that more directly assess the coupling between ectodomain and TMD motions. We will also revise the text to emphasize the limits imposed by sampling and model dependence and to discuss the potential benefits of enhanced sampling methods.

      Third, we will clarify the rationale for the chosen membrane composition and discuss how differences in lipid content between host plasma membranes and HIV virions may influence bilayer properties and Env dynamics.

      Fourth, we will supplement the Methods section to improve clarity and address issues of citation throughout the manuscript.

      Finally, we intend to deposit MD trajectories to a public research data repository to the extent permitted by available storage capacity.

    1. eLife Assessment

      This valuable study uses NAD(P)H fluorescence lifetime imaging (FLIM) to map metabolic states in the Drosophila brain. The authors reveal subtype-specific metabolic profiles in Kenyon cells and report learning-related changes, supported by solid evidence and careful methodology. However, the FLIM shifts observed after memory formation in α/β neurons are small and only weakly significant, so the ability of FLIM to detect subtle physiological changes still requires further validation. Nevertheless, this work provides a strong starting point and demonstrates the promising potential of FLIM for probing neural metabolism in vivo.

    2. Reviewer #1 (Public review):

      Summary:

      The authors present a novel usage of fluorescence life-time imaging microscopy (FLIM) to measure NAD(P)H autofluorescence in the Drosophila brain, as a proxy for cellular metabolic/redox states. This new method relies on the fact that both NADH and NADPH are autofluorescent, with a different excitation lifetime depending on whether they are free (indicating glycolysis) or protein-bound (indicating oxidative phosphorylation). The authors successfully use this method in Drosophila to measure changes in metabolic activity across different areas of the fly brain, with a particular focus on the main center for associative memory: the mushroom body.

      Strengths:

      The authors have made a commendable effort to explain the technical aspects of the method in accessible language. This clarity will benefit both non-experts seeking to understand the methodology and researchers interested in applying FLIM to Drosophila in other contexts.

      Weaknesses:

      Despite being statistically significant, the learning-induced change in f-free in α/β Kenyon cells is minimal (a decrease from 0.76 to 0.73, with a high variability). It is unclear whether this small effect represents a meaningful shift in neuronal metabolic state.

      Whether this method can be valuable to examine the effects of long-term memory (after spaced or massed conditioning) remains to be established.

    3. Reviewer #2 (Public review):

      This revised manuscript presents a valuable application of NAD(P)H fluorescence lifetime imaging (FLIM) to study metabolic activity in the Drosophila brain. The authors reveal regional differences in oxidative and glycolytic metabolism, with particular emphasis on the mushroom body, a key center for associative learning and memory. They also report metabolic shifts in α/β Kenyon cells following classical conditioning, in line with their known role in energy-demanding memory processes.

      The study is well-executed and the authors have added more detailed methodological descriptions in this version, which strengthen the technical contribution. The analysis pipeline is rigorous, with careful curve fitting and appropriate controls. However, the metabolic shifts observed after conditioning are small and only weakly significant, raising questions about the sensitivity of FLIM for detecting subtle physiological changes. The authors acknowledge these limitations in the revised discussion, which helps place the findings in proper context.

      Despite this, the work provides a solid foundation for future applications of label-free FLIM in vivo and serves as a valuable technical resource for researchers interested in neural metabolism. Overall, this study represents a meaningful step toward integrating metabolic imaging with the study of neural activity and cognitive function.

    4. Reviewer #3 (Public review):

      This study investigates the characteristics of the autofluorescence signal excited by 740 nm 2-photon excitation, in the range of 420-500 nm, across the Drosophila brain. The fluorescence lifetime (FL) appears bi-exponential, with a short 0.4 ns time constant followed by a longer decay. The lifetime decay and the resulting parameter fits vary across the brain. The resulting maps reveal anatomical landmarks, which simultaneous imaging of genetically encoded fluorescent proteins help identify. Past work has shown that the autofluorescence decay time course reflects the balance of the redox enzyme NAD(P)H vs. its protein bound form. The ratio of free to bound NADPH is thought to indicate relative glycolysis vs. oxidative phosphorylation, and thus shifts in the free-to-bound ratio may indicate shifts in metabolic pathways. The basics of this measure have been demonstrated in other organisms, and this study is the first to use the FLIM module of the STELLARIS 8 FALCON microscope from Leica to measure autofluorescence lifetime in the brain of the fly. Methods include registering brains of different flies to a common template and masking out anatomical regions of interest using fluorescence proteins.

      The analysis relies on fitting a FL decay model with two free parameters, f_free and T_bound. F_free is the fraction of the normalized curve contributed by a decaying exponential with a time constant 0.4 ns, thought to represent the FL of free NADPH or NADH, which apparently cannot be distinguished. T_bound is the time constant of the second exponential, with scalar amplitude = (1-f_free). The T_bound fit is thought to represent the decay time constant of protein bound NADPH, but can differ depending on the protein. The study shows that across the brain, T_bound can range from 0 to >5 ns, whereas f_free can range from 0.5 to 0.9 ns (Figure 1a). The paper beautifully lays out the analysis pipeline, providing a valuable resource. The full range of fits are reported, including maximum likelihood quality parameters, and can be benchmarks for future studies.

      The authors measure properties of NADPH related autofluorescence of Kenyon Cells (KCs) of the fly mushroom body. The somata and calyx of mushroom bodies have a longer average tau_bound than other regions (Figure 1e); the f_free fit is higher for the calyx (input synapses) region than for KC somata; and the average across flies of average f_free fits in alpha/beta KC somata decreases slightly following paired presentation of odor and shock, compared to unpaired presentation of the same stimuli. Though the change is slight, no comparable change is detected in gamma KCs, suggesting that distributions of f_free derived from FL may be sensitive enough to measure changes in metabolic pathways following conditioning.

      FLIM as a method is not yet widely prevalent in fly neuroscience, but recent demonstrations of its potential are likely to increase its use. Future efforts will benefit from the description of the properties of the autofluorescence signal to evaluate how autofluorescence may impact measures of FL of genetically engineered indicators.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors present a novel usage of fluorescence lifetime imaging microscopy (FLIM) to measure NAD(P)H autofluorescence in the Drosophila brain, as a proxy for cellular metabolic/redox states. This new method relies on the fact that both NADH and NADPH are autofluorescent, with a different excitation lifetime depending on whether they are free (indicating glycolysis) or protein-bound (indicating oxidative phosphorylation). The authors successfully use this method in Drosophila to measure changes in metabolic activity across different areas of the fly brain, with a particular focus on the main center for associative memory: the mushroom body.

      Strengths:

      The authors have made a commendable effort to explain the technical aspects of the method in accessible language. This clarity will benefit both non-experts seeking to understand the methodology and researchers interested in applying FLIM to Drosophila in other contexts.

      Weaknesses:

      (1) Despite being statistically significant, the learning-induced change in f-free in α/β Kenyon cells is minimal (a decrease from 0.76 to 0.73, with a high variability). The authors should provide justification for why they believe this small effect represents a meaningful shift in neuronal metabolic state.

      We agree with the reviewer that the observed f_free shift averaged per individual, while statistically significant, is small. However, to our knowledge, this is the first study to investigate a physiological (i.e., not pharmacologically induced) variation in neuronal metabolism using FLIM. As such, there are no established expectations regarding the amplitude of the effect. In the revised manuscript, we have included an additional experiment involving the knockdown of ALAT in α/β Kenyon cells, which further supports our findings. We have also expanded the discussion to expose two potential reasons why this effect may appear modest.

      (2) The lack of experiments examining the effects of long-term memory (after spaced or massed conditioning) seems like a missed opportunity. Such experiments could likely reveal more drastic changes in the metabolic profiles of KCs, as a consequence of memory consolidation processes.

      We agree with the reviewer that investigating the effects of long-term memory on metabolism represent a valuable future path of investigation. An intrinsic caveat of autofluorescence measurement, however, is to identify the cellular origin of the observed changes. To this respect, long-term memory formation is not an ideal case study as its essential feature is expected to be a metabolic activation localized to Kenyon cells’ axons in the mushroom body vertical lobes (as shown in Comyn et al., 2024), where many different neuron subtypes send intricate processes. This is why we chose to first focus on middle-term memory, where changes at the level of the cell bodies could be expected from our previous work (Rabah et al., 2022). But our pioneer exploration of the applicability of NAD(P)H FLIM to brain metabolism monitoring in vivo now paves the way to extending it to the effect of other forms of memory.

      (3) The discussion is mostly just a summary of the findings. It would be useful if the authors could discuss potential future applications of their method and new research questions that it could help address.

      The discussion has been expanded by adding interpretations of the findings and remaining challenges.

      Reviewer #2 (Public review):

      This manuscript presents a compelling application of NAD(P)H fluorescence lifetime imaging (FLIM) to study metabolic activity in the Drosophila brain. The authors reveal regional differences in oxidative and glycolytic metabolism, with a particular focus on the mushroom body, a key structure involved in associative learning and memory. In particular, they identify metabolic shifts in α/β Kenyon cells following classical conditioning, consistent with their established role in energy-demanding middle- and long-term memories.

      These results highlight the potential of label-free FLIM for in-vivo neural circuit studies, providing a powerful complement to genetically encoded sensors. This study is well-conducted and employs rigorous analysis, including careful curve fitting and well-designed controls, to ensure the robustness of its findings. It should serve as a valuable technical reference for researchers interested in using FLIM to study neural metabolism in vivo. Overall, this work represents an important step in the application of FLIM to study the interactions between metabolic processes, neural activity, and cognitive function.

      Reviewer #3 (Public review):

      This study investigates the characteristics of the autofluorescence signal excited by 740 nm 2-photon excitation, in the range of 420-500 nm, across the Drosophila brain. The fluorescence lifetime (FL) appears bi-exponential, with a short 0.4 ns time constant followed by a longer decay. The lifetime decay and the resulting parameter fits vary across the brain. The resulting maps reveal anatomical landmarks, which simultaneous imaging of genetically encoded fluorescent proteins helps to identify. Past work has shown that the autofluorescence decay time course reflects the balance of the redox enzyme NAD(P)H vs. its protein-bound form. The ratio of free-to-bound NADPH is thought to indicate relative glycolysis vs. oxidative phosphorylation, and thus shifts in the free-to-bound ratio may indicate shifts in metabolic pathways. The basics of this measure have been demonstrated in other organisms, and this study is the first to use the FLIM module of the STELLARIS 8 FALCON microscope from Leica to measure autofluorescence lifetime in the brain of the fly. Methods include registering the brains of different flies to a common template and masking out anatomical regions of interest using fluorescence proteins.

      The analysis relies on fitting an FL decay model with two free parameters, f_free and t_bound. F_free is the fraction of the normalized curve contributed by a decaying exponential with a time constant of 0.4 ns, thought to represent the FL of free NADPH or NADH, which apparently cannot be distinguished. T_bound is the time constant of the second exponential, with scalar amplitude = (1-f_free). The T_bound fit is thought to represent the decay time constant of protein-bound NADPH but can differ depending on the protein. The study shows that across the brain, T_bound can range from 0 to >5 ns, whereas f_free can range from 0.5 to 0.9 (Figure 1a). These methods appear to be solid, the full range of fits are reported, including maximum likelihood quality parameters, and can be benchmarks for future studies.

      The authors measure the properties of NADPH-related autofluorescence of Kenyon Cells(KCs) of the fly mushroom body. The results from the three main figures are:

      (1) Somata and calyx of mushroom bodies have a longer average tau_bound than other regions (Figure 1e);

      (2) The f_free fit is higher for the calyx (input synapses) region than for KC somata (Figure 2b);

      (3) The average across flies of average f_free fits in alpha/beta KC somata decreases from 0.734 to 0.718. Based on the first two findings, an accurate title would be "Autofluorecense lifetime imaging reveals regional differences in NADPH state in Drosophila mushroom bodies."

      The third finding is the basis for the title of the paper and the support for this claim is unconvincing. First, the difference in alpha/beta f_free (p-value of 4.98E-2) is small compared to the measured difference in f_free between somas and calyces. It's smaller even than the difference in average soma f_free across datasets (Figure 2b vs c). The metric is also quite derived; first, the model is fit to each (binned) voxel, then the distribution across voxels is averaged and then averaged across flies. If the voxel distributions of f_free are similar to those shown in Supplementary Figure 2, then the actual f_free fits could range between 0.6-0.8. A more convincing statistical test might be to compare the distributions across voxels between alpha/beta vs alpha'/beta' vs. gamma KCs, perhaps with bootstrapping and including appropriate controls for multiple comparisons.

      The difference observed is indeed modest relative to the variability of f_free measurements in other contexts. The fact that the difference observed between the somata region and the calyx is larger is not necessarily surprising. Indeed, these areas have different anatomical compositions that may result in different basal metabolic profiles. This is suggested by Figure 1b which shows that the cortex and neuropile have different metabolic signatures. Differences in average f_free values in the somata region can indeed be observed between naive and conditioned flies. However, all comparisons in the article were performed between groups of flies imaged within the same experimental batches, ensuring that external factors were largely controlled for. This absence of control makes it difficult to extract meaningful information from the comparison between naive and conditioned flies.

      We agree with the reviewer that the choice of the metric was indeed not well justified in the first manuscript. In the new manuscript, we have tried to illustrate the reasons for this choice with the example of the comparison of f_free in alpha/beta neurons between unpaired and paired conditioning (Dataset 8). First, the idea of averaging across voxels is supported by the fact that the distributions of decay parameters within a single image are predominantly unimodal. Examples for Dataset 8 are now provided in the new Sup. Figure 14. Second, an interpretable comparison between multiple groups of distributions is, to our knowledge, not straightforward to implement. It is now discussed in Supplementary information. To measure interpretable differences in the shapes of the distributions we computed the first three moments of distributions of f_free for Dataset 8 and compared the values obtained between conditions (see Supplementary information and new Sup. Figure 15). Third, averaging across individuals allows to give each experimental subject the same weight in the comparisons.

      I recommend the authors address two concerns. First, what degree of fluctuation in autofluorescence decay can we expect over time, e.g. over circadian cycles? That would be helpful in evaluating the magnitude of changes following conditioning. And second, if the authors think that metabolism shifts to OXPHOS over glycolosis, are there further genetic manipulations they could make? They test LDH knockdown in gamma KCs, why not knock it down in alpha/beta neurons? The prediction might be that if it prevents the shift to OXPHOS, the shift in f_free distribution in alpha/beta KCs would be attenuated. The extensive library of genetic reagents is an advantage of working with flies, but it comes with a higher standard for corroborating claims.

      In the present study, we used control groups to account for broad fluctuations induced by external factors such as the circadian cycle. We agree with the reviewer that a detailed characterization of circadian variations in the decay parameters would be valuable for assessing the magnitude of conditioning-induced shifts. We have integrated this relevant suggestion in the Discussion. Conducting such an investigation lies unfortunately beyond the scope and means of the current project.

      In line with the suggestion of the reviewer, we have included a new experiment to test the influence of the knockdown of ALAT on the conditioning-induced shift measured in alpha/beta neurons. This choice is motivated in the new manuscript. The obtained result shows that no shift is detected in the mutant flies, in accordance with our hypothesis.

      FLIM as a method is not yet widely prevalent in fly neuroscience, but recent demonstrations of its potential are likely to increase its use. Future efforts will benefit from the description of the properties of the autofluorescence signal to evaluate how autofluorescence may impact measures of FL of genetically engineered indicators.

      Recommendations for the authors

      Reviewer #1 (Recommendations for the authors):

      (1) Y axes in Figures 1e, 2c, 3b,c are misleading. They must start at 0.

      Although we agree that making the Y axes start at 0 is preferable, in our case it makes it difficult to observe the dispersion of the data at the same time (your next suggestion). To make it clearer to the reader that the axes do not start at 0, a broken Y-axis is now displayed in every concerned figure.

      (2) These same plots should have individual data points represented, for increased clarity and transparency.

      Individual data points were added on all boxplots.

      Reviewer #2 (Recommendations for the authors):

      I am evaluating this paper as a fly neuroscientist with experience in neurophysiology, including calcium imaging. I have little experience with FLIM but anticipate its use growing as more microscopes and killer apps are developed. From this perspective, I value the opportunity to dig into FLIM and try to understand this autofluorescence signal. I think the effort to show each piece of the analysis pipeline is valuable. The figures are quite beautiful and easy to follow. My main suggestion is to consider moving some of the supplemental data to the main figures. eLife allows unlimited figures, moving key pieces of the pipeline to the main figures would make for smoother reading and emphasize the technical care taken in this study.

      We thank the reviewer for their feedback. Following their advice we have moved panels from the supplementary figures to the main text (see new Figure 2).

      Unfortunately, the scientific questions and biological data do not rise to the typical standard in the field to support the claims in the title, "In vivo autofluorescence lifetime imaging of the Drosophila brain captures metabolic shifts associated with memory formation". The authors also clearly state what the next steps are: "hypothesis-driven approaches that rely on metabolite-specific sensors" (Intro). The advantage of fly neuroscience is the extensive library of genetic reagents that enable perturbations. The key manipulation in this study is the electric shock conditioning paradigm that subtly shifts the distribution of a parameter fit to an exponential decay in the somas of alpha/beta KCs vs others. This feels like an initial finding that deserves follow-up; but is it a large enough result to motivate a future student to pick this project up? The larger effect appears to be the gradients in f_free across KCs overall (Figure 2b). How does this change with conditioning?

      We acknowledge that the observed metabolic shift is modest relative to the variability of f_free and agree that additional corroborating experiments would further strengthen this result. Nevertheless, we believe it remains a valid and valuable finding that will be of interest to researchers in the field. The reviewer is right in pointing out that the gradient across KCs is higher in magnitude, however, the fact that this technique can also report experience-dependent changes, in addition to innate heterogeneities across different cell types, is a major incentive for people who could be interested in applying NAD(P)H FLIM in the future. For this reason, we consider it appropriate to retain mention of the memory-induced shift in the title, while making it less assertive and adding a reference to the structural heterogeneities of f_free revealed in the study. We have also rephrased the abstract to adopt a more cautious tone and expanded the discussion to clarify why a low-magnitude shift in f_free can still carry biological significance in this context. Finally, we have added the results of a new set of data involving the knockdown of ALAT in Kenyon cells, to further support the relevance of our observation relative to memory formation, despite its small magnitude. We believe that these elements together form a good basis for future investigations and that the manuscript merits publication in its present form.

      Together, I would recommend reshaping the paper as a methods paper that asks the question, what are the spatial properties of NADPH FL across the brain? The importance of this question is clear in the context of other work on energy metabolism in the MBs. 2P FLIM will likely always have to account for autofluorescence, so this will be of interest. The careful technical work that is the strength of the manuscript could be featured, and whether conditioning shifts f_free could be a curio that might entice future work.

      By transferring panels of the supplementary figures to the main text (see new Figure 2) as suggested by Reviewer 2, we have reinforced the methodological part of the manuscript. For the reasons explained above, we however still mention the ‘biological’ findings in the title and abstract.

      Minor recommendations on science:

      Figure 2C. Plotting either individual data points or distributions would be more convincing.

      Individual data points were added on all boxplots.

      There are a few mentions of glia. What are the authors' expectations for metabolic pathways in glia vs. neurons? Are glia expected to use one more than the other? The work by Rabah suggests it should be different and perhaps complementary to neurons. Can a glial marker be used in addition to KC markers? This seems crucial to being able to distinguish metabolic changes in KC somata from those in glia.

      Drosophila cortex glia are thought to play a similar role as astrocytes in vertebrates (see Introduction). In that perspective, we expect cortex glia to display a higher level of glycolysis than neurons. The work by Rabah et al. is coherent with this hypothesis. Reviewer 2 is right in pointing out that using a glial marker would be interesting. However, current technical limitations make such experiments challenging. These limitations are now exposed in the discussion.

      The question of whether KC somata positions are stereotyped can probably be answered in other ways as well. For example, the KCs are in the FAFB connectomic data set and the hemibrain. How do the somata positions compare?

      The reviewer’s suggestion is indeed interesting. However, the FAFB and hemibrain connectomic datasets are based on only two individual flies, which probably limits their suitability for assessing the stereotypy of KC subtype distributions. In addition, aligning our data with the FAFB dataset would represent substantial additional work.

      The free parameter tau_bound is mysterious if it can be influenced by the identity of the protein. Are there candidate NADPH binding partners that have a spatial distribution in confocal images that could explain the difference between somas and calyx?

      There are indeed dozens of NADH- or NADPH-binding proteins. For this reason, in all studies implementing exponential fitting of metabolic FLIM data, tau_bound is considered a complex combination of the contributions from many different proteins. In addition, one should keep in mind that the number of cell types contributing to the autofluorescence signal in the mushroom body calyx (Kenyon cells, astrocyte-like and ensheathing glia, APL neurons, olfactory projection neurons, dopamine neurons) is much higher than in the somas (only Kenyon cells and cortex glia). This could also participate in the observed difference. Hence, focusing on intracellular heterogeneities of potential NAD(P)H binding partners seems premature at that stage.

      The phrase "noticeable but not statistically significant" is misleading.

      We agree with the reviewer and have removed “noticeable but” from the sentence in the new version of the manuscript.

      Minor recommendations on presentation:

      The Introduction can be streamlined.

      We agree that some parts of the Introduction can seem a bit long for experts of a particular field. However, we think that this level of detail makes the article easily accessible for neuroscientists working on Drosophila and other animal models but not necessarily with FLIM, as well as for experts in energy metabolism that may be familiar with FLIM but not with Drosophila neuroscience.

    1. eLife Assessment

      This study provides a useful application of computational modelling to examine how people with chronic pain learn under uncertainty, contributing to efforts to link pain with motivational processes. However, the evidence supporting the main claims is incomplete, as the modelling differences are not reflected in observable behaviour or pain measures, and the interpretation extends beyond what the data can substantiate. The conclusions would benefit from a clearer explanation of the behavioural differences that underlie the computational findings.

    2. Reviewer #1 (Public review):

      Summary:

      This study investigates how individuals with chronic temporomandibular disorder (TMD) learn from uncertain rewards, using a probabilistic three-armed bandit task and computational modelling. The authors aim to identify whether people living with chronic pain show altered learning under uncertainty and how such differences might relate to psychological symptoms.

      Strengths:

      The work addresses an important question about how chronic pain may influence cognition and motivation. The task design is appropriate for probing adaptive learning, and the modelling approach is novel. The findings of altered uncertainty updating in the TMD group are interesting.

      Weaknesses:

      Several aspects of the paper limit the strength of the conclusions. The group differences appear only in model-derived parameters, with no corresponding behavioural differences in task performance. Model parameters do not correlate with pain severity, making the proposed mechanistic link between pain and learning speculative. Some of the interpretations extend beyond what the data can directly support.

    3. Reviewer #2 (Public review):

      Summary:

      In this paper, the authors report on a case-control study in which participants with chronic pain (TMD) were compared to controls on performance of a three-option learning task. The authors find no difference in task behavior, but fit a model to this behavior and suggest that differences in the model-derived metrics (specifically, change in learning rate/estimated volatility/model estimated uncertainty) reveal a relevant between-group effect. They report a mediation effect suggesting that group differences on self-report apathy may be partially mediated by this uncertainty adaptation result.

      Strengths:

      The role of sensitivity to uncertainty in pathological states is an interesting question and is the focus of a reasonable amount of research at present. This paper provides a useful assessment of these processes in people with chronic pain.

      Weaknesses:

      (1) The interpretation of the model in the absence of any apparent behavioral effect is not convincing. The model is quite complex with a number of free parameters (what these parameters are is not well explained in the methods, although they seem to be presented in the supplement). These parameters are fitted to participant choice behavior - that is, they explain some sort of group difference in this choice behavior. The authors haven't been able to demonstrate what this difference is. The graphs of learning rate per group (Figure 2) suggest that the control group has a higher initial learning rate and a lower later learning rate. If this were actually the case, you would expect to see it reflected in the choice data (the control group should show higher lose-shift behavior earlier on, with this then declining over time, and the TMD group should show no change). This behavior is not apparent. The absence of a clear effect on behavior suggests that the model results are more likely to be spurious.

      (2) As far as I could see, the actual parameters of the model are not reported. The results (Figure 2) illustrate the trial-level model estimated uncertainty/learning rate, etc, but these differ because the fitted model parameters differ. The graphs look like there are substantial differences in v0 (which was not well recovered), but presumably lambda, at least, also differs. The mean(SD) group values for these parameters should be reported, as should the correlations between them (it looks very much like they will be correlated).

      (3) The task used seems ill-suited to measuring the reported process. The authors report the performance of a restless bandit task and find an effect on uncertainty adaptation. The task does not manipulate uncertainty (there are no periods of high/low uncertainty) and so the only adaptation that occurs in the task is the change from what appears to be the participants' prior beliefs about uncertainty (which appear to be very different between groups - i.e. the lines in Figure 2a,b,c are very different at trial 0). If the authors are interested in measuring adaptation to uncertainty, it would clearly be more useful to present participants with periods of higher or lower uncertainty.

      (4) The main factor driving the better fit of the authors' preferred model over listed alternatives seems to be the inclusion of an additive uncertainty term in the softmax-this differentiates the chosen model from the other two Kalman filter-based models that perform less well. But a similar term is not included in the RW models-given the uncertainty of a binary outcome can be estimated as p(1-p), and the RW models are estimating p, this would seem relatively straightforward to do. It would be useful to know if the factor that actually drives better model fit is indeed in the decision stage (rather than the learning stage).

    4. Reviewer #3 (Public review):

      This paper applies a computational model to behavior in a probabilistic operant reward learning task (a 3-armed bandit) to uncover differences between individuals with temporomandibular disorder (TMD) compared with healthy controls. Integrating computational principles and models into pain research is an important direction, and the findings here suggest that TMD is associated with subtle changes in how uncertainty is represented over time as individuals learn to make choices that maximize reward. There are a number of strengths, including the comparison of a volatile Kalman filter (vKF) model to some standard base models (Rescorla Wagner with 1 or 2 learning rates) and parameter recovery analyses suggesting that the combination of task and vKF model may be able to capture some properties of learning and decision-making under uncertainty that may be altered in those suffering from chronic pain-related conditions.

      I've focused my comments in four areas: (1) Questions about the patient population, (2) Questions about what the findings here mean in terms of underlying cognitive/motivational processes, (3) Questions about the broader implications for understanding individuals with TMD and other chronic pain-related disorders, and (4) Technical questions about the models and results.

      (1) Patient population

      This is a computational modelling study, so it is light on characterization of the population, but the patient characteristics could matter. The paper suggests they were hospitalized, but this is not a condition that requires hospitalization per se. It would be helpful to connect and compare the patient characteristics with large-scale studies of TMD, such as the OPPERA study led by Maixner, Fillingim, and Slade.

      (2) What cognitive/motivational processes are altered in TMD

      The study finds a pattern of alterations in TMD patients that seems clear in Figure 2. Healthy controls (HC) start the task with high estimates of volatility, uncertainty, and learning rate, which drop over the course of the task session. This is consistent with a learner that is initially uncertain about the structure of the environment (i.e., which options are rewarded and how the contingencies change over time) but learns that there is a fixed or slowly changing mean and stationary variance. The TMD patients start off with much lower volatility, uncertainty, and learning rate - which are actually all near 0 - and they remain stable over the course of learning. This is consistent with a learner who believes they know the structure of the environment and ignores new information.

      What is surprising is that this pattern of changes over time was found in spite of null group differences in a number of aspects of performance: (1) stay rate, (2) switch rate, (3) win-stay/lose-switch behaviors, (4) overall performance (corrected for chance level), (5) response times, (6) autocorrelation, (7) correlations between participants' choice probability and each option's average reward rate, (7) choice consistency (though how operationalized is not described?), (8) win-stay-lose-shift patterns over time. I'm curious about how the patterns in Figure 2 would emerge if standard aspects of performance are essentially similar across groups (though the study cannot provide evidence in favor of the null). It will be important to replicate these patterns in larger, independent samples with preregistered analyses.

      The authors believe that this pattern of findings reveals that TMD patients "maintain a chronically heightened sensitivity to environmental changes" and relate the findings to predictive processing, a hallmark of which (in its simplest form) is precision-weighted updating of priors. They also state that the findings are not related to reduced overall attentiveness or failure to understand the task, but describe them as deficits or impairments in calibrating uncertainty.

      The pattern of differences could, in fact, result from differences in prior beliefs, conceptualization of the task, or learning. Unpacking these will be important steps for future work, along with direct measures of priors, cognitive processes during learning, and precision-weighted updating.

      (3) Implications for understanding chronic pain

      If the findings and conclusions of the paper are correct, individuals with TMD and perhaps other pain-related disorders may have fundamental alterations in the ways in which they make decisions about even simple monetary rewards. The broader questions for the field concern (1) how generalizable such alterations are across tasks, (2) how generalizable they are across patient groups and, conversely, how specific they are to TMD or chronic pain, (3) whether they are the result of neurological dysfunction, as opposed to (e.g.) adaptive strategies or assumptions about the environment/task structure.

      It will be important to understand which features of patients' and/or controls' cognition are driving the changes. For example, could the performance differences observed here be attributable to a reduced or altered understanding of the task instructions, more uncertainty about the rules of the game, different assumptions about environments (i.e., that they are more volatile/uncertain or less so), or reduced attention or interest in optimizing performance? Are the controls OVERconfident in their understanding of the environment?

      This set of questions will not be easy to answer and will be the work of many groups for many years to come. It is a judgment call how far any one paper must go to address them, but my view is that it is a collaborative effort. Start with a finding, replicate it across labs, take the replicable phenomena and work to unpack the underlying questions. The field must determine whether it is this particular task with this model that produces case-control differences (and why), or whether the findings generalize broadly. Would we see the same findings for monetary losses, sounds, and social rewards? Tasks with painful stimuli instead of rewards?

      Another set of questions concerns the space of computational models tested, and whether their parameters are identifiable. An alteration in estimated volatility or learning rate, for example, can come from multiple sources. In one model, it might appear as a learning rate change and in another as a confirmation bias. It would be interesting in this regard to compare the "mechanisms" (parameters) of other models used in pain neuroscience, e.g., models by Seymour, Mancini, Jepma, Petzschner, Smith, Chen, and others (just to name a few).

      One immediate next step here could be to formally compare the performance of both patients and controls to normatively optimal models of performance (e.g., Bayes optimal models under different assumptions). This could also help us understand whether the differences in patients reflect deficits and what further experiments we would need to pin that down.<br /> In addition, the volatility parameter in the computational model correlated with apathy. This is interesting. Is there a way to distinguish apathy as a particular clinical characteristic and feature of TMD from apathy in the sense of general disinterest in optimal performance that may characterize many groups?

      If we know this, what actionable steps does it lead us to take? Could we take steps to reduce apathy and thus help TMD patients better calibrate to environmental uncertainty in their lives? Or take steps to recalibrate uncertainty (i.e., increase uncertainty adaptation), with benefits on apathy? A hallmark of a finding that the field can build off of is the questions it raises.

      (4) Technical questions about the models and results

      Clarification of some technical points would help interpret the paper and findings further:

      (a) Was the reward probability truly random? Was the random walk different for each person, or constrained?

      (b) When were self-report measures administered, and how?

      (c) Pain assessments: What types of pain? Was a body map assessed? Widespreadness? Pain at the time of the test, or pain in general?

      (d) Parameter recovery: As you point out, r = 0.47 seems very low for recovery of the true quantity, but this depends on noise levels and on how the parameter space is sampled. Is this noise-free recovery, and is it robust to noise? Are the examples of true parameters drawn from the space of participants, or do they otherwise systematically sample the space of true parameters?

      (e) What are the covariances across parameter estimates and resultant confusability of parameter estimates (e.g., confusion matrix)?

      (f) It would be helpful to have a direct statistical comparison of controls and TMD on model parameter estimates.

      (g) Null statistical findings on differences in correlations should not be interpreted as a lack of a true effect. Bayes Factors could help, but an analysis of them will show that hundreds of people are needed before it is possible to say there are no differences with reasonable certainty. Some journals enforce rules around the kinds of language used to describe null statistical findings, and I think it would be helpful to adopt them more broadly.

      (h) What is normatively optimal in this task? Are TMD patients less so, or not? The paper states "aberrant precision (uncertainty) weighting and misestimation of environmental volatility". But: are they misestimates?

      (i) It's not clear how well the choice of prior variance for all parameters (6.25) is informed by previous research, as sensible values may be task- and context-dependent. Are the main findings robust to how priors are specified in the HBI model?

    1. eLife Assessment

      This manuscript proposes a lateralized, lobe-specific brain-liver sympathetic neurocircuit regulating hepatic glucose metabolism and presents anatomical evidence for sympathetic crossover at the porta hepatis using viral tracing and neuromodulation approaches. While the topic is of important significance and the methodologies are, in principle, state-of-the-art, significant concerns regarding experimental design, incomplete methodological reporting, sparse and ambiguous labeling, and overi-nterpretation of the data substantially weaken support for the study's central conclusions, thereby limiting the study's completeness. The work will be of interest to biologists, clinicians, and physiologists.

    2. Reviewer #1 (Public review):

      Summary:

      This manuscript by Wang et al. reports the potential involvement of an asymmetric neurocircuit in the sympathetic control of liver glucose metabolism.

      Strengths:

      The concept that the contralateral brain-liver neurocircuit preferentially regulates each liver lobe may be interesting.

      Weaknesses:

      However, the experimental evidence presented did not support the study's central conclusion.

      (1) Pseudorabies virus (PRV) tracing experiment:<br /> The liver not only possesses sympathetic innervations but also vagal sensory innervations. The experimental setup failed to distinguish whether the PRV-labeling of LPGi (Lateral Paragigantocellular Nucleus) is derived from sympathetic or vagal sensory inputs to the liver.

      (2) Impact on pancreas:<br /> The celiac ganglia not only provide sympathetic innervations to the liver but also to the pancreas, the central endocrine organ for glucose metabolism. The chemogenetic manipulation of LPGi failed to consider a direct impact on the secretion of insulin and glucagon from the pancreas.

      (3) Neuroanatomy of the brain-liver neurocircuit:<br /> The current study and its conclusion are based on a speculative brain-liver sympathetic circuit without the necessary anatomical information downstream of LPGi.

      (4) Local manipulation of the celiac ganglia:<br /> The left and right ganglia of mice are not separate from each other but rather anatomically connected. The claim that the local injection of AAV in the left or right ganglion without affecting the other side is against this basic anatomical feature.

    3. Reviewer #2 (Public review):

      Summary:

      The manuscript by Wang and colleagues aims to determine whether the left and right LPGi differentially regulate hepatic glucose metabolism and to reveal decussation of hepatic sympathetic nerves.

      The authors used tissue clearing to identify sympathetic fibers in the liver lobes, then injected PRV into the hepatic lobes. Five days post-injection, PRV-labeled neurons in the LPGi were identified. The results indicated contralateral dominance of premotor neurons and partial innervation of more than one lobe. Then the authors activated each side of the LPGi, resulting in a greater increase in blood glucose levels after right-sided activation than after left-sided activation, as well as changes in protein expression in the liver lobes. These data suggested modulation of HGP (hepatic glucose production) in a lobe-specific manner. Chemical denervation of a particular lobe did not affect glucose levels due to compensation by the other lobes. In addition, nerve bundles decussate in the hepatic portal region.

      Strengths:

      The manuscript is timely and relevant. It is important to understand the sympathetic regulation of the liver and the contribution of each lobe to hepatic glucose production. The authors use state-of-the-art methodology.

      Weaknesses:

      (1) The wording/terminology used in the manuscript is misleading, and it is not used in the proper context. For instance, the goal of the study is "to investigate whether cerebral hemispheres differentially regulate hepatic glucose metabolism..." (see abstract); however, the authors focus on the brainstem (a single structure without hemispheres). Similarly, symmetric is not the best word for the projections.

      (2) Sparse labeling of liver-related neurons was shown in the LPGi (Figure 1). It would be ideal to have lower magnification images to show the area. Higher quality images would be necessary, as it is difficult to identify brainstem areas. The low number of labeled neurons in the LPGi after five days of inoculation is surprising. Previous findings showed extensive labeling in the ventral brainstem at four days post-inoculation (Desmoulins et al., 2025). Unfortunately, it is not possible to compare the injection paradigm/methods because the PRV inoculation is missing from the methods section. If the PRV is different from the previously published viral tracers, time-dependent studies to determine the order of neurons and the time course of infection would be necessary.

      (3) Not all LPGi cells are liver-related. Was the entire LPGi population stimulated, or was it done in a cell-type-specific manner? What was the strain, sex, and age of the mice? What was the rationale for using the particular viral constructs?

      (4) The authors should consider the effect of stimulation of double-labeled neurons (innervating more than one lobe) and potential confounding effects regarding other physiological functions.

      (5) The authors state that "central projections directly descend along the sympathetic chain to the celiac-superior mesenteric ganglia". What they mean is unclear. Do the authors refer to pre-ganglionic neurons or premotor neurons? How does it fit with the previous literature?

      (6) How was the chemical denervation completed for the individual lobes?

      (7) The Western Blot images look like they are from different blots, but there are no details provided regarding protein amount (loading) or housekeeping. What was the reason to switch beta-actin and alpha-tubulin? In Figures 3F -G, the GS expression is not a good representative image. Were chemiluminescence or fluorescence antibodies used? Were the membranes reused?

      (8) Key references using PRV for liver innervation studies are missing (Stanley et al, 2010 [PMID: 20351287]; Torres et al., 2021 [PMID: 34231420]; Desmoulins et al., 2025 [PMID: 39647176]).

    4. Reviewer #3 (Public review):

      Summary:

      This study found a lobe-specific, lateralized control of hepatic glucose metabolism by the brain and provides anatomical evidence for sympathetic crossover at the porta hepatis. The findings are particularly insightful to the researchers in the field of liver metabolism, regeneration, and tumors.

      Strengths:

      Increasing evidence suggests spatial heterogeneity of the liver across many aspects of metabolism and regenerative capacity. The current study has provided interesting findings: neuronal innervation of the liver also shows anatomical differences across lobes. The findings could be particularly useful for understanding liver pathophysiology and treatment, such as metabolic interventions or transplantation.

      Weaknesses:

      Inclusion of detailed method and Discussion:

      (1) The quantitative results of PRV-labeled neurons are presented, and please include the specific quantitative methods.

      (2) The Discussion can be expanded to include potential biological advantages of this complex lateralized innervation pattern.

    5. Reviewer #4 (Public review):

      Summary:

      The studies here are highly informative in terms of anatomical tracing and sympathetic nerve function in the liver related to glucose levels, but given that they are performed in a single species, it is challenging to translated them to humans, or to determine whether these neural circuits are evolutionarily conserved. Dual-labeling anatomical studies are elegant, and the addition of chemogenetic and optogenetic studies is mechanistically informative. Denervation studies lack appropriate controls, and the role of sensory innervation in the liver is overlooked.

      Specific Weaknesses - Major:

      (1) The species name should be included in the title.

      (2) Tyrosine hydroxylase was used to mark sympathetic fibers in the liver, but this marker also hits a portion of sensory fibers that need to be ruled out in whole-mount imaging data

      (3) Chemogenetic and optogenetic data demonstrating hyperglycemia should be described in the context of prior work demonstrating liver nerve involvement in these processes. There is only a brief mention in the Discussion currently, but comparing methods and observations would be helpful.

      (4) Sympathetic denervation with 6-OHDA can drive compensatory increases to tissue sensory innervation, and this should be measured in the liver denervation studies to implicate potential crosstalk, especially given the increase in LPGi cFOS that may be due to afferent nerve activity. Compensatory sympathetic drive may not be the only culprit, though it is clearly assumed to be. The sensory or parasympathetic/vagal innervation of the liver is altogether ignored in this paper and could be better described in general.

    6. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This manuscript by Wang et al. reports the potential involvement of an asymmetric neurocircuit in the sympathetic control of liver glucose metabolism.

      Strengths:

      The concept that the contralateral brain-liver neurocircuit preferentially regulates each liver lobe may be interesting.

      Weaknesses:

      However, the experimental evidence presented did not support the study's central conclusion.

      We sincerely thank the reviewer for recognizing the conceptual novelty of our work and for constructive comments aimed at enhancing its rigor and clarity. In response, we will carry out targeted experiments to address the points raised, including: (i) further characterization of LPGi projections to vagal and sympathetic circuits; (ii) evaluation of potential pancreatic involvement; and (ii) validation of the specificity of chemogenetic activation within the proposed circuit. We anticipate completing the revised version within 8 weeks.

      (1) Pseudorabies virus (PRV) tracing experiment:

      The liver not only possesses sympathetic innervations but also vagal sensory innervations. The experimental setup failed to distinguish whether the PRV-labeling of LPGi (Lateral Paragigantocellular Nucleus) is derived from sympathetic or vagal sensory inputs to the liver.

      Thank you for raising this important point. We fully agree that the liver receives both sympathetic and vagal sensory innervation, and we acknowledge that PRV-based tracing alone does not definitively distinguish between these two pathways. This represents a limitation of the original experimental design.

      Based on established anatomical literature as well as our experimental observations, vagal sensory neuron cell bodies reside in the nodose ganglion (NG), and their central projections terminate predominantly in the nucleus of the solitary tract (NTS) (Nature. 2023;623(7986):387-396; Curr Biol. 2020;30(20):3986-3998.e5.), which is located in the dorsomedial medulla. In contrast, the LPGi, together with other sympathetic-related nuclei, is predominantly distributed in the ventral medulla (Cell Metab. 2025;37(11):2264-2279.e10; Nat Commun. 2022;13(1):5079.).

      To directly assess the contribution of vagal sensory pathways, we will perform an additional PRV tracing experiment using two groups of mice: one with bilateral nodose ganglion (NG) removal and a sham-operated control group. Identical PRV injections will be delivered to the liver in both groups, and PRV labeling in the LPGi will be quantitatively compared. Preservation of LPGi labeling following NG ablation would indicate that PRV transmission occurs primarily via sympathetic, rather than vagal sensory, pathways. These data will be incorporated into the revised manuscript and are expected to be completed within 3 weeks.

      (2) Impact on pancreas:

      The celiac ganglia not only provide sympathetic innervations to the liver but also to the pancreas, the central endocrine organ for glucose metabolism. The chemogenetic manipulation of LPGi failed to consider a direct impact on the secretion of insulin and glucagon from the pancreas.

      Thank you for this important comment. We agree that the celiac ganglia (CG) provide sympathetic innervation not only to the liver but also to the pancreas, which plays a central role in glucose homeostasis through the secretion of both insulin and glucagon. Therefore, the potential pancreatic implications associated with LPGi chemogenetic manipulation worth careful consideration.

      To address this concern, we examined circulating glucagon levels following chemogenetic manipulation of the LPGi. As shown in the Supplementary Figure below, plasma glucagon (GCG) concentrations were not significantly altered at 30, 60, 90, or 120 minutes compared with control mice (n = 6), indicating that LPGi manipulation does not measurably affect glucagon secretion under our experimental conditions.

      We acknowledge that insulin secretion was not assessed in the study, which represents an important limitation given the pancreatic innervation of the CG. To further strengthen our interpretation, we are performing additional experiments in newly prepared mice to measure circulating insulin levels following LPGi manipulation. These data together with Author response image 1 below will be included in the revised manuscript upon completion.

      Author response image 1.

      Plasma concentrations of GCG in mice following LPGi GABAergic neurons activation.

      (3) Neuroanatomy of the brain-liver neurocircuit:<br /> The current study and its conclusion are based on a speculative brain-liver sympathetic circuit without the necessary anatomical information downstream of LPGi.

      Thank you for raising this important point. A clear anatomical definition of the downstream pathways linking the brain to the liver is essential for interpreting the proposed brain-liver sympathetic circuit.

      However, the present study (Figure 4A) provides direct anatomical evidence supporting the organization of the brain–liver sympathetic neurocircuit. These observations are consistent with our recent detailed characterization of the brain-liver sympathetic circuit published in Cell Metabolism (Cell Metab. 2025;37(11):2264–2279), LPGi GABAergic neurons inhibit GABAergic neurons in the caudal ventrolateral medulla (CVLM). Disinhibition of CVLM reduces GABAergic suppression of rostral ventrolateral medulla (RVLM) neurons, which are key excitatory drivers of sympathetic tone. RVLM neurons project to sympathetic preganglionic neurons in the sympathetic chain (Syc). These neurons synapse with postganglionic sympathetic neurons in ganglia such as the celiac-superior mesenteric ganglion (CG-SMG). Postganglionic sympathetic fibers then innervate the liver, releasing NE to activate hepatic β<sub>2</sub>-adrenergic receptors and stimulate HGP.

      Together, these data establish a coherent anatomical basis for the proposed brain-liver sympathetic pathway and clarify the downstream organization relevant to the functional experiments presented here.

      Author response image 2.

      Tracing scheme (Left) and whole-mount imaging (Right) of PRV-labeled brain-liver neurocircuit. Scale bars, 3,000 (whole mount) or 1,000 (optical sections) μm.

      (4) Local manipulation of the celiac ganglia:<br /> The left and right ganglia of mice are not separate from each other but rather anatomically connected. The claim that the local injection of AAV in the left or right ganglion without affecting the other side is against this basic anatomical feature.

      Thank you for raising this important anatomical point. We fully acknowledge that the left and right celiac ganglia (CG) in mice are interconnected, and that unilateral viral injection could theoretically affect the contralateral side. The celiac–superior mesenteric ganglion (CG-SMG) complex serves as a major sympathetic hub that regulates visceral organ functions. Recent transcriptomic, anatomical, and functional studies have revealed that the CG-SMG is not a homogeneous structure but is composed of molecularly and functionally distinct neuronal populations. These populations exhibit specialized projection patterns and regulate different aspects of gastrointestinal physiology, supporting a model of modular sympathetic control. (Nature. 2025 Jan;637(8047):895-902). Therefore, we were aware of this phenomenon during the initial stages of these experiments.

      To minimize unintended spread to the contralateral CG, we took two complementary approaches.

      First, we optimized the injection strategy by using an extremely small injection volume (100 nL per site), with a very slow infusion rate (50 nL/min), and fine glass micropipettes. With these refinements, contralateral viral spread was rarely observed.

      Second, and importantly, all animals included in the final analyses were subjected to post hoc anatomical verification. After completion of the experiments, CG were collected, sectioned, and examined for viral expression. As shown in Supplementary Figure 5F, only mice in which viral expression was strictly confined to the targeted CG, with no detectable infection in the contralateral ganglion, were included in the presented data.

      Together, these measures ensure that the reported effects are attributable to local manipulation of the intended CG. We will ensure that the Methods section more explicitly details these technical precautions and that the legend for Figure S5F clearly states its role in validating injection specificity.

      Reviewer #2 (Public review):

      Summary:

      The manuscript by Wang and colleagues aims to determine whether the left and right LPGi differentially regulate hepatic glucose metabolism and to reveal decussation of hepatic sympathetic nerves.

      The authors used tissue clearing to identify sympathetic fibers in the liver lobes, then injected PRV into the hepatic lobes. Five days post-injection, PRV-labeled neurons in the LPGi were identified. The results indicated contralateral dominance of premotor neurons and partial innervation of more than one lobe. Then the authors activated each side of the LPGi, resulting in a greater increase in blood glucose levels after right-sided activation than after left-sided activation, as well as changes in protein expression in the liver lobes. These data suggested modulation of HGP (hepatic glucose production) in a lobe-specific manner. Chemical denervation of a particular lobe did not affect glucose levels due to compensation by the other lobes. In addition, nerve bundles decussate in the hepatic portal region.

      We thank the reviewer for the thorough and constructive evaluation of our manuscript. In direct response, we will undertake comprehensive revisions to enhance the rigor and clarity of the study, including: (i) correcting ambiguous or misleading terminology pertaining to anatomical resolution and sympathetic circuit organization; (ii) expanding the Methods section with complete experimental details, improved image presentation, and explicit justification of our viral and genetic approaches; and (iii) strengthening data interpretation by addressing issues related to sparse PRV labeling, projection heterogeneity, and the functional implications of double-labeled neurons. All revisions are expected to be completed within 8 weeks.

      Strengths:

      The manuscript is timely and relevant. It is important to understand the sympathetic regulation of the liver and the contribution of each lobe to hepatic glucose production. The authors use state-of-the-art methodology.

      Weaknesses:

      (1) The wording/terminology used in the manuscript is misleading, and it is not used in the proper context. For instance, the goal of the study is "to investigate whether cerebral hemispheres differentially regulate hepatic glucose metabolism..." (see abstract); however, the authors focus on the brainstem (a single structure without hemispheres). Similarly, symmetric is not the best word for the projections.

      We thank the reviewer for raising these critical points regarding terminology and conceptual framing. We acknowledge that certain phrases in our original manuscript may have been overly broad or ambiguous, particularly in describing the scope of sympathetic heterogeneity and the specificity of neural projections. Due to practical constraints and the scope of our study, our investigation is focused on the brainstem, which represents the final common pathway for these lateralized commands. We acknowledge that terms referring to the cerebral hemispheres do not accurately describe our study.

      We are revising the manuscript to ensure accurate and consistent terminology and will submit the revised version with these corrections.

      (2) Sparse labeling of liver-related neurons was shown in the LPGi (Figure 1). It would be ideal to have lower magnification images to show the area. Higher quality images would be necessary, as it is difficult to identify brainstem areas. The low number of labeled neurons in the LPGi after five days of inoculation is surprising. Previous findings showed extensive labeling in the ventral brainstem at four days post-inoculation (Desmoulins et al., 2025). Unfortunately, it is not possible to compare the injection paradigm/methods because the PRV inoculation is missing from the methods section. If the PRV is different from the previously published viral tracers, time-dependent studies to determine the order of neurons and the time course of infection would be necessary.

      We sincerely thank the reviewer for these detailed and constructive comments regarding the PRV tracing experiments. We fully agree that careful presentation and interpretation of the anatomical data are essential for ensuring rigor and transparency. We address each point in detail below.

      (1) Image magnification and anatomical context of LPGi labeling

      We agree that the original images did not sufficiently convey the broader anatomical context of the LPGi. In the revised manuscript, we will replace the original panels in Figure 1 with new images that include lower-magnification overviews of the brainstem, alongside higher-magnification views of the LPGi. These images clearly delineate the LPGi with respect to established anatomical landmarks and atlas boundaries. Image contrast and resolution will also be optimized to allow unambiguous identification of PRV-labeled neurons and surrounding structures.

      (2) Sparse LPGi labeling at 5 days post-injection and methodological details

      We apologize for the omission of the detailed PRV injection protocol in the original Methods section. We deliberately used small-volume, focal injections (1 µL per liver lobe) to minimize viral spread and to restrict labeling to circuits specifically connected to the targeted hepatic region. Under these conditions, early-stage or intermediate-order upstream nuclei such as the LPGi are expected to exhibit relatively sparse labeling compared to more proximal autonomic nuclei. This information will add, including the PRV strain, viral titer, injection volume, precise injection coordinates, and surgical procedures.

      (3) Not all LPGi cells are liver-related. Was the entire LPGi population stimulated, or was it done in a cell-type-specific manner? What was the strain, sex, and age of the mice? What was the rationale for using the particular viral constructs?

      We thank the reviewer for this insightful and important question. We agree that not all neurons within the LPGi are liver-related, and we apologize that our rationale was not clearly articulated in the original manuscript.

      (1) Our decision to target GABAergic neurons in the LPGi using Gad1-Cre mice was based on prior experimental evidence rather than an assumption about the entire LPGi population. In our previous study (Cell Metab. 2025;37(11):2264-2279.e10), we performed single-cell RNA sequencing on retrogradely labeled LPGi neurons following liver tracing. These analyses revealed that the majority of liver-projecting LPGi neurons are GABAergic in nature. Based on these findings, we chose to selectively manipulate GABAergic neurons in the LPGi rather than the entire LPGi neuronal population, in order to achieve greater cellular specificity and to minimize potential confounding effects arising from heterogeneous neuron types within this region. We regret that this rationale was not clearly described in the original submission and have now revised the manuscript to explicitly state this reasoning.

      (2) In addition, we apologize for the omission of mouse strain, sex, and age information in the Methods section. These details will be fully added.

      (3) We selected AAV-based viral vectors, specifically the AAV9 serotype, due to their well-established efficiency in transducing neurons in the brainstem, relatively low toxicity, and widespread use in circuit-level chemogenetic and optogenetic studies. When combined with Cre-dependent viral constructs in Gad1-Cre mice, this approach enabled selective and reliable manipulation of LPGi GABAergic neurons.

      (4) The authors should consider the effect of stimulation of double-labeled neurons (innervating more than one lobe) and potential confounding effects regarding other physiological functions.

      We thank the reviewer for raising this important point. We agree that neurons innervating more than one liver lobe could, in principle, introduce potential confounding effects and may reflect higher-order integrative autonomic neurons.

      This consideration is consistent with a key finding of the cited study: the celiac-superior mesenteric ganglion (CG-SMG) contains molecularly distinct sympathetic neuron populations (e.g., RXFP1<sup>+</sup> vs. SHOX2<sup>+</sup>) that exhibit complementary organ projections and separate, non‑overlapping functions. Specifically, RXFP1<sup>+</sup> neurons innervate secretory organs (pancreas, bile duct) to regulate secretion, while SHOX2<sup>+</sup> neurons innervate the gastrointestinal tract to control motility. This functional segregation supports the concept of specialized autonomic modules rather than a uniform,“fight or flight”response, reinforcing the need for careful interpretation of circuit-specific manipulations. (Nature. 2025;637(8047):895-902; Neuron. Published online December 10, 2025).

      In our PRV tracing experiments, the proportion of double-labeled neurons was relatively small, suggesting that the majority of labeled LPGi neurons preferentially associate with individual hepatic lobes. Nevertheless, we recognize that activation of this minority population could contribute to broader physiological effects beyond strictly lobe-specific regulation. We acknowledge that the absence of single-cell-level resolution in the current study limits our ability to further dissect the functional heterogeneity of these projection-defined neurons, and we will explicitly state this as a limitation in the revised manuscript. We will explicitly acknowledge this possibility in the revised manuscript and included it as a limitation of the current study. We thank the reviewer for highlighting this important conceptual consideration.

      (5) The authors state that "central projections directly descend along the sympathetic chain to the celiac-superior mesenteric ganglia". What they mean is unclear. Do the authors refer to pre-ganglionic neurons or premotor neurons? How does it fit with the previous literature?

      We thank the reviewer for pointing out this imprecise wording. We agree that the original phrasing was anatomically inaccurate and potentially confusing. The pathways we intended to describe involve brainstem premotor neurons that project to sympathetic preganglionic neurons in the spinal cord. These preganglionic neurons then innervate neurons in the celiac–superior mesenteric ganglia, which in turn provide postganglionic input to the liver.

      We are revising the manuscript to clearly distinguish premotor from preganglionic neurons and to describe this pathway in a manner consistent with the established organization of sympathetic autonomic circuits reported in the previous literature. The revised wording will explicitly reflect this hierarchical relay structure.

      (6) How was the chemical denervation completed for the individual lobes?

      We thank the reviewer for raising this important methodological concern. We agree that potential diffusion of 6-OHDA is a critical issue when performing lobe-specific chemical denervation, and we apologize that our original description did not sufficiently clarify how this was controlled.

      In the revised Methods section, we will provide a detailed description of the denervation procedure, including the injection volume and concentration of 6-OHDA, as well as the physical separation and isolation of individual hepatic lobes during application to minimize diffusion to adjacent tissue.

      To directly assess the specificity of the chemical denervation, we included immunofluorescence and Western blot analyses demonstrating a selective reduction of sympathetic markers in the targeted lobe, with minimal effects on non-targeted lobes. These results support the effectiveness and relative spatial confinement of the 6-OHDA treatment under our experimental conditions.

      We thank the reviewer for highlighting this point, which has helped us improve both the clarity and rigor of the manuscript.

      (7) The Western Blot images look like they are from different blots, but there are no details provided regarding protein amount (loading) or housekeeping. What was the reason to switch beta-actin and alpha-tubulin? In Figures 3F -G, the GS expression is not a good representative image. Were chemiluminescence or fluorescence antibodies used? Were the membranes reused?

      We thank the reviewer for this careful and detailed evaluation of the Western blot data. We apologize that insufficient methodological detail was provided in the original submission.

      (1) We would like to clarify that the protein bands shown within each panel were derived from the same membrane. To improve transparency, we will provide full, uncropped images of the corresponding membranes in the supplementary materials. In addition, detailed information regarding protein loading amounts, gel conditions, and housekeeping controls will be added to the Methods section.

      (2) The use of different loading controls (β-actin or α-tubulin) reflects a technical consideration rather than an experimental inconsistency. In our experiments, the molecular weight of the TH (62kDa) was too close to α-tubulin (55kDa), and β-actin (42kDa) was therefore used to avoid band overlap and to ensure accurate quantification.

      (3) Regarding the GS signal shown in Figures 3F–G, we agree that the original representative image was suboptimal. This appears to be related to antibody performance rather than sample quality. To address this, we are repeating the GS Western blot using a newly validated antibody. The original tissue samples had been aliquoted and stored at −80 °C, allowing reliable re-analysis. This work will be done in 8 weeks.

      (4) All Western blot experiments were detected using chemiluminescence, and membrane stripping and reprobing procedures are now explicitly described in the Methods section.

      We thank the reviewer for highlighting these issues, which significantly improve the rigor and clarity of our data presentation.

      (8) Key references using PRV for liver innervation studies are missing (Stanley et al, 2010 [PMID: 20351287]; Torres et al., 2021 [PMID: 34231420]; Desmoulins et al., 2025 [PMID: 39647176]).

      We thank the reviewer for pointing out these important and highly relevant references that were inadvertently omitted in our initial submission. The studies by Stanley et al. (Proc Natl Acad Sci U S A, 2010), Torres et al. (Am J Physiol Regul Integr Comp Physiol, 2021), and Desmoulins et al. (Auton Neurosci, 2025) represent key PRV-based retrograde tracing work that has mapped central neural circuits innervating the liver and thus provide essential context for our anatomical analyses.

      We agree that inclusion of these studies is necessary to properly situate our findings within the existing literature. Accordingly, we will incorporate citations to these references in the revised manuscript and discuss their relationship to our results.

      Reviewer #3 (Public review):

      Summary:

      This study found a lobe-specific, lateralized control of hepatic glucose metabolism by the brain and provides anatomical evidence for sympathetic crossover at the porta hepatis. The findings are particularly insightful to the researchers in the field of liver metabolism, regeneration, and tumors.

      Strengths:

      Increasing evidence suggests spatial heterogeneity of the liver across many aspects of metabolism and regenerative capacity. The current study has provided interesting findings: neuronal innervation of the liver also shows anatomical differences across lobes. The findings could be particularly useful for understanding liver pathophysiology and treatment, such as metabolic interventions or transplantation.

      Weaknesses:

      Inclusion of detailed method and Discussion:

      We sincerely thank the reviewer for the positive and constructive feedback, which will significantly enhance both the methodological rigor and the broader biological interpretation of our study. In direct response, we will revise the Discussion to elaborate on the potential physiological advantages of a lateralized and lobe-specific pattern of liver innervation. Furthermore, we will expand the Methods section to include a comprehensive description of the quantitative analysis applied to PRV-labeled neurons. Together, these revisions will strengthen the manuscript’s clarity, depth, and relevance to researchers in hepatic metabolism, regeneration, and disease. We expect to complete all updates within 8 weeks.

      (1) The quantitative results of PRV-labeled neurons are presented, and please include the specific quantitative methods.

      We thank the reviewer for this helpful suggestion. We will add a detailed description of the quantitative methods used to analyze PRV-labeled neurons in the revised Methods section. This includes information on the counting criteria, the brain regions analyzed, how the regions of interest were delineated, and the normalization procedures applied to obtain the reported neuron counts.

      (2) The Discussion can be expanded to include potential biological advantages of this complex lateralized innervation pattern.

      We appreciate the reviewer’s suggestion. We will expand the Discussion to include a paragraph addressing the potential biological significance of lateralized liver innervation. We highlight that this asymmetric organization could allow for more precise, lobe-specific regulation of hepatic metabolism, enable integration of distinct physiological signals, and potentially provide robustness against perturbations. These points will discuss in the revised manuscript.

      Reviewer #4 (Public review):

      Summary:

      The studies here are highly informative in terms of anatomical tracing and sympathetic nerve function in the liver related to glucose levels, but given that they are performed in a single species, it is challenging to translated them to humans, or to determine whether these neural circuits are evolutionarily conserved. Dual-labeling anatomical studies are elegant, and the addition of chemogenetic and optogenetic studies is mechanistically informative. Denervation studies lack appropriate controls, and the role of sensory innervation in the liver is overlooked.

      We sincerely appreciate the reviewer's thoughtful evaluation and fully agree that findings derived from a single-species model must be interpreted with caution in relation to human physiology. In direct response, we will revise the manuscript to explicitly clarify that all experimental data were obtained in mice and to provide a discussion of the limitations regarding direct extrapolation to humans. Concurrently, we will expand the Discussion section by integrating our findings with recent human and translational studies, including a multicenter clinical trial demonstrating that catheter-based endovascular denervation of the celiac and hepatic arteries significantly improved glycemic control in patients with poorly controlled type 2 diabetes, without major adverse events (Signal Transduct Target Ther. 2025;10(1):371). While our current work focuses on defining the anatomical organization and functional asymmetry of this circuit in mice, the clinical findings suggest that the core principles, sympathetic control of hepatic glucose metabolism via CG-liver pathways, may be conserved and of translational relevance. Additionally, we will clarify the interpretation of tyrosine hydroxylase labeling and expand the discussion of hepatic sensory and parasympathetic innervation, acknowledging their important roles in liver–brain communication and identifying them as key directions for future research. Collectively, these revisions will provide a more balanced, clinically informed, and rigorous framework for interpreting our findings, and we aim to complete all updates within 8 weeks.

      Specific Weaknesses - Major:

      (1) The species name should be included in the title.

      We thank the reviewer for this suggestion. We agree that the species should be clearly indicated. The findings presented in this study were obtained in mice using tissue clearing and whole-organ imaging approaches. Due to technical limitations, these observations are currently limited to the mouse strain. We will update the title and clarified the species used throughout the manuscript.

      (2) Tyrosine hydroxylase was used to mark sympathetic fibers in the liver, but this marker also hits a portion of sensory fibers that need to be ruled out in whole-mount imaging data

      We thank the reviewer for pointing this out. We acknowledge that tyrosine hydroxylase (TH) labels not only sympathetic fibers but also a subset of sensory fibers. We will add a limitation of this point in the revised manuscript. In addition, ongoing experiments using retrograde PRV labeling from the liver, combined with sectioning, are being used to distinguish sympathetic fibers from vagal and dorsal root ganglion–derived sensory fibers. These data will be included in a forthcoming update of the manuscript and are expected to be completed in approximately 6 weeks.

      (3) Chemogenetic and optogenetic data demonstrating hyperglycemia should be described in the context of prior work demonstrating liver nerve involvement in these processes. There is only a brief mention in the Discussion currently, but comparing methods and observations would be helpful.

      We thank the reviewer for this suggestion. Previous studies largely relied on electrical stimulation to modulate liver innervation, which provides relatively coarse control of neural activity (Eur J Biochem. 1992;207(2):399-411). By contrast, our use of chemogenetic and optogenetic approaches allows selective, cell-type–specific manipulation of LPGi neurons. We will revise the Discussion to place our functional data in the context of prior work, highlighting how these more precise approaches improve understanding of the contribution of liver-innervating neurons to hyperglycemia.

      (4) Sympathetic denervation with 6-OHDA can drive compensatory increases to tissue sensory innervation, and this should be measured in the liver denervation studies to implicate potential crosstalk, especially given the increase in LPGi cFOS that may be due to afferent nerve activity. Compensatory sympathetic drive may not be the only culprit, though it is clearly assumed to be. The sensory or parasympathetic/vagal innervation of the liver is altogether ignored in this paper and could be better described in general.

      We thank the reviewer for this insightful comment and agree that chemical sympathetic denervation with 6-OHDA may induce compensatory changes in non-sympathetic hepatic inputs, including sensory and parasympathetic (vagal) innervation. As the reviewer correctly points out, increased LPGi cFOS activity may reflect afferent nerve engagement rather than solely compensatory sympathetic drive.

      More broadly, we agree that the central nervous system functions as an integrated homeostatic network that continuously processes diverse afferent signals, including hepatic sensory and vagal inputs, as well as other interoceptive cues. From this perspective, the LPGi cFOS changes observed in our study likely represent one component of a complex integrative response rather than evidence for a single dominant pathway.

      We acknowledge that the present study did not directly assess hepatic sensory or parasympathetic innervation, which represents a limitation in scope. In the revised manuscript, we will expand the Discussion to explicitly note this limitation and provide a more balanced consideration of potential crosstalk among sympathetic, sensory, and parasympathetic pathways in shaping LPGi activity following hepatic denervation.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      Although the findings are interesting, this reviewer has major concerns about the experimental design, methodology, results, and interpretation of the data. Experimental details are lacking, including basic information (age, sex, strain of mice, procedures, magnification, etc.).

      We thank the reviewer for this important recommendation. We agree that comprehensive reporting of experimental details is essential for rigor and reproducibility.

      In the revised manuscript, we will add complete information regarding mouse strain, sex, age, and sample size for each experiment. In addition, detailed descriptions of surgical procedures, viral constructs, injection parameters, imaging magnification, and analysis methods have been incorporated into the Methods section.

      These revisions ensure that all experiments are described with sufficient technical detail and clarity to allow accurate interpretation and replication of our findings.

      Reviewer #3 (Recommendations for the authors):

      Addressing a few questions might help:

      (1) The study found that liver-associated LPGi neurons are predominantly GABAergic. It would be informative to molecularly characterize the PRV-traced, liver-projecting LPGi neurons to determine their neurochemical phenotypes.

      We thank the reviewer for this insightful suggestion. We agree that molecular characterization of liver-projecting LPGi neurons is important for understanding their functional identity.

      This issue has been addressed in detail in our recent study (Cell Metab. 2025;37(11):2264-2279.e10), in which we performed single-cell RNA sequencing on retrogradely traced LPGi neurons connected to the liver. These analyses demonstrated that the majority of liver-projecting LPGi neurons are GABAergic, with a defined transcriptional profile distinct from neighboring non–liver-related populations.

      Based on these findings, the current study selectively targets GABAergic LPGi neurons using Gad1-Cre mice. We are now explicitly referencing and summarizing these molecular results in the revised manuscript to clarify the neurochemical identity of the PRV-traced LPGi neurons.

      (2) Is it possible to do a local microinjection of a sodium channel blocker (e.g., lidocaine) or an adrenergic receptor antagonist into the porta hepatis? That would potentially provide additional evidence for the porta hepatis as the functional crossover point.

      We appreciate the reviewer’s thoughtful suggestion. While pharmacological blockade at the porta hepatis could modulate local neural activity, the proposed approach may not fully capture the distinction between ipsilateral and contralateral inputs, and may not conclusively establish neural crossover at this particular site.

      In our view, the anatomical evidence provided by whole-mount tissue clearing, dual-labeled tracing, and direct visualization of decussating nerve bundles at the porta hepatis offers a more definitive demonstration of sympathetic crossover. Pharmacological blockade would affect both crossed and uncrossed fibers simultaneously and therefore would not specifically resolve the anatomical organization of this decussation.

      Nevertheless, we agree that functional interrogation of the porta hepatis represents an interesting direction for future work, and we will now acknowledge this possibility in the Discussion.

      (3) It is possible to investigate the effects of unilateral LPGi manipulation or ablation of one side of CG/SMG on liver metabolism, such as hyperglycemia?

      We thank the reviewer for this important suggestion. We agree that unilateral ablation or silencing of the CG-SMG could provide additional insight into lateralized sympathetic control of liver metabolism.

      However, precise and selective ablation of one side of the CG-SMG through 6-OHDA without affecting the contralateral ganglion or adjacent autonomic structures remains technically challenging, particularly given the anatomical connectivity between the two sides. We are currently optimizing approaches to achieve reliable unilateral manipulation.

      If successful within the revision timeframe, we will include these experiments and corresponding metabolic analyses in the revised manuscript. If not, we will explicitly discuss this experimental limitation and the predicted metabolic consequences of unilateral CG-SMG ablation as an important direction for future studies. This work will be done in 6 weeks.

      Reviewer #4 (Recommendations for the authors):

      In the abstract and elsewhere, the use of the term 'sympathetic release' is unclear - do you mean release of nerve products, such as the neurotransmitter norepinephrine? This should be more clearly defined.

      We thank the reviewer for pointing out this ambiguity. We agree that the term “sympathetic release” was imprecise. In the revised manuscript, we will explicitly refer to the release of sympathetic neurotransmitters, primarily norepinephrine, from postganglionic sympathetic fibers.

      We will revise the wording throughout the manuscript to ensure accurate and consistent terminology and to avoid potential confusion regarding the underlying neurobiological mechanisms.

    1. eLife Assessment

      The findings are important, as they identify MIRO1 as a central regulator linking mitochondrial positioning and respiratory chain function to VSMC proliferation, neointima formation, and human vasoproliferative disease. Overall, the strength of evidence is convincing, with comprehensive in vivo and in vitro data, including human cells and added bioenergetic analyses, that broadly support the main claims despite some remaining limitations in mechanistic and mitochondrial assays.

    2. Reviewer #1 (Public review):

      Summary:

      In this paper, the authors investigate the effects of Miro1 on VSMC biology after injury. Using conditional knockout animals, they provide the important observation that Miro1 is required for neointima formation. They also confirm that Miro1 is expressed in human coronary arteries. Specifically, in conditions of coronary diseases, it is localized in both media and neointima and, in atherosclerotic plaque, Miro1 is expressed in proliferating cells.

      However, the role of Miro1 in VSMC in CV diseases is poorly studied and the data available are limited; therefore, the authors decided to deepen this aspect. The evidence that Miro-/- VSMCs show impaired proliferation and an arrest in S phase is solid and further sustained by restoring Miro1 to control levels, normalizing proliferation. Miro1 also affects mitochondrial distribution, which is strikingly changed after Miro1 deletion. Both effects are associated with impaired energy metabolism due to the ability of Miro1 to participate in MICOS/MIB complex assembly, influencing mitochondrial cristae folding. Interestingly, the authors also show the interaction of Miro1 with NDUFA9, globally affecting super complex 2 assembly and complex I activity.<br /> Finally, these important findings also apply to human cells and can be partially replicated using a pharmacological approach, proposing Miro1 as a target for vasoproliferative diseases.

      Strengths:

      The discovery of Miro1 relevance in neointima information is compelling, as well as the evidence in VSMC that MIRO1 loss impairs mitochondrial cristae formation, expanding observations previously obtained in embryonic fibroblasts.<br /> The identification of MIRO1 interaction with NDUFA9 is novel and adds value to this paper. Similarly, the findings that VSMC proliferation requires mitochondrial ATP support the new idea that these cells do not rely mostly on glycolysis.

      The revised manuscript includes additional data supporting mitochondrial bioenergetic impairment in MIRO1 knockout VSMCs. Measurements of oxygen consumption rate (OCR), along with Complex I (ETC-CI) and Complex V activity, have been added and analyzed across multiple experimental conditions. Collectively, these findings provide a more comprehensive characterization of the mitochondrial functional state. Following revision, the association between MIRO1 deficiency and impaired Complex I activity is more robust.

      Although the precise molecular mechanism of action remains to be fully elucidated, in this updated version, experiments using a MIRO1 reducing agent are presented with improved clarity

      Although some limitations remain, the authors have addressed nearly all the concerns raised, and the manuscript has substantially improved

      Weaknesses:

      Figure 6: The authors do not address the concern regarding the cristae shape; however, characterization of the cristae phenotype with MIRO1 ΔTM would have strengthened the mechanistic link between MIRO1 and the MIB/MICOS complex

      Although the authors clarified their reasoning, they did not explore in vivo validation of key biochemical findings, which represents a limitation of the current study. While their justification is acknowledged, at least a preliminary exploratory effort could have been evaluated to reinforce the translational relevance of the study.

      Finally, in line with the explanations outlined in the rebuttal, the Discussion section should mention the limits of MIRO1 reducer treatment.

    3. Reviewer #2 (Public review):

      Summary:

      This study identifies the outer‑mitochondrial GTPase MIRO1 as a central regulator of vascular smooth muscle cell (VSMC) proliferation and neointima formation after carotid injury in vivo and PDGF-stimulation ex vivo. Using smooth muscle-specific knockout male mice, complementary in vitro murine and human VSMC cell models, and analyses of mitochondrial positioning, cristae architecture and respirometry, the authors provide solid evidence that MIRO1 couples mitochondrial motility with ATP production to meet the energetic demands of the G1/S cell cycle transition. However, a component of the metabolic analyses are suboptimal and would benefit from more robust methodologies. The work is valuable because it links mitochondrial dynamics to vascular remodelling and suggests MIRO1 as a therapeutic target for vasoproliferative diseases, although whether pharmacological targeting of MIRO1 in vivo can effectively reduce neointima after carotid injury has not been explored. This paper will be of interest to those working on VSMCs and mitochondrial biology.

      Strengths:

      The strength of the study lies in its comprehensive approach assessing the role of MIRO1 in VSMC proliferation in vivo, ex vivo and importantly in human cells. The subject provides mechanistic links between MIRO1-mediated regulation of mitochondrial mobility and optimal respiratory chain function to cell cycle progression and proliferation. Finally, the findings are potentially clinically relevant given the presence of MIRO1 in human atherosclerotic plaques and the available small molecule MIRO1.

      Weaknesses:

      (1) High-resolution respirometry (Oroboros) to determine mitochondrial ETC activity in permeabilized VSMCs would be informative.

      (2) Therapeutic targeting of MIRO1 failed to prevent neointima formation, however, the technical difficulties of such an experiment is appreciated.

    4. Reviewer #3 (Public review):

      Summary:

      This study addresses the role of MIRO1 in vascular smooth muscle cell proliferation, proposing a link between MIRO1 loss and altered growth due to disrupted mitochondrial dynamics and function. While the findings are useful for understanding the importance of mitochondrial positioning and function in this specific cell type, the main bioenergetic and mechanistic claims are not strongly supported.

      Strengths:

      This study focuses on an important regulatory protein, MIRO1, and its role in vascular smooth muscle cell (VSMC) proliferation, a relatively underexplored context.

      This study explores the link between smooth muscle cell growth, mitochondrial dynamics, and bioenergetics, which is a significant area for both basic and translational biology.

      The use of both in vivo and in vitro systems provides a useful experimental framework to interrogate MIRO1 function in this context.

      Weaknesses:

      The proposed link between MIRO1 and respiratory supercomplex biogenesis or function is not clearly defined.

      Completeness and integration of mitochondrial assays is marginal, undermining the strength of the conclusions regarding oxidative phosphorylation.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews

      Reviewer #1 (Public review):

      Summary:

      In this paper, the authors investigate the effects of Miro1 on VSMC biology after injury. Using conditional knockout animals, they provide the important observation that Miro1 is required for neointima formation. They also confirm that Miro1 is expressed in human coronary arteries. Specifically, in conditions of coronary diseases, it is localized in both media and neointima, and, in atherosclerotic plaque, Miro1 is expressed in proliferating cells.

      However, the role of Miro1 in VSMC in CV diseases is poorly studied, and the data available are limited; therefore, the authors decided to deepen this aspect. The evidence that Miro-/- VSMCs show impaired proliferation and an arrest in S phase is solid and further sustained by restoring Miro1 to control levels, normalizing proliferation. Miro1 also affects mitochondrial distribution, which is strikingly changed after Miro1 deletion. Both effects are associated with impaired energy metabolism due to the ability of Miro1 to participate in MICOS/MIB complex assembly, influencing mitochondrial cristae folding. Interestingly, the authors also show the interaction of Miro1 with NDUFA9, globally affecting super complex 2 assembly and complex I activity.

      Finally, these important findings also apply to human cells and can be partially replicated using a pharmacological approach, proposing Miro1 as a target for vasoproliferative diseases.

      Strengths:

      The discovery of Miro1 relevance in neointima information is compelling, as well as the evidence in VSMC that MIRO1 loss impairs mitochondrial cristae formation, expanding observations previously obtained in embryonic fibroblasts.

      The identification of MIRO1 interaction with NDUFA9 is novel and adds value to this paper. Similarly, the findings that VSMC proliferation requires mitochondrial ATP support the new idea that these cells do not rely mostly on glycolysis.

      Weaknesses:

      (1) Figure 3:

      I appreciate the system used to assess mitochondrial distribution; however, I believe that time-lapse microscopy to evaluate mitochondrial movements in real time should be mandatory. The experimental timing is compatible with time-lapse imaging, and these experiments will provide a quantitative estimation of the distance travelled by mitochondria and the fraction of mitochondria that change position over time. I also suggest evaluating mitochondrial shape in control and MIRO1-/- VSMC to assess whether MIRO1 absence could impact mitochondrial morphology, altering fission/fusion machinery, since mitochondrial shape could differently influence the mobility.

      Mitochondrial motility experiments. WT and Miro1-/- VSMCs were transiently transfected with mito-ds-red and untargeted GFP adenoviruses to fluorescently label mitochondria and cytosol, respectively. Live-cell fluorescence confocal microscopy was used to acquire mitochondrial images at one-minute intervals over a 25-30-minute period. WT cells exhibited dynamic reorganization of the mitochondrial network, whereas Miro1-/- VSMCs displayed minimal mitochondrial movement, characterized only by limited oscillatory behavior without network remodeling (Supplemental Video 1).

      Mitochondrial shape (form factor) was assessed by confocal microscopy in WT and Miro1-/- VSMCs. Analysis of the mitochondrial form factor (defined as the ratio of mitochondrial length to width) during cell cycle progression revealed morphological changes in wild type (WT) cells, characterized by an increase in form factor. In contrast, Miro1-/- cells exhibited no significant alterations in mitochondrial morphology (Figure 3- Figure supplement 1B).

      (2) Figure 6:

      The evidence of MIRO1 ablation on cristae remodeling is solid; however, considering that the mechanism proposed to explain the finding is the modulation of MICOS/MIB complex, as shown in Figure 6D, I suggest performing EM analysis in each condition. In my mind, Miro1 KK and Miro1 TM should lead to different cristae phenotypes according to the different impact on MICOS/MIB complex assembly. Especially, Miro1 TM should mimic Miro1 -/- condition, while Miro1 KK should drive a less severe phenotype. This would supply a good correlation between Miro1, MICOS/MIB complex formation and cristae folding.

      I also suggest performing supercomplex assembly and complex I activity with each plasmid to correlate MICOS/MIB complex assembly with the respiratory chain efficiency.

      Complex I activity assays revealed that overexpression of MIRO1-WT fully restored enzymatic activity in MIRO1-/- cells, whereas MIRO1-KK provided partial rescue. In contrast, a MIRO1 mutant lacking the transmembrane domain failed to restore activity and resembled the Miro1-/- phenotype (Figure 6- Figure supplement 2).

      The Complex I activity in each Miro1 mutant correlated with the degree of MICOS/MIB complex assembly in pulldown assays, implying a functional link between Miro1 and mitochondrial cristae organization.

      Moreover, an in-gel Complex V activity assay was performed to evaluate the enzymatic activity of mitochondrial ATP synthase in a native gel following electrophoresis. To normalize the activity signal, a Blue Native PAGE of the same samples was probed for the ATP5F1 subunit. A modest, yet statistically significant reduction in Complex V activity was observed in Miro1-/- cells (Figure 6- Figure supplement 1).

      (3) I noticed that none of the in vitro findings have been validated in an in vivo model. I believe this represents a significant gap that would be valuable to address. In your animal model, it should not be too complex to analyze mitochondria by electron microscopy to assess cristae morphology. Additionally, supercomplex assembly and complex I activity could be evaluated in tissue homogenates to corroborate the in vitro observations.

      We appreciate the reviewer’s comment. However, our currently available samples have been processed by light microscopy and are therefore not suitable for embedding for light for electron microscopy.

      (4) I find the results presented in Figure S7 somewhat unclear. The authors employ a pharmacological strategy to reduce Miro1 and validate the findings previously obtained with the genetic knockout model. They report increased mitophagy and a reduction in mitochondrial mass. However, in my opinion, these changes alone could significantly impact cellular metabolism. A lower number of mitochondria would naturally result in decreased ATP production and reduced mitochondrial respiration. This, in turn, weakens the proposed direct link between Miro1 deletion and impaired metabolic function or altered electron transport chain (ETC) activity. I believe this section would benefit from additional experiments and a more in-depth discussion.

      We initially conducted experiments using the MIRO1 reducer to explore the translational potential of our findings. These experiments aimed to provide a foundation in vivo studies. However, despite multiple attempts, we were unable to demonstrate a significant effect of MIRO1reducer, delivered via a Pluronic gel, on the mitochondria of the vascular wall. Of note, he role of MIRO1 in mitophagy has been well-established in several studies (for example, PMID: 34152608), which show that genetic deletion of Miro1 delays the translocation of the E3 ubiquitin ligase Parkin onto damaged mitochondria, thereby reducing mitochondrial clearance in fibroblasts and cultured neurons. Furthermore, loss of Miro1 in the hippocampus and cortex increases mitofusin levels with the appearance of hyperfused mitochondria and activation of the integrated stress response. Thus, MIRO1 deletion in genetic models does not result in a substantial reduction of mitochondria but causes hyperfused mitochondria. The rationale for developing the MIRO1 reducer stems from genetic forms of Parkinson’s disease, where Miro1 is retained in PD cells but degraded in healthy cells following mitochondrial depolarization (PMID: 31564441). Thus, the degradation of mutant MIRO1 by the reducer does not phenocopy the effects of genetic MIRO1 depletion. Thus, we believe the data with the reducer demonstrate that MIRO1 can be acutely targeted in vitro, but the mechanism of action (as the reviewer points out, the reduction of mitochondrial mass may lead to decreased ATP levels, potentially reducing cell proliferation) differs from that of chronic genetic deletion. In fact, we observe somewhat increased mitochondrial length in MIRO1-/- cells. We acknowledge that this is complex and have revised the paragraph to clarify the use of the MIRO1 reducer.

      Reviewer #2 (Public review):

      Summary:

      This study identifies the outer mitochondrial GTPase MIRO1 as a central regulator of vascular smooth muscle cell (VSMC) proliferation and neointima formation after carotid injury in vivo and PDGF-stimulation ex vivo. Using smooth muscle-specific knockout male mice, complementary in vitro murine and human VSMC cell models, and analyses of mitochondrial positioning, cristae architecture, and respirometry, the authors provide solid evidence that MIRO1 couples mitochondrial motility with ATP production to meet the energetic demands of the G1/S cell cycle transition. However, a component of the metabolic analyses is suboptimal and would benefit from more robust methodologies. The work is valuable because it links mitochondrial dynamics to vascular remodeling and suggests MIRO1 as a therapeutic target for vasoproliferative diseases, although whether pharmacological targeting of MIRO1 in vivo can effectively reduce neointima after carotid injury has not been explored. This paper will be of interest to those working on VSMCs and mitochondrial biology.

      Strengths:

      The strength of the study lies in its comprehensive approach, assessing the role of MIRO1 in VSMC proliferation in vivo, ex vivo, and importantly in human cells. The subject provides mechanistic links between MIRO1-mediated regulation of mitochondrial mobility and optimal respiratory chain function to cell cycle progression and proliferation. Finally, the findings are potentially clinically relevant given the presence of MIRO1 in human atherosclerotic plaques and the available small molecule MIRO1.

      Weaknesses:

      (1) There is a consistent lack of reporting across figure legends, including group sizes, n numbers, how many independent experiments were performed, or whether the data is mean +/- SD or SEM, etc. This needs to be corrected.

      These data were added in the revised manuscript.

      (2) The in vivo carotid injury experiments are in male mice fed a high-fat diet; this should be explicitly stated in the abstract, as it's unclear if there are any sex- or diet-dependent differences. Is VSMC proliferation/neointima formation different in chow-fed mice after carotid injury?

      This is an important point, and we appreciate the feedback. In this model, the transgene is located on the Y chromosome. As a result, only male mice can be studied. However, in our previous experiments, we have not observed any sex-dependent changes in neointimal formation. Additionally, please note that smooth muscle cell proliferation in neointimal formation is enhanced in models of cholesterol-fed mice on a high-fat diet.

      (3) The main body of the methods section is thin, and it's unclear why the majority of the methods are in the supplemental file. The authors should consider moving these to the main article, especially in an online-only journal.

      We thank the reviewer for this suggestion. We moved the methods to the main manuscript.

      (4) Certain metabolic analyses are suboptimal, including ATP concentration and Complex I activity measurements. The measurement of ATP/ADP and ATP/AMP ratios for energy charge status (luminometer or mass spectrometry), while high-resolution respirometry (Oroboros) to determine mitochondrial complex I activity in permeabilized VSMCs would be more informative.

      ATP/ADP and ATP/AMP ratios were assessed on samples from WT and Miro1-/- VSMCs using an ATP/ADP/AMP Assay Kit (Cat#: A-125) purchased from Biomedical Research Service, University at Buffalo, New York). Miro1-/- samples exhibited reduced ATP levels accompanied by elevated concentrations of ADP and AMP. As a result, both ATP/ADP and ATP/AMP ratios were significantly lower in MIRO1-/- cells compared to WT, indicating impaired cellular energy homeostasis (Figure 5B, C).

      (5) The statement that 'mitochondrial mobility is not required for optimal ATP production' is poorly supported. XF Seahorse analysis should be performed with nocodazole and also following MIRO1 reconstitution +/- EF hands.

      To evaluate the metabolic effects of Nocodazole, we conducted Seahorse metabolic assays on vascular smooth muscle cells with various conditions (VSMCs). We used WT VSMCs, Miro1-/- VSMCs, and Miro1-/- VSMCs that expressed either MIRO1-WT, KK, or ΔTM mutants.Our results demonstrate that Nocodazole exposure did not compromise mitochondrial respiratory activity. However, Miro1-/- VSMCs displayed a trend toward reduced basal and maximal mitochondrial respiration when compared to WT cells. This deficit was only partially corrected by the expression of the MIRO1-KK mutant. In contrast, reintroducing MIRO1-WT through adenoviral delivery fully restored mitochondrial respiration to normal levels (Figure 5- Figure supplement 1).

      (6) The authors should consider moving MIRO1 small molecule data into the main figures. A lot of value would be added to the study if the authors could demonstrate that therapeutic targeting of MIRO1 could prevent neointima formation in vivo.

      We appreciate the reviewer's comment and attempted the suggested in vivo experiments using the commercially available Miro1 reducer. For these experiments, we used a pluronic gel to deliver the reducer to the adventitial area surrounding the carotid artery. Despite numerous attempts to optimize the experimental conditions, we were unable to reliably detect a significant effect of the reducer on mitochondria in the vascular wall.

      Reviewer #3 (Public review):

      Summary:

      This study addresses the role of MIRO1 in vascular smooth muscle cell proliferation, proposing a link between MIRO1 loss and altered growth due to disrupted mitochondrial dynamics and function. While the findings are potentially useful for understanding the importance of mitochondrial positioning and function in this specific cell type within health and disease contexts, the evidence presented appears incomplete, with key bioenergetic and mechanistic claims lacking adequate support.

      Strengths:

      (1)The study focuses on an important regulatory protein, MIRO1, and its role in vascular smooth muscle cell (VSMC) proliferation, a relatively underexplored context.

      (2) It explores the link between smooth muscle cell growth, mitochondrial dynamics, and bioenergetics, which is a potentially significant area for both basic and translational biology.

      (3) The use of both in vivo and in vitro systems provides a potentially useful experimental framework to interrogate MIRO1 function in this context.

      Weaknesses:

      (1) The central claim that MIRO1 loss impairs mitochondrial bioenergetics is not convincingly demonstrated, with only modest changes in respiratory parameters and no direct evidence of functional respiratory chain deficiency.

      (2) The proposed link between MIRO1 and respiratory supercomplex assembly or function is speculative, lacking mechanistic detail and supported by incomplete or inconsistent biochemical data.

      (3) Key mitochondrial assays are either insufficiently controlled or poorly interpreted, undermining the strength of the conclusions regarding oxidative phosphorylation.

      (4) The study does not adequately assess mitochondrial content or biogenesis, which could confound interpretations of changes in respiratory activity.

      (5) Overall, the evidence for a direct impact of MIRO1 on mitochondrial respiratory function in the experimental setting is weak, and the conclusions overreach the data.

      Recommendations for the authors:

      Reviewer #3 (Recommendations for the authors):

      (1)  Throughout the manuscript, the authors incorrectly use "mobility" to describe the active transport of mitochondria. The appropriate term is "mitochondrial motility," which refers to active, motor-driven movement. "Mobility" implies passive diffusion and is not scientifically accurate in this context.

      (2) "Super complex" should be consistently written as "supercomplex," in line with accepted mitochondrial biology terminology.

      We thank the reviewer for this comment and revised the text accordingly.

      (3) A significant limitation of the in vivo model is the mild phenotype observed, which is expected from an inducible knockout system. The authors should clarify whether a constitutive, tissue-specific knockout was considered and, if not, whether embryonic lethality or another limitation prevented its generation.

      This genetic model was originally developed by Dr. Janet Shaw at the University of Utah. In the original publication, Miro1 was constitutively knocked out in neurons. Germline inactivation of Miro1 was achieved by crossing mice harboring the Miro1F allele with a mouse line expressing Cre recombinase under the control of the hypoxanthine-guanine phosphoribosyltransferase (HPRT) promoter. Mating Miro1+/− mice resulted in Miro1−/− animals, which were cyanotic and died shortly after birth. Due to this outcome, we opted to develop an inducible, smooth muscle-specific model. Additionally, we considered testing whether the acute use of an inhibitor or a knockdown system targeting Miro1 could be evaluated as a potential therapeutic approach.

      (4) In Figure 1A and S1A, the authors use Western blotting to validate the knockout in the aorta and IHC in carotid arteries. The choice of different methods does not seem justified, and qPCR data are shown only for the aorta. IHC appears to be suboptimal for assessing MIRO1 levels in vascular tissue due to high autofluorescence, and IHC in Figure S1A is merely qualitative, with no quantification provided.

      We present complementary approaches to validate the deletion of Miro1. For Western blot analysis, we used the aorta because it provides more material for analysis. The autofluorescence observed via immunofluorescence is characteristic of elastin fibers within the media layer, making our results typical for this technique. As shown in Figure 1- Figure supplement 1, our data demonstrate a significant decrease, if not a complete knockout, of the target protein specifically in smooth muscle cells.

      (5) In Figure 1G, the bottom left panel (magnification) shows a lower green signal than the top left panel, suggesting these may have been collected with different signal intensity. This raises concerns about image consistency and representation.

      Top images in Figure 1G are taken at magnification 63x. Bottom images were made at magnification 20x. The intensity is different between the two magnifications, but similar between genotypes.

      (6) In Figure S3, the sampling is uncontrolled: the healthy subject and the patient differ markedly in age. The claim of colocalization is not substantiated with any quantitative analysis.

      As outlined in the Methods section, our heart samples were obtained from LVAD patients or explanted hearts from transplant recipients. Due to the limited availability of such samples, there is indeed a difference in age between the healthy subject and the patient. While we acknowledge this limitation, the scarcity of samples made it challenging to control for age. Additionally, we determined that performing a quantitative analysis of colocalization would not yield robust or meaningful data given the constraints of our sample size and variability. 

      (7) Figure S4A lacks statistical analysis, which is necessary for interpreting the data shown.

      This appears to be a misunderstanding. In this manuscript, we do present statistically significant differences and focus on those that are biologically meaningful. Specifically, we highlight differences between PDGF treatment versus no treatment within the same genotype, as well as differences between the two genotypes under the same treatment condition (control or PDGF treatment). In this particular case, there is only a statistical difference between WT+PDGF and SM-Miro1-/, but since this is not a meaningful comparison, it is not shown. Please note that this approach applies to all figures in the manuscript. Including all comparisons—whether statistically significant or not, and whether biologically meaningful or not—may appear rigorous but in our opinion, ultimately detracts from the main message of this paper.

      (8) The authors state, "given the generally poor proliferation of VSMCs from SM-MIRO1-/- mice, in later experiments we used VSMCs from MIRO1fl/fl mice and infected them with adenovirus expressing cre." This is not convincing, especially since in vivo cre efficiency is generally lower than in vitro. Moreover, the methods indicate that "VSMCs from littermate controls were subjected to the same procedure with empty vector control adenovirus," yet in Figure 2A, the control appears to be MIRO1fl/fl VSMCs transduced with Ad-EV. The logic and consistency of the controls used need clarification.

      For the initial experiments, cells were explanted from SM-MIRO1-/- mice (Figure 2- Figure supplement 1). In these mice, Cre recombination had occurred in vivo, and the cells exhibited very poor growth. In fact, their growth was so limited that we decided not to pursue this experimental approach after three independent experiments.

      For subsequent experiments, cells were explanted from Miro1fl/fl mice and passaged several times, which allowed us to generate the number of cells required for the experiments (Figure 2B). Once sufficient Miro1fl/fl cells were obtained, they were treated with adenovirus expressing Cre, as described in the Methods section. Control cells were treated with an empty vector adenovirus. To clarify, the control cells are Miro1fl/fl cells infected with an empty vector adenovirus, while the MIRO1-/- cells are Miro1fl/fl cells infected with adenovirus expressing Cre. The statement that “littermate controls were used” is incorrect as in fact, Miro1fl/fl cells from the same preparation were either infected with an empty vector adenovirus, or with adenovirus expressing Cre. As mentioned, the knockdown was confirmed by Western blotting.

      (9) Figure 2C shows a growth delay in MIRO1-/- cells. Have the authors performed additional time points to determine when these cells return to G1 and quantify the duration of the lag?

      This is an excellent suggestion. So far, we have not performed this experiment.

      (10) In the 24 h time point of Figure 2C, MIRO1-/- cells appear to be cycling, yet no cyclin E signal is detected. How do the authors explain this inconsistency? Additionally, in Figure 2H, the quantification of cyclin E is unreliable, given that lanes 3 and 4 show no detectable signal.

      We agree with the reviewer—the inconsistency is driven by the exposure of the immunoblot presented. We revisited the data, reviewed the quantification, and performed an additional experiment. We are now presenting an exposure that demonstrates levels of cyclin E (Figure 2G).

      (11) In Figure 3D, the authors present mitochondrial probability map vs. distance from center curves. How was the "center" defined in this analysis? Were radial distances normalized across cells (e.g., to the cell radius or maximum extent)? If not, variation in cell and/or nucleus size or shape could significantly affect the resulting profiles. No statistical analysis is provided for this assessment, which undermines its quantitative value. Furthermore, the rationale behind the use of mito95 values is not clearly explained.

      The center refers to the center of the microchip's Y-shaped pattern, to which each cell is attached. Since all Y-shapes on the chip are identical in size, normalization is not required. The size of the optimal Y-shapes was tested as recommended by CYTOO. For further context, please refer to the papers by the Kittler group.

      Additionally, a graph demonstrating the percentage of mitochondria localized at specific distances can be produced for any given distance. Notably, the further from the center of the chip, the more pronounced the differences become.

      (12) The authors apply a 72 h oligomycin treatment to assess proliferation and a 16 h treatment to measure ATP levels. This discrepancy in experimental design is not justified in the manuscript. The length of treatment directly impacts the interpretation of the data in Figures 4C, 4D, and 4E, and needs to be addressed.

      Thank you for this comment. We have performed additional experiments to align these time points. In the revised manuscript, we now present proliferation and ATP production measured at the same time point (Figure 4A, B for proliferation and ATP levels).

      (13) The manuscript repeatedly suggests that MIRO1 loss causes a defect in mitochondrial ATP production, yet no direct demonstration of a bioenergetic defect is provided. The claim relies on a modest decrease in supercomplex species (of undefined composition) and a mild reduction in complex I activity that does not support a substantial OXPHOS defect. Notably, the respirometry data in Figure 5I do not align with the BN-PAGE results in Figure 6I. There is increasing evidence that respiratory chain supercomplexes do not confer a catalytic advantage. The authors should directly assess the enzymatic activities of all respiratory complexes. Reported complex I activity in MIRO1-/- cells appears rotenone-like (virtually zero, figure 3K) or ~30% residual (Figure 3L), suggesting a near-total loss of functional complex I, which is not reflected in the BN-PAGE. Additionally, complex I activity has not been normalized to a mitochondrial reference, such as citrate synthase.

      Given that we work in primary cells and are limited by the number of cells we can generate, we concentrated on ETC1 and 5 and performed experiments in cells after expression of MIRO1 WT and MIRO1 mutants (Figure 6- Figure supplement 1). Please note that the addition of Rotenone abolishes the slope of NADH consumptions (Figure 6- Figure supplement 2F).

      While the ETC1 activity is measured in Fig. 6K, the blue native gel shown in Figure 6I is performed without substrate and thus, indicative of protein complex abundance rather than complex activity.

      In additional experiments, we normalized the activity to citrate synthase as requested.

      (14) In the methods section, the complex I activity assay is incorrectly described: complex I is a NADH dehydrogenase, so the assay measures NADH oxidation, not NADPH.

      We thank the reviewer for his comment and revised the manuscript accordingly.

      (15) The authors have not assessed mitochondrial mass, which is a critical omission. Differences in mitochondrial biogenesis or content could underlie several observed phenotypes and should be controlled for.

      A qPCR assay was used to assess mitochondrial DNA copy number in WT and Miro1-/- VSMCs. We determined the abundance of COX1 and MT-RNR1 DNA as mitochondrial gene targets and NDUFV DNA as the nuclear reference gene. While the results in Miro1-/- cells were highly variable, no statistically significant reduction of copy numbers was detected (Figure 3- Figure supplement 1B).

      (16) Complex IV signal is missing in Figure 6I. Its omission is not acknowledged or explained.

      Thank you for this comment. We believe this is due to a technical issue. Complex IV can be challenging to detect consistently, as its visibility is highly dependent on sample preparation conditions. In this specific case, we suspect that the buffer used during the isolation process may have influenced the detection of Complex IV.

      (17) Figure 6D does not appear representative of the quantifications shown. C-MYC signal is visibly reduced in the mutant, consistent with the lower levels of interactors such as Sam50 and NDUFA9. Additionally, the SDHA band is aligned at the bottom of the blot box. The list of antibodies used, and their catalog number is missing, or it was not provided to the reviewers. It seems plausible that the authors used a cocktail antibody set (e.g., Abcam ab110412), which includes anti-NDUFA9. This would contradict the claim of reduced complex I and SC levels, as the steady-state levels of NDUFA9 appear unchanged.

      We acknowledge that the expression of the myc-MIRO1 mutant is lower compared to myc-MIRO1 WT or myc-MIRO1 KK. Achieving identical expression levels when overexpressing multiple MIRO1 constructs is challenging. We agree that the lower expression of this mutant contributes to a reduced pull-down. Our quantification shows a reduction in association, although it is not statistically significant.

      A list of the antibodies was provided in the Methods section.

      We would like to clarify that we did not use an antibody cocktail in our experiments.

      (18) The title of Figure 6, "Loss of Miro1 leads to dysregulation of ETC activity under growth conditions," is vague. The term "dysregulation" should be replaced with a more specific mechanistic descriptor-what specific regulatory defect is meant?

      We thank the reviewer for this suggestion and rephrased the title.

      (19) In the results text for Figure 6, the authors state: "These data demonstrate that MIRO1 associates with MIB/MICOS and that this interaction promotes the formation of mitochondrial super complexes and the activity of ETC complex I." This conclusion is speculative and not mechanistically supported by the data presented.

      We appreciate the reviewer's feedback. We have revised the text to clarify the relationship between MIRO1, MIB/MICOS, supercomplex formation, and ETC activity. The updated text now states: "These data demonstrate that MIRO1 associates with MIB/MICOS. Additionally, MIRO1 promotes the formation of mitochondrial supercomplexes and enhances the activity of ETC complex I.”

      (20) In Figure 7A, it is unclear what the 3x siControl/siMiro1 pairs represent-are these different cell lines or technical replicates of the same line? No loading control is shown. If changes in mitochondrial protein abundance are being evaluated, using COX4 as a loading control is inappropriate. The uneven COX4 signal across samples further complicates interpretation

      Please note that we used primary cells, not cell lines. The three siControl/siMiro1 pairs represent independent cell isolations, each transfected with either siControl or. siMIRO1 mRNA. While the possibility of a difference in mitochondrial mass is an interesting question, the primary objective of this experiment is to demonstrate that the technique effectively results in the knockdown of Miro1, which is exclusively localized to mitochondria and not present in the cytosol. As such, we believe that Cox4 serves as a reasonable loading control. Although Miro1 knockdown may lead to a reduction in mitochondrial mass, the focus of this experiment is not to assess mitochondrial mass but to confirm the reduction in Miro1 protein levels on mitochondria. We also performed anti-VDAC immunoblots on the same membranes as alternative loading control (Author response image 1).

      Author response image 1.

      (21) Figure 7G is difficult to interpret. Why did the authors choose to use a sensor-based method instead of the chemiluminescent assay to measure ATP in these samples?

      Both methods were employed to assess ATP levels in human samples. ATP measurements obtained with luminescent assay are provided.

    1. eLife Assessment

      This manuscript provides useful insights into how the brain can simultaneously represent events and the times when they occurred. The results include a comparison between two different basis functions for temporal selectivity and how these generate different predictions for the dynamics of neural populations. The conclusions are partly incomplete because of questions such as the impact of the linear separability assumption and whether joint encodings of event type and time can be made without it.

    2. Joint Public Review:

      Quite obviously, the brain encodes "time", as we are able to tell if something happened before or after something else. How this is done, however, remains essentially not understood. In the context of Working Memory tasks, many experiments have shown that the neural activity during the retention period "encodes" time, besides the stimulus to be remembered; that is, the time elapsed from stimulus presentation can be reliably inferred from the recordings, even if time per se is not important for the task. This implies 'mixed selectivity', in the weak sense of neural activity varying with both stimulus identity and time elapsed (since presentation).

      In this paper, the authors investigate the implications of a specific form of such mixed selectivity, that is, conjunctive coding of what (stimulus) and when (time) at the single-neuron level, on the resulting dynamics of the population activity when 'viewed' through linear dimensionality-reduction techniques, essentially Principal Component Analysis (PCA). The theoretical/modeling results presented provide a useful guide to the interpretation of the experimental results; in particular, with respect to what can, or cannot, be rightfully inferred from those experimental results (using PCA-like techniques). The results are essentially theoretical in nature; there are, however, some conclusions that require a more precise justification, in my opinion. More generally, as the authors themselves discuss in the paper, it is not clear how to generalize this coding scheme to more complicated, but behaviorally and cognitively relevant, situations, such as multi-item WM or WM for sequences.

      (1) It is unclear to me how the conjunctive code that the authors use (i.e., Equation (3)) is constrained by the theoretical desiderata (i.e., compositionality) they list, or whether it is simply an ansatz, partly motivated by theoretical considerations and experimental observations.

      The "what" part: What the authors mean by "relationships" between stimuli is never clearly defined. From their argument (and from Figure 1b), it would seem that what they mean is "angles" between population vectors for all pairs of stimuli. If this is so, then the effect of the passing time can only amount to a uniform rescaling of the components of the population vector (i.e., it must be a similarity transformation; rotations are excluded, if the linear-decoder vectors are to be time-independent); the scaling factor, then, must be a strictly monotonous function of time (increasing or decreasing), if one is to decode time. In other words, the "when" receptive fields must be the same for all neurons.

      The "when" part: The condition, \tau_3=\tau_1+\tau_2, does not appear to be used at all. In fact, it is unclear (to me at least) whether the model, as it is formulated, is able to represent time intervals between stimuli.

      (2) For the specific case considered, i.e., conjunctive coding, it would seem that one should be able to analytically work out the demixed PCA (see Kobak et al., 2016). More generally, it seems interesting to compare the results of the PCA and the demixed PCA in this specific case, even just using synthetic data.

      (3) In the Section "Dimensionality of neural trajectories...", there is some claim about how the dimensionality of the population activity goes up with the observation window T, backed up by numerical results that somehow mimic the results of Cueva et al. (2020) on experimental data. Is this a result that can be formally derived? Related to this point, it would be useful to provide a little more justification for Equation (17). Naively, one would think that the correlation matrix of the temporal component is always full-rank nominally, but that one can get excellent low-rank approximations (depending on T, following your argument).

    1. eLife Assessment

      The authors provide a scholarly review of intracranial research into the neural correlates of consciousness (NCCs). To our knowledge, this is the first such review, and it therefore may become a must-read for anyone working in the field of consciousness research. It is not so persuasive that intracranial recordings are better suited to identifying pure NCCs than other methods, which appears a problem instead solved through novel paradigms and better-developed theories - but this no doubt reflects an in-depth, timely, and insightful contribution to the literature.

    2. Reviewer #1 (Public review):

      Summary

      In this review paper, the authors describe the concept of neural correlates of consciousness (NCC) and explain how noninvasive neuroimaging methods fall short of being able to properly characterise an unconfounded NCC. They argue that intracranial research is a means to address this gap and provide a review of many intracranial neuroimaging studies that have sought to answer questions regarding the neural basis of perceptual consciousness.

      Strengths

      The authors have provided an in-depth, timely, and scholarly contribution to the study of NCCs. First and foremost, the review surveys a vast array of literature. The authors synthesise findings such that a coherent narrative of what invasive electrophysiology studies have revealed about the neural basis of consciousness can be easily grasped by the reader. The review is also, to the best of my knowledge, the first review to specifically target intracranial approaches to consciousness and to describe their results in a single article. This is a credit to the authors, as it becomes ever harder to apply strict tests to theories of consciousness using methods such as fMRI and M/EEG it is important to have informative resources describing the results of human intracranial research so that theorists will have to constrain their theories further in accordance with such data. As far as the authors were aiming to provide a complete and coherent overview of intracranial approaches to the study of NCCs, I believe they have achieved their aim.

      Weaknesses

      Overall, I feel positive about this paper. However, there are a couple of aspects to the manuscript that I think could be improved.

      (1) Distinguishing NCCs from their prerequisites or consequences

      This section in the introduction was particularly confusing to me. Namely, in this section, the authors' aim is to explain how intracranial recordings can help distinguish 'pure' NCCs from their antecedents and consequences. However, the authors almost exclusively describe different tasks (e.g., no-report tasks) that have been used to help solve this problem, rather than elaborating on how intracranial recordings may resolve this issue. The authors claim that no-report designs rely on null findings, and invasive recordings can be more sensitive to smaller effects, which can help in such cases. However, this motivation pertains to the previous sub-section (limits of noninvasive methods), since it is primarily concerned with the lack of temporal and spatial resolution of fMRI and M/EEG. It is not, in and of itself, a means to distinguish NCCs from their confounds.

      As such, in its current formulation, I do not find the argument that intracranial recordings are better suited to identifying pure NCCs (i.e. separating them from pre- or post-processing) convincing. To me, this is a problem solved through novel paradigms and better-developed theories. As it stands, the paper justifies my position by highlighting task developments that help to distinguish NCCs from prerequisites and consequences, rather than giving a novel argument as to why intracranial recordings outperform noninvasive methods beyond the reasons they explained in the previous section. Again, this position is justified when, from lines 505-506, the authors describe how none of the reported single-cell studies were able to dissociate NCCs from post-perceptual processing. As such, it seems as if, even with intracranial recording, NCCs and their confounds cannot be disentangled without appropriate tasks.

      The section 'Towards Better Behavioural Paradigms' is a clear attempt to address these issues and, as such, I am sure the authors share the same concerns as I am raising. Still, I remain unconvinced that the distinguishing of NCCs from pre-/post- processing is a fair motivation for using intracranial over noninvasive measures.

      (2) Drawing misleading conclusions from certain studies

      There are passages of the manuscript where the authors draw conclusions from studies that are not necessarily warranted by the studies they cite. For instance:

      Lines 265 - 271: "The results of these two studies revealed a complex pattern: on the one hand, HGA in the lateral occipitotemporal cortex and the ventral visual cortex correlated with stimulus strength. On the other hand, it also correlated with another factor that does not appear to play a role in visibility (repetition suppression), and did not correlate with a non-sensory factor that affects visibility reports (prior exposure). These results suggest that activity in occipitotemporal cortex regions reflecting higher-order visual processing may be a precursor to the NCC but not an NCC proper."

      It's possible to imagine a theory that would predict HGA could correlate with stimulus strength and repetition suppression, or that it would not correlate with prior exposure (e.g. prior exposure could impact response bias without affecting subjective visibility itself). The authors describe this exact ambiguity in interpretation later in the article (line 664), but in its current form, at least in line 270 (when the study is most extensively discussed), the manuscript heavily implies that HGA is not an NCC proper. This generates a false impression that intracranial recordings have conclusively determined that occipitotemporal HGA is not a pure NCC, which is certainly a premature conclusion.

      Line 243: "Altogether, these early human intracranial studies indicate that early-latency visual processing steps, reflected in broadband and low gamma activity, occur irrespective of whether a stimulus is consciously perceived or not. They also identified a candidate NCC: later (>200 ms) activity in the occipitotemporal region responsible for higher-order visual processing."

      The authors claim in this section that later (>200ms) activity in occipitotemporal regions may be a candidate for an NCC. However, the Fisch et al. (2009) study they describe in support of this conclusion found that early (~150ms) activity could dissociate conscious and unconscious processing. This would suggest that it is early processing that lays claim to perceptual consciousness. The authors explicitly describe the Fisch et al results as showing evidence for early markers of consciousness (line 240: '...exhibited an early...response following recognized vs unrecognised stimuli.) Yet only a few lines later they use this to support the conclusion that a candidate NCC is 'later (>200ms) activity in the occipitotemporal region' (line 245). As such, I am not sure what conclusion the authors want me to make from these studies.

      This problem is repeated in lines 386-387: "Altogether, studies that investigated the cortical correlates of visual consciousness point to a role of neural responses starting ~250 ms after stimulus onset in the non-primary visual cortex and prefrontal cortex."

      This seems to be directly in conflict with the Fisch et al results, which show that correlates of consciousness can begin ~100ms earlier than the authors state in this passage.

      (3) Justifying single-neuron cortical correlates of consciousness

      The purpose of the present manuscript is to highlight why and how intracortical measures of neural activity can help reveal the neural correlates of perceptual consciousness. As such, in the section 'Single-neuron cortical correlates of perceptual consciousness', I think the paper is lacking an argument as to why single-neuron research is useful when searching for the NCC. Most theories of consciousness are based around circuit or system-level analyses (e.g., global ignition, recurrent feedback, prefrontal indexing, etc.) and usually do not make predictions about single cells. Without any elaboration or argument as to why single-cell research is necessary for a science of consciousness, the research described in this section, although excellent and valuable in its own right, seems out of place in the broader discussion of NCCs. A particularly strong interpretation here could be that intracranial recordings mislead researchers into studying single cells simply because it is the finest level of analysis, rather than because it offers helpful insight into the NCCs.

      (4) No mention of combined fMRI-EEG research

      A minor point, but I was surprised that the authors did not mention any combined fMRI-EEG research when they were discussing the limits of noninvasive recordings. Intracortical recordings are one way to surpass the spatial and temporal resolution limits of M/EEG and fMRI respectively, but studies that combine fMRI and EEG are also an alternative means to solve this problem: by combining the spatial resolution of fMRI with the temporal resolution of EEG, researchers can - in theory - compare when and where certain activity patterns (be they univariate ERPs or multivariate patterns) arise. The authors do cite one paper (Dellert et al., 2021 JNeuro) that used this kind of setup, but they discuss it only with respect to the task and ignore the recording method. The argument for using intracranial recordings is weaker for not mentioning a viable, noninvasive alternative that resolves the same issues.

    3. Reviewer #2 (Public review):

      Summary:

      In this work, the authors review the study of the neural correlates of consciousness (NCCs). They discuss several of the difficulties that researchers must face when studying NCCs, and argue that several of these difficulties can be alleviated by using intracranial recordings in humans.

      They describe what constitutes an NCC, and the difficulties to distinguish between an NCC proper from the prerequisites and consequences of conscious processing.

      They also describe the two main types of experimental designs used to study NCCs. These are the contrastive approach (with its report and non-report variants), and the supraliminal approach, each with its own merits and pitfalls.

      They discuss the limitations of non-invasive methods, such as fMRI, EEG and MEG, as well as the limitations of the use of invasive recordings in non-human animals.

      After setting the stage in this way, the authors provide an extensive review of the knowledge acquired by using invasive recordings in humans. This included population-level measurements in vision and in other sensory modalities, as well as single-neuron level studies. The authors also discuss studies of subcortical NCCs.

      The second half of this work discusses the theoretical insights gained through the use of intracranial recordings, as well as their limitations, and a perspective for future work.

      Strengths:

      This work offers an impressive review, which will serve as a useful reference document, both for newcomers to the study of NCC and for experienced researchers. The inclusion of non-visual and subcortical NCCs is of particular merit, as these have been understudied.

      Besides serving as a review, this work includes a perspective, exploring several directions to pursue for the progress of the field.

      Weaknesses:

      The intention of the authors is to argue how some of the problems faced when studying NCCs are alleviated by the use of intracranial recordings in humans. But in some cases, the link between the problems related to the study of NCCs and the advantages of intracranial recordings over non-invasive methods is not clear.

      For example, the authors explain the difficulties in distinguishing between true NCCs from their prerequisites and consequences. This constitutes a difficult conceptual problems that plague all recording techniques. The authors don't provide a convincing explanation of how intracranial recordings offer advantages over EEG or MEG when dealing with these problems.

      For example, the authors explain how the use of non-report designs to rule out post-perceptual processing relies on null results, which, according to them, are harder to interpret given the low resolution of non-invasive methods. But the interpretation of null results is actually more complicated in the case of intracranial recordings. As the coverage achieved by the electrodes is sparse, if a null result is attested, it remains possible that a true effect was present in a nearby patch of cortex out of coverage.

      The authors argue that the spatial resolution of intracranial recordings is better than that of EEG and MEG. While this is technically true (especially compared to EEG), the true spatial scale of the NCCs is unknown. If NCCs' span is in the mm range, then the additional spatial resolution of intracranial recordings might not be an advantage.

      Another factor that should be taken into consideration when assessing the spatial resolution of intracranial recordings is that while the listening zone of individual intracranial contacts is small, coverage is sparse and defined by clinical criteria (something that the authors discuss). In practice, the activity recorded by contacts is usually attributed to anatomically defined ROIs with a scale in the cm range. Given the sparse and uneven (across regions and patients) coverage afforded by intracranial recordings, the advantage of intracranial recordings in terms of spatial resolution is overstated.

      Appraisal of whether the authors achieved their aims:

      In this work, the authors have gathered an impressive review and have discussed several important problems in the field of study of NCCs, as well as provided a perspective on how the field could move forward.

      What is less clear is how the use of intracranial recordings per se holds potential to overcome problems such as the distinction between true NCCs and the prerequisites and consequences of conscious processing.

      Discussion of the likely impact of the work on the field:

      This work has the potential of becoming a must-read for anyone working in the field of consciousness research.

    4. Reviewer #3 (Public review):

      Summary:

      This narrative review provides a clear, well-structured, and comprehensive synthesis of intracerebral recording work on the neural correlates of consciousness. It is written in an accessible manner that will be useful to a broad community of researchers, from those new to iEEG to specialists in the field.

      Strengths:

      The manuscript successfully integrates methodological and theoretical perspectives and offers a balanced overview of current, sometimes contradicting evidence. As such, the manuscript is important as it calls for a concerted and better exploration of NCCs using iEEG in the future.

      Weaknesses:

      The manuscript extensively discusses the use of "report" as a criterion for identifying conscious perception and its limitations for separating between correlates of consciousness and post-consciousness processes, yet the term is not defined at the outset. The authors should specify what they mean by "report" (e.g., verbal report, nonverbal self-report, or any meta-cognitive indication of experience). Importantly, this definition should be explicitly linked to the theoretical landscape: whether the authors adopt an access-consciousness perspective in which (self) reportability is central, or whether the review also aims to address phenomenal consciousness. Making this conceptual grounding explicit at the beginning will help readers interpret the empirical work surveyed throughout the review.

      In addition, the review would benefit from an earlier introduction of the distinction between states and contents of consciousness. This distinction becomes important in the later section on anaesthesia, sleep, and epileptic seizures, where the focus shifts from content-specific NCCs to alterations in global states. Presenting these definitions upfront and briefly explaining how states and contents interact would strengthen the coherence of the manuscript.

      Overall, this is an excellent and timely review. With clearer initial theoretical definitions of consciousness, the manuscript will offer an even stronger conceptual framework for interpreting intracerebral studies of consciousness.

    1. eLife Assessment

      This important study establishes a workflow based on environmental sampling for the discovery of bacteriophages capable of infecting antibiotic-resistant pathogens. The experimental design, analysis, and results demonstrating the effectiveness of the workflow are convincing, although a broader sampling scheme and more careful framing of the data within the current limitations of viral taxonomy could strengthen the work. This study will interest researchers working on bacterial infections, environmental microbiology, and phage-based alternatives for addressing antimicrobial resistance.

    2. Reviewer #1 (Public review):

      Summary:

      In the manuscript "Pathogen-Phage Geomapping to Overcome Resistance," Do et al. present an impressive demonstration of using geographical sampling and metagenomics to guide sample choice for enrichment in human-associated microbes and the pathogen of interest to increase the chances of success for isolating phages active against highly resistant bacterial strains. The authors document many notable successes (17!) with highly resistant bacterial isolates and share a thoughtfully structured phage discovery effort, potentially opening the door to similar geomapping efforts across the field. While the work is methodologically strong and valuable for the community, there are a few areas where additional clarification and analysis could better align the claims with the data presented.

      Strengths:

      (1) The manuscript describes a well-executed and transparent example of overcoming a major obstacle in therapeutic virus identification, providing a practical success story that will resonate with researchers in microbiology and medicine.

      (2) Many phage researchers have anecdotally experienced a similar phenomenon, that a particular wastewater treatment plant always seems to have the pathogens you need. Quantifying this with metagenomics modernizes and adds evidence to this phenomenon in a way that could help researchers reproduce this success in a methodical way.

      (3) The methodology of combining environmental sampling, viral screening, and host-range analysis is clearly articulated and reproducible, offering a valuable blueprint for others in the field.

      (4) The data are presented with appropriate analytical rigor, and the results include robust sequencing and metagenomic profiling that deepen understanding of local viral communities.

      (5) The 17 successes yielding 35 phages have a lot of phylogenetic novelty beyond what the Tailor labs have typically found with previous methods.

      (6) The work highlights a practical and innovative solution to an increasingly important clinical problem, supporting the development of personalized antiviral strategies.

      Weaknesses:

      (1) The central concept of geomapping as a broadly applicable strategy is wonderfully supported by the 17 successes documented in the paper. While this is actually, of course, a strength, the study does not include a comparative analysis across multiple sites with varying sampling outcomes for different bacterial types, which would be necessary to validate this claim more generally.

      (2) Some elements, such as beta diversity comparisons and the metagenomics analysis of viral dark matter, would benefit from additional statistical analysis and clearer context.

      (3) Claims about therapeutic cocktails would be better framed as speculative and/or moved to the discussion section.

      (4) The manuscript could be strengthened by elaborating on the scope and composition of the phage and bacterial isolate collections, which are important for interpreting the broader significance of the findings.

    3. Reviewer #2 (Public review):

      Summary:

      The manuscript by Do and colleagues aims to develop a workflow for isolating and identifying bacteriophages with potential applications in phage therapy against antibiotic-resistant pathogens. The workflow integrates geΦmapping as a strategy to identify potential phage sources, ΦHD as a device for phage concentration, and RΦ as a phage library constructed from the initial sampling, resulting in the discovery of 36 new phages. The paper is overall interesting, and the proposed method appears robust and effective.

      Strengths:

      The methods proposed combined state-of-the-art strategies to solve an ever-increasing problem of antibiotic resistance. The methods are robust, and the controls are appropriate. The integration of environmental sampling, concentration strategies, and downstream genomic characterization is a clear strength and provides a potentially scalable framework for identifying candidate therapeutic phages. The manuscript is clearly written overall, and the results support the main conclusions.

      Weaknesses:


      While the authors acknowledge several limitations, some aspects require clearer framing or additional clarification. The proposed workflow focuses exclusively on aquatic environments as sources of phages, which may limit the diversity of hosts and phage types recoverable using this approach. Some interpretations, particularly regarding taxonomic classification and sampling saturation, would benefit from more cautious wording given current limitations in viral taxonomy and the observed data.

    1. eLife Assessment

      This important work shows that a history of cocaine self-administration disrupts the orbitofrontal cortex's ability to encode similarities between distinct sensory stimuli that possess identical task information - hidden states. The evidence supporting these conclusions is compelling, with methods and analyses spanning self-administration, a novel 'figure 8' sequential odor task, recordings from 3,881 single units, and sophisticated firing analyses revealing complex orbitofrontal representations of task structure. These results will be of broad interest to psychologists, neuroscientists, and clinicians.

    2. Reviewer #1 (Public review):

      Summary:

      In this study, the authors trained rats on a "figure 8" go/no-go odor discrimination task. Six odor cues (3 rewarded and 3 non-rewarded) were presented in a fixed temporal order and arranged into two alternating sequences that partially overlap (Sequence #1: 5⁺-0⁻-1⁻-2⁺; Sequence #2: 3⁺-0⁻-1⁻-4⁺) --forming an abstract figure-8 structure of looping odor cues.

      This task is particularly well-suited for probing representations of hidden states, defined here as the animal's position within the task structure beyond superficial sensory features. Although the task can be solved without explicit sequence tracking, it affords the opportunity to generalize across functionally equivalent trials (or "positions") in different sequences, allowing the authors to examine how OFC representations collapse across latent task structure.

      Rats were first trained to criterion on the task and then underwent 15 days of self-administration of either intravenous cocaine (3 h/day) or sucrose. Following self-administration, electrodes were implanted in lateral OFC, and single-unit activity was recorded while rats performed the figure-8 task.

      Across a series of complementary analyses, the authors report several notable findings. In control animals, lOFC neurons exhibit representational compression across corresponding positions in the two sequences. This compression is observed not only in trial/positions involving overlapping odor (e.g., Position 3 = odor 1 in sequence 1 vs sequence 2), but also in trials/positions involving distinct, sequence-specific odors (e.g., Position 4: odor 2 vs odor 4) --indicating generalization across functionally equivalent task states. Ensemble decoding confirms that sequence identity is weakly decodable at these positions, consistent with the idea that OFC representations collapse incidental differences in sensory information into a common latent or hidden state representation. In contrast, cocaine-experienced rats show persistently stronger differentiation between sequences, including at overlapping odor positions.

      Strengths:

      Elegant behavioral design that affords the detection of hidden-state representations.

      Sophisticated and complementary analytical approaches (single-unit activity, population decoding, and tensor component analysis).

      Weaknesses:

      The number of subjects is small --can't fully rule out idiosyncratic, animal-specific effects.

      Comments

      (1) Emergence of sequence-dependent OFC representations across learning.

      A conceptual point that would benefit from further discussion concerns the emergence of sequence-dependent OFC activity at overlapping positions (e.g., position P3, odor 1). This implies knowledge of the broader task structure. Such representations are presumably absent early in learning, before rats have learned the sequence structure. While recordings were conducted only after rats were well trained, it would be informative if the authors could comment on how they envision these representations developing over learning. For example, does sequence differentiation initially emerge as animals learn the overall task structure, followed by progressive compression once animals learn that certain states are functionally equivalent? Clarifying this learning-stage interpretation would strengthen the theoretical framing of the results.

      (2) Reference to the 24-odor position task

      The reference to the previously published 24-odor position task is not well integrated into the current manuscript. Given that this task has already been published and is not central to the main analyses presented here, the authors may wish to a) better motivate its relevance to the current study or b) consider removing this supplemental figure entirely to maintain focus.

      (3) Missing behavioral comparison

      Line 117: the authors state that absolute differences between sequences differ between cocaine and sucrose groups across all three behavioral measures. However, Figure 1 includes only two corresponding comparisons (Fig. 1I-J). Please add the third measure (% correct) to Figure 1, and arrange these panels in an order consistent with Figure 1F-H (% correct, reaction time, poke latency).

      (4) Description of the TCA component

      Line 220: authors wrote that the first TCA component exhibits low amplitude at positions P1 and P4 and high amplitude at positions P2 and P3. However, Figure 3 appears to show the opposite pattern (higher magnitude at P1 and P4 and lower magnitude at P2 and P3). Please check and clarify this apparent discrepancy. Alternatively, a clearer explanation of how to interpret the temporal dynamics and scaling of this component in the figure would help readers correctly understand the result.

      (5) Sucrose control<br /> Sucrose self-administration is a reasonable control for instrumental experience and reward exposure, but it means that this group also acquired an additional task involving the same reinforcer. This experience may itself influence OFC representations and could contribute to the generalization observed in control animals. A brief discussion of this possibility would help contextualize the interpretation of cocaine-related effects.

      (6) Acknowledge low N

      The number of rats per group is relatively low. Although the effects appear consistent across animals within each group, this sample size does not fully rule out idiosyncratic, animal-specific effects. This limitation should be explicitly acknowledged in the manuscript.

      (7) Figure 3E-F: The task positions here are ordered differently (P1, P4, P2, P3) than elsewhere in the paper. Please reorder them to match the rest of the paper.

    3. Reviewer #2 (Public review):

      In the current study, the authors use an odor-guided sequence learning task described as a "figure 8" task to probe neuronal differences in latent state encoding within the orbitofrontal cortex after cocaine (n = 3) vs sucrose (n = 3) self-administration. The task uses six unique odors which are divided into two sequences that run in series. For both sequences, the 2nd and 3rd odors are the same and predict reward is not available at the reward port. The 1st and 4th odors are unique, and are followed by reward. Animals are well-trained before undergoing electrode implant and catheterization, and then retrained for two weeks prior to recording. The hypothesis under test is that cocaine-experienced animals will be less able to use the latent task structure to perform the task, and instead encode information about each unique sequence that is largely irrelevant. Behaviorally, both cocaine and sucrose-experienced rats show high levels of accuracy on task, with some group differences noted. When comparing reaction times and poke latencies between sequences, more variability was observed in the cocaine-treated group, implying animals treated these sequences somewhat differently. Analyses done at the single unit and ensemble level suggests that cocaine self-administration had increased the encoding of sequence-specific information, but decreased generalization across sequences. For example, the ability to decode odor position and sequence from neuronal firing in cocaine-treated animals was greater than controls. This pattern resembles that observed within the OFC of animals that had fewer training sessions. The authors then conducted tensor component analysis (TCA) to enable a more "hypothesis agnostic" evaluation of their data.

      Overall, the paper is well written and the authors do a good job of explaining quite complicated analyses so that the reader can follow their reasoning. I have the following comments.

      While well-written, the introduction mainly summarises the experimental design and results, rather than providing a summary of relevant literature that informed the experimental design. More details regarding the published effects of cocaine self-administration on OFC firing, and on tests of behavioral flexibility across species, would ground the paper more thoroughly in the literature and explain the need for the current experiment.

      For Fig 1F, it is hard to see the magnitude of the group difference with the graph showing 0-100%- can the y axis be adjusted to make this difference more obvious? It looks like the cocaine-treated animals were more accurate at P3- is that right?<br /> The concluding section is quite brief. The authors suggest that the failure to generalize across sequences observed in the current study could explain why people who are addicted to cocaine do not use information learned e.g. in classrooms or treatment programs to curtail their drug use. They do not acknowledge the limitations of their study e.g. use of male rats exclusively, or discuss alternative explanations of their data.

      Is it a problem that neuronal encoding of the "positions" i.e. the specific odors was at or near chance throughout in controls? Could they be using a simpler strategy based on the fact that two successive trials are rewarded, then two successive trials are not rewarded, such that the odors are irrelevant?

      When looking at the RT and poke latency graphs, it seems the cocaine-experienced rats were faster to respond to rewarded odors, and also faster to poke after P3. Does this mean they were more motivated by the reward?

    1. eLife Assessment

      This important study provides the first direct neuroimaging evidence for the integration-segregation theory of exogenous attention underlying inhibition of return, using an optimized IOR-Stroop fMRI paradigm to dissociate integration and segregation processes and to demonstrate that attentional orienting modulates semantic- and response-level conflict processing. Although the empirical evidence is compelling, clearer justification of the experimental logic, more cautious framing of behavioral and regional interpretations, and greater transparency in reporting and presentation are needed to strengthen the conclusions. The work will be of broad interest to researchers investigating visual attention, perception, cognitive control, and conflict processing.

    2. Reviewer #1 (Public review):

      Summary:

      This study makes a significant and timely contribution to the field of attention research. By providing the first direct neuroimaging evidence for the integration-segregation theory of exogenous attention, it fills a critical gap in our understanding of the neural mechanisms underlying inhibition of return (IOR). The authors employ a carefully optimized cue-target paradigm combined with fMRI to elegantly dissociate the neural substrates of cue-target integration from those of segregation, thereby offering compelling support for the integration-segregation account. Beyond validating a key theoretical hypothesis, the study also uncovers an interaction between spatial orienting and cognitive conflict processing, suggesting that exogenous attention modulates conflict processing at both semantic and response levels. This finding shed new light on the neural mechanisms that connect exogenous attentional orienting with cognitive control.

      Strengths:

      The experimental design is rigorous, the analyses are thorough, and the interpretation is well grounded in the literature. The manuscript is clearly written, logically structured, and addresses a theoretically important question. Overall, this is an excellent, high-impact study that advances both theoretical and neural models of attention.

      Weaknesses:

      While this study addresses an important theoretical question and presents compelling neuroimaging findings, a few additional details would help improve clarity and interpretation. Specifically, more information could be provided regarding the experimental conditions (SI and RI), the justification for the criteria used for excluding behavioral trials, and how the null condition was incorporated into the analyses. In addition, given the non-significant interaction effect in the behavioral results, the claim that the behavioral data "clearly isolated" distinct semantic and response conflict effects should be phrased more cautiously.

    3. Reviewer #2 (Public review):

      Summary:

      This study provides evidence for the integration-segregation theory of an attentional effect, widely cited as inhibition of return (IOR), from a neuroimaging perspective, and explores neural interactions between IOR and cognitive conflict, showing that conflict processing is potentially modulated by attentional orienting.

      Strengths:

      The integration-segregation theory was examined in a sophisticated experimental task that also accounted for cognitive conflict processing, which is phenomenologically related to IOR but "non-spatial" by nature. This study was carefully designed and executed. The behavioral and neuroimaging data were carefully analyzed and largely well presented.

      Weaknesses:

      The rationale for the experimental design was not clearly explained in the manuscript; more specifically, why the current ER-fMRI study would disentangle integration and segregation processes was not explained. The introduction of "cognitive conflict" into the present study was not well reasoned for a non-expert reader to follow.

      The presentation of the results can be further improved, especially the neuroimaging results. For instance, Figure 4 is challenging to interpret. If "deactivation" (or a reduction in activation) is regarded as a neural signature of IOR, this should be clearly stated in the manuscript.

    4. Reviewer #3 (Public review):

      Summary:

      This study aims to provide the first direct neuroimaging evidence relevant to the integration-segregation theory of exogenous attention - a framework that has shaped behavioral research for more than two decades but has lacked clear neural validation. By combining an inhibition-of-return (IOR) paradigm with a modified Stroop task in an optimized event-related fMRI design, the authors examine how attentional integration and segregation processes are implemented at the neural level and how these processes interact with semantic and response conflicts. The central goal is to map the distinct neural substrates associated with integration and segregation and to clarify how IOR influences conflict processing in the brain.

      Strengths:

      The study is well-motivated, addressing a theoretically important gap in the attention literature by directly testing a long-standing behavioral framework with neuroimaging methods. The experimental approach is creative: integrating IOR with a Stroop manipulation expands the theoretical relevance of the paradigm, and the use of a genetic-algorithm-optimized fMRI design ensures high efficiency. Methodologically, the study is sound, with rigorous preprocessing, appropriate modeling, and analyses that converge across multiple contrasts. The results are theoretically coherent, demonstrating plausible dissociations between integration-related activity in the fronto-parietal attention network (FEF, IPS, TPJ, dACC) and segregation-related activity in medial temporal regions (PHG, STG). The findings advance the field by supplying much-needed neural evidence for the integration-segregation framework and by clarifying how IOR modulates conflict processing.

      Weaknesses:

      Some interpretive aspects would benefit from clarification, particularly regarding the dual roles ascribed to dACC activation and the circumstances under which PHG and STG are treated as a single versus separate functional clusters. Reporting conventions are occasionally inconsistent (e.g., statistical formatting, abbreviation definitions), which may hinder readability. More detailed reporting of sample characteristics, exclusion criteria, and data-quality metrics-especially regarding the global-variance threshold-would improve transparency and reproducibility. Finally, some limitations of the study, including potential constraints on generalization, are not explicitly acknowledged and should be articulated to provide a more balanced interpretation.

    1. eLife Assessment

      This important work contributes a transcriptional dataset that identifies potential genes involved in axon initial growth and axon regrowth, followed by a characterization of axon phenotypes after knockdown of a subset of these genes. Focused experiments on a single gene, Pmvk, highlight the potential role of the mevalonate pathway in axon regrowth. The methods are convincing, though partially incomplete. The data establish a basis for further studies on axonal development and will be of interest to both developmental neurobiologists and those seeking to develop molecular tools to target, monitor, and manipulate axon morphology and function.

    2. Reviewer #1 (Public review):

      Summary:

      Fahdan et al. present a study investigating the molecular programs underlying axon initial growth and regrowth in Drosophila mushroom body (MB) neurons. The authors leverage the fact that different Kenyon cell (KC) subtypes undergo distinct axonal events on the same developmental timeline: γ KCs prune and then regrow their axons during early pupation, whereas α/β KCs extend their axons for the first time during the same pupal period. Using bulk Smart-seq2 RNA sequencing across six developmental time points, the authors identify genes enriched during γ KC regrowth and α/β KC initial outgrowth, and subsequently perform an RNAi screen to determine which candidates are functionally required for these processes.

      Among these, they focus on Pmvk, a key enzyme in the mevalonate pathway. Both RNAi knockdown and a CRISPR-generated mutant produce strong γ KC regrowth defects. Knockdown of other mevalonate pathway components (Hmgcr, Mvk) partially recapitulates this phenotype. The authors propose that Pmvk promotes axonal regrowth through effects on the TOR pathway.

      Overall, this work identifies new molecular players in developmental axon remodeling and provides intriguing evidence connecting Pmvk to γ KC regrowth.

      While the Pmvk knockdown and loss-of-function data are compelling, the evidence that the mevalonate pathway broadly regulates γ KC axon regrowth is less clear. RNAi knockdown of enzymes upstream of Pmvk (Hmgcr, Mvk) produces only mild phenotypes, and knockdown of several downstream enzymes produces no phenotype. The authors attribute this discrepancy to the possibility of weak RNAi constructs, which is plausible but not fully demonstrated. It would be helpful for the authors to discuss alternative explanations, including non-canonical roles for Pmvk that may not require the full pathway, and clarify the extent to which the current data support the conclusion that the mevalonate pathway, rather than Pmvk specifically, is a core regulator of regrowth.

      It is not clear from the Methods whether γ KCs and α/β KCs were sorted from the same brains using orthogonal binary expression systems (e.g., Gal4 > reporter 1 and LexA > reporter 2), or isolated separately from different fly lines. If the latter, differences in genetic background, staging, or batch effects could influence transcriptional comparisons. This should be explicitly clarified in the Methods, and any associated limitations discussed in the manuscript.

      The authors have made important findings that contribute to our understanding of axon growth and regrowth. As written, some major claims are only partially supported, but these issues can be addressed through reframing and clarification. In particular, the manuscript would benefit from (1) a more cautious interpretation of the mevalonate pathway's role, potentially considering Pmvk non-canonical functions, and (2) addressing methodological ambiguities in the transcriptomic analysis.

    3. Reviewer #2 (Public review):

      Fahdan et al. set out to build upon their previous work outlining the genes involved in axon growth, targeting two axon growth states: initial growth and regrowth. They outline a debate in the field that axon regrowth (For instance, after injury or in the peripheral nervous system) is different from initial axon growth, for which the authors have previously demonstrated distinct mechanisms. The authors set out to directly compare the transcriptomes of initial axon growth and regrowth, specifically within the same neuronal environment and developmental time point. To this end, the authors used the well-characterized genetic tools available in Drosophila melanogaster (the fruit fly) to build a valuable dataset of genes involved at different time points in axon growth (alpha/beta Mushroom Body Kenyon cells) and regrowth (gamma Mushroom Body Kenyon cells). The authors then focus on genes that are upregulated during both initial axon growth and axon regrowth. Then, using this subset of genes, they screen for axonal growth and regrowth deficits by knocking down 300 of these genes. 12 genes are found to be phenotypically involved in both axon growth and regrowth based on RNAi gene-targeted knockdown in the Mushroom Body. Of these 12 genes, the authors focus on one gene, Pmvk, which is part of the mevalonate pathway. They then highlight other genes in this pathway. But these genes primarily affect axon regrowth, not initial axon growth, implicating metabolic pathways in axon regrowth. This comprehensive RNA-seq dataset will be a valuable resource for the field of axon growth and regrowth, as well as for other researchers studying the Mushroom Body.

      Strengths:

      This paper contains many strengths, including the in-depth sequencing of overlapping developmental time points during the alpha/beta KCs' initial axon growth and gamma KCs' regrowth. This produces a rich dataset of differentially expressed genes across different time points in either cell population during development. In addition, the authors characterized expression patterns at developmental time points for 30 Gal4 lines previously identified as alpha/beta KC-expressing. This is very helpful for Drosophila

      Mushroom Body researchers because the authors not only characterized alpha/beta expression but also alpha'/beta' expression, gamma expression, and non-MB expression. The authors comprehensively walked through identifying differentially expressed genes during alpha/beta axon growth, identifying a subset of overlapping upregulated genes between cell types, then systematically characterized whether knockdown of a subset of these genes produced an axonal growth defect, and finally selected 1 of 3 cell-autonomous genes important for gamma KCs regrowth to further study.

      The authors utilized the developing Mushroom Body in Drosophila melanogaster, which happens to have new neurons developing axons and neurons that have undergone pruning and are regrowing neurons at the same developmental time. They are also in the same part of the brain (the Mushroom Body) and, in theory, since the authors implicate a metabolic pathway, they will have similar metabolic growth conditions.

      Identifying Pmvk and two other components of the mevalonate pathway in axon regrowth opens up novel avenues for future studies on the role this metabolic pathway may have in axon growth. The authors of this paper are also very upfront about their negative results, allowing researchers to avoid running redundant experiments and truly build on this work.

      Weaknesses:

      While the dataset produced in this study is a strength, certain aspects make it more challenging to interpret. For instance, the authors state that roughly equal numbers of males and females are used for sequencing, and this vagueness, coupled with only taking a subset of the GFP-labeled neurons during FACs sorting, can introduce confounds into the dataset. This may hold true in imaging studies as well, in which males and females were used interchangeably.

      Additionally, a rationale is needed to explain why random numbers of 1-7 were assigned to zero-expressing genes in the DESeq analysis. This does not seem to conform to the usual way this analysis is normally performed. This can alter how genes across the dataset are normalized and requires further explanation.

      The display and discussion of the data set do not always align with the authors' stated goal of having a comprehensive description of the genes that dynamically change during axon<br /> growth and regrowth. Displaying more information about genes differentially expressed in the alpha/beta KCs, or any information about the genes diƯerentially expressed in the gamma KCs when using the same criteria as the alpha/beta KCs, or the 676 overlapping upregulated genes, would significantly add to this paper. The authors previously performed a similar study across developmental time points for gamma KCs, and it is not clear whether any overlapping genes were identified. Also, more information on the genes consisting of PC1 and PC3 when showing the PCA analysis would be helpful. Within the text, there is a discussion of why certain genes or gene groups were omitted or selected, such as clusters 1 and 2, and then some of their subgroups based on expected genes. There is also some discussion of omitted gene groups, but this is not complete across the different clusters, nor is there a discussion of why PC2 was not selected or of which genes might exhibit greater variability than cell type. The authors would make a stronger case for the genes they pursued if they showed that groups of genes already known to be involved in axon growth clustered within the selected groups. Since we do not see the gene lists, this is unclear and adds to the sometimes arbitrary nature of the author's choices about what to pursue in this paper. A larger set of descriptors, such as gene lists and Gene Ontology analysis beyond what is shown, would be very helpful in putting the results in context and determining whether this is a resource beneficial to others.

      While the Pmvk story is interesting, the authors appear to make some arbitrary decisions in what is shown or pursued in this paper. Visually, CadN and Twr appear to be more severe axon regrowth phenotypes, where the peduncle appears intact, and axons are not regrowing in Figures 3 N and O. In contrast, Pmvk visually appears to lose neurons in Figure 3 M. With a change of the Gal4 driver (Figure 4), Pmvk now produces a gamma axon regrowth phenotype similar to CadN and Twr in Figure 3. This diƯerence in the use of Gal4 for characterizing axonal phenotypes is not discussed, making some interpretations more challenging due to diƯerences in Gal4 expression strength. For instance, the sequencing work was done with a diƯerent Gal4 MB expressing line than the characterization of gene knockdowns. Further characterization of the Pmvk was performed in the same Gal4 lines as the sequencing (Figure 4), suggesting a potential diƯerence in Gal4 strength that may play a role in their rescue experiments if they are using a slightly weaker Gal4 for gamma lobe expression. A broader discussion of this may make the selection of Pmvk less arbitrary if the phenotype is similar to those of CadN and Twr. Along the lines of the sometimes arbitrary nature of the genes chosen to pursue further, the authors state that they selected genes that showed differential expression at any time point. As they refine their list of genes to pursue further, they seem to prioritize genes that change at 18-21 APF. This appears to be the early period for axon growth in alpha/beta KCs and gamma KCs, based on Figure 1. A stronger case might be made at longer time points when the axon is growing or regrowing.

      The paper would benefit from scaling back the claim that the mevalonate pathway is involved. The authors identified only a subset of genes from the mevalonate pathway, all immediately upstream of Pmvk, with no effect on downstream genes. Along these lines, the paper would benefit from a discussion of non-canonical PmvK signaling.

      While the ability to take neurons at the same developmental time and from the same brain region is a strength, they are still 2 different types of neurons. Although gamma neuron axon growth occurs very early in development, it would be interesting to know whether the same genes are involved in their initial growth. A caveat to the author's conclusion is that these are 2 different cell types, and they might use different genetic programs or use overlapping ones at other times. The authors did not show that gamma KCs use these genes in their initial axon growth.

    1. eLife Assessment

      This valuable study characterises the activity of motor units from two of the three anatomical subdivisions ("heads") of the triceps muscle while mice walked on a treadmill at various speeds. Altogether, this is the most thorough characterisation of motor unit activity in walking mice to date, providing solid evidence for probabilistic recruitment of motor units that differed between the two heads.

    2. Reviewer #1 (Public review):

      Summary:

      Here, the authors have addressed the recruitment and firing patterns of motor units (MUs) from the long and lateral heads of triceps in the mouse. They used their newly developed Myomatrix arrays to record from these muscles during treadmill locomotion at different speeds, and they used template-based spike sorting (Kilosort) to extract units. Between MUs from the two heads, the authors observe differences in their firing rates, recruitment probability, phase of activation within the locomotor cycle and interspike interval patterning. Examining different walking speeds, the authors find increases in both recruitment probability and firing rates as speed increases. The authors also observed differences in the relation between recruitment and the angle of elbow extension between motor units from each head. These differences indicate meaningful variation between motor units within and across motor pools, and may reflect the somewhat distinct joint actions of the two heads of triceps.

      Strengths:

      The extraction of MU spike timing for many individual units is an exciting new method that has great promise for exposing the fine detail in muscle activation and its control by the motor system. In particular, the methods developed by the authors for this purpose seem to be the only way to reliably resolve single MUs in the mouse, as the methods used previously in humans and in monkeys (e.g. Marshall et al. Nature Neuroscience, 2022) do not seem readily adaptable for use in rodents.

      The paper provides a number of interesting observations. There are signs of interesting differences in MU activation profiles for individual muscles here, consistent with those shown by Marshall et al. It is also nice to see fine scale differences in the activation of different muscle heads, which could relate to their partially distinct functions. The mouse offers greater opportunities for understanding the control of these distinct functions, compared to the other organisms in which functional differences between heads have previously been described.

      The Discussion is very thorough, providing a very nice recounting of a great deal of relevant previous results.

      Weaknesses:

      The findings are limited to one pair of muscle heads. While the findings are important in their own right, the lack of confirmation from analysis of other muscles acting at other joints leaves the generalization of these findings unclear.

      While differences between muscle heads with somewhat distinct functions are interesting and relevant to joint control, differences between MUs for individual muscles, like those in Marshall et al., are more striking because they cannot be attributed potentially to differences in each head's function. The present manuscript does show some signs of differences for MUs within individual heads (e.g. Figure 2C), but the manuscript falls short of providing a statistical basis for the existence of distinct subpopulations.

    3. Reviewer #2 (Public review):

      The present study, led by Thomas and collaborators, aims to characterise the firing activity of individual motor units in mice during locomotion. To achieve this, the team implanted small arrays of eight electrodes into two heads of the triceps and performed spike sorting using a custom implementation of Kilosort. Concurrently, they tracked the positions of the shoulder, elbow, and wrist using a single camera and a markerless motion capture algorithm (DeepLabCut). Repeated one-minute recordings were conducted in six mice across five speeds, ranging from 10 to 27.5 cm-1.

      From these data, the authors demonstrate that:

      - Their recording method and adapted spike-sorting algorithm enable robust decoding of motor unit activity during rapid movements.<br /> - Identified motor units tend to be recruited during a subset of strides, with recruitment probability increasing with speed.<br /> - Motor units within individual heads of the triceps likely receive common synaptic inputs that correlate their activity, whereas motor units from different heads exhibit distinct behaviour.

      The authors conclude that these differences arise from the distinct functional roles of the muscles and the task constraints (i.e., speed).

      Strengths:

      - The novel combination of electrode arrays for recording intramuscular electromyographic signals from a larger muscle volume, paired with an advanced spike-sorting pipeline capable of identifying motor unit populations.<br /> - The robustness of motor unit decoding during fast movements.

      Weaknesses:

      - The data do not clearly indicate which motor units were sampled from each pool, leaving uncertainty as to whether the sample is biased towards high-threshold motor units or representative of the entire pool.<br /> - The results largely confirm the classic physiological framework of motor unit recruitment and rate coding, offering limited new insights into motor unit physiology.

      I would like to thank the authors for their thorough and insightful revisions. I am particularly pleased with the inclusion of the new analyses demonstrating the robustness of motor unit decoding, as well as the improved transparency regarding spike-sorting yield for each muscle and animal. Additionally, the new analyses illustrating that recruitment within muscle heads is consistent with the presence of common synaptic inputs and orderly recruitment significantly strengthen the manuscript.

    4. Reviewer #3 (Public review):

      Summary:

      Using the approach of Myomatrix recording, the authors report that 1) motor units are recruited differently in the two types of muscles and 2) individual units are probabilistically recruited during the locomotion strides, whereas the population bulk EMG has a more reliable representation of the muscle. Third, the recruitment of units was proportional to walking speed.

      Strengths:

      The new technique provides a unique dataset, and the data analysis is convincing and well-executed.

      Weaknesses:

      After the revision, I no longer see any apparent weaknesses in the study.

    5. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Here, the authors have addressed the recruitment and firing patterns of motor units (MUs) from the long and lateral heads of the triceps in the mouse. They used their newly developed Myomatrix arrays to record from these muscles during treadmill locomotion at different speeds, and they used template-based spike sorting (Kilosort) to extract units. Between MUs from the two heads, the authors observed differences in their firing rates, recruitment probability, phase of activation within the locomotor cycle, and interspike interval patterning. Examining different walking speeds, the authors find increases in both recruitment probability and firing rates as speed increases. The authors also observed differences in the relation between recruitment and the angle of elbow extension between motor units from each head. These differences indicate meaningful variation between motor units within and across motor pools and may reflect the somewhat distinct joint actions of the two heads of triceps.

      Strengths:

      The extraction of MU spike timing for many individual units is an exciting new method that has great promise for exposing the fine detail in muscle activation and its control by the motor system. In particular, the methods developed by the authors for this purpose seem to be the only way to reliably resolve single MUs in the mouse, as the methods used previously in humans and in monkeys (e.g. Marshall et al. Nature Neuroscience, 2022) do not seem readily adaptable for use in rodents.

      The paper provides a number of interesting observations. There are signs of interesting differences in MU activation profiles for individual muscles here, consistent with those shown by Marshall et al. It is also nice to see fine-scale differences in the activation of different muscle heads, which could relate to their partially distinct functions. The mouse offers greater opportunities for understanding the control of these distinct functions, compared to the other organisms in which functional differences between heads have previously been described.

      The Discussion is very thorough, providing a very nice recounting of a great deal of relevant previous results.

      We thank the Reviewer for these comments.

      Weaknesses:

      The findings are limited to one pair of muscle heads. While an important initial finding, the lack of confirmation from analysis of other muscles acting at other joints leaves the general relevance of these findings unclear.

      The Reviewer raises a fair point. While outside the scope of this paper, future studies should certainly address a wider range of muscles to better characterize motor unit firing patterns across different sets of effectors with varying anatomical locations. Still, the importance of results from the triceps long and lateral heads should not be understated as this paper, to our knowledge, is the first to capture the difference in firing patterns of motor units across any set of muscles in the locomoting mouse.

      While differences between muscle heads with somewhat distinct functions are interesting and relevant to joint control, differences between MUs for individual muscles, like those in Marshall et al., are more striking because they cannot be attributed potentially to differences in each head's function. The present manuscript does show some signs of differences for MUs within individual heads: in Figure 2C, we see what looks like two clusters of motor units within the long head in terms of their recruitment probability. However, a statistical basis for the existence of two distinct subpopulations is not provided, and no subsequent analysis is done to explore the potential for differences among MUs for individual heads.

      We agree with the Reviewer and have revised the manuscript to better examine potential subpopulations of units within each muscle as presented in Figure 2C. We performed Hartigan’s dip test on motor units within each muscle to test for multimodal distributions. For both muscles, p > 0.05, so we cannot reject the null hypothesis that the units in each muscle come from a multimodal distribution. However, Hartigan’s test and similar statistical methods have poor statistical power for the small sample sizes (n=17 and 16 for long and lateral heads, respectively) considered here, so the failure to achieve statistical significance might reflect either the absence of a true difference or a lack of statistical resolution.

      Still, the limited sample size warrants further data collection and analysis since the varying properties across motor units may lead to different activation patterns. Given these results, we have edited the text as follows:

      “A subset of units, primarily in the long head, were recruited in under 50% of the total strides and with lower spike counts (Figure 2C). This distribution of recruitment probabilities might reflect a functionally different subpopulation of units. However, the distribution of recruitment probabilities were not found to be significantly multimodal (p>0.05 in both cases, Hartigan’s dip test; Hartigan, 1985). However, Hartigan’s test and similar statistical methods have poor statistical power for the small sample sizes (n=17 and 16 for long and lateral heads, respectively) considered here, so the failure to achieve statistical significance might reflect either the absence of a true difference or a lack of statistical resolution.”

      The statistical foundation for some claims is lacking. In addition, the description of key statistical analysis in the Methods is too brief and very hard to understand. This leaves several claims hard to validate.

      We thank the Reviewer for these comments and have clarified the text related to key statistical analyses throughout the manuscript, as described in our other responses below.

      Reviewer #2 (Public review):

      The present study, led by Thomas and collaborators, aims to describe the firing activity of individual motor units in mice during locomotion. To achieve this, they implanted small arrays of eight electrodes in two heads of the triceps and performed spike sorting using a custom implementation of Kilosort. Simultaneously, they tracked the positions of the shoulder, elbow, and wrist using a single camera and a markerless motion capture algorithm (DeepLabCut). Repeated one-minute recordings were conducted in six mice at five different speeds, ranging from 10 to 27.5 cm·s<sup>-1</sup>.

      From these data, the authors reported that:

      (1) a significant portion of the identified motor units was not consistently recruited across strides,

      (2) motor units identified from the lateral head of the triceps tended to be recruited later than those from the long head,

      (3) the number of spikes per stride and peak firing rates were correlated in both muscles, and

      (4) the probability of motor unit recruitment and firing rates increased with walking speed.

      The authors conclude that these differences can be attributed to the distinct functions of the muscles and the constraints of the task (i.e., speed).

      Strengths:

      The combination of novel electrode arrays to record intramuscular electromyographic signals from a larger muscle volume with an advanced spike sorting pipeline capable of identifying populations of motor units.

      We thank the Reviewer for this comment.

      Weaknesses:

      (1) There is a lack of information on the number of identified motor units per muscle and per animal.

      The Reviewer is correct that this information was not explicitly provided in the prior submission. We have therefore added Table 1 that quantifies the number of motor units per muscle and per animal.

      (2) All identified motor units are pooled in the analyses, whereas per-animal analyses would have been valuable, as motor units within an individual likely receive common synaptic inputs. Such analyses would fully leverage the potential of identifying populations of motor units.

      Please see our answer to the following point, where we address questions (2) and (3) together.

      (3) The current data do not allow for determining which motor units were sampled from each pool. It remains unclear whether the sample is biased toward high-threshold motor units or representative of the full pool.

      We thank the Reviewer for these comments. To clarify how motor unit responses were distributed across animals and muscle targets, we updated or added the following figures:  

      Figure 2C

      Figure 4–figure supplement 1

      Figure 5–figure supplement 2

      Figure 6–figure supplement 2

      These provide a more complete look at the range of activity within each motor pool, suggesting that we do measure from units with different activation thresholds within the same motor pool, rather than this variation being due to cross-animal differences. For example, Figure 2C illustrates that motor units from the same muscle and animal show a wide variety of recruitment probabilities. However, the limited number of motor units recorded from each individual animal does not allow a statistically rigorous test for examining cross-animal differences.

      (4) The behavioural analysis of the animals relies solely on kinematics (2D estimates of elbow angle and stride timing). Without ground reaction forces or shoulder angle data, drawing functional conclusions from the results is challenging.

      The Reviewer is correct that we did not measure muscular force generation or ground reaction forces in the present study. Although outside the scope of this study, future work might employ buckle force transducers as used in larger animals (Biewener et al., 1988; Karabulut et al., 2020) to examine the complex interplay between neural commands, passive biomechanics, and the complex force-generating properties of muscle tissue.

      Major comments:

      (1) Spike sorting

      The conclusions of the study rely on the accuracy and robustness of the spike sorting algorithm during a highly dynamic task. Although the pipeline was presented in a previous publication (Chung et al., 2023, eLife), a proper validation of the algorithm for identifying motor unit spikes is still lacking. This is particularly important in the present study, as the experimental conditions involve significant dynamic changes. Under such conditions, muscle geometry is altered due to variations in both fibre pennation angles and lengths.

      This issue differs from electrode drift, and it is unclear whether the original implementation of Kilosort includes functions to address it. Could the authors provide more details on the various steps of their pipeline, the strategies they employed to ensure consistent tracking of motor unit action potentials despite potential changes in action potential waveforms, and the methods used for manual inspection of the spike sorting algorithm's output?

      This is an excellent point and we agree that the dynamic behavior used in this investigation creates potential new challenges for spike sorting. In our analysis, Kilosort 2.5 provides key advantages in comparing unit waveforms across multiple channels and in detecting overlapping spikes. We modified this version of Kilosort to construct unit waveform templates using only the channels within the same muscle (Chung et al., 2023), as clarified in the revised Methods section (see “Electromyography (EMG)”):

      “A total of 33 units were identified across all animals. Each unit’s isolation was verified by confirming that no more than 2% of inter-spike intervals violated a 1 ms refractory limit. Additionally, we manually reviewed cross-correlograms to ensure that each waveform was only reported as a single motor unit.”

      The Reviewer is correct that our ability to precisely measure a unit’s activity based on its waveform will depend on the relationship between the embedded electrode and the muscle geometry, which alters over the course of the stride. As a follow-up to the original text, we have included new analyses to characterize the waveform activity throughout the experiment and stride (also in Methods):

      “We further validated spike sorting by quantifying the stability of each unit’s waveform across time (Figure 1–figure supplement 1). First, we calculated the median waveform of each unit across every trial to capture long-term stability of motor unit waveforms. Additionally, we calculated the median waveform through the stride binned in 50 ms increments using spiking from a single trial. This second metric captures the stability of our spike sorting during the rapid changes in joint angles that occur during the burst of an individual motor unit. In doing so, we calculated each motor unit’s waveforms from the single channel in which that unit’s amplitude was largest and did not attempt to remove overlapping spikes from other units before measuring the median waveform from the data. We then calculated the correlation between a unit’s waveform over either trials or bins in which at least 30 spikes were present. The high correlation of a unit waveform over time, despite potential changes in the electrodes’ position relative to muscle geometry over the dynamic task, provides additional confidence in both the stability of our EMG recordings and the accuracy of our spike sorting.”

      (2) Yield of the spike sorting pipeline and analyses per animal/muscle

      A total of 33 motor units were identified from two heads of the triceps in six mice (17 from the long head and 16 from the lateral head). However, precise information on the yield per muscle per animal is not provided. This information is crucial to support the novelty of the study, as the authors claim in the introduction that their electrode arrays enable the identification of populations of motor units. Beyond reporting the number of identified motor units, another way to demonstrate the effectiveness of the spike sorting algorithm would be to compare the recorded EMG signals with the residual signal obtained after subtracting the action potentials of the identified motor units, using a signal-to-residual ratio.

      Furthermore, motor units identified from the same muscle and the same animal are likely not independent due to common synaptic inputs. This dependence should be accounted for in the statistical analyses when comparing changes in motor unit properties across speeds and between muscles.

      We thank the Reviewer for this comment. Regarding motor unit yield, as described above the newly-added Table 1 displays the yield from each animal and muscle.

      Regarding spike sorting, while signal-to-residual is often an excellent metric, it is not ideal for our high-resolution EMG signals since isolated single motor units are typically superimposed on a “bulk” background consisting of the low-amplitude waveforms of other motor units. Because these smaller units typically cannot be sorted, it is challenging to estimate the “true” residual after subtracting (only) the largest motor unit, since subtracting each sorted unit’s waveform typically has a very small effect on the RMS of the total EMG signal. To further address concerns regarding spike sorting quality, we added Figure 1–figure supplement 1 that demonstrates motor units’ consistency over the experiment, highlighting that the waveform maintains its shape within each stride despite muscle/limb dynamics and other possible sources of electrical noise or artifact.

      Finally, the Reviewer is correct that individual motor units in the same muscle are very likely to receive common synaptic inputs. These common inputs may reflect in sparse motor units being recruited in overlapping rather than different strides. Indeed, in the following added to the Results, we identified that motor units are recruited with higher probability when additional units are recruited.

      “Probabilistic recruitment is correlated across motor units

      Our results show that the recruitment of individual motor units is probabilistic even within a single speed quartile (Figure 5A-C) and predicts body movements (Figure 6), raising the question of whether the recruitment of individual motor units are correlated or independent. Correlated recruitment might reflect shared input onto the population of motor units innervating the muscle (De Luca, 1985; De Luca & Erim, 1994; Farina et al., 2014). For example, two motor units, each with low recruitment probabilities, may still fire during the same set of strides. To assess the independence of motor unit recruitment across the recorded population, we compared each unit’s empirical recruitment probability across all strides to its conditional recruitment probability during strides in which another motor unit from the same muscle was recruited (Figure 7). Doing this for all motor unit pairs revealed that motor units in both muscles were biased towards greater recruitment when additional units were active (p<0.001, Wilcoxon signed-rank tests for both the lateral and long heads of triceps). This finding suggests that probabilistic recruitment reflects common synaptic inputs that covary together across locomotor strides.”

      (3) Representativeness of the sample of identified motor units

      However, to draw such conclusions, the authors should exclusively compare motor units from the same pool and systematically track violations of the recruitment order. Alternatively, they could demonstrate that the motor units that are intermittently active across strides correspond to the smallest motor units, based on the assumption that these units should always be recruited due to their low activation thresholds.

      One way to estimate the size of motor units identified within the same muscle would be to compare the amplitude of their action potentials, assuming that all motor units are relatively close to the electrodes (given the selectivity of the recordings) and that motoneurons innervating more muscle fibres generate larger motor unit action potentials.

      We thank the Reviewer for this comment. Below, we provide more detailed analyses of the relationships between motor unit spike amplitude and the recruitment probability as well as latency (relative to stride onset) of activation.

      We generated the below figures to illustrate the relationship between the amplitude of motor units and their firing properties. As suspected, units with larger-amplitude waveforms fired with lower probability and produced their first spikes later in the stride. If we were comfortable assuming that larger spike amplitudes mean higher-force units, then this would be consistent with a key prediction of the size principle (i.e. that higher-force units are recruited later). However, we are hesitant to base any conclusions on this assumption or emphasize this point with a main-text figure, since EMG signal amplitude may also vary due to the physical properties of the electrode and distance from muscle fibers. Thus it is possible that a large motor unit may have a smaller waveform amplitude relative to the rest of the motor pool.

      Author response image 1.

      Relation between motor unit amplitude and (A) recruitment probability and (B) mean first spike time within the stride. Colored lines indicate the outcome of linear regression analyses.

      Currently, the data seem to support the idea that motor units that are alternately recruited across strides have recruitment thresholds close to the level of activation or force produced during slow walking. The fact that recruitment probability monotonically increases with speed suggests that the force required to propel the mouse forward exceeds the recruitment threshold of these "large" motor units. This pattern would primarily reflect spatial recruitment following the size principle rather than flexible motor unit control.

      We thank the Reviewer for this comment. We agree with this interpretation, particularly in relation to the references suggested in later comments, and have added the following text to the Discussion to better reflect this argument:

      “To investigate the neuromuscular control of locomotor speed, we quantified speed-dependent changes in both motor unit recruitment and firing rate. We found that the majority of units were recruited more often and with larger firing rates at faster speeds (Figure 5, Figure5–figure supplement 1). This result may reflect speed-dependent differences in the common input received by populations of motor neurons with varying spiking thresholds (Henneman et al., 1965). In the case of mouse locomotion, faster speeds might reflect a larger common input, increasing the recruitment probability as more neurons, particularly those that are larger and generate more force, exceed threshold for action potentials (Farina et al., 2014).”

      (4) Analysis of recruitment and firing rates

      The authors currently report active duration and peak firing rates based on spike trains convolved with a Gaussian kernel. Why not report the peak of the instantaneous firing rates estimated from the inverse of the inter-spike interval? This approach appears to be more aligned with previous studies conducted to describe motor unit behaviour during fast movements (e.g., Desmedt & Godaux, 1977, J Physiol; Van Cutsem et al., 1998, J Physiol; Del Vecchio et al., 2019, J Physiol).

      We thank the Reviewer for this comment. In the revised Discussion (see ‘Firing rates in mouse locomotion compared to other species’) we reference several examples of previous studies that quantified spike patterns based on the instantaneous firing rate. We chose to report the peak of the smoothed firing rate because that quantification includes strides with zero spikes or only one spike, which occur regularly in our dataset (and for which ISI rate measures, which require two spikes to define an instantaneous firing rate, cannot be computed). Regardless, in the revised Figure 4B, we present an analysis that uses inter-spike intervals as suggested, which yielded similar ranges of firing rates as the primary analysis.

      (5) Additional analyses of behaviour

      The authors currently analyse motor unit recruitment in relation to elbow angle. It would be valuable to include a similar analysis using the angular velocity observed during each stride, re broadly, comparing stride-by-stride changes in firing rates with changes in elbow angular velocity would further strengthen the final analyses presented in the results section.

      We thank the Reviewer for this comment. To address this, we have modified Figure 6 and the associated Supplemental Figures, to show relationships in unit activation with both the range of elbow extension and the range of elbow velocity for each stride. These new Supplemental Figures show that the trends shown in main text Figure 6C and 6E (which show data from all speed quartiles on the same axes) are also apparent in both the slower and faster quartiles individually, although single-quartile statistical tests (with smaller sample size than the main analysis) not reach statistical significance in all cases.

      Reviewer #3 (Public review):

      Summary:

      Using the approach of Myomatrix recording, the authors report that:

      (1) Motor units are recruited differently in the two types of muscles.

      (2) Individual units are probabilistically recruited during the locomotion strides, whereas the population bulk EMG has a more reliable representation of the muscle.

      (3) The recruitment of units was proportional to walking speed.

      Strengths:

      The new technique provides a unique data set, and the data analysis is convincing and well-performed.

      We thank the Reviewer for the comment.

      Weaknesses:

      The implications of "probabilistical recruitment" should be explored, addressed, and analyzed further.

      Comments:

      One of the study's main findings (perhaps the main finding) is that the motor units are "probabilistically" recruited. The authors do not define what they mean by probabilistically recruited, nor do they present an alternative scenario to such recruitment or discuss why this would be interesting or surprising. However, on page 4, they do indicate that the recruitment of units from both muscles was only active in a subset of strides, i.e., they are not reliably active in every step.

      If probabilistic means irregular spiking, this is not new. Variability in spiking has been seen numerous times, for instance in human biceps brachii motor units during isometric contractions (Pascoe, Enoka, Exp physiology 2014) and elsewhere. Perhaps the distinction the authors are seeking is between fluctuation-driven and mean-driven spiking of motor units as previously identified in spinal motor networks (see Petersen and Berg, eLife 2016, and Berg, Frontiers 2017). Here, it was shown that a prominent regime of irregular spiking is present during rhythmic motor activity, which also manifests as a positive skewness in the spike count distribution (i.e., log-normal).

      We thank the Reviewer for this comment and have clarified several passages in response. The Reviewer is of course correct that irregular motor unit spiking has been described previously and may reflect motor neurons’ operating in a high-sensitivity (fluctuation-driven) regime. We now cite these papers in the Discussion (see ‘Firing rates in mouse locomotion compared to other species’). Additionally, the revision clarifies that “probabilistically” - as defined in our paper - refers only to the empirical observation that a motor unit spikes during only a subset of strides, either when all locomotor speeds are considered together (Figure 2) or separately (Figure 5A-C):

      “Motor units in both muscles exhibited this pattern of probabilistic recruitment (defined as a unit’s firing on only a fraction of strides), but with differing distributions of firing properties across the long and lateral heads (Figure 2).”

      “Our findings (Figure 4) highlight that even with the relatively high firing rates observed in mice, there are still significant changes in firing rate and recruitment probability across the spikes within bursts (Figure 4B) and across locomotor speeds (Figure 5F). Future studies should more carefully examine how these rapidly changing spiking patterns derive from both the statistics of synaptic inputs and intrinsic properties of motor neurons (Manuel & Heckman, 2011; Petersen & Berg, 2016; Berg, 2017).”

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      As mentioned above, there are several issues with the statistics that need to be corrected to properly support the claims made in the paper.

      The authors compare the fractions of MUs that show significant variation across locomotor speeds in their firing rate and recruitment probability. However, it is not statistically founded to compare the results of separate statistical tests based on different kinds of measurements and thus have unconstrained differences in statistical power. The comparison of the fractional changes in firing rates and recruitment across speeds that follow is helpful, though in truth, by contemporary standards, one would like to see error bars on these estimates. These could be generated using bootstrapping.

      The Reviewer is correct, and we have revised the manuscript to better clarify which quantities should or should not be compared, including the following passage (see “Motor unit mechanisms of speed control” in Results):

      “Speed-dependent increases in peak firing rate were therefore also present in our dataset, although in a smaller fraction of motor units (22/33) than changes in recruitment probability (31/33). Furthermore, the mean (± SE) magnitude of speed-dependent increases was smaller for spike rates (mean rate<sub>fast</sub>/rate<sub>slow</sub> of 111% ± 20% across all motor units) than for recruitment probabilities (mean p(recruitment) <sub>fast</sub>/p(recruitment) <sub>slow</sub> of 179% ± 3% across all motor units). While fractional changes in rate and recruitment probability are not readily comparable given their different upper limits, these findings could suggest that while both recruitment and peak rate change across speed quartiles, increased recruitment probability may play a larger role in driving changes in locomotor speed.”

      The description in the Methods of the tests for variation in firing rates and recruitment probability across speeds are extremely hard to understand - after reading many times, it is still not clear what was done, or why the method used was chosen. In the main text, the authors quote p-values and then state "bootstrap confidence intervals," which is not a statistical test that yields a p-value. While there are mathematical relationships between confidence intervals and statistical tests such that a one-to-one correspondence between them can exist, the descriptions provided fall short of specifying how they are related in the present instance. For this reason, and those described in what follows, it is not clear what the p-values represent.

      Next, the authors refer to fitting a model ("a Poisson distribution") to the data to estimate firing rate and recruitment probability, that the model results agree with their actual data, and that they then bootstrapped from the model estimates to get confidence intervals and compute p-values. Why do this? Why not just do something much simpler, like use the actual spike counts, and resample from those? I understand that it is hard to distinguish between no recruitment and just no spikes given some low Poisson firing rate, but how does that challenge the ability to test if the firing rates or the number of spiking MUs changes significantly across speeds? I can come up with some reasons why I think the authors might have decided to do this, but reasoning like this really should be made explicit.

      In addition, the authors would provide an unambiguous description of the model, perhaps using an equation and a description of how it was fit. For the bootstrapping, a clear description of how the resampling was done should be included. The focus on peak firing rate instead of mean (or median) firing rate should also be justified. Since peaks are noisier, I would expect the statistical power to be lower compared to using the mean or median.

      We thank the Reviewer for the comments and have revised and expanded our discussion of the statistical tests employed. We expanded and clarified our description of these techniques in the updated Methods section:

      “Joint model of rate and recruitment

      We modeled the recruitment probability and firing rate based on empirical data to best characterize firing statistics within the stride. Particularly, this allowed for multiple solutions to explain why a motor unit would not spike within a stride. From the empirical data alone, strides with zero spikes would have been assumed to have no recruitment of a unit. However, to create a model of motor unit activity that includes both recruitment and rate, it must be possible that a recruited unit can have a firing rate of zero. To quantify the firing statistics that best represent all spiking and non-spiking patterns, we modeled recruitment probability and peak firing rate along the following piecewise function:

      where y denotes the observed peak firing rate on a given stride (determined by convolving motor unit spike times with a Gaussian kernel as described above), p denotes the probability of recruitment, and λ denotes the expected peak firing rate from a Poisson distribution of outcomes. Thus, an inactive unit on a given stride may be the result of either non-recruitment or recruitment with a stochastically zero firing rate. The above equations were fit by minimizing the negative log-likelihood of the parameters given the data.

      “Permutation test for joint model of rate and recruitment and type 2 regression slopes

      To quantify differences in firing patterns across walking speeds, we subdivided each mouse’s total set of strides into speed quartiles and calculated rate (𝜆, Eq. 1 and 2, Fig. 5A-C) and recruitment probability terms (p, Eq. 1 and 2, Fig. 5D-F) for each unit in each speed quartile. Here we calculated the difference in both the rate and recruitment terms across the fastest and slowest speed quartiles (p<sub>fast</sub>-p<sub>slow</sub> and 𝜆<sub>fast</sub>-𝜆<sub>slow</sub>). To test whether these model parameters were significantly different depending on locomotor speed, we developed a null model combining strides from both the fastest and slowest speed quartiles. After pooling strides from both quartiles, we randomly distributed the pooled set of strides into two groups with sample sizes equal to the original slow and fast quartiles. We then calculated the null model parameters for each new group and found the difference between like terms. To estimate the distribution of possible differences, we bootstrapped this result using 1000 random redistributions of the pooled set of strides. Following the permutation test, the 95% confidence interval of this final distribution reflects the null hypothesis of no difference between groups. Thus, the null hypothesis can be rejected if the true difference in rate or recruitment terms exceeds this confidence interval.

      We followed a similar procedure to quantify cross-muscle differences in the relationship between firing parameters. For each muscle, we estimated the slope across firing parameters for each motor unit using type 2 regression. In this case, the true difference was the difference in slopes between muscles. To test the null hypothesis that there was no difference in slopes, the null model reflected the pooled set of units from both muscles. Again, slopes were calculated for 1000 random resamplings of this pooled data to estimate the 95% confidence interval.”

      The argument for delayed activation of the lateral head is interesting, but I am not comfortable saying the nervous system creates a delay just based on observations of the mean time of the first spike, given the potential for differential variability in spike timing across muscles and MUs. One way to make a strong case for a delay would be to show aggregate PSTHs for all the spikes from all the MUs for each of the two heads. That would distinguish between a true delay and more gradual or variable activation between the heads.

      This is a good point and we agree that the claim made about the nervous system is too strong given the results. Even with Author response image 2 below that the Reviewer suggested, there is still not enough evidence to isolate the role of the nervous system in the muscles’ activation.

      Author response image 2.

      Aggregate peristimulus time histogram (PSTH) for all motor unit spike times in the long head (top) and lateral head (bottom) within the stride.

      In the ideal case, we would have more simultaneous recordings from both muscles to make a more direct claim on the delay. Still, within the current scope of the paper, to correct this and better describe the difference in timing of muscle activity, we edited the text to the following:

      “These findings demonstrate that despite the synergistic (extensor) function of the long and lateral heads of the triceps at the elbow, the motor pool for the long head becomes active roughly 100 ms before the motor pool supplying the lateral head during locomotion (Figure 3C).”

      The results from Marshall et al. 2022 suggest that the recruitment of some MUs is not just related to muscle force, but also the frequency of force variation - some of their MUs appear to be recruited only at certain frequencies. Figure 5C could have shown signs of this, but it does not appear to. We do not really know the force or its frequency of variation in the measurements here. I wonder whether there is additional analysis that could address whether frequency-dependent recruitment is present. It may not be addressable with the current data set, but this could be a fruitful direction to explore in the future with MU recordings from mice.

      We agree that this would be a fruitful direction to explore, however the Reviewer is correct that this is not easily addressable with the dataset. As the Reviewer points out, stride frequency increases with increased speed, potentially offering the opportunity to examine how motor unit activity varies with the frequency, phase, and amplitude of locomotor movements. However, given our lack of force data (either joint torques or ground reaction forces), dissociating the frequency/phase/amplitude of skeletal kinematics from the frequency/phase/amplitude of muscle force. Marshall et al. (2022) mitigated these issues by using an isometric force-production task (Marshall et al., 2022). Therefore, while we agree that it would be a major contribution to extend such investigations to whole-body movements like locomotion, given the complexities described above we believe this is a project for the future, and beyond the scope of the present study.

      Minor:

      Page 5: "Units often displayed no recruitment in a greater proportion of strides than for any particular spike count when recruited (Figures 2A, B)," - I had to read this several times to understand it. I suggest rephrasing for clarity.

      We have changed the text to read:

      “Units demonstrated a variety of firing patterns, with some units producing 0 spikes more frequently than any non-zero spike count (Figure 2A, B),...”

      Figure 3 legend: "Mean phase ({plus minus} SE) of motor unit burst duration across all strides.": It is unclear what this means - durations are not usually described as having a phase. Do we mean the onset phase?

      We have changed the text to read:

      “Mean phase ± SE of motor unit burst activity within each stride”

      Page 9: "suggesting that the recruitment of individual motor units in the lateral and long heads might have significant (and opposite) effects on elbow angle in strides of similar speed (see Discussion)." I wouldn't say "opposite" here - that makes it sound like the authors are calling the long head a flexor. The authors should rephrase or clarify the sense in which they are opposite.

      This is a fair point and we agree we should not describe the muscles as ‘opposite’ when both muscles are extensors. We have removed the phrase ‘and opposite’ from the text.

      Page 11: "in these two muscles across in other quadrupedal species" - typo.

      We have corrected this error.

      Page 16: This reviewer cannot decipher after repeated attempts what the first two sentences of the last paragraph mean. - “Future studies might also use perturbations of muscle activity to dissociate the causal properties of each motor unit’s activity from the complex correlation structure of locomotion. Despite the strong correlations observed between motor unit recruitment and limb kinematics (Fig. 6, Supplemental Fig. 3), these results might reflect covariations of both factors with locomotor speed rather than the causal properties of the recorded motor unit.”

      For better clarity, we have changed the text to read:

      “Although strong correlations were observed between motor unit recruitment and limb kinematics during locomotion (Figure 6, Figure 6–figure supplement 1), it remains unclear whether such correlations actually reflect the causal contributions that those units make to limb movement. To resolve this ambiguity, future studies could use electrical or optical perturbations of muscle contraction levels (Kim et al., 2024; Lu et al., 2024; Srivastava et al., 2015, 2017) to test directly how motor unit firing patterns shape locomotor movements. The short-latency effects of patterned motor unit stimulation (Srivastava et al., 2017) could then reveal the sensitivity of behavior to changes in muscle spiking and the extent to which the same behaviors can be performed with many different motor commands.”

      Reviewer #2 (Recommendations for the authors):

      Minor comments:

      Introduction:

      (1) "Although studies in primates, cats, and zebrafish have shown that both the number of active motor units and motor unit firing rates increase at faster locomotor speeds (Grimby, 1984; Hoffer et al., 1981, 1987; Marshall et al., 2022; Menelaou & McLean, 2012)." I would remove Marshall et al. (2022) as their monkeys performed pulling tasks with the upper limb. You can alternatively remove locomotor from the sentence and replace it with contraction speed.

      Thank you for the comment. While we intended to reference this specific paper to highlight the rhythmic activity in muscles, we agree that this deviates from ‘locomotion’ as it is referenced in the other cited papers which study body movement. We have followed the Reviewer’s suggestion to remove the citation to Marshall et al.

      (2) "The capability and need for faster force generation during dynamic behavior could implicate motor unit recruitment as a primary mechanism for modulating force output in mice."

      The authors could add citations to this sentence, of works that showed that recruitment speed is the main determinant of the rate of force development (see for example Dideriksen et al. (2020) J Neurophysiol; J. L. Dideriksen, A. Del Vecchio, D. Farina, Neural and muscular determinants of maximal rate of force development. J Neurophysiol 123, 149-157 (2020)).

      Thank you for pointing out this important reference. We have included this as a citation as recommended.

      Results:

      (3) "Electrode arrays (32-electrode Myomatrix array model RF-4x8-BHS-5) were implanted in the triceps brachii (note that Figure 1D shows the EMG signal from only one of the 16 bipolar recording channels), and the resulting data were used to identify the spike times of individual motor units (Figure 1E) as described previously (Chung et al., 2023)."

      This sentence can be misleading for the reader as the array used by the researchers has 4 threads of 8 electrodes. Would it be possible to specify the number of electrodes implanted per head of interest? I assume 8 per head in most mice (or 4 bipolar channels), even if that's not specifically written in the manuscript.

      Thank you for the suggestion. As described above, we have added Table 1, which includes all array locations, and we edited the statement referenced in the comment as follows:

      “Electrode arrays (32-electrode Myomatrix array model RF-4x8-BHS-5) were implanted in forelimb muscles (note that Figure 1D shows the EMG signal from only one of the 16 bipolar recording channels), and the resulting data were used to identify the spike times of individual motor units in the triceps brachii long and lateral heads (Table 1, Figure 1E) as described previously (Chung et al., 2023).“

      (4) "These findings demonstrate that despite the overlapping biomechanical functions of the long and lateral heads of the triceps, the nervous system creates a consistent, approximately 100 ms delay (Figure 3C) between the activation of the two muscles' motor neuron pools. This timing difference suggests distinct patterns of synaptic input onto motor neurons innervating the lateral and long heads."

      Both muscles don't have fully overlapping biomechanical functions, as one of them also acts on the shoulder joint. Please be more specific in this sentence, saying that both muscles are synergistic at the elbow level rather than "have overlapping biomechanical functions".

      We agree with the above reasoning and that our manuscript should be clearer on this point. We edited the above text in accordance with the Reviewer suggestion as follows:

      "These findings demonstrate that despite the synergistic (extensor) function of the long and lateral heads of the triceps at the elbow, …”  

      (5) "Together with the differences in burst timing shown in Figure 3B, these results again suggest that the motor pools for the lateral and long heads of the triceps receive distinct patterns of synaptic input, although differences in the intrinsic physiological properties of motor neurons innervating the two muscles might also play an important role."

      It is difficult to draw such an affirmative conclusion on the synaptic inputs from the data presented by the authors. The differences in firing rates may solely arise from other factors than distinct synaptic inputs, such as the different intrinsic properties of the motoneurons or the reception of distinct neuromodulatory inputs.

      To better explain our findings, we adjusted the above text in the Results (see “Motor unit firing patterns in the long and lateral heads of the triceps”):

      “Together with the differences in burst timing shown in Figure 3B, these results again suggest that the motor pools for the lateral and long heads of the triceps receive distinct patterns of synaptic input, although differences in the intrinsic physiological properties of motor neurons innervating the two muscles might also play an important role.”

      We also included the following distinction in the Discussion (see “Differences in motor unit activity patterns across two elbow extensors”) to address the other plausible mechanisms mentioned.

      “The large differences in burst timing and spike patterning across the muscle heads suggest that the motor pools for each muscle receive distinct inputs. However, differences in the intrinsic physiological properties of motor units and neuromodulatory inputs across motor pools might also make substantial contributions to the structure of motor unit spike patterns (Martínez-Silva et al., 2018; Miles & Sillar, 2011).”

      (6) "We next examined whether the probabilistic recruitment of individual motor units in the triceps and elbow extensor muscle predicted stride-by-stride variations in elbow angle kinematics."

      I'm not sure that the wording is appropriate here. The analysis does not predict elbow angle variations from parameters extracted from the spiking activity. It rather compares the average elbow angle between two conditions (motor unit active or not active).

      We thank the Reviewer for this comment and agree that the wording could be improved here to better reflect our analysis. To lower the strength of our claim, we replaced usage of the word ‘predict’ with ‘correlates’ in the above text and throughout the paper when discussing this result.

      Methods:

      (7) "Using the four threads on the customizable Myomatrix array (RF-4x8-BHS-5), we implanted a combination of muscles in each mouse, sometimes using multiple threads within the same muscle. [...] Some mice also had threads simultaneously implanted in their ipsilateral or contralateral biceps brachii although no data from the biceps is presented in this study."

      A precise description of the localisation of the array (muscles and the number of arrays per muscle) for each animal would be appreciated.

      (8) "A total of 33 units were identified and manually verified across all animals." A precise description of the number of motor units concurrently identified per muscle and per animal would be appreciated. Moreover, please add details on the manual inspection. Does it involve the manual selection of missing spikes? What are the criteria for considering an identified motor unit as valid?

      As discussed earlier, we added Table 1 to the main text to provide the details mentioned in the above comments.

      Regarding spike sorting, given the very large number of spikes recorded, we did not rely on manual adjusting mislabeled spikes. Instead, as described in the revised Methods section, we verified unit isolation by ensuring units had >98% of spikes outside of 1ms of each other. Moreover, as described above we have added new analyses (Figure 1–figure supplement 1) confirming the stability of motor unit waveforms across both the duration of individual recording sessions (roughly 30 minutes) and across the rapid changes in limb position within individual stride cycles (roughly 250 msec).

      Reviewer #3 (Recommendations for the authors):

      Figure 2 (and supplement) show spike count distributions with strong positive skewness, which is in accordance with the prediction of a fluctuation-driven regime. I suggest plotting these on a logarithmic x-axis (in addition to the linear axis), which should reveal a bell-shaped distribution, maybe even Gaussian, in a majority of the units.

      We thank the Reviewer for the suggestion. We present the requested analysis below, which shows bell-shaped distributions for some (but not all) distributions. However, we believe that investigating why some replotted distributions are Gaussian and others are not falls beyond the scope of this paper, and likely requires a larger dataset than the one we were able to obtain.

      Author response image 3.

      Spike count distributions for each motor unit on a logarithmic x-axis.

      Why not more data? I tried to get an overview of how much data was collected.

      Supplemental Figure 1 has all the isolated units, which amounts to 38 (are the colors the two muscle types?). Given there are 16 leads in each myomatrix, in two muscles, of six mice, this seems like a low yield. Could the authors comment on the reasons for this low yield?

      Regarding motor unit yield, even with multiple electrodes per muscle and a robust sorting algorithm, we often isolated only a few units per muscle. This yield likely reflects two factors. First, because of the highly dynamic nature of locomotion and high levels of muscle contraction, isolating individual spikes reliably across different locomotor speeds is inherently challenging, regardless of the algorithm being employed. Second, because the results of spike-train analyses can be highly sensitive to sorting errors, we have only included the motor units that we can sort with the highest possible confidence across thousands of strides.

      Minor:

      Figure captions especially Figure 6: The text is excessively long. Can the text be shortened?

      We thank the Reviewer for this comment. Generally, we seek to include a description of the methods and results within the figure captions, but we concede that we can condense the information in some cases. In a number of cases, we have moved some of the descriptive text from the caption to the Methods section.

      References

      Berg, R. W. (2017). Neuronal Population Activity in Spinal Motor Circuits: Greater Than the Sum of Its Parts. Frontiers in Neural Circuits, 11. https://doi.org/10.3389/fncir.2017.00103

      Biewener, A. A., Blickhan, R., Perry, A. K., Heglund, N. C., & Taylor, C. R. (1988). Muscle Forces During Locomotion in Kangaroo Rats: Force Platform and Tendon Buckle Measurements Compared. Journal of Experimental Biology, 137(1), 191–205. https://doi.org/10.1242/jeb.137.1.191

      Chung, B., Zia, M., Thomas, K. A., Michaels, J. A., Jacob, A., Pack, A., Williams, M. J., Nagapudi, K., Teng, L. H., Arrambide, E., Ouellette, L., Oey, N., Gibbs, R., Anschutz, P., Lu, J., Wu, Y., Kashefi, M., Oya, T., Kersten, R., … Sober, S. J. (2023). Myomatrix arrays for high-definition muscle recording. eLife, 12, RP88551. https://doi.org/10.7554/eLife.88551

      De Luca, C. J. (1985). Control properties of motor units. Journal of Experimental Biology, 115(1), 125–136. https://doi.org/10.1242/jeb.115.1.125

      De Luca, C. J., & Erim, Z. (1994). Common drive of motor units in regulation of muscle force. Trends in Neurosciences, 17(7), 299–305. https://doi.org/10.1016/0166-2236(94)90064-7

      Farina, D., Negro, F., & Dideriksen, J. L. (2014). The effective neural drive to muscles is the common synaptic input to motor neurons. The Journal of Physiology, 592(16), 3427–3441. https://doi.org/10.1113/jphysiol.2014.273581

      Hartigan, P. M. (1985). Algorithm AS 217: Computation of the Dip Statistic to Test for Unimodality. Applied Statistics, 34(3), 320. https://doi.org/10.2307/2347485

      Henneman, E., Somjen, G., & Carpenter, D. O. (1965). FUNCTIONAL SIGNIFICANCE OF CELL SIZE IN SPINAL MOTONEURONS. Journal of Neurophysiology, 28(3), 560–580. https://doi.org/10.1152/jn.1965.28.3.560

      Karabulut, D., Dogru, S. C., Lin, Y.-C., Pandy, M. G., Herzog, W., & Arslan, Y. Z. (2020). Direct Validation of Model-Predicted Muscle Forces in the Cat Hindlimb During Locomotion. Journal of Biomechanical Engineering, 142(5), 051014. https://doi.org/10.1115/1.4045660

      Kim, J. J., Wyche, I. S., Olson, W., Lu, J., Bakir, M. S., Sober, S. J., & O’Connor, D. H. (2024). Myo-optogenetics: Optogenetic stimulation and electrical recording in skeletal muscles. https://doi.org/10.1101/2024.06.21.600113

      Lu, J., Zia, M., Baig, D. A., Yan, G., Kim, J. J., Nagapudi, K., Anschutz, P., Oh, S., O’Connor, D., Sober, S. J., & Bakir, M. S. (2024). Opto-Myomatrix: μLED integrated microelectrode arrays for optogenetic activation and electrical recording in muscle tissue. https://doi.org/10.1101/2024.07.01.601601

      Manuel, M., & Heckman, C. J. (2011). Adult mouse motor units develop almost all of their force in the subprimary range: A new all-or-none strategy for force recruitment? Journal of Neuroscience, 31(42), 15188–15194. https://doi.org/10.1523/JNEUROSCI.2893-11.2011

      Marshall, N. J., Glaser, J. I., Trautmann, E. M., Amematsro, E. A., Perkins, S. M., Shadlen, M. N., Abbott, L. F., Cunningham, J. P., & Churchland, M. M. (2022). Flexible neural control of motor units. Nature Neuroscience, 25(11), 1492–1504. https://doi.org/10.1038/s41593-022-01165-8

      Martínez-Silva, M. de L., Imhoff-Manuel, R. D., Sharma, A., Heckman, C. J., Shneider, N. A., Roselli, F., Zytnicki, D., & Manuel, M. (2018). Hypoexcitability precedes denervation in the large fast-contracting motor units in two unrelated mouse models of ALS. eLife, 7(2007), 1–26. https://doi.org/10.7554/eLife.30955

      Miles, G. B., & Sillar, K. T. (2011). Neuromodulation of Vertebrate Locomotor Control Networks. Physiology, 26(6), 393–411. https://doi.org/10.1152/physiol.00013.2011

      Petersen, P. C., & Berg, R. W. (2016). Lognormal firing rate distribution reveals prominent fluctuation–driven regime in spinal motor networks. eLife, 5. https://doi.org/10.7554/elife.18805

      Srivastava, K. H., Elemans, C. P. H., & Sober, S. J. (2015). Multifunctional and Context-Dependent Control of Vocal Acoustics by Individual Muscles. The Journal of Neuroscience, 35(42), 14183–14194. https://doi.org/10.1523/JNEUROSCI.3610-14.2015

      Srivastava, K. H., Holmes, C. M., Vellema, M., Pack, A. R., Elemans, C. P. H., Nemenman, I., & Sober, S. J. (2017). Motor control by precisely timed spike patterns. Proceedings of the National Academy of Sciences of the United States of America, 114(5), 1171–1176. https://doi.org/10.1073/pnas.1611734114

    1. eLife Assessment

      This important model-based study seeks to mimic bat echolocation behavior and flight under conditions of high interference, such as when large numbers of bats leave their roost together. The simulations convincingly suggest that the problem of acoustic jamming in these situations may be less severe than previously thought. This finding will be of broad interest to scientists working in the fields of bat biology and collective behaviour.

    2. Reviewer #1 (Public review):

      Summary:

      Mazer & Yovel 2025 dissect the inverse problem of how echolocators in groups manage to navigate their surroundings despite intense jamming using computational simulations.

      The authors show that despite the 'noisy' sensory environments that echolocating groups present, agents can still access some amount of echo-related information and use it to navigate their local environment. It is known that echolocating bats have strong small and large-scale spatial memory that plays an important role for individuals. The results from this paper also point to the potential importance of an even lower-level, short-term role of memory in the form of echo 'integration' across multiple calls, despite the unpredictability of echo detection in groups. The paper generates a useful basis to think about the mechanisms in echolocating groups for experimental investigations too.

      Strengths:

      The paper builds on biologically well-motivated and parametrised 2D acoustics and sensory simulation setup to investigate the various key parameters of interest

      The 'null-model' of echolocators not being able to tell apart objects & conspecifics while echolocating still shows agents successfully emerge from groups - even though the probability of emergence drops severely in comparison to cognitively more 'capable' agents. This is nonetheless an important result showing the direction-of-arrival of a sound itself is the 'minimum' set of ingredients needed for echolocators navigating their environment.

      The results generate an important basis in unraveling how agents may navigate in sensorially noisy environments with a lot of irrelevant and very few relevant cues.

      The 2D simulation framework is simple and computationally tractable enough to perform multiple runs to investigate many variables - while also remaining true to the aim of the investigation.

    3. Reviewer #2 (Public review):

      This manuscript describes a detailed model for bats flying together through a fixed geometry. The model considers elements which are faithful to both bat biosonar production and reception and the acoustics governing how sound moves in air and interacts with obstacles. The model also incorporates behavioral patterns observed in bats, like one-dimensional feature following and temporal integration of cognitive maps. From a simulation study of the model and comparison of the results with the literature, the authors gain insight into how often bats may experience destructive interference of their acoustic signals and those of their peers, and how much such interference may actually negatively effect the groups' ability to navigate effectively. The authors use generalized linear models to test the significance of the effects they observe.

      The work relies on a thoughtful and detailed model which faithfully incorporates salient features, such as acoustic elements like the filter for a biological receiver and temporal aggregation as a kind of memory in the system. At the same time, the authors abstract features that are complicating without being expected to give additional insights, as can be seen in the choice of a two-dimensional rather than three-dimensional system. I thought that the level of abstraction in the model was perfect, enough to demonstrate their results without needless details. The results are compelling and interesting, and the authors do a great job discussing them in the context of the biological literature.

      With respect to the first version of the manuscript, the authors have remedied all my outstanding questions or concerns in the current version. The new supplementary figure 5 is especially helpful in understanding the geometry.

    4. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Mazar & Yovel 2025 dissect the inverse problem of how echolocators in groups manage to navigate their surroundings despite intense jamming using computational simulations.

      The authors show that despite the 'noisy' sensory environments that echolocating groups present, agents can still access some amount of echo-related information and use it to navigate their local environment. It is known that echolocating bats have strong small and large-scale spatial memory that plays an important role for individuals. The results from this paper also point to the potential importance of an even lower-level, short-term role of memory in the form of echo 'integration' across multiple calls, despite the unpredictability of echo detection in groups. The paper generates a useful basis to think about the mechanisms in echolocating groups for experimental investigations too.

      Strengths:

      The paper builds on biologically well-motivated and parametrised 2D acoustics and sensory simulation setup to investigate the various key parameters of interest

      The 'null-model' of echolocators not being able to tell apart objects & conspecifics while echolocating still shows agents succesfully emerge from groups - even though the probability of emergence drops severely in comparison to cognitively more 'capable' agents. This is nonetheless an important result showing the direction-of-arrival of a sound itself is the 'minimum' set of ingredients needed for echolocators navigating their environment.

      The results generate an important basis in unraveling how agents may navigate in sensorially noisy environments with a lot of irrelevant and very few relevant cues.

      The 2D simulation framework is simple and computationally tractable enough to perform multiple runs to investigate many variables - while also remaining true to the aim of the investigation.

      Weaknesses:

      Authors have not yet provided convincing justification for the use of different echolocation phases during emergence and in cave behaviour. In the previous modelling paper cited for the details - here the bat-agents are performing a foraging task, and so the switch in echolocation phases is understandable. While flying with conspecifics, the lab's previous paper has shown what they call a 'clutter response' - but this is not necessarily the same as going into a 'buzz'-type call behaviour. As pointed out by another reviewer - the results of the simulations may hinge on the fact that bats are showing this echolocation phase-switching, and thus improving their echo-detection. This is not necessarily a major flaw - but something for readers to consider in light of the sparse experimental evidence at hand currently.

      The use of echolocation phases—defined as the sequential search, approach, and buzz call patterns—has been documented not only during foraging but also in tasks such as landing, obstacle avoidance, clutter navigation, and drinking. Bat call structure has been shown to vary systematically with object proximity, not exclusively in response to prey. During obstacle avoidance, phase transitions were observed, with approach calls emitted in grouped sequences and with reduced durations (Gustafson & Schnitzler, 1979; Schnitzler et al., 1987). In landing contexts, bats have been reported to emit short-duration calls and decrease inter-pulse intervals—buzz-like patterns also observed during prey capture— suggesting shared acoustic strategies across behaviors (Hagino et al., 2007; Hiryu et al., 2008; Melcón et al., 2007, 2009). Comparable patterns have been reported during drinking maneuvers, where “drinking buzzes” have been proposed to guide a precise approach to the water surface, analogous to landing buzzes (Griffiths, 2013; Russo et al., 2016). In response to environmental complexity, bats were found to shorten calls and increase repetition rates when navigating cluttered spaces compared to open ones (Falk et al., 2014; Kalko & Schnitzler, 1993).

      Moreover, field recordings from our study of Rhinopoma microphyllum (Goldshtein et al., 2025) revealed shortened call durations and inter-pulse intervals during dense group flight outside the cave during emergence—patterns consistent with terminal-approach phase that is typical when coming very close to an object (another bat in this case). The Author response image 1 shows an approach sequence recorded from a tagged bat approximately 20 meters from the cave entrance, with self-generated echolocation calls marked. The inter-pulse-interval of ca. 20 ms is used by these bats when a reflective object (another bat in this case) is nearby. 

      Author response image 1.

      These results provide direct evidence that bats actively employ approach-phase echolocation during swarming likely to avoid collision with other bats. This supports the view that echolocation phase transitions are a general proximity-based sensing strategy, adapted across a variety of behavioral scenarios—not limited to hunting alone. 

      In our simulations, bats predominantly emitted calls in the approach phase, with only rare occurrences of buzz-phase calls.

      See lines 355-363 in the revised manuscript.

      The decision to model direction-of-arrival with such high angular resolution (1-2 degrees) is not entirely justifiable - and the authors may wish to do simulation runs with lower angular resolution. Past experimental paradigms haven't really separated out target-strength as a confounding factor for angular resolution (e.g. see the cited Simmons et al. 1983 paper). Moreover, to this reviewer's reading of the cited paper - it is not entirely clear how this experiment provides source-data to support the DoA-SNR parametrisation in this manuscript. The cited paper has two array-configurations, both of which are measured to have similar received levels upon ensonification. A relationship between angular resolution and signal-to-noise ratio is understandable perhaps - and one can formulate such a relationship, but here the reviewer asks that the origin/justification be made clear. On an independent line, also see the recent contrasting results of Geberl, Kugler, Wiegrebe 2019 (Curr. Biol.) - who suggest even poorer angular resolution in echolocation.

      We thank the reviewer for raising this important point. The acuity of 1.5–3° in horizontal direction-of-arrival (DoA) estimation is based on the classical work of Simmons et al. with Eptesicus fuscus (Simmons et al., 1983). Similar precision was later supported by Erwin et al. (Erwin et al., 2001), who modeled azimuth estimation from measured interaural intensity differences (IIDs), reporting an average error of 0.2° with a standard deviation of ~2.2°, consistent with the behavioral data found by Simmons. The decline in acuity with increasing arrival angle has also been demonstrated in behavioral and physiological studies of binaural IID processing (Erwin et al., 2001; Fay, 1995; Razak, 2012; Wohlgemuth et al., 2016). The error model itself was first introduced in our earlier work (Mazar & Yovel, 2020).

      Importantly, Geberl et al. (Geberl et al., 2019) examined the resolution of weak targets masked by nearby strong flankers  and found poor spatial discrimination of ~45 degrees; however, they were studying a detection problem, rather than the horizontal acuity of azimuth estimation. Indeed, our model assumes there is no spatial discrimination at all.

      Overall, while our DoA–SNR parametrization can certainly be critiqued and alternative parameterizations could be tested in future work, we believe it reflects a reasonable and empirically supported assumption. 

      Reviewer #2 (Public review):

      This manuscript describes a detailed model for bats flying together through a fixed geometry. The model considers elements which are faithful to both bat biosonar production and reception and the acoustics governing how sound moves in air and interacts with obstacles. The model also incorporates behavioral patterns observed in bats, like one-dimensional feature following and temporal integration of cognitive maps. From a simulation study of the model and comparison of the results with the literature, the authors gain insight into how often bats may experience destructive interference of their acoustic signals and those of their peers, and how much such interference may actually negatively effect the groups' ability to navigate effectively. The authors use generalized linear models to test the significance of the effects they observe.

      The work relies on a thoughtful and detailed model which faithfully incorporates salient features, such as acoustic elements like the filter for a biological receiver and temporal aggregation as a kind of memory in the system. At the same time, the authors abstract features that are complicating without being expected to give additional insights, as can be seen in the choice of a two-dimensional rather than three-dimensional system. I thought that the level of abstraction in the model was perfect, enough to demonstrate their results without needless details. The results are compelling and interesting, and the authors do a great job discussing them in the context of the biological literature.

      With respect to the first version of the manuscript, the authors have remedied all my outstanding questions or concerns in the current version. The new supplementary figure 5 is especially helpful in understanding the geometry.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Data Availability: This reviewer lauds the authors for switching from a private commercial folder requiring login to one that does not. At the cost of being overtly pedantic - the Github repository is not a long-term archival resource. The ideal solution is to upload the code in an academic repository (Zenodo, OSF, etc.) to periodically create a 'static snapshot' of code for archival, while also hosting a 'live' version on Github.

      We have uploaded to Zenodo repository, and updated the link in the paper:

      How bats exit a crowded colony when relying on echolocation only - a modeling approach

      In one of the rebuttals to Reviewer #3- the authors have cited a wrong paper (Beleyur & Goerlitz 2019) - while discussing broad bandwidth calls improving detection - and may wish to correct this if possible on record.

      We have removed the incorrect citation from the revised version of the manuscript.

      Specific comments on the 2nd manuscript:

      Figure 5: Table 1 says 1, 2,5,10,20,40,100 bats were simulated (line 138-139) but the conclusion (line 398) says '1 to 100 bats' per 3msq. However, the X-axis only stops at 40 and says 'number of bats', while the legend says bats/3msq....what is actually being plotted? Moreover, in the entire paper there is a constant back-and-forth between density and # of bats - perhaps it is explained beforehand, but it is a bit unsettling - and more can be done to clarify these two conventions.

      While most parameters were tested across the full range of 1 to 100 bats per 3 m², a subset of conditions—including misidentification, multi-call clustering, wall target strength, and conspecific target strength—were simulated only up to 40 bats due to significantly longer run-times. This is now clarified in both the main text and the Table 1 caption.

      In our simulations, the primary parameter was the number of bats placed within a 3 m² starting area, which directly determined the initial density (bats per 3 m²). Throughout the manuscript, we use “number of bats” to refer to the simulation input, while “density” denotes the equivalent ecological measure. Figure 5 and related captions have been revised accordingly to note these conventions and to indicate when results are shown only up to 40 bats (see lines 120–122, 314-317 in the revised text).

      Table 1: This was made considerably difficult to read given the visual clutter - and I hope I've understood these changes correctly.

      What is in the square brackets of the effect-size (e.g. first row with values 'Exit prob. (%)' says -0.37/bat [63:100] ? What does this 63:100 refer to?

      What is the 'process flag'

      Values in square brackets indicate the minimum and maximum values of the metric across the tested range (e.g., [63:100] shows the range of exit probabilities observed across different bat densities).

      The term “process flag” has been replaced with “with and without multi-call clustering” for clarity

      Both the table layout and caption have been revised to reduce visual clutter and to make these conventions clearer to the reader. 

      Lines 562-3: "In our study, due to the dense cave environment, the bats are found to operate in the approach phase nearly all of the time, which is consistent with natural cave emergence behavior" - bats are 'found to' implies there is some experimental data or it is an emergent property. See above for the point questioing the implementation of multiple echolocation phases in the model, but also - here the bat-agents are allowed to show different phases and thus they do so -- it is a constraint of the implementation and not a result per se given the size of the cave and the number of bats involved...

      We removed the sentence from the Methods section, since it could be misinterpreted as an experimental finding rather than a model outcome. Instead, we now discuss this in the Discussion, clarifying that the predominance of the approach phase arises from the cluttered cave environment in our simulations, which is consistent with natural emergence behavior (see lines 355-363). In this context, the use of echolocation phases is presented as a biologically plausible modeling choice rather than an empirical result.

      Lines 659-660: The parametrisation between DoA and SNR is supposedly found in 'Equation 10' - which this reviewer could not find in the manuscript

      The equation was accidentally omitted in the previous revision and has now been reinserted into the manuscript. It defines how direction-of-arrival (DoA) error depends on SNR and azimuth angle (see lines 603-605).

    1. eLife Assessment

      This important mouse study shows that wild-type female progeny of Khdc3 mutants have abnormal gene expression relating to hepatic metabolism, which persists over multiple generations and passes through both female and male lineages. A role for small RNAs on this phenomenon is proposed, and evidence supporting the authors' claims is convincing. Further experiments are required to functionally validate the role of small RNAs in transmission of the phenotype. The work will be of interest to researchers in the field of DNA-independent mechanism of inheritance.

    2. Reviewer #1 (Public review):

      The key discovery of the manuscript is that the authors found that genetically wild type females descended from Khdc3 mutants shows abnormal gene expression relating to hepatic metabolism, which persist over multiple generations and pass through both female and male lineages. They also find dysregulation of hepatically-metabolized molecules in the blood of these wild type mice with Khdc3 mutant ancestry. These data provide solid evidence further support that phenotype can be transmitted to multiple generations without altering DNA sequence, supporting the involvement of epigenetic mechanisms. The authors further performed exploratory studies on the small RNA profiles in the oocytes of Khdc3-null females, and their wild type descendants, suggesting that altered small RNA expression could be a contributor of the observed phenotype transmission, although this has not been functionally validated.

      Comments on revisions:

      My previous comments are addressed.

    3. Reviewer #2 (Public review):

      Summary:

      This manuscript aimed to investigate the non-genetic impact of KHDC3 mutation on the liver metabolism. To do that they analyzed the female liver transcriptome of genetically wild type mice descended from female ancestors with a mutation in the Khdc3 gene. They found that genetically wild type females descended from Khdc3 mutants have hepatic transcriptional dysregulation which persist over multiple generations in the progenies descended from female ancestors with a mutation in the Khdc3 gene. This transcriptomic deregulation was associated with dysregulation of hepatically-metabolized molecules in the blood of these wild type mice with female mutational ancestry. Furthermore, to determine whether small non-coding RNA could be involved in the maternal non-genetic transmission of the hepatic transcriptomic deregulation, they performed small RNA-seq of oocytes from Khdc3-/- mice and genetically wild type female mice descended from female ancestors with a Khdc3 mutation and claimed that oocytes of wild type female offspring from Khdc3-null females has dysregulation of multiple small RNAs.

      Finally, they claimed that their data demonstrates that ancestral mutation in Khdc3 can produce transgenerational inherited phenotypes.

      Comments on revisions:

      I thank the authors for their detailed response to my comments. I have nothing to add.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      The key discovery of the manuscript is that the authors found that genetically wild type females descended from Khdc3 mutants shows abnormal gene expression relating to hepatic metabolism, which persist over multiple generations and pass through both female and male lineages. They also find dysregulation of hepatically-metabolized molecules in the blood of these wild type mice with Khdc3 mutant ancestry. These data provide solid evidence further support that phenotype can be transmitted to multiple generations without altering DNA sequence, supporting the involvement of epigenetic mechanisms. The authors further performed exploratory studies on the small RNA profiles in the oocytes of Khdc3-null females, and their wild type descendants, suggesting that altered small RNA expression could be a contributor of the observed phenotype transmission, although this has not been functionally validated.

      Reviewer #2 (Public review):

      Summary:

      This manuscript aimed to investigate the non-genetic impact of KHDC3 mutation on the liver metabolism. To do that they analyzed the female liver transcriptome of genetically wild type mice descended from female ancestors with a mutation in the Khdc3 gene. They found that genetically wild type females descended from Khdc3 mutants have hepatic transcriptional dysregulation which persist over multiple generations in the progenies descended from female ancestors with a mutation in the Khdc3 gene. This transcriptomic deregulation was associated with dysregulation of hepatically-metabolized molecules in the blood of these wild type mice with female mutational ancestry. Furthermore, to determine whether small non-coding RNA could be involved in the maternal non-genetic transmission of the hepatic transcriptomic deregulation, they performed small RNA-seq of oocytes from Khdc3-/- mice and genetically wild type female mice descended from female ancestors with a Khdc3 mutation and claimed that oocytes of wild type female offspring from Khdc3-null females has dysregulation of multiple small RNAs.

      Finally, they claimed that their data demonstrates that ancestral mutation in Khdc3 can produce transgenerational inherited phenotypes.

      However, at this stage and considering the information provided in the paper, I think that these conclusions are too preliminary. Indeed, several controls/experiments need to be added to reach those conclusions.

      Additional context you think would help readers interpret or understand the significance of the work

      Line 25: this first sentence is very strong and needs to be documented in the introduction.

      Line 48: Reference 5 is not appropriate since the paper shows the remodeling of small RNA during post-testicular maturation of mammalian sperm and their sensibility to environment. Please, change it

      Line 51: "implies" is too strong and should be replaced by « suggests »

      Line 67: reference is missing

      Database, the accession numbers are lacking.

      References showing the maternal transmission of non-genetically inherited phenotypes in mice via small RNA need to be added

      Line 378: All RNA-Seq and small RNA-Seq data are available in the NCBI GEO

      We have changed references as requested, and updated portions of the introduction in order to mention specifically genes that seem to regulate an RNA-based genetic nurture effect.  We are not aware of any published work that has demonstrated maternal transmission of non-genetic phenotypes via small RNAs; if the reviewer has a specific reference in mind, we would be happy to read it and add it to our manuscript.  We did add a few sentences describing why this work has primarily been performed in males/fathers.

      Reviewer #1 (Recommendations for the authors):

      (1) In addition to the altered hepatic gene expression and metabolites, did the authors notice any overall phenotypes? including body weight, overall growth, eating behavior, etc?

      We have added information on more general phenotypes of the mice, including litter size, birth weights, and weights at 3 and 8 weeks of age.  We have also performed a metabolic analysis of WT****** mice at 8 months of age.  Overall, there are no striking differences in the WT* mice in these broad phenotypic measures, and also no indication that a smaller litter size or larger birthweight are the drivers of our observed hepatic abnormalities.

      (2) When analyzing the small RNAs, the authors mentioned that they have mapped the reads aging rRNAs. This should have resulted in the identifications of many rRNA-derived small RNAs (rsRNAs). The authors should also perform analyses on the differential expression of rsRNAs in this context. Both tsRNAs and rsRNAs has been shown to be involved in epigenetic inheritance (at least in sperm) (Nat Cell Biol 2018, PMID: 29695786).

      In the oocyte small RNA data, we did not notice many differences in either piRNAs or rRNAs between either the WT and KO oocytes, or the WT and WT** oocytes.  The most significant differences by far were in miRNA and tsRNA.  We have added that we do not see any differences in rRNAs.

      Reviewer #2 (Recommendations for the authors):

      To support your conclusion, you should include the following Data/experiments:

      (1) In the abstract, you wrote "Our results demonstrate that ancestral mutation in Khdc3 can produce transgenerational inherited phenotypes". The full phenotypic description of the phenotype (weight at birth, 3-weeks, 8weights old, phenotype of the liver...) of each progeny should carefully described/analyzed.

      Female KHDC3-deficient mice showed reduced fertility with smaller litter. Given the fact that litter size influences early growth and adult physiology (DOI: 10.1016/j.cmet.2020.07.014), all the metabolic effects observed in the paper could be the result of the litter size. Information about the litter size should be provided. Without this information, it is difficult to evaluate the non-genetic impact of KHDC3 mutation on the metabolism of the progenies.

      We have added information on more general phenotypes of the mice, including litter size, birth weights, and weights at 3 and 8 weeks of age (Figure 3). We have also performed a metabolic analysis of WT****** mice at 8 months of age.  Overall, there were no striking differences in the WT* mice in these broad phenotypic measures, and also no indication that a smaller litter size or larger birthweight are the drivers of our observed hepatic abnormalities.

      We have also added a new figure in order to examine the mechanism of transmission of our observed transcriptional abnormalities (Figure 5).  By transferring serum from WT* mice into wild type recipients, we observe alterations to hepatic gene expression, suggesting that serum-based molecules are driving the altered non-genetic factors in the oocyte.  This lends further support to the conclusion that the observed changes in WT* mice are from inherited germ cell abnormalities (informed by somatic metabolic abnormalities and communicated via blood), and not a consequence of litter sizes or growth rates.

      (2) In addition to the lack of phenotypic information of the progenies, the DEG for the small RNA-seq should be filtered on padj(FDR)<0.05 and not on pvalue<0.05. In Figure 4a, the legend is missing.

      We did not alter the filtering on the small RNA-Seq data.  We are not focusing on any specific small RNA, rather we are stating that these groups (miRNA, tsRNA) of small RNAs are dysregulated; accordingly we believe that using pval is not inappropriate in this circumstance.  The analysis was performed similarly to 4 cell embryo RNA-Seq performed by Harris et al, Cell Reports (PMID 38573852).

    1. eLife Assessment

      The study presents data on the possible connection of respiratory pathologies like pneumonia in a cohort of dolphins with altered composition and concomitant perturbed biophysical properties of pulmonary surfactant complexes. Overall, it is a valuable contribution that could be of interest to scientists in the field. However, the study as it is appears somewhat incomplete and additional clarification and discussions are required in order to explain a few methodological questions that may limit the impact of the work considerably.

    2. Reviewer #1 (Public review):

      Summary:

      This paper describes a number of alterations in pulmonary surfactant recovered from bottlenosed dolphins. Although the sample consists of only seven diseased and two control animals, due to the difficulty in obtaining these animals, this is considered adequate. However, conclusions must be considered in view of this small sample size. The authors employ a number of sophisticated techniques to show differences in the composition and in the structure of bilayers formed by these two surfactant samples

      Strengths:

      The availability of these samples makes this study quite original. The authors apply mass spectroscopy to observe an increase of an acidic phospholipid and in the level of plasmalogens in the diseased (i.e. pneumonia) aquatic animals. They suggest these increases contribute to hampered function in vivo. They show alterations in lipid bilayers formed from lipid extracts of these surfactants by electron microscopy, by Atomic Force Microscopy and by small and wide-angle X-ray scattering -SAXS/WAXS. They have previously shown that adding small amounts of cardiolin to the clinical surfactant BLES results in altered bilayer structure, consistent with the current study.

      Weaknesses:

      It seems surprising to me that the small changes in cardiolipin can alter surfactant function i.e., reducing surface tension to near zero. As it happens, no surfactant function tests monitoring the reduction in surface tension were conducted. This would add a great deal to the paper. Further, the paper would benefit greatly from the inclusion of a table listing the lipid composition of surfactant recovered from diseased and normal animals and comparing this to the composition of BLES, a clinical surfactant. Finally, there is a possibility that the minor lipid identified by mass spec is the lysosomal marker, bis-(monoacylglcerol)phosphate rather than the metachronal marker, cardiolipin.

    3. Reviewer #2 (Public review):

      Summary:

      In this manuscript, Porras-Gómez et al. analyse the lipid composition and biophysical properties of pulmonary surfactant obtained by bronchoalveolar lavage (BAL) from a group of bottlenose dolphins (Tursiops truncatus), including two healthy individuals and five affected by pneumonia. Through lipidomic analysis, the authors report an exacerbated presence of cardiolipin species in the BAL lipid extracts from diseased dolphins compared to healthy ones. Structural analyses using electron microscopy, atomic force microscopy, and X-ray scattering on rehydrated membrane samples reveal that lipids from diseased animals form membranes with a more pronounced Lβ phase and reduced fluidity. Moreover, the membranes from affected lungs appear more interconnected and less hydrated, as indicated by the X-ray scattering data. These findings provide valuable and convincing insights into how pulmonary disease alters the lipid composition and structural properties of surfactant in diving mammals, and may have broader implications for understanding surfactant dysfunction in marine mammals.

      Strengths:

      The study is well designed, and the experimental techniques were applied in a logical and coherent manner. The results are thoroughly analysed and discussed, and the manuscript is clearly written and well organized, making it both easy to follow and scientifically robust. Although the number of samples is limited, the rarity and logistical challenges of obtaining bronchoalveolar lavage material, particularly from animals affected by respiratory disease, make this study especially valuable and relevant.

      Weaknesses:

      In my opinion, the main issue lies in the treatment of the samples. Pulmonary surfactant is a lipoprotein complex produced by type II pneumocytes of the alveolar epithelium in the form of compact and highly dehydrated structures known as tubular myelin. Once secreted, these structures unfold and, upon contact with the air-liquid interface, form an interfacial monolayer connected to surfactant membranes in the subphase, thereby facilitating respiratory dynamics throughout the breathing cycle.

      When bronchoalveolar lavages are treated using the Bligh and Dyer method to extract the hydrophobic fraction of these samples, the structural complexity of the surfactant is disrupted, and this organization cannot be completely restored once the lipids are rehydrated. Although these extracts contain the hydrophobic proteins SP-B and SP-C, the hydrophilic protein SP-A may play an essential role in the formation of pulmonary surfactant structures. It is well established that SP-A is crucial for the formation of tubular myelin, an intermediate structure between the lamellar bodies newly secreted by type II cells and the interfacial surfactant layers.

      Moreover, and more importantly, bronchoalveolar lavage fluid may contain cells, tissue debris, and even bacteria that can alter the lipid composition of the samples used in the study after extraction by the Bligh and Dyer method. For this reason, most studies include a density gradient centrifugation step to isolate the surfactant membranes. Consequently, the samples used may be contaminated with phospholipids originating from other cells, such as macrophages, pneumocytes, or bacterial cells, particularly in lavages obtained from diseased animals.

      Although the techniques employed provide valuable information about the behaviour of surfactant membranes and allow certain inferences regarding their functionality, no functional studies of these samples have been conducted using methods such as the constrained drop surfactometer or the captive bubble surfactometer. The observed alterations do not necessarily demonstrate that surfactant modulates its properties, as claimed by the authors, but rather indicate that it is altered by the presence of other lipids.

      The spin-coating technique used to form lipid films for analysis by atomic force microscopy is not the most suitable approach to reproduce the structures generated by pulmonary surfactant. However, the results obtained may still provide valuable insights into the biophysical behaviour of its components. The analysis of lung tissue shown in Supplementary Figure S3 presents the same limitation, as the samples were embedded in a cutting compound, and the measurements may have been taken from different regions of the tissue. Therefore, it cannot be ensured that the analysed structures correspond to those generated by pulmonary surfactant.

      The finding that the structures formed in samples obtained from diseased animals are more tightly packed and dehydrated than those derived from the surfactant of healthy animals contrasts with the notion that the high efficiency of lamellar bodies in generating interfacial structures is related to their high degree of packing and dehydration. The formation of these structures involves the participation of the ABCA3 protein, which pumps phospholipids into the interior of lamellar bodies, and SP-B, which facilitates the formation of close membrane contacts.

      While the results are interesting from a comparative perspective, the implications for surfactant performance and respiratory dynamics should be interpreted with caution.

    4. Reviewer #3 (Public review):

      In this manuscript, the authors present data on the supposed composition of pulmonary surfactant obtained from bronchoalveolar lavages (BALs) of a small cohort of dolphins, a group of them suffering from pneumonia. The lipid compositional differences of the sample group are consistent with the different pathological situations of the specimens, suggesting that differences in surfactant composition are somehow associated (as a cause or as a consequence) with the particular pathophysiological contexts. It is particularly remarkable that an increase in cardiolipins and plasmalogens appears as an abnormal composition in pathological surfactants. The study is completed by analyzing the differences in membrane properties (order, packing, phase) of abnormal versus "control" membranes, concluding that pneumonia in dolphins is associated with a significant alteration of surfactant membranes that become more rigid, packed and thicker than those in surfactant from animals with no lung disease.

      In general terms, the data provided are of interest as they somehow offer a framework of effects that may extend what is known about alterations of composition, biophysical properties and functional performance of pulmonary surfactant as a consequence of respiratory pathologies. A collection of pertinent biophysical methodologies (fluorescence, X-ray scattering, AFM) have been applied to complete a full characterization of membrane properties in the different samples.

      However, they way the samples have been processed, i.e. by making organic extracts of hydrophobic (lipid and protein) components before surfactant membranes have been purified or at least, separated from bulk lavage, open the question of how much of the altered composition is actually occurring in surfactant or comes from other membranes (from cells, bacteria) that have been completely intermixed as a consequence of the organic extraction. Without an appropriate surfactant membrane obtention, the results of the study should be taken with caution and await confirmation. Specific questions that need to be considered include:

      (1) As said, the direct organic extract of BAL samples ends in a full mix of lipid and protein components that in origin could be part of different membranes, either from different surfactant assemblies, or even from pulmonary cells or membrane debris, or microorganisms, collected within the lavage. Obtaining conclusions about the structure and properties of membranes artefactually reconstituted from such lipid and protein mixtures is far from correct.

      It is mentioned that "subsequentially" to the organic extraction, the samples were subjected to ultracentrifugation to separate debris and membrane cells. I do not see what the ultracentrifugation is going to change if it is done after the organic extraction. It should have been done before the extraction, for the organic solvents to solubilize exclusively the large, and relatively light, surfactant membrane complexes.

      On the other hand, the ulterior reconstitution of the obtained full lipid mixture surely ends in membrane assemblies whose compositional distribution and organization may differ significantly from those in the original membranes.

      Taking all this into account, statements such as "These aggregate forms reproduce the expected membrane microstructures observed in native alveolar hypophase" or "pulmonary membranes can be successfully extracted and reconstituted from BALs of Navy dolphins" are simply not true and should be rephrased.

      One can understand that the limitation of material may make it difficult to obtain first the purified surfactant membranes and then their organic extract. However, the limitation should be acknowledged to make the readers clear that the actual compositional effects caused in surfactant by pneumonia need confirmation.

      (2) In some of the experiments, i.e. in the AFM characterization, supported membranes were prepared by the spray-dry method applied to organic solutions. Again, the spray-dry of organic lipid solutions ends in a lipid dispersion that may be very far from the real organization of the lipids in actual surfactant membranes.

      (3) When stated that phospholipid concentrations are greater in BAL from pinnipeds than in humans, how has the actual concentration been determined? BAL volumes are typically subjected to large variations depending on the conditions used to obtain the lavage (including volume of saline instilled, level of atelectasia in the lung tissue, presence of inflammation and edema, etc). If total amounts of phospholipids in BAL are to be compared, certain normalization procedures should be applied, such as for instance, with respect to the urea concentration in serum.

      (4) All the differences regarding membrane phase and lipid order/packing have been interpreted in terms of the potential coexistence of Lbeta (gel)/Lalpha (liquid crystalline) phases. However, it has been well established that in lipid systems containing cholesterol, such as pulmonary surfactant, phase coexistence can actually be of the type liquid-ordered (Lo)/liquid-disordered (Ld), very different in terms of mobility and true molecular order. Why do the authors consider that Lbeta is the phase observed in the surfactant membranes they have reconstituted? The presence of round-shaped domains seems to indicate that a liquid/liquid phase segregation is actually occurring.

      (5) In the same line as the previous comment, the authors state that SAXS shows that bovine-extracted pulmonary membranes exhibit a coexistence of two lamellar phases, one rich in unsaturated lipids and one in saturated lipids. SAXS and WAXS cannot provide compositional information, but structural parameters such as membrane thickness, or molecular order. This should be clarified.

      (6) It is mentioned that the surfactant monolayer at the air-liquid interface is interconnected to tubular membranous structures (tubular myelin, TM). It is true that TM, when present, appears interconnected with the interface. However, it is widely recognized that there are many other structures connected with the interfacial film, including multilamellar membrane arrays or reservoirs that have not been mentioned here. Furthermore, TM is not required for surfactant function, because it is absent, for instance, in mice lacking expression of surfactant protein SP-A, which can breathe perfectly.

      (7) In the Discussion, the authors mention that "...after squeeze-out, the excluded multilayers remain closely associated with the interfacial monolayer rather than escaping into the subphase". The authors may like to complete this discussion by specifying that the stable association of excluded assemblies with the interfacial film is actually possible thanks to the surfactant proteins.

    1. eLife Assessment

      This is a potentially important paper attempting to identify neutral correlates of memory engram expression in humans, and how they change during forgetting. The questions posed are clear and novel. The methods employed, namely behavioral analysis, high-resolution functional magnetic resonance imaging, and representational similarity analysis, are advanced, integrative, and appropriate. The experiments are well designed and combine analysis of recollection and familiarity of object/face associations. However, substantial questions remain as to the validity of the incomplete statistical analyses applied to the imaging data, as well as the parsing of and interpretation of the behavioral data.

    2. Reviewer #1 (Public review):

      Summary:

      This manuscript presents an ambitious attempt to examine whether episodic memory traces ("engrams") of forgotten associations persist in the human brain and whether these traces continue to influence behavior implicitly. Using 7T fMRI, the authors track 96 one-shot face-object associations across learning, 30-minute retrieval, and 24-hour retrieval, complemented by a recognition test. Participants classify each memory as sure, unsure, or guess, enabling an operational dissociation between consciously accessible and inaccessible memories.

      Strengths:

      The study addresses a timely and theoretically important question arising from rodent engram research, i.e., whether forgotten human memories leave detectable neural signatures. The use of high-resolution 7T fMRI, representational similarity analysis (RSA), and gPPI connectivity analyses aims at a detailed systems-level perspective. The results suggest that correct guess responses (i.e., when participants believe they are guessing) are accompanied by hippocampal activity and connectivity patterns that correlate with behavioral performance, potentially pointing to residual memory traces. The study also presents evidence for divergent consolidation trajectories: consciously accessible memories become more neocortically distributed after sleep, whereas inaccessible memories exhibit strengthened hippocampal signatures.

      Weaknesses:

      Despite the methodological rigor, some interpretational issues merit caution. First, the reliance on participants' subjective "guess" reports to categorize trials as forgotten is problematic. Guess responses at the 30-minute retrieval were at chance level, whereas guess responses during recognition were above chance; interpreting both as "implicit episodic memory" may conflate different mechanisms (episodic retrieval, familiarity, associative priming).

      Second, several analyses raise concerns about circularity or insufficient independence, for example, when contrasting correct vs. incorrect guess trials to locate "engram" activity and then correlating that activity with guessing accuracy. Similarly, the behavioral analyses are fragmented (multiple t-tests across conditions) rather than using a factorial model that accounts for dependencies among confidence levels and timepoints.

      Third, the choice to include only "sure" and "guess" responses discards a substantial portion of trials ("unsure"), reducing power and complicating interpretation, especially given that unsure responses show above-chance performance.

      Finally, the study's two-scanner-sequence design (small-FOV vs. whole-brain) is challenging as it complicates comparisons across analyses, especially when some critical results (e.g., hippocampal reinstatement patterns) do not consistently replicate across sequences.

      Conclusion:

      Overall, the manuscript provides preliminary evidence that neural traces of forgotten episodic memories might persist in humans and could guide behavior in the absence of conscious awareness. While interpretational caution is warranted, especially regarding the nature of "guess"-based retrieval and the independence of neural contrasts, the study makes a valuable contribution to debates on engram persistence, systems consolidation, and the role of consciousness in episodic memory.

    3. Reviewer #2 (Public review):

      Summary:

      The goal of the experiment was to identify the fMRI neural correlates of persistence and recovery of forgotten memories. A forgotten memory was defined behaviorally as successful learning, followed by failure in a recall format task, followed by next-day success in a recognition format task. The comparison is to memories that were not forgotten at any stage of the task. Various univariate, connectivity, and multivariate analyses were used to identify neural correlates of forgotten memories that were recovered, that remained forgotten, and successful memory. Some claims are made about how activity of the "episodic memory network" predicts the persistence of forgotten memories.

      Strengths:

      Studies on the persistence of forgotten memories in rodent models have been used to make some novel claims about the potential properties of engrams. Attempting similar research in humans is a laudable goal.

      Patterns of behavioral responses are consistent across subjects.

      Weaknesses:

      I do not find that the fMRI results fit the narrative provided.

      A major issue is that primary results do not replicate across the two fMRI datasets that were collected using the same task. For example, hippocampal activity associated with correct responses (confident and guess) was identified in the group receiving the fMRI scan that used a small FOV, but not in the group that received an fMRI scan of the whole brain, for both 30-min and 24-hr delays (lines 202-217). This suggests that the main findings are not even replicable internally within the same experiment. There is no reasonable justification for this.

      Next, most of the reported fMRI findings do not meet reasonable thresholds for statistical significance. In many places, the authors acknowledge this in the text by saying that a difference in the fMRI metric "tended towards significant correlation" or that comparisons "revealed non-significant mean value comparisons". It is not clear why these non-significant findings are interpreted as though they are positive findings. Beyond that, many of the reported findings are not meeting the threshold (i.e., p=0.058), without any acknowledgement that they are marginal. Beyond that, the majority of comparisons that are interpreted in the main text are not significant based on the companion information provided in the supplementary tables. That is, they are totally non-significant when using FWE or FDR correction at either the cluster or peak levels.

      Beyond this, the supplementary tables indicate that "clusters identified solely within white matter regions have been excluded." The fact that there are any findings in white matter to ignore indicates that the statistical thresholds are inappropriate. It's tantamount to seeing activation in the brain of a dead fish.

      The overall picture based on these factors is that the statistical tests did not use sufficiently stringent safeguards against false positives given the multiple comparison problem that plagues fMRI. So, there are tons of false positives, which are being selectively interpreted to tell a particular story. That is, each comparison yields lots of findings in many brain area, and those that do not fit the particular narrative are being ignored (including those in white matter). What's more, when the small FOV fMRI scan is done, the imaging volume is centered on the hippocampus and its close network, so all false positives appear to be exactly in those brain regions about which the authors want to make conclusions. When throwing darts, you will always hit a bullseye if that is all that exists. The fact that the same comparisons done in the companion whole-brain dataset do not yield the same results is telling: the analysis plan is not sufficiently rigorous to yield findings that are replicable.

      Further, I think that it is highly debatable whether the task measures the recovery of forgotten memories at all. Forgotten memories are defined as those that fail when tested using a recollection format but succeed when tested using a recognition format. The well-characterized distinction between recollection and recognition is thus being construed as telling us something about the fate of engrams. I think the much more likely alternative is that "forgotten" memories are just relatively weak memories that don't meet whatever criteria subjects typically use when making recollection judgments, and not some special category of memory. In terms of brain activation, they seem for the most part to follow the pattern of stronger memory, but weaker.

      Finally, many hypotheses are used as though they are proven. For instance, fMRI activity patterns are called "engrams" even though there are no tests to determine whether they meet reasonable criteria that have been adopted in the engram literature (e.g., necessity, sufficiency). Whatever happens over the 24-hour delay is called "consolidation" even if there is no test that consolidation has occurred. Etc. It becomes hard to differentiate what is an assumption, versus a hypothesis, versus an inference/conclusion.

    1. eLife Assessment

      This valuable study links psychological theories of chunking with a physiological implementation based on short-term synaptic plasticity and synaptic augmentation. The theoretical derivation for increased memory capacity via hierarchical chunking is solid. However, the model robustness and biological grounding of the mechanism - including many aspects that were hard-wired, chunking cues, and parameter ranges - as well as its evaluation in the task settings that motivated the study, are incomplete. Additional simulations to test robustness in more cognitively and biologically realistic settings, a systematic parameter analysis, and stronger links to prior work would substantially strengthen the manuscript and increase its impact across disciplines.

    2. Reviewer #1 (Public review):

      Summary:

      This study extends the short-term synaptic plasticity (STP)-based theory of activity-silent working memory (WM) by introducing a physiological mechanism for chunking that relies on synaptic augmentation (SA) and specialized chunking clusters. The model consists of a recurrent neural network comprising excitatory clusters representing individual items and a global inhibitory pool. The self-connections within each cluster dynamically evolve through the combined effects of STP and SA. When a chunking cue, such as a brief pause in a stimulus sequence, is presented, the chunking cluster transiently suppresses the activity of the item clusters, enabling the grouped items to be maintained as a coherent unit and subsequently reactivated in sequence. This mechanism allows the network to enhance its effective memory capacity without exceeding the number of simultaneously active clusters, which defines the basic capacity. They further derive a new upper limit of WM capacity, the new magic number. When the basic capacity is four, the upper bound for complete recall becomes eight, and the optimal hierarchical structure corresponds to a binary tree of two-item pairs forming four chunks that combine into two meta-chunks. Reanalysis of linguistic data and single-neuron recordings from human epilepsy patients (identifying boundary neurons) provides qualitative support for the model's predictions.

      Strengths:

      This study makes an important contribution to theoretical and computational neuroscience by proposing a physiologically grounded mechanism for chunking based on STP and SA. By embedding these processes in a recurrent neural network, the authors provide a unified account of how chunks can be formed, maintained, and sequentially retrieved through local circuit dynamics, rather than through top-down cognitive strategies. The work is conceptually original, analytically rigorous, and clearly presented, deriving a simple yet powerful capacity law that extends the classical magic number framework from four to eight items under hierarchical chunking. The modeling results are further supported by preliminary empirical evidence from linguistic data and single-neuron recordings in the human medial temporal lobe, lending credibility to the proposed mechanism. Overall, this is a well-designed and well-written study that offers novel insights into the neural basis of working-memory capacity and establishes a solid bridge between theoretical modeling and experimental findings.

      Weaknesses:

      This study is conceptually strong and provides an elegant theoretical framework, but several aspects limit its biological and empirical grounding.

      First, the control mechanism that triggers and suppresses chunking clusters remains only schematically defined. The model assumes that chunking events are initiated by pauses, prosodic cues, or internal control signals, but does not specify the underlying neural circuits (e.g., prefrontal-basal ganglia loops) that could mediate this gating in the brain. Clarifying where, when, and how the chunking clusters are turned on and off will be critical for establishing biological plausibility.

      Second, the network representation is simplified: item clusters are treated as non-overlapping and homogeneous, whereas real cortical circuits exhibit overlapping representations, distinct excitatory/inhibitory populations, and multiscale local and long-range connectivity. It remains unclear how robust the proposed dynamics and derived capacity limit would be under such biologically realistic conditions.

      Third, the model heavily relies on SA operating over a timescale of several seconds, yet in vivo, the time constants and prevalence of SA can vary widely across cortical regions and neuromodulatory states. The stability of the predicted "new magic number" under realistic noise levels and modulatory influences, therefore, needs to be systematically evaluated.

    3. Reviewer #2 (Public review):

      Summary:

      This work extends a previous recurrent neural network model of activity-silent working memory to account for well-established findings from psychology and neuroscience suggesting that working memory capacity constraints can be partially overcome when stimuli can be organized into chunks. This is accomplished via the introduction of specialized chunking clusters of neurons to the original model. When these chunking clusters are activated by a cue (such as a longer delay between stimuli), they rapidly suppress recently active stimulus clusters. This makes these stimulus clusters available for later retrieval via a synaptic augmentation mechanism, thereby expanding the network's overall effective capacity. Furthermore, these chunking clusters can be arranged in a hierarchical fashion, where chunking clusters are themselves chunked by higher-level chunking clusters, further expanding the network's overall effective capacity to a new "magic number", 2^{C-1} (where C is the basic capacity without chunking). In addition to illustrating the basic dynamics of the model with detailed simulations (Figures 1 and 2), the paper also utilizes qualitative predictions from the model to (re-)analyze data collected in previous experiments, including single-unit recordings from human medial temporal lobe as well as behavioral findings from a classic study of human memory.

      Strengths:

      The writing and figures are very clear, and the general topic is relevant to a broad interdisciplinary audience. The work is strongly theory-driven, but also makes some effort to engage with existing data from two empirical studies. The basic results showcasing how chunking can be achieved in an activity-silent working memory model via suppression and synaptic augmentation dynamics are interesting. Furthermore, we agree with the authors that the derivation of their new "magic number" is relatively general and could apply to other models, so those findings in particular may be of interest even to researchers using different modeling frameworks.

      Weaknesses:

      (1) Very important aspects of the model are assumed / hard-coded, raising the concern that it relies too much on an external controller, and that it would therefore be difficult to implement the same principles in a fully behaving model responsible for producing its own outputs from a sequence of stimuli (i.e., without a priori knowledge of the structure of incoming sequences).

      (i) One such aspect is the use of external chunking cues provided to the model at critical times to activate the chunking clusters. The simulations reported in the paper were conducted in a setting where signals to chunk are conveniently indicated by longer delays between stimuli. In this case, it is not difficult to imagine how an external component could detect the presence of such a delay and activate a chunking cluster in response. However, in order for the model to be more broadly applicable to different memory tasks that elicit chunking-related phenomena, a more general-purpose detector would be required (see further comments below and alternative models).

      (ii) Relatedly, and as the authors acknowledge in the discussion, the network relies on a pretty sophisticated external controller that decides when the individual chunking clusters are activated or deactivated during readout/retrieval. This seems especially complex in the hierarchical case. How might a network decide which chunking/meta-chunking clusters are activated/deactivated in which order? This was hard-coded in their simulations, but we imagine that it would be difficult to implement a general solution to this problem, especially in cases where there is ambiguity about which stimuli should be chunked, or where the structure of the incoming sequence is not known in advance.

      (iii) One of the central mechanisms of the model is the rapid synaptic plasticity in the inhibitory connections responsible for binding chunking clusters to their corresponding stimulus clusters. This mechanism again appears to have been hard-coded in the main simulations. Although we appreciate that the authors worked on one possible way that this could be implemented (Methods section D, Supplementary Figure S2), in the end, their solution seems to rely on precisely fine-tuning the timing with which stimuli are presented - a factor that seems unlikely to matter very much in humans/animals. This stands in contrast with models of working memory that rely on persistent activity, which are more robust to changes in timing. Note that we do not discount the possibility of activity-silent WM, and indeed it should be studied in its own right, but it is then even more important to highlight which of its features are dependent on the time constants, etc.

      (2) Another key shortcoming of this work is its limited direct engagement with empirical evidence and alternative computational accounts of chunking in WM. Although the efforts to re-analyze existing empirical results in light of the new predictions made by the model are commendable, in the end, we think they fall short of being convincing. As noted above, the model doesn't actually perform the same two tasks used in the human experiments, so direct quantitative comparisons between the model and human behavior or neural data are not possible. Instead, the authors rely on isolating two qualitative predictions of the model - the "dip" and "ramp" phenomena observed after a chunking cluster is activated (Figure 3), and the new magic number for effective capacity derived from the model in the case where stimuli are chunkable, which approximately converges with human recall performance in a memory study (Figure 4). Below, we highlight some specific issues related to these two sets of analyses, but the larger point is that if the model is making a commitment about how these neural mechanisms relate to behavioral phenomena, it would be important to test if the model can produce the behavioral patterns of data in experimental paradigms that have been extensively used to characterize those phenomena. For example, modern paradigms characterizing capacity limits have been more careful to isolate the contributions of WM per se (whereas the original magic number 7 is now thought to reflect a combination of episodic and working memory; see Cowan 2010). There are several existing models that more directly engage with this literature (e.g., Edin et al., 2009; Matthey et al., 2015; Nassar et al., 2018; Soni & Frank, 2025; Swan & Wyble, 2014; van den Berg et al., 2014; Wei et al., 2012), some of which also account for chunking-related phenomena (e.g., Wei et al, 2012; Nassar et al., 2018; Panichello et al., 2019; Soni & Frank, 2025). A number of related proposals suggest that WM capacity limits emerge from fundamentally different mechanisms than the one considered here - for example, content-related interference (Bays, 2014; Ma et al., 2014; Schurgin et al., 2020), or limitations in the number of content-independent pointers that can be deployed at a given time (Awh & Vogel, 2025), and/or the inherent difficulty of learning this binding problem (Soni & Frank, 2025). We think it would be worth discussing how these ideas could be considered complementary or alternatives to the ones presented here.

      (i) Single unit recordings. We found it odd that the authors chose to focus on evidence from single-unit recordings in the medial temporal lobe from a study focused on episodic memory. It was unclear how exactly these data are supposed to relate to their proposal. Is the suggestion that a mechanism similar to the boundary neurons might be operative in the case of working memory over shorter timescales in WM-related areas such as the prefrontal cortex, or that their chunking mechanism may relate not only to working memory but also to episodic memory in the medial temporal lobe?

      (ii) N-gram memory experiment. Our main complaint about the analysis of the behavioral data from the human memory study (Figure 4) is that the model clearly does not account for the main effect observed in that study - namely, the better recall observed for higher-order n-gram approximations to English. We acknowledge that this was perhaps not the main point of the analysis (which related more to the prediction about the absolute capacity limit M*), but it relates to a more general criticism that the model cannot account for chunking behavior associated with statistical learning or semantic similarity. Most of the examples used in the introduction and discussion are of this kind (e.g., expressions such as "Oh my God" or "Easier said than done", etc.). However, the chunking mechanism of the model should not have any preference for segmenting based on statistical regularities or semantic similarity - it should work just as well if statistical anomalies or semantic dissimilarity were used as external chunking cues. In our view, these kinds of effects are likely to relate to the brain's use of distributed representations that can capture semantic similarity and learn statistical regularities in the environment. Although these kinds of effects may be beyond the scope of this model, some effort could be made to highlight this in the discussion. But again, more generally, the paper would be more compelling if the model were challenged to simulate more modern experimental paradigms aimed at testing the nature of capacity limits in WM, or chunking, etc.

      (iii) There are a number of other empirical phenomena that we're not sure the model can explain. In particular, one of the hallmarks of WM capacity limits is that it suffers from a recency bias, where people are more likely to remember the most recent items at the expense of items presented prior to that (Oberauer et al 2012). [There are also studies showing primacy effects in addition to recency effects, but the primacy effects are generally attributed to episodic rather than working memory - for example, introducing a distractor task abolishes the recency but not primacy effect]. But the current model seems to make the opposite prediction: when the stimuli exceed its base capacity, it appears to forget the most recent stimuli rather than the earliest ones (Figure 1d). This seems to result from the number of representations that can be reactivated within a cycle and thus seems inherent to the dynamics of the model, but the authors can clarify if, instead, it depends on the particular values of certain parameters. (In contrast, this recency effect is captured in other models with chunking capabilities based on attractive dynamics and/or gating mechanisms - eg Boboeva et al 2023; Soni & Frank (2025)). Relatedly, we're not sure if the model could account for the more recent finding that recall is specifically enhanced when chunks occur in early serial positions compared to later ones (Thalmann, Souza, Oberauer, 2019).

    4. Reviewer #3 (Public review):

      The paper presents a synaptic mechanism for chunking in working memory, extending previous work of the last author by introducing specialized "chunking clusters", neural populations that can dynamically segment incoming items into chunks. The idea is that this enables hierarchical representations that increase the effective capacity of working memory. They also derive a theoretical bound for working memory capacity based on this idea, suggesting that hierarchical chunking expands the number of retrievable items beyond the basic WM capacity. Finally, they present neural and behavioral data related to their hypothesis.

      Strengths

      A major strength of the paper is its clear theoretical ambition of developing a mechanistic model of working memory chunking.

      Weaknesses

      Despite the inspiration in biophysical mechanisms (short-term synaptic plasticity with different time constants), the model is "cartoonish". It is unclear whether the proposed mechanism would work reliably in the presence of noise and non-zero background activity or in a more realistic implementation (e.g., a spiking network).

      As far as I know, there is no evidence for cyclic neural activation patterns, which are supposed to limit WM capacity (such as in Figure 1d). In fact, I believe there is no evidence for population bursts in WM, which are a crucial ingredient of the model. For example, Panicello et al. 2024 have found evidence for periods during which working memory decoding accuracy decreases, but no population bursts were observed in their data. In brief, my critique is that including some biophysical mechanism in an abstract model does not make the model plausible per se.

      It is claimed that "our proposed chunking mechanism applies to both the persistent-activity and periodic-activity regimes, with chunking clusters serving the same function in each", but this is not shown. If the results and model predictions are the same, irrespective of whether WM is activity-silent or persistent, I suggest highlighting this more and including the corresponding simulations.

      The empirical validations of the model are weak. The single-unit analysis is purely descriptive, without any statistical quantification of the apparent dip-ramp pattern. I agree that the dip-ramp pattern may be consistent with the proposed model, but I don't believe that this pattern is a specific prediction of the proposed model. It seems just to be an interesting observation that may be compatible with several network mechanisms involving some inhibition and a rebound.

      Moreover, the reanalyses of n-gram behavioral data do not constitute a mechanistic test of the model. The "new magic number" depends strongly on structural assumptions about how chunking operates, and it is unclear whether human working memory uses the specific hierarchical scheme required to achieve the predicted limit.

      The presentation of the modeling results is highly compressed in two figures and is rather hard to follow. Plotting the activity of different neural clusters in separate subplots or as heatmaps (x-axis time, y-axis neural population, color = firing rate) would help to clarify (Figure 1d). Also, control signals that activate the chunking clusters should be shown.

      Overall, the theoretical proposal is interesting, but its empirical grounding and biological plausibility need to be substantially reinforced.

    1. eLife Assessment

      The granularity with which neural activity in the sensorimotor cortex of mice corresponds to voluntary forelimb motion is a key open question. This paper provides compelling evidence for the encoding of low-level features like joint angles and represents an important step forward toward understanding cortical limb control signals.

    2. Reviewer #1 (Public review):

      Summary:

      This study addresses the encoding of forelimb movement parameters using a reach-to-grasp task in mice. The authors use a modified version of the water-reaching paradigm developed by Galinanes and Huber. Two-photon calcium imaging was then performed with GCaMP6f to measure activity across both the contralateral caudal forelimb area (CFA) and the forelimb portion of primary somatosensory cortex (fS1) as mice perform the reaching behavior. Established methods were used to extract the activity of imaged neurons in layer 2/3, including methods for deconvolving the calcium indicator's response function from fluorescence time series. Video-based limb tracking was performed to track the positions of several sites on the forelimb during reaching and extract numerous low-level (joint angle) and high-level (reach direction) parameters. The authors find substantial encoding of parameters for both the proximal and distal parts of the limb across both CFA and fS1, with individual neurons showing heterogeneous parameter encoding. Limb movement can be decoded similarly well from both CFA and fS1, though CFA activity enables decoding of reach direction earlier and for a more extended duration than fS1 activity. Collectively, these results indicate involvement of a broadly distributed sensorimotor region in mouse cortex in determining low-level features of limb movement during reach-to-grasp.

      Strengths:

      The technical approach is of very high quality. In particular, the decoding methods are well designed and rigorous. The use of partial correlations to distinguish correlation between cortical activity and either proximal or distal limb parameters or either low- or high-level movement parameters was very nice. The limb tracking was also of extremely high quality, and critical here to revealing the richness of distal limb movement during task performance.

      The task itself also reflects an important extension of the original work by Galinanes and Huber. The demonstration of a clear, trackable grasp component in a paradigm where mice will perform hundreds of trials per day expands the experimental opportunities for the field. This is an exciting development.

      The findings here are important and the support for them is solid. The work represents an important step forward toward understanding the cortical origins of limb control signals. One can imagine numerous extensions of this work to address basic questions that have not been reachable in other model systems.

      Collectively, these strengths made this manuscript a pleasure to read and review.

    3. Reviewer #2 (Public review):

      Summary:

      In this manuscript, Grier, Salimian, and Kaufman characterize the relationship between the activity of neurons in sensorimotor cortex and forelimb kinematics in mice performing a reach-to-grasp task. First, they train animals to reach to two cued targets to retrieve water reward, measure limb motion with high resolution, and characterize the stereotyped kinematics of the shoulder, elbow, wrist, and digits. Next, they find that inactivation of the caudal forelimb motor area severely impairs coordination of the limb and prevents successful performance of the task. They then use calcium imaging to measure the activity of neurons in motor and somatosensory cortex, and demonstrate that fine details of limb kinematics can be decoded with high fidelity from this activity. Finally, they show reach direction (left vs right target) can be decoded earlier in the trial from motor than from somatosensory cortex.

      Strengths:

      In my opinion, this manuscript is technically outstanding and really sets a new bar for motor systems neurophysiology in the mouse. The writing and figures are clear, and the claims are supported by the data. This study is timely, as there has been a recent trend towards recording large numbers of neurons across the brain in relatively uncontrolled tasks and inferring a widespread but coarse encoding of high-level task variables. The central finding here, that sensorimotor cortical activity reflects fine details of forelimb movement, argues against the resurgent idea of cortical equipotentiality, and in favor of a high degree of specificity in the responses of individual neurons and of the specialization of cortical areas.

      Comment on revised version:

      The authors addressed all my concerns, and in my opinion, the manuscript is suitable for publication of the Version of Record in its current form.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This study addresses the encoding of forelimb movement parameters using a reach-to-grasp task in mice. The authors use a modified version of the water-reaching paradigm developed by Galinanes and Huber. Two-photon calcium imaging was then performed with GCaMP6f to measure activity across both the contralateral caudal forelimb area (CFA) and the forelimb portion of primary somatosensory cortex (fS1) as mice perform the reaching behavior. Established methods were used to extract the activity of imaged neurons in layer 2/3, including methods for deconvolving the calcium indicator's response function from fluorescence time series. Video-based limb tracking was performed to track the positions of several sites on the forelimb during reaching and extract numerous low-level (joint angle) and high-level (reach direction) parameters. The authors find substantial encoding of parameters for both the proximal and distal parts of the limb across both CFA and fS1, with individual neurons showing heterogeneous parameter encoding. Limb movement can be decoded similarly well from both CFA and fS1, though CFA activity enables decoding of reach direction earlier and for a more extended duration than fS1 activity. Collectively, these results indicate involvement of a broadly distributed sensorimotor region in mouse cortex in determining low-level features of limb movement during reach-to-grasp.

      Strengths:

      The technical approach is of very high quality. In particular, the decoding methods are well designed and rigorous. The use of partial correlations to distinguish correlation between cortical activity and either proximal or distal limb parameters or either low- or high-level movement parameters was very nice. The limb tracking was also of extremely high quality, and critical here to revealing the richness of distal limb movement during task performance.

      The task itself also reflects an important extension of the original work by Galinanes and Huber. The demonstration of a clear, trackable grasp component in a paradigm where mice will perform hundreds of trials per day expands the experimental opportunities for the field. This is an exciting development.

      The findings here are important and the support for them is solid. The work represents an important step forward toward understanding the cortical origins of limb control signals. One can imagine numerous extensions of this work to address basic questions that have not been reachable in other model systems.

      Collectively, these strengths made this manuscript a pleasure to read and review.

      Thank you!

      Weaknesses:

      In the last section of the results, the authors purport to examine the representation of "higher-level target-related signals," using the decoding of reach direction. While I think the authors are careful in their phrasing here, I think they should be more explicit about what these signals could be reflecting. The "signals" here that are used to decode direction could relate to anything - low-level signals related to limb or postural muscles, or true high-level commands that dictate only what movement downstream motor centers should execute, rather than the muscle commands that dictate how. One could imagine using a partial correlation-type approach again here to extract a signal uncorrelated with all the measured low-level parameters, but there would still be all the unmeasured ones. Again, I think it is still ok to call these "high-level signals," but I think some explicit discussion of what these signals could reflect is necessary.

      Thank you for this excellent suggestion. We have followed both pieces of the reviewer’s advice. First, we performed the suggested analysis, partialing off the kinematics then performing target classification on the residuals. This is now Figure 6S1. The analysis revealed the presence of target-related information in the neural activity after subtracting off all linear correlations with kinematics, supporting our claims that higher-level information is present in both populations. The exact timing of classifier performances varied substantially across mice, potentially due to differences in reach-to-grasp strategy, kinematic tracking fidelity, and exact spatial locations of each recorded FOV. Following the second suggestion, we have made the relevant text more careful. We now conclude simply that higher-level signals, meaning those signals that are largely unrelated to forelimb joint angle kinematics, are present but with variable timing and strengths in each area. That text now reads:

      “Target decoding performance could result from truly higher-level signals that code abstractly for target location, or alternatively could be supported by strong encoding of kinematic variables that differed between targets. To disambiguate these possibilities, we refit the linear classifier to neural data after regressing off variance related to the joint angle kinematics. The strength and exact time course of the resulting target decoding varied somewhat across animals, but the earliest portion of target decoding performance persisted in all animals after the removal of kinematics and performance remained stronger for M1-fl than S1-fl (Fig. 6S1B). We thus conclude that higher-level signals are present in both areas, but differ in their exact timing and strength. However, we note that other possible signals, such as postural changes, could not be controlled for here.”

      Related to this, I think the manuscript in general does not do an adequate job of explicitly raising the important caveats in interpreting parametric correlations in motor system signals, like those raised by Todorov, 2000. The authors do an expert job of handling the correlations, using PCA to extract uncorrelated components and using the partial correlation approach. However, more clarity about the range of possible signal types the recorded activity could reflect seems necessary.

      This is an important point, and our text could have unintentionally misled readers. We have now attempted to make this point explicit in the Discussion and in the Results for Figure 6. This Discussion text now reads:

      “Moreover, as is widely known (Todorov 2000), the exact role of these kinematically-related signals is challenging to determine from correlative measures alone; thus, determining whether these signals are used for direct movement control or instead indirectly reflect control performed elsewhere is left as a topic for future work.”

      The manuscript could also do a better job of clarifying relevant similarities and differences between the rodent and primate systems, especially given the claims about the rodent being a "first-class" system for examining the cellular and circuit basis of motor control, which I certainly agree with. Interspecies similarities and differences could be better addressed both in the Introduction, where results from both rodents and primates are intermixed (second paragraph), and in the Discussion, where more clarity on how results here agree and disagree with those from primates would be helpful. For example, the ratio of corticospinal projections targeting sensory and motor divisions of the spinal cord differs substantially between rodents and primates. As another example, the relatively high physical proximity between the typical neurons in mouse M1 and S1 compared to primates seems likely to yoke their activity together to a greater extent. There is also the relatively large extent of fS1 from which forelimb movements can be elicited through intracortical microstimulation at current levels similar to those for evoking movement from M1. All of these seem relevant in the context of findings that activity in mouse M1 and S1 are similar.

      We understand two points to address here. The first point is that we needed to be more careful to attribute previous results as being from the rodent vs. monkey. We agree. We have now revised several parts of the paper to make these distinctions clearer. The second point is about the potential benefit of a thorough review of the many ways in which primate and rodent sensorimotor systems differ. We entirely agree that this could be useful for the field. However, this is a sizable endeavor and doing it full justice is beyond what we know how to fit in the space allotted for framing our results here. We therefore sought a compromise, acknowledging how our results correspond to existing results in the primate without exhaustively accounting for how they differ. Future work will be necessary to more carefully disambiguate whether species-specific differences are due to biomechanical, neurological, ethological, or as-of-yet undetermined sources. We have incorporated your final specific points about what could produce similar information in M1 and S1 into the Discussion.

      “This may simply be a consequence of widely distributed representations of movement across mouse cortex (Musall et al. 2019; Steinmetz et al. 2019; Stringer et al. 2019), including forelimb somatosensory areas, or may be a consequence of the close physical proximity of M1-fl and S1-fl hindering development of functionally distinct representations (Tennant et al. 2011).”

      In addition, there are a number of other issues related to the interpretation of findings here that are not adequately addressed. These are described in the Recommendations for improvement.

      Reviewer #2 (Public review):

      Summary:

      In this manuscript, Grier, Salimian, and Kaufman characterize the relationship between the activity of neurons in sensorimotor cortex and forelimb kinematics in mice performing a reach-to-grasp task. First, they train animals to reach to two cued targets to retrieve water reward, measure limb motion with high resolution, and characterize the stereotyped kinematics of the shoulder, elbow, wrist, and digits. Next, they find that inactivation of the caudal forelimb motor area severely impairs coordination of the limb and prevents successful performance of the task. They then use calcium imaging to measure the activity of neurons in motor and somatosensory cortex, and demonstrate that fine details of limb kinematics can be decoded with high fidelity from this activity. Finally, they show reach direction (left vs right target) can be decoded earlier in the trial from motor than from somatosensory cortex.

      Strengths:

      In my opinion, this manuscript is technically outstanding and really sets a new bar for motor systems neurophysiology in the mouse. The writing and figures are clear, and the claims are supported by the data. This study is timely, as there has been a recent trend towards recording large numbers of neurons across the brain in relatively uncontrolled tasks and inferring a widespread but coarse encoding of high-level task variables. The central finding here, that sensorimotor cortical activity reflects fine details of forelimb movement, argues against the resurgent idea of cortical equipotentiality, and in favor of a high degree of specificity in the responses of individual neurons and of the specialization of cortical areas.

      Thank you!

      Weaknesses:

      It would be helpful for the authors to be more explicit about which models of mouse cortical function their results support or rule out, and how their findings break new conceptual ground.

      We appreciate this feedback and have attempted to make these details clearer through changes to the Introduction and Discussion. One key change is noted below:

      “The presence of detailed kinematic signals in the sensorimotor cortex supports a model of mouse sensorimotor cortex in which M1-fl and S1-fl play a strong role in shaping the fine details of reaching and grasping movements.”

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      In addition to the weaknesses noted above, I suggest the authors also address the following:

      The last results section is generally lacking in statistical support for claims. Statistical support should be added.

      Thank you for pointing this out, we have added more statistical support to this section.

      The consideration in the Discussion of relevant previous findings and potential explanations for the distal limb signals in mouse sensorimotor cortex is somewhat lacking. There are several specific issues:

      (1) In contrast to the present study, the studies cited in regards to a lack of motor cortical involvement did not involve dexterous movements - in fact, Kawai et al. explicitly engineered a task that did not involve dexterity to distinguish the role of motor cortex in learning from its known role in dextrous movement execution. In Kawai et al., the authors note one rat who adopted a more dexterous approach to the lever pressing task; in this rat, a motor cortical lesion did cause a longer-lasting reduction in task performance. In additional experiments reported in Kawai's PhD thesis, performance of a dextrous task does erode with motor cortex lesion, as seen in other studies, like the early rodent reaching work of Whishaw and colleagues.

      (2) Other possible explanations for the persistence of non-dexterous tasks following motor cortical removal are compensation by, or redundant functionality in, other motor system regions.

      (3) It is also worth noting that stimulation in different regions of mouse M1 and S1 evokes alternately, digit, wrist, and elbow movements in fairly similar proportions (Tennant, 2011), suggesting that descending pathways substantially target spinal circuits that control all forelimb joints.

      (4) It also seems relevant that although the recovery time course is longer, nonhuman primates also retain substantial hand control after motor cortical removal (e.g. Lashley, 1925; Glees and Cole, 1950; Passingham et al., 1983). Humans of course, appear to be a different story.

      These are good points. We have tried to make the Discussion better reflect the tension in the literature, including with this new text:

      “However, several other previous results have indirectly suggested that M1 and S1 may be involved in the details of forelimb movement. Performance suffers with inactivation or lesioning of M1 and S1 in skilled, complex manual behaviors (Guo et al 2015, Mizes et al 2024, Whishaw et al 1990) or idiosyncratic use of digits to accomplish non-dexterous tasks (Kawai 2014). The sparing of non-dexterous tasks with these lesions may also reflect redundancy in control as opposed to irrelevance of M1 and S1. Nevertheless, our finding of low-level kinematic information in sensorimotor cortex supports a role for cortex beyond simply providing redundant high-level commands to these subcortical areas.”

      We have avoided mentioning points 3 and 4 in the paper; the stimulation results might follow from activating projections not normally involved in this behavior, and discussing primates in this context would require a long list of caveats. We agree that these points are worth thinking about, but are concerned that they are too circumstantial to include in interpreting the results formally.

      Although similar decoding performance is achieved using neurons from both CFA and fS1, I am left wondering whether you would do substantially better with CFA using activity at additional preceding time points, or when using exclusively time points from the past. The primary model used here appears to use neural signals from corresponding time points to decode limb parameters, but results seemingly could be different when using preceding time points as regressors.

      We appreciate this suggestion and have added the analysis to an additional supplementary panel for Figure 5 (Figure 5S3). Incorporating lags into the decoder via a Wiener filter does indeed improve the decoding performance, but this could simply be due to the increase in the number of predictor variables. This analysis did not, however, further disambiguate M1-fl and S1-fl: the performance improvement was similar across areas for both causal and acausal lag configurations. This could be a consequence of the time resolution of calcium imaging, so further experiments with electrophysiology would be required to rule this possibility out. We now note this new result:

      “Including additional causal (-100 ms preceding) and/or acausal (-100 ms preceding to 100 following) lags improved decoding performance modestly and similarly for both areas (Fig. 5S3E-F).”

      Related to this, I am also worried about the bleeding of signals across time here. If you deconvolve and interpolate between time points, the interpolation seemingly will pull information into the past, up to half the sampling period, which here is on the order of how long it takes signals to travel to and from the limb. The authors do not make any inappropriate claims about the neural signals here reflecting causes or consequences of what is happening at the limb, but readers (like me) will still try to draw these sorts of conclusions. Is it possible that, although decoding from instantaneous signals is similar for the two regions, the M1 signals are actually motor signals related to future limb state while the S1 signals are sensory consequences? Even if many of the relevant details related to conduction times are not known, perhaps the authors could clarify what can and can't be said related to causal interpretation here.

      Thank you for suggesting further explanation here. We agree that our interpretation could be made more specific. We have added text in the Discussion section to speak more directly to what can and cannot be concluded from our analyses. In short, it is hard to be certain of lags in calcium imaging data for many reasons, and using recording methods with finer temporal resolution (like electrophysiology) will be necessary for determining the precise temporal relationships between kinematics and neural activity. In the absence of these recordings, we limit our claim to kinematic information being present in M1-fl and S1-fl neural activity and leave determining the causal role of this information to future work.

      New clarifying text in the Discussion:

      “The use of calcium imaging further prevents strong conclusions about whether activity reflects future limb states or sensory consequences. Confirming this limitation, inclusion of lagged data in the decoding models, whether causal or acausal, resulted in similar performance changes in both areas.”

      An alternative reason why lift onset is less decodable in CFA is that CFA activates substantially before lift onset, as has been observed in previous rodent studies (Kargo and Nitz, 2004; Miri et al., 2017; Veuthey et al., 2020), perhaps as some sort of movement preparation. S1, on the other hand, may not have this early activity, and so may show a clearer transient at onset when the hand and limb start to move. This seems more likely than the explanations provided by the authors.

      This is a valid possible alternative explanation and we have updated the Discussion to reflect this. This difference in the structure of M1-fl activity versus S1-fl is apparent in the projections of Figure 6A, which show M1-fl projections more clearly aligned to cue-onset than S1-fl projections.

      “Our lift time decoding results are consistent with this view and align with recent observations characterizing mouse proprioceptive forelimb cortex, (Alonso et al 2023), although an alternative explanation may be simply that M1-fl activates earlier than S1-fl during reaching (Kargo and Nitz 2004; Miri et al 2017; Veuthey et al 2020).”

      To better clarify relevant similarities and differences between the rodent and primate systems, the Introduction could include some of these similarities and differences exposed by the literature currently cited, and the Discussion could include an additional paragraph specifically relating findings here to previous observations in the primate.

      We appreciate the reviewer’s thoughtfulness on possible framings of our results. When writing this paper, framing was a major challenge for us and we drafted quite a few versions of the Introduction including some that focused more on mouse-primate comparison. In the end, we decided the most critical function of the Intro was to set up our central question, of “levels-of-sensorimotor-control”. The rich primate literature was valuable here, but getting into a protracted compare-and-contrast exercise quickly became a distraction from the point. Further, we sought to highlight the relevance and importance of the question answered in our work as the mouse has gained prominence for filling gaps that are challenging to address with primates. This paper serves as one of many early steps towards the ultimate goal of revealing general properties of sensorimotor cortical function with the mouse model. We have made some subtle changes to the Introduction that we hope will more clearly communicate this narrative. 

      We agree that a Discussion paragraph directly relating our results to those in primates would benefit our conclusions and have added one:

      “These results expand our understanding of the rodent sensorimotor system and highlight similarities to nonhuman primates. We show here evidence in mice of detailed joint angle kinematic signals from the full forelimb in M1 and S1, as has been shown in macaque cortex during tasks involving reaching and grasping objects (Vargas-Irwin et al. 2010; Saleh et al. 2010, 2012; Goodman et al. 2019; Okorokova et al. 2020). Additionally, the earlier onset of movement-related activity in M1-fl compared to S1-fl is similar to macaque M1 and S1 (Tanji and Evarts 1976). Taken together these results suggest that the mouse can be employed to address questions traditionally explored in primates about how cortical activity encodes detailed movement commands.”

      Although this is outside the scope of the present study, it would be interesting to image descending projection neurons to see what signals are conveyed downstream, and to what targets. Some signals observed in layer 2/3 may not be strongly reflected in descending projections.

      We agree that recording from descending projection neurons in this task would be of deep interest – and also agree that these experiments are beyond the scope of the present study. We look forward to performing these additional experiments in future work.

      Minor:

      (1) The use of "CFA" and “fS1” is a bit confusing. S1, like M1, is defined primarily based on histological criteria, while CFA is defined by intracortical microstimulation. CFA contains a substantial fraction of fS1, seemingly most of it based on the maps shown in Tennant et al., 2011. This is not really a criticism, as the field has not reached any sort of consensus on this nomenclature yet.

      We are similarly unhappy with the inconsistency of the terminology in the field, and struggled with how not to make it worse.  After much debate and consultation with colleagues, we decided to use “M1” and “S1” to evoke the century of literature on these areas; and “-fl” to indicate forelimb because it is more intuitive than “-ul” and avoids using the illegible “-ll” for hindlimb (relevant to our subsequent paper). For what we called M1-fl, we recorded where we did because anecdotally we saw similar responses across that swath; but note that this definition is also consistent with the definition of “MOp-ul” found with multimodal mapping by

      Munoz-Castaneda (2021), which extends a little anteriorly of MOp as defined by the Allen CCF. As the field continues to mature, we hope future work can converge on a set of shared terms.

      (2) Page 4: "Inactivations and lesions of M1 and S1 have shown that M1 is required for the execution of dexterous reach-to-grasp movements" - to me, earlier work from Whishaw and colleagues deserves to be cited here.

      We appreciate the suggestion and have updated the references in this section to better reflect the prior work from Whishaw and other researchers.

      (3) Page 5: "evoking sufficient trial-to-trial variability to avoid model overfitting." - what I think the authors are referring to here is a particular kind of "overfitting," the consequence of not exploring the full movement space, as opposed to model overfitting from issues with the model-fitting method itself. Rather than just saying overfitting, the authors could be clearer about what they are referring to.

      The reviewer is right; the phenomenon we intended to refer to is not properly termed overfitting. Specifically, we meant that data with restricted range does not necessarily express global structure, and models can therefore incorrectly fit them. For example, fitting a linear model to data including many periods of a sine wave will correctly show a zero-slope linear component, but fitting to only a portion of a single cycle will typically yield a nonzero slope. This is not overfitting, is not exactly underfitting (because the relevant structure is barely present in the data, as opposed to missed by an insufficiently powerful model), is not bias (the data are fit well), and is not even necessarily a problem (the local relationship may be what you are interested in). Yet, it does not reflect the larger structure of the data.

      We do not know of a standard term for this phenomenon, so instead of dragging the reader through this tangential argument, we have tried to offer a simpler motivation for using multiple targets:

      “Assessing the relationship between neural activity and the details of movement requires striking a balance between achieving repeatable behavior and evoking sufficient trial-to-trial variability to broadly sample movement space”.

      (4) Page 5: Caudal Forelimb Area should not be capitalized.

      Obviated with the change in area nomenclature.

      (5) Page 7: "of linearly independent degrees of freedom" - for a neuroscience audience, I think it is better to explicitly mention that the resulting PCs are uncorrelated.

      We agree that this section could benefit from clarification. We have attempted to provide additional nuance to indicate what the analysis was intended to test.

      “Despite the strong coupling between the proximal and distal joint angles, rich variation remained in the action of different joints over time. The presence of strong correlations across joints suggested that the kinematics may be well described by a smaller number of independent degrees of freedom than the total number of recorded angles. To assess the number of linearly independent (uncorrelated) degrees of freedom amongst the 24 joint angles and velocities, we used double-cross-validated PCA (Yu et al. 2009); Methods; Fig. 3D), finding intermediate dimensionalities of 7 (median for joint angles) and 10 (velocities; Fig. 3E). This is consistent with the idea that joint angles across the limb are coordinated instead of controlled independently, and that this coordination is flexible enough over time to enable accurately performing reaching and grasping to different targets.”

      (6) Page 7: In the Results, the authors should mention what indicator is being used, the imaging frame rate, and summarize briefly how cells were defined.

      Thank you for the suggestion, these details have been added to the relevant results section for clarity.

      “To do so, we recorded neural activity from neurons in layer 2/3 M1-fl extending into the immediately adjacent secondary motor cortex (M2), and the forelimb region of S1 (S1-fl) using two-photon calcium imaging of GCaMP6f-expressing neurons in layer 2/3 (185-230 μm deep, imaged at 31 Hz, cells extracted with Suite2p (Pachitariu et al 2017)).”

      (7) Page 7: "corrected at n=2" - n doesn't typically refer to the number of tests, so for clarity I would say "corrected for dual tests."

      Thank you for pointing this out, we have corrected the text and added additional explanation in the methods for our approach to determining statistical significance across the targets and locking events.

      “P-values obtained through the ZETA were then Bonferroni corrected for dual tests when measuring the number of cells modulated to a given event and corrected for six tests (2 targets and 3 events) when measuring the overall number of modulated cells.”

      (8) Page 7: In the Results, when the decoding is introduced, it would be helpful to have a few details without having to hunt through the Methods. For example, were things regularized, how was cross-validation handled, etc?

      Thank you for the suggestion, these details have been added to the relevant results section for clarity.

      A simple linear regression model related the single-trial joint angles at all time points to single-trial neural activity at the corresponding moments. The model was fit with ridge regression, the ridge penalty was determined via a heuristic (Karabatsos 2018), and performance was measured on held-out trials (80/20 train/test split, 50 folds).

      (9) Page 8: I think it is worth noting how much mouse reaching involves shoulder rotation as opposed to movement in other joints, as this seems very different from primates.

      Thank you for pointing this out. We think this is mostly a task difference: our mice were in a quadrupedal stance, whereas monkeys are typically asked to reach from a sitting position. We now mention this in the Results. 

      “Reaching evoked particularly large rotation of the shoulder, likely because the mice reached from a quadrupedal position to targets on either side of the snout.”

      (10) Page 8: Should provide quantification to clarify what is meant by "closely tracked."

      We have updated the text to indicate that this claim was meant to be qualitative, and to more clearly highlight that the interest here is the first demonstration of the ability to reconstruct valid forelimb postures from decoded joint angles in the mouse. Quantifying the reconstruction properly would require substantially more manual data labeling, and the successful decoding itself demonstrates indirectly that the reconstructions are good enough to obtain the results of interest.

      Additionally, we reconstructed the skeletal representation of the forelimb from the decoded joint angles and found that, as intended, the reconstructed postures had strong qualitative resemblance to the true postures, even of “minor” angles like cylindrical paw deformation or digit splay (Fig. 5C,G).

      (11) Page 8: "Overall, these results suggest that instantaneous movement-related signals are similarly distributed across CFA and fS1." - I know we are being succinct here, but this sentence sounds like a non sequitur in the context of this paragraph - perhaps include a conclusion from the results in this paragraph first, then summarize the whole section.

      Thank you for the suggestion, we have updated this text to more clearly conclude the results of this section.

      Overall, these results reveal that neural activity in M1-fl and S1-fl is closely related to the kinematic details of reach-to-grasp movements. The ability to decode substantial variance in proximal and distal joints suggests that this relationship extends to the entire forelimb and the similar performance obtained from each area suggests that this information is similarly distributed across M1-fl and S1-fl. 

      (12) Page 10: Mention of projections from fS1 does not explicitly specify their preferential targeting of the dorsal horn, which seems relevant.

      We appreciate the suggestion and have added this detail to the text.

      Rodent S1-fl is known to influence interneuron populations in the spinal cord through direct and indirect projections that predominantly target the dorsal horn (Ueno et al. 2018), thus these signals may also reflect S1-fl’s important role in modulating reflex circuits to coordinate sensory feedback with movement generation (Moreno-López et al. 2016; Moreno-Lopez et al. 2021; Seki et al. 2003).

      (13) Page 31: Labels on the figure indicating what blue and red stand for would be helpful.

      Thank you for the suggestion, labels have been added to indicate left and right trials for Figure 5 C/F and Figure 6A.

      (14) Page 32: Legend does not include panel D.

      Thank you for catching this, the corresponding caption has been added.

      Reviewer #2 (Recommendations for the authors):

      (1) The Introduction could perhaps set the central question in starker relief. What specifically do the authors mean by high- vs low-level control? As suggested by the cited studies, this has been a fraught issue in primate work for decades, and I think a finer-grained framing of alternative hypotheses would help set up the results. For example, would better performance at decoding joint angles than paw position be evidence for lower-level control? The clarity of the Introduction might also be improved if the facts and unknowns were broken down by species throughout.

      We have tried to further improve the focus of the Introduction on the central question, clarify what we mean, and make clearer in the review of the literature which species a finding comes from.

      The clarifying text from the introduction is quoted below:

      Extensive motor mapping experiments in rodents have revealed that activating different parts of the sensorimotor cortex evokes movements of different body parts or different kinds of movements of the same body part, as it does in primates (for review, see (Harrison and Murphy 2014)). Yet it is unclear how the topography of stimulation-evoked movements relates to the roles of these areas during volitional actions. Perturbations during behavioral tasks in mice involving forelimb lever or reaching movements have provided a coarse-level understanding of how these areas contribute during behavior. Inactivations and lesions of M1 and S1 have shown that M1 is required for the execution of dexterous reach-to-grasp movements (Guo et al. 2015; Sauerbrei et al. 2020; Galiñanes et al. 2018; Wang et al. 2017; Whishaw et al. 1991; Whishaw 2000) and that S1 is essential for adapting learned movements to external perturbations of a joystick (Mathis et al. 2017). However, spinal cord projections from mouse M1 and S1 primarily target spinal interneurons rather than directly synapsing onto motor neurons (Gu et al. 2017; Ueno et al. 2018; Wang et al. 2017), suggesting cortical activity might play a more modulatory role. Further, stimulation of brainstem nuclei alone can evoke naturalistic forelimb actions, including realistic reaching movements involving coordinated flexion and extension of the proximal and distal limb (Esposito et al. 2014; Ruder et al. 2021; Yang et al. 2023). Taken together, these results have raised the question of what role mouse M1 and S1 play in the control of goal-directed forelimb movements. 

      One route to answering this question involves characterizing the signals present in mouse M1 and S1 during movement. If mouse M1 and S1 were to control only high-level aspects of forelimb movements, activity should be dominated by ‘abstract’ signals like target location and reflect little trial-to-trial variability in reach kinematics. If instead M1 and S1 control low-level movement features then activity should correlate strongly with forelimb joint angle kinematics and their trial-to-trial variation when reaching to different targets. While the presence of high- or low-level signals in a cortical area does not necessarily imply that they are causally responsible for these aspects of movement, characterizing what signals are present serves as a first step toward determining how these areas relate to movement.

      (2) The kinematics and calcium traces appear to be highly stereotyped across trials. If the population encodes joint angles, would one expect to find correlations between the neural and kinematic residuals after subtraction of the time-varying means? Some additional analysis and/or discussion on this point would be helpful, especially as there are only two targets.

      This is a great idea. As suggested, we implemented regression models on the residuals for each target in the new Figure 5S3. Figure 5S3 A and B show the performance when decoding the residuals for right trials and C and D show performance for left trials. Decoding remained well above chance, despite shrinking down due to predicting this relatively small within-target variation. This analysis supports our claims from the main regression models in Figure 5 and 5S1-2, and also suggests that movements ipsilateral to the reaching limb (contralateral to the recording hemisphere) may be better encoded than movements contralateral to the reaching limb. We have added a reference to this additional residual analysis in the final paragraph of the decoding section of the Results section:

      “Finally, we tested whether the ability to decode these many joint angles was a direct consequence of inter-joint correlations, and might not be indicative of the presence of “real” information about some of these joints. To do so, we fit partial correlation models that removed correlations between proximal and distal joints, or removed correlations of the joint angles with a high-level parameter – the overall distance of the paw centroid to the spout. Despite substantially lowering the behavioral variance, in each case the residuals could still be decoded from neural activity (Fig 5S2A-D). Similar decoding performance for M1-fl and S1-fl was obtained from models fit to decode single-trial residuals separately for left and right trials (Fig 5S3A-D), indicating that trial-to-trial variations on each basic movement were decodable from these populations.”

      Along similar lines, binary classification is used to characterize cue-, lift-, and contact-responsive neurons. Is it possible to exploit trial-to-trial variation in the cue-lift and lift-contact latencies to extract the time-varying marginal effects of each event (e.g., using a GLM)?

      For the detection of single-cell modulations by different events, we have elected to retain our simple statistical test to determine modulation; in our experience, encoding models typically involve a surprising number of steps to get them to do what you actually intend. We leave more extensive encoding model-style analysis to future work, currently in progress.

      (3) The authors mention prior studies suggesting that the control of some forelimb tasks can be gradually transferred from the cortex to the subcortical centers. Have they performed the inactivation at different time points across learning, and if so, do they have evidence for a diminishing effect over time (e.g., blocking of both initiation and coordination early in training)? In addition, the effects of motor cortex inactivation are similar to, but slightly different from, effects shown in reaching tasks in prior studies. Some additional discussion on this point would be useful.

      Our inactivation experiments in this study were intended to coarsely demonstrate the involvement of mouse forelimb sensorimotor cortex in our task. We have not performed the inactivations over learning and leave such experiments to future work. 

      We agree that a little more clarity relating our results to previous ones was warranted. Previous studies (Guo et al. 2015 and Galinanes et al. 2018) have demonstrated inactivation impacts on similar tasks, but for thoroughness we sought to show the same for our task as it varied from the pellet and motorized water spout tasks in both training time and target configurations. Our results are strongly in line with those of Galinanes et al. 2018 which used a fairly similar water spout target configuration. In the inactivation experiments of that paper, 3 out of 13 animals with initiation-triggered inactivations were able to initiate reaching within a time window similar to control trials. Additionally, a proportion of trials across multiple mice proceeded with little perturbation from the inactivations. This is consistent with our observation that M1-fl inactivations may either abolish movement initiation or allow movement initiation but impair task completion on a trial-by-trial and animal-to-animal basis. Further work is required to determine what factors influence these differential responses to inactivation and to determine how these effects differ across task variations (i.e., pellet vs water spout). We have added a brief description of these nuances to the text for clarity. 

      “These inactivations blocked the execution of the reach to grasp sequence, preventing the animal from making contact with the spout during the 3-second laser stimulation period (Fig. 1F; 86.5% control trials with contact within 3 seconds of cue, 5.1% inactivation trials with contact, P < 10<sup>-191</sup>, Mann-Whitney U test, 2 mice, 495 stimulation trials). Interestingly, inactivation at the time of cue often did not prevent reach initiation (mouse 1: 54.7%, mouse 2: 34.2% of inactivation trials with lift within 3 seconds; 93.5%, 86.2% control trials). Yet the movement stalled once the paw and digits extended towards the spout, producing uncoordinated and unsuccessful reaching trajectories (Fig. 1I, two representative datasets). Taken together, these results support the involvement of M1-fl in the water-reaching task and suggest that the strength of inactivation effects may depend on specific task details like training time or target configuration (c.f. Galinanes et al. 2018).”

      Minor points

      (1) The rationale for the multiple comparisons procedure in identifying event-locked responses should be explained in more detail. If I understand correctly, the authors are not correcting for comparisons across ROIs, but instead control the family-wise error rate across brain regions and event types (dividing alpha by two or six). Why not instead control the false discovery rate across ROIs? 

      Thank you for pointing this out, it was confusing as written and we received a similar comment from Reviewer 1. We have fixed the wording now to make it clearer why we did this. We simply aimed to describe how many of the recorded neurons in each area were modulated by the task as a proxy for the engagement of these areas during the behavior, and to use this measure of modulation as a criterion for including the neuron in subsequent analysis. In other words, if the question had been “are any neurons in this area modulated by the task?” then correcting for the number of ROIs would be the correct method; but if the question is, “is this neuron probably modulated and therefore worth including in my decoder?” correcting for the number of ROIs will typically be much too conservative. Thus, we only sought to correct for the false discovery rate across events and targets for each ROI. We have added additional text in the methods to clarify these choices, below. Please also see response to (7) from Reviewer 1 above.

      “Note that we did not correct for the number of ROIs tested for two reasons. First, the goal of this testing was to serve as a criterion for inclusion in subsequent decoding analyses, not to determine whether any neurons in the area at all were modulated; and second, correcting for the number of ROIs would bias comparison between areas if different numbers of ROIs were recorded in one area vs. the other.”

      (2) It appears joint angles are treated as linear variables in the decoding analysis; is this correct? This seems reasonable as long as the range of motion is not too large, but the authors might briefly comment on the issue in the Methods. 

      Yes, all joint angles are treated as linear variables in the linear regression model. We observed empirically (as can be seen in Figure 3B and Figure 5B/F) that the joint angle variables were relatively constrained to specific ranges during the task, with no angles displaying substantial wrap-around during the reaching and grasping movements. It is true that use of nonlinear decoding would almost surely improve performance further. Future work could also compare decoding of joint angles with muscle forces, which correlate and which we made no effort to distinguish here. In this work, though, the demonstration of a substantial relationship between neural activity and kinematics already tells us that fine details of movement are present in the M1 and S1-fl populations, which is a critical fact to understand these areas and was not previously known. We now comment explicitly on this, as suggested.

      “Joint angle or velocity kinematics were linearly interpolated from their original 6.66 ms to 10 ms and smoothed with a Gaussian (15 ms s.d.). These angular variables were then treated linearly in decoding analyses as their ranges were relatively constrained during the reaching and grasping movements; although the true relationships are likely nonlinear, this serves as a sufficient approximation to demonstrate the presence of a relationship between neural activity and kinematics.”

      (3) Are the limb pose estimates mirrored along the mediolateral axis? Figures 1C and 2D appear to show reaches to the left spout on the animal's right.

      Thank you for pointing out the ambiguity in the display of these data. The reach trajectories were not mirrored along the mediolateral axis, but they are displayed from the perspective of the behavioral imaging cameras as shown in Figure 1A. Thus the right target reaches (ipsilateral to the animal’s reaching arm) are on the left side of the camera image and the left target reaches (contralateral to the animal’s reaching arm) are on the right side of the image. We have clarified this in the figure captions.

    1. eLife Assessment

      This important study uses an original method to address the longstanding question of why reaching movements are often biased. The combination of a wide range of experimental conditions and computational modeling is a strength. Convincing evidence is presented in support of the main claim that most of the biases in 2-D movement planning originate in misalignment between visuo-proprioceptive reference frames.

    2. Reviewer #1 (Public review):

      Wang et al. studied an old, still unresolved problem: Why are reaching movements often biased? Using data from a set of new experiments and from earlier studies, they identified how the bias in reach direction varies with movement direction and movement extent, and how this depends on factors such as the hand used, the presence of visual feedback, the size and location of the workspace, the visibility of the start position and implicit sensorimotor adaptation. They then examined whether a target bias, a proprioceptive bias, a bias in the transformation from visual to proprioceptive coordinates and/or biomechanical factors could explain the observed patterns of biases. The authors conclude that biases are best explained by a combination of transformation and target biases.

      A strength of this study is that it used a wide range of experimental conditions with also a high resolution of movement directions and large numbers of participants, which produced a much more complete picture of the factors determining movement biases than previous studies did. The study used an original, powerful and elegant method to distinguish between the various possible origins of motor bias, based on the number of peaks in the motor bias plotted as a function of movement direction. The biomechanical explanation of motor biases could not be tested in this way, but this explanation was excluded in a different way using data on implicit sensorimotor adaptation. This was also an elegant method as it allowed the authors to test biomechanical explanations without the need to commit to a certain biomechanical cost function.

      Overall, the authors have done a good job mapping out reaching biases in a wide range of conditions, revealing new patterns in one of the most basic tasks, and the evidence for the proposed origins is convincing. The study will likely have substantial impact on the field, as the approach taken is easily applicable to other experimental conditions. As such, the study can spark future research on the origin of reaching biases.

      Comments on revisions:

      The authors have addressed my concerns convincingly. The inclusion of the data on movement extent, and the comparison with the data and explanation of Gordon et al. (1994), has strengthened the paper, as it shows that the proposed model can also explain biases in movement extent. I also appreciate the addition of the mathematical analysis, although I suspect that this analysis can be developed further to yield more detailed insights into the conditions under which the 1-, 2- and 4-peaked patterns arise, but that is a more suitable question for follow-up work.

    3. Reviewer #2 (Public review):

      Summary:

      This work examines an important question in the planning and control of reaching movements - where do biases in our reaching movements arise and what might this tell us about the planning process. They compare several different computational models to explain the results from a range of experiments including those within the literature. Overall, they highlight that motor biases are primarily caused errors in the transformation between eye and hand reference frames. One strength of the paper is the large numbers of participants studied across many experiments. However, one weakness is that most of the experiments follow a very similar planar reaching design - with slicing movements through targets rather than stopping within a target. This is partially addressed with Exp 4. This work provides a valuable insight into the biases that govern reaching movements. While the evidence is solid for planar reaching movements, further support in the manner of 3D reaching movements would help strengthen the findings.

      Strengths:

      The work uses a large number of participants both with studies in the laboratory which can be controlled well and a huge number of participants via online studies. In addition, they use a large number of reaching directions allowing careful comparison across models. Together these allow a clear comparison between models which is much stronger than would usually be performed.

      Comments on revisions:

      I thank the authors for all the additions to the manuscript, which has addressed my concerns.

    4. Reviewer #3 (Public review):

      This study makes excellent use of a uniquely large dataset of reaching movements collected over several decades to evaluate the origins of systematic motor biases. The analyses convincingly demonstrate that these biases are not explained by errors in sensed hand position or by biomechanical constraints, but instead arise from a misalignment between eye-centric and body-centric representations of position. By testing multiple computational models across diverse contexts-including different effectors, visible versus occluded start positions-the authors provide strong evidence for their transformation model. My earlier concerns have been addressed, and I find the work to be a significant and timely contribution that will be of broad interest to researchers studying visuomotor control, perception, and sensorimotor integration.

      Comments on revisions:

      None

    5. Author response:

      The following is the authors’ response to the previous reviews

      General recommendations (from the Reviewing Editor):

      The reviewers agreed that addressing some specific concerns would improve the clarity of the paper and the strength of the conclusions. These points are listed below, and described in more detail in the reviewer-specific 'Recommendations for Authors':

      We thanks the editor and reviewers for the encouraging feedback and constructive comments. We provide our point-by-point response below.

      (1) The details of the new experiment including number of subjects and a description of the analysis should be provided in the main text.

      We now provide a detailed description of the methods (including the number of subjects; N = 30) and analyses for the new experiment. See our response to Reviewer 2 for more details.

      (2) It would be informative to see how the amplitude biases observed, agree with those found by Gordon et al. 1994.

      Addressed. Please see our response to Reviewer 1, comment 1.

      (3) Each of the models lead to different bias patterns. It would be very helpful to hear the author's interpretation, ideally with a mathematical explanation, of what leads to these distinct patterns.

      Addressed. Please see our response to Reviewer 1, comment 2.

      Reviewer #1 (Recommendations for the authors):

      (1) Most of my points have been addressed convincingly in this revision. The new experiment in which also biases in movement amplitude were determined is a welcome addition to the paper. However, I could not see the results of this study, as the authors did not include Fig. 4 in the manuscript, but repeated Fig. 3. That's unfortunate as I would have like to see the similarity between the biases in direction and amplitude. Moreover, I would have liked to see how the amplitude biases agree with those found by Gordon et al. EBR (1994) 99:112-130, and to which extent Gordon et al.'s explanation can explain the pattern.

      We apologize for including the incorrect figure in the previous version of our manuscript. We did make a correction and submitted a corrected version, but it appears that it didn’t make its way to you. The correct Figure 4 is now in the manuscript.

      The motor biases in amplitude (extent) observed in Experiment 4 (Author response image 1) are qualitatively similar to the pattern reported by Gordon et al. 1994. While the exact peaks do not match perfectly, both datasets show a two-peaked pattern.

      Gordon et al. (1994) attributed the bias in amplitude to direction-dependent variation in movement speed which, in their view, arise from anisotropies in limb inertia. Specifically, moving the upper arm along its quasiorthogonal direction (i.e., rotation about the elbow) requires lower effective inertia than moving parallel to the upper-arm axis. Given the arm posture in both datasets, the upper limb points toward ~135°/315°, with the orthogonal direction corresponding to ~45°/225°. The two-peaked speed profiles in both our data Author response image 1 and Gordon et al. are consistent with this prediction.

      Author response image 1.

      Gordon et al (1994) noted that, while the extent bias function should mirror the speed bias function, the motor planning system might proactively compensate for the speed bias. Indeed, while the extent and speed bias functions are roughly aligned in their study, the two are misaligned in our Experiment 4. For example, the speed function peaks around 45° which corresponds to a valley in the extent bias function. The difference between their data and ours could be due to a difference in the starting point configuration. However, their model predicts alignment of the speed and extent functions independent of starting point configuration. In contrast, the TR+TG model does predict our observed extent bias function and yields predictions about how this should change with different start point configurations. As such, while heterogeneity in movement speed may contribute to extent bias to some degree, we think the transformation bias and visual-target bias likely play a larger role in determining the amplitude bias observed extent bias at movement endpoint.

      We have added a discussion section about the bias function reported by Gordon et al. (1994) and their account in the manuscript (lines 482-493). We do not repeat it here, as the content largely overlaps with the response above.

      (2) One of the most important new insights from this study is that the three single-source models lead to different bias patterns, with 1, 2 or 4 peaks. However, what I miss in the paper is an intuitive explanation why they do so. Now, the models are described and their predictions are shown, but it remains unclear where these distinct patterns come from. As scientists, we want to understand things, so I would very much appreciate if the authors can provide such an intuitive explanation, for instance using a mathematical proof. That could also identify how general these patterns are, or if there are certain requirements for them to occur (such as a certain shape of the transformation bias).

      Note that the closed-form mathematical expression for the motor bias function is not straight forward. As such, the intuition comes primarily from inspection, that is, the model simulations themselves, what we show Figure 1 of the paper. Importantly, the model predictions are insensitive to the parameter values over a reasonable range. Thus, the number of peaks predicted by each model is a core distinguishing feature. We present in the Supplementary Results a formalized mathematical analysis to illustrate how different models produce different numbers of peaks in the movement-bias function.

      (3) I think it's a good idea to change the previous "Visual Bias" into a "Target Bias". This raises the question whether the "Prioprioceptive Bias" should not be changed into a "Hand Bias" or "Start Bias"?

      While we appreciate the reviewer’s point here, we prefer the term “Proprioceptive Bias” given that this term has been used in the literature and provides a contrast with sources of bias arising from vision. “Hand Bias” and "Start Bias” seem more ambiguous.

      L51: I think "would fall short" should be replaced by "would overshoot".

      L127: I think "biased toward the vertical axis" should be replaced by "biased away from the vertical axis". Figure 3 still contains the old terminology like T+V. Please replace by the new terminology. L255: Replace "Exp 1a" by "Exp 1b".

      L376: Replace 60 by 6.

      L831-2: I hope the summed LL was maximized, not minimized.

      Thanks for catching the typos. We have corrected all of them.

      Reviewer #2 (Recommendations for the authors):

      I think that Experiment 4 does not mention how many participants performed the study. (Only in the response to the reviewers I found this)

      We have added information regarding the number of participants in the Fig 4 (N=30).

      I am very happy that the authors added the biomechanical simulation into the paper. I am not convinced that this addressed my concerns exactly but it is an excellent addition and the authors have now adjusted the text appropriately.

      We appreciate the positive response to our additional assessment of biomechanical factors. We welcome any additional information on how we might fully address this issue.

      line 826: extend -> extent

      Corrected.

      Figure 4. I think that the authors have put the wrong figure here. I cannot see any data for extent. I would need to see this figure (or please correct me - but the caption doesn't match the figure and I don't see the results clearly. (I think the review might have the correct figure).

      We apologize for this mistake. We now provided the correct Figure 4 in the paper (also included in the first page of the response letter).

      I am missing the detailed description on when the direction error and distance error were calculated for exp 4 - and what exactly was used? How did the authors examine the values without correction? What time point was used? Did I miss the analysis section for this?

      Participants were instructed to make fast, straight movement without any corrections and were given up to 1 s to complete the movement. Hand position was recorded once the movement speed dropped below 1 cm/s. On 99.8% of trials, movement speed did not increase once this threshold was passed, indicating that the participants adhered to the instructions. On the remaining trials, we detected a secondary corrective movement (increase in speed >5 cm/s). On these trials, we used the position recorded when the movement speed initially dropped below 1 cm/s as the endpoint position. The pattern of results would be the same were we to exclude these trials.

      This information has been added to the Methods section (line 661-666).

    1. eLife Assessment

      This valuable study assesses through simulations how several features of local cortical circuits - interneuron subtypes, their specific targeting of dendritic compartments, and certain brain rhythms - together affect the integration of synaptic inputs by a pyramidal cell. Employing several carefully considered simulation setups they convincingly demonstrate that beta rhythms are best suited to modulate and control dendritic Ca-spikes while gamma rhythms affect their coupling to somatic spiking, or how basal inputs are directly integrated into somatic spikes. However, the baseline setup may be idealized for the generation of the events in question and it would be beneficial if the similarity to the in-vivo activity regime was demonstrated further. The results will be relevant for neuroscientists studying local circuits or developing more abstract theories at the systems level.

    2. Reviewer #1 (Public review):

      In this study, the authors explore the implications of two types of rhythmic inhibition - "gamma" (30-80 Hz) and "beta"(13-30Hz) - for synaptic integration. They study this in a multi-compartmental model L5 pyramidal neuron with Poisson excitation and rhythmic inhibition (16 Hz and 64 Hz), applied either to the perisomatic or apical tuft regions in the neuron. They find that 64 Hz inhibition applied to the cell body is effective in phasic modulation of AP generation, while 16 Hz inhibition applied to the apical tufts is effective in phasic modulation of dendritic spikes (in addition to APs). Switching the location of the two kinds of rhythmic inhibition reduces the overall excitability, but is not effective in phasic modulation of either dendritic spikes and weakly so for somatic APs.

      Strengths:

      The effect of the timescale of rhythmic inhibition on synaptic integration is an interesting question, since a) rhythmic spiking is most strongly evident in inhibitory population, b) rhythmic spiking is modulated by behavioral states and the sensory environment. The methods are clear and data are well-presented. The study systematically explores the effect of two frequencies of rhythmic inhibition in a biophysically detailed model. The study considers not only idealized rhythmic inhibition but also the bursty kind that is observed in in-vivo conditions. Both distributed and clustered excitatory synaptic organization are simulated, which covers the two extremes of the spatial organization of excitatory inputs in-vivo.

    3. Reviewer #2 (Public review):

      Summary:

      The manuscript illustrates how spatial targeting (perisomatic vs distal, apical and basal dendritic) and timing of inhibition is crucial to distinct effects on neuronal integration, and show that beta and gamma oscillations differentially engage dendritic spiking mechanisms.

      Strengths:

      The strength of this study lies in the integrative biophysical modelling of a layer 5 pyramidal neuron by bringing together in vitro and in vivo observations

      Weaknesses:

      The weaknesses are probably in some of the parameterization of inhibitory synaptic dynamics. A unitary peak conductance of 1nS is very high for inhibitory synapses. This high value could invariably skew some of the network-level predictions. The authors could obtain specific parameters from the Neocortical Collaboration Portal (https://bbp.epfl.ch/nmc-portal/microcircuit.html), which comes across an incredible resource for cortical neurons and synapses.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1:

      SOM+ interneurons such as Martinotti cells target the apical tufts of pyramidals in the cortex. Since interneurons in general are strongly implicated in mediating rhythmic population activity over a range of timescales, it is quite appropriate to study the consequence of rhythmic inhibition provided by SOM+ interneurons for synaptic integration, including the phenomenon of dendritic spikes. However, using conclusions from a singular study (ref 22) to identify the beta band as the rhythm mediated by SOM+ is not very accurate. SOM+ interneurons have been implicated in regulating rhythms centered just below 30 Hz (refs 22, 21). It is a range that lies in the grey zone of the traditional definition of beta and gamma. However, it is significantly higher than the 16 Hz rhythms explored in this study. It thus remains unknown how a 25-30 Hz rhythmic inhibition (that has an experimentally suggested role for dendrite targeting SOM+ INs) in apical tufts regulates dendritic spikes.

      We agree with the reviewer that the rhythms arising from SOM+ interneurons can extend their frequencies higher than the 16 Hz analyzed in this study. To address this, we have conducted a new set of simulations where we delivered distal dendritic inhibition across a range of frequencies, from 0.5 to 80 Hz (see new Results section “Frequency specific effects of rhythmic inhibition on neuronal integration”). These results revealed, surprisingly, that at 30 Hz their ability to entrain Ca<sup>2+</sup> and NMDA spikes degrades (but not Na<sup>+</sup> spikes). This suggests that beta rhythms in the 20-30 Hz range are operating at the highest frequency for which dendritically targeting inhibition will be effective. The implications are covered in the Discussion section “Interaction with microcircuitry”. They are:

      “Particularly in the visual cortex, SOM interneurons can generate a rhythm in the 25-30 Hz range [22]. We found this to be at the upper end of the frequency range for dendritic inhibitory rhythms to be effective in modulating NMDA and Ca<sup>2+</sup> spikes. If this rhythm solely recruited SOM interneurons, its effectiveness would be marginal. Potentially compensating for this, recent work has found that PV interneurons also participate in beta/low-gamma [23, 24] (but see [21, 22]). In our model, on its own when beta rhythmic inhibition was delivered perisomatically we found that it was less able to entrain spiking and had an overall hyperpolarizing effect. However, if delivered in conjunction with the distal dendritic inhibition arising from SOM interneurons, this may strengthen entrainment.”

      Distal dendritic inhibition has been previously shown to be more effective in controlling dendritic spikes. However, given the slow timescale of dendritic spikes, it can be hypothesized that high-frequency rhythmic inhibition would be ineffective in entraining the dendritic spikes either in distal or proximal location, as demonstrated by 4H and 5F, and vice versa. A computational study can take this further by exploring the robustness of this hypothesis. By sticking to a single-frequency definition of what constitutes Gamma (64 Hz) and Beta (16 Hz) inhibition, the current exploration does support the core hypothesis. However, given the temporal dynamics of dendritic spikes, it is valuable to learn, for example, the upper bound of "Beta" range (13-30Hz) inhibition that fails to phasically modulate them. In addition to the reason stated in the earlier paragraph, Alpha band activity (8-12 Hz), has been implicated (e.g. van Kerkoerle, 2014) in signaling of inter-areal feedback to the superficial layer in the cortex, potentially targeting apical tufts of pyramidals from multiple layers and resulting in alpha-range rhythmic inhibition. To make the findings significant, it might therefore be more pertinent to understand the consequences of ~10Hz rhythmic inhibition (in addition to the ~25-30 Hz Beta/Gamma) in the apical tufts for phasic modulation of dendritic spikes.

      We added an additional set of simulations that address this in the Results section ‘Frequency specific effects of rhythmic inhibition on neuronal integration’. In general, we found that dendritic and perisomatic inhibitory rhythms at lower frequencies could entrain AP generation, but with less functional specialization. This is explored in our Discussion section ‘Interneuron specializations and rhythm timescales’.

      The differential effect of Gamma and Beta range inhibition on basal and apical excitatory clusters is not convincing from the information provided. The basal cluster appears to overlap with perisomatic inhibitory synapses. The description in the methods does not have enough information to negate the visual perception (ln 979-81). With this understanding, it is not surprising that the correlation between excitation and APs is high (during the trough of gamma) for basal and not apical excitation. A more comparable scenario would be a more distal location of the basal excitatory cluster.

      While we stated in the original manuscript that we were contrasting ‘basal’ vs. ‘apical’ clustered inputs, this terminology did not reflect our intent with these analyses. We meant to contrast proximal vs. distal dendritic clustered synaptic inputs, which the reviewer correctly noted is confounded in the apical vs. basal comparison. We have rewritten these results, their discussion, and corresponding figure, to clearly state that we are contrasting proximal vs. distal synaptic input.

      Reviewer #2:

      The weaknesses are probably in some of the parameterizations of inhibitory synaptic dynamics. A unitary peak conductance of 1nS is very high for inhibitory synapses. This high value could invariably skew some of the network-level predictions. The authors could obtain specific parameters from the Neocortical Collaboration Portal (https://bbp.epfl.ch/nmcportal/microcircuit.html), which is an incredible resource for cortical neurons and synapses.

      We appreciate the valuable resource mentioned by the reviewer and will consult it when constructing future models. Regarding the present one, our choice of peak conductance was based on previous studies, namely:

      Egger R, Narayanan RT, Guest JM, Bast A, Udvary D, Messore LF, Das S, de Kock CPJ, Oberlaender M (2020) Cortical output is gated by horizontally projecting neurons in the deep layers. Neuron 105, 122-137.e128.

      and

      Xiang Z, Huguenard JR, Prince DA (2002) Synaptic inhibition of pyramidal cells evoked by different interneuronal subtypes in layer v of rat visual cortex. J Neurophysiol 88, 740-750.

      The study by Egger et al. used an inhibitory peak conductance of 1 nS and was simulating circuitry very similar to ours. We validated these synapses in pilot simulations that sought to characterize the resulting IPSPs and IPSCs, and whose results can be seen in Table 1 of our methods. These synapses exhibited IPSCs whose peak amplitudes ranged over values (~24162 pA) that agreed with the experimental literature, such as Xiang et al.

      Given this, we feel our parameterization of inhibitory synapses does not warrant any changes.

      Reviewer #3:

      What disappointed me a bit was the lack of a concise summary of what we learned beyond the fact that beta and gamma act differently on dendritic integration. The individual paragraphs of the discussion often are 80% summary of existing theories and only a single vague statement about how the results in this study relate. I think a summarizing schematic or similar would help immensely.

      We agree with the reviewer that a summary schematic would help the reader. This has been added to the manuscript as Figure 11. It demonstrates the principal findings of the paper and is referenced in the opening paragraph of the discussion section.

      Orthogonal to that, there were some points where the authors could have offered more depth on specific features. For example, the authors summarized that their "results suggest that the timescales of these rhythms align with the specialized impacts of SOM and PV interneurons on neuronal integration". Here they could go deeper and try to explain why SOM impact is specialized at slower time scales. (I think their results provide enough for a speculative outlook.)

      This discussion has been expanded under the section “Interneuron specializations and rhythm timescales”. The added text is:

      “So, while our results suggest that spatial targeting of SOM and PV interneurons aligns with the timescales of their network-level rhythms, it could also be that their timing and subcellular localization interact to produce specialized neuron-level functions [85]. For instance, NMDA and Ca<sup>2+</sup> spikes in the distal dendrites last for ~50 ms, making the slower beta rhythm more appropriate for bidirectionally controlling them. Both can be described as dynamical systems with distinct phases with differing sensitivity to inhibition. Ca<sup>2+</sup> spikes are dynamical events comprised of an initiation, plateau, and termination phase. Inhibition delivered during the plateau phase shortens their duration [86]. If the beta rhythm is comprised of cycling between periods of elevated excitation (increased NMDA spike generation) followed by elevated inhibition, then Ca<sup>2+</sup> spike initiation will tend to occur during the excitatory phase, and its plateau during the subsequent inhibitory phase. A plateau during the inhibitory phase will more quickly enter termination. This is bidirectional control. On the other hand, slower rhythms (e.g. 1 Hz) initiate Ca<sup>2+</sup> spikes during the excitatory phase that plateau and enter termination autonomously, before the inhibitory phase is reached. The same principle holds for NMDA spikes [87]. As a result, rhythms in the range from 15-30 Hz are optimal for synchronizing the onsets and offsets of dendritic spikes across a population of neurons.

      The integrative effects of gamma (>40 Hz) are also specialized. Low frequency inhibitory rhythms delivered to the soma tended to shift the membrane potential higher or lower with the rhythm’s phase, effectively bringing it closer or farther from AP generation but not changing the neuron’s sensitivity to fast synaptic inputs. In the gamma frequency range, this is reversed, with the mean membrane potential not varying with rhythm phase but with a shifting bias to positive or negative membrane potential fluctuations. In addition, the trough phase of gamma lowers the threshold for AP generation, while slower rhythms like beta only raise the threshold. Consequently, the timing of gamma is ideal for increasing the sensitivity of the neuron to rapid excitation. This agrees with the observation that gamma oscillations accompany rapid excitation-inhibition balancing [88].”

      We also extended our discussion section ‘Relevance to coding’ to explore how beta and gamma rhythms can support sparse vs. dense population coding, respectively. It reads:

      “One interpretation of rhythms arising from local inhibitory feedback is that they maintain the balance between excitation and inhibition. This can be thought of as a normalization operation that maintains activity within a set range. Normalization can be achieved either through a subtractive effect that raises the threshold for initiating an action potential, or a multiplicative effect that lowers the slope of the relationship between excitation and action potential firing rate. When considered at the population level, these normalization effects impact coding in different ways. Subtractive normalization increases sparsity by dropping out neurons whose excitation is below the raised threshold. Multiplicative normalization, however, encourages dense codes by scaling down firing rates and compressing the range of firing rates. This study found that while both perisomatic and distal dendritic inhibition produced subtractive effects, only perisomatic had a multiplicative effect. Tying this to beta and gamma, beta rhythms may encourage sparse population codes while gamma allows for dense.”

      Beyond that, the authors invite the community to reappraise the role of gamma and beta in coding. This idea seems to be hindered by the fact that I cannot find a mention of a release of the model used in this work. The base pyramidal cell model is of course available from the original study, but it would be helpful for follow-up work to release the complete setup including excitatory and inhibitory synapses and their activation in the different simulation paradigms used. As well as code related to that.

      We have added a Code and Data Availability section that addresses this. It reads: “Simulation code is deposited at ModelDB athttps://modeldb.science/2019883 . The raw simulation data are available from DBH upon request. Analysis code is posted as a github repo at https://github.com/dbheadley/InhibOnDendComp.”

    1. eLife Assessment

      The presented findings are important for the field of cell-cycle control. They provide new insights into the origin of cell size variability in budding yeast. The strength of evidence is solid. However, the conclusions could be more strongly supported by additional analysis.

    2. Reviewer #1 (Public review):

      Summary:

      The authors investigate the determinants of population-level cell size variability, quantified via the coefficient of variation, in budding yeast populations. Using a combination of computational modeling and experimental readouts, they conclude that mother-daughter division asymmetry is the dominant factor shaping the coefficient of variation of cell size. In particular, through parameter sensitivity analysis of the Chandler-Brown model and empirical perturbations, the authors show that size-control mutations have limited effects on CV, whereas modulating mother-daughter asymmetry, by changing the growth environment, produces substantially larger shifts.

      Strengths:

      (1) The study addresses a fundamental question in biophysics, i.e., what are the mechanisms that produce and maintain population size heterogeneity?

      (2) It provides a conceptual reconciliation for previous observations that size-control mutants often alter mean size but not CV.

      (3) The modeling framework is clearly explained and compared to the data.

      (4) The parameter sensitivity analysis is thoughtfully performed and provides transparent intuition about which parameters influence variability.

      (5) The writing is clear, and the figures are well-organized.

      Weaknesses:

      (1) The work focuses on the Chandler-Brown model, so it is not clear to what extent the conclusions depend on it. A sensitivity or robustness check using an alternative model would strengthen generality.

      (2) CV is the sole descriptor used to quantify heterogeneity; while this is an efficient descriptor, it must be handled with care when used on experimental data, as it may vary due to differences in the chosen observables (e.g., if size is identified via cell volume, length, area, number of proteins, etc.) instead of real differences in the distribution.

      (3) The experimental validation using varied nutrient conditions is interesting; however, the statistical significance of the found correlations should be provided/discussed.

    3. Reviewer #2 (Public review):

      Summary:

      This paper provides a new framework for understanding how cell size variability arises in budding yeast populations. Whereas previous studies emphasized G1/S size control in daughter cells as the main regulator of size homeostasis, the authors show that perturbations to this control checkpoint have only modest effects on population-wide size variability.

      By extending a stochastic model of the yeast cell cycle to include both mother and daughter lineages, the authors demonstrate that division asymmetry-stemming from slower growth and longer post-Start phases in mother cells-is the key factor determining the population coefficient of variation (CV). As mothers grow larger and daughters smaller, the overall size distribution broadens. Experimental measurements across multiple mutants and conditions support the predicted correlation between asymmetry and CV.

      Strengths:

      The main conceptual advance of this study is to consider the full proliferating population, and in particular the dominant mother lineages, rather than single-cycle daughters, thereby offering a population-level explanation for size variability that is consistent with several previous but seemingly conflicting results.

      Weaknesses:

      Nevertheless, the modelling is described superficially and has notable limitations.

      (1) The extended Chandler-Brown model was originally parameterized only for daughter cells, and its generalization to mothers introduces several new assumptions that are not directly tested.

      (2) The model treats asymmetry phenomenologically, without a mechanistic basis, so while it correctly identifies correlations, causality remains uncertain.

      (3) Moreover, since population CVs emerge from steady-state lineage dynamics, they could be sensitive to parameter choices or growth-related details not fully explored in the current analysis.

      In summary, this study provides a useful conceptual synthesis and a useful quantitative framework, but it should be clear that readers should interpret the modeling as heuristic. The central message-that division asymmetry dominates population size variability-remains interesting and well supported at the phenomenological level.

    4. Reviewer #3 (Public review):

      Summary:

      The article studies the origins of cell size random variability in budding yeast. Different strains with different average cell sizes have very similar noise measured using the coefficient of variability defined as the standard deviation over the mean. Manipulating the noise in key variables such as the duration of cell stages, the growth rate or the division strategy (adder, timer, sizer) was not enough to explain the observed noise in mutants. The proposed solution for the origin of most of the cell size noise is related to the asymmetry in the average cell size for cells with two different phenotypes: daughter cells (New cells that have not passed the first division) AND 'Mother cells' (the rest). The origin of the cell size noise is mainly related to the fact that the distributions of these phenotypes have different cell size distributions. The article includes simple statistical methods for hypothesis analysis and explanatory figures.

      Strengths:

      The article provides different approaches: experimental (mutants and different growth conditions) and computational (simulations) to explain and test the hypothesis. The methods are based on previous articles with simple conclusions and explanations easy to follow.

      The rigor level in both mathematical and biological approaches looks fair to me. The terms are well defined and consistent throughout the article. Authors use well-established analysis techniques.

      The proposed theoretical analysis is coarse-grained and therefore can explain different strains and mutations using mathematical tools (noise analysis), aiming to reach general (mathematically) claims. This approach strengthens the conclusions and provides a good language to set a bridge between the biological community and mathematicians (quantitative biologists).

      The concept that the population heterogeneity (mothers vs daughters) is a fundamental reason behind the cell size variability is not new, but this article presents a clear experimental justification for the development of complete models of cell size regulation. I consider this contribution very relevant to the community modelling cell size.

      Weaknesses:

      The concept that population heterogeneity (mother and daughters) with different cell size distributions explains the observed size variability in a heterogeneous population. It is not clear how the population composition can affect this heterogeneity. Intuitively, I would expect that the fraction (number of daughters)/(number of mothers) changes in different stages of the population expansion due to the mean duration of both stages can change in different growth conditions. I would suggest studying how different (or not) these fractions are in different conditions. The authors should acknowledge this effect and discuss briefly using, for instance, simple models of random variables addition (adding different fractions of individuals with different cell size distributions) in which cases (different fractions or different means and noises in their respective distribution) their contribution is relevant. Finally. Do different simulations (gradient or sizer, timer) predict different moments (mean and CV) in distributions of both mother size and daughter size?

      Related to the previous comment, I would also include the fraction (number of daughters)/(number of mothers) or the percentage in different growth conditions with their respective size moments (mean and CV) to test whether the resultant cell size moments are related to the addition of two variables with different fractions with their respective moments.

      It is interesting how the G1 timer and G1 Sizer are located in different quadrants of Figure 4D, while the studied mutants belong to the other quadrant. I expected them to be closer to the G1 timer, similar to that observed in Figure 4G. I think the authors should discuss this dissimilarity.

      Although the authors are working using a definite model, other models would predict different results, especially in synthetic data. For instance, the same models for obtaining sizers can predict different noise levels.

      Nieto, C. et al., 2024. npj Systems Biology and Applications, 10(1), p.61.

      Barber, Felix, et al., Frontiers in cell and developmental biology 5 (2017): 92.

      Teimouri, H. et al,.2020. The Journal of Physical Chemistry Letters, 11(20), pp.8777-8782.

      I would mention that the noise level also depends on whether the population has reached steady-state conditions. This would require multiple generations, and measure over at least a couple of thousand cells. Therefore, experiments with single-cell-derived colonies would present different levels of noise than the noise in steady conditions, especially if few cells were sampled. However, I acknowledge that the purpose of the article is not a detailed description of the system but rather the presentation of the concept and for that matter, this level of detail is not mandatory.

    1. eLife Assessment

      This important paper presents the discovery of the molecular basis of differential apterous expression during early Drosophila wing disc development. The evidence supporting these conclusions is compelling, ranging from classical genetic approaches to state-of-the-art genetic engineering techniques. By opening new questions, this paper is expected to be of broad interest to developmental biologists and geneticists working on transcriptional regulation.

    2. Reviewer #1 (Public review):

      Summary:

      The Drosophila wing disc is an epithelial tissue which study has provided many insights into the genetic regulation of organ patterning and growth. One fundamental aspect of wing development is the positioning of the wing primordia, which occurs at the confluence of two developmental boundaries, the anterior-posterior and the dorsal-ventral. The dorsal-ventral boundary is determined by the domain of expression of the gene apterous, which is set early in the development of the wing disc. For this reason, the regulation of apterous expression is a fundamental aspect of wing formation.

      In this manuscript the authors used state of the art genomic engineering and a bottom-up approach to analyze the contribution of a 463 base pair fragment of apterous regulatory DNA. They find compelling evidence about the inner structure of this regulatory DNA and the upstream transcription factors that likely bind to this DNA to regulate apterous early expression in the Drosophila wing disc.

      Strengths:

      This manuscript has several strengths concerning both the experimental techniques used to address a problem of gene regulation and the relevance of the subject. To identify the mode of operation of the 463 bp enhancer, the authors use a balanced combination of different experimental approaches. First, they use bioinformatic analysis (sequence conservation and identification of transcription factors binding sites) to identify individual modules within the 463 bp enhancer. Second, they identify the functional modules through genetic analysis by generating Drosophila strains with individual deletions. Each deletion is characterized by looking at the resulting adult phenotype and also by monitoring apterous expression in the mutant wing discs. They then use a clever method to interfere in a more dynamic manner with the function of the enhancer, by directing the expression of catalytically inactive Cas9 to specific regions of this DNA. Finally, they recur to a more classical genetic approach to uncover the relevance of candidate transcription factors, some of them previously know and other suggested by the bioinformatic analysis of the 463 bp sequence. This workflow is clearly reflected in the manuscript, and constitute a great example of how to proceed experimentally in the analysis of regulatory DNA.

      Weaknesses:

      The previously pointed weakness (vg expression, P compartment specific effects, early vs late analysis of ap expression in mutants) have been throughly and satisfactorily addressed by the authors.

    3. Reviewer #3 (Public review):

      In this manuscript, authors use the Drosophila wing as model system and combine state-of-the-arte genetic engineering to identify and validate the molecular players mediating the activity of one of the cis-regulatory enhancers of the apterous gene involved in the regulation of its expression domain in the dorsal compartment of the wing primordium during larval development. The paper is subdivided into the following chapters/figures:

      (1) In the first couple of figures, authors describe the methodology to genetically manipulate the apE enhancer (a cartoon summarizing all the previous work with this enhancer might help) and identify two well-conserved domains in the OR463 enhancer required for wing development (the m3 region whose deletion phenocopies OR463 deletion: loss of wing, and the m1 region, whose deletion gives rise to AP identify changes in the P compartment).

      (2) In the following three figures, authors characterize the m1 regulatory region, identify HOX and ETS binding sites, functionally validate their role in wing development and the activity of the genes/proteins regulating their activity (eg-. Hth and Pointed) by their ability to phenocopy (when depleted) the m1 loss of function wing phenotype. Authors conclude that Hth and Pointed regulate apterous expression through the m1 region.

      (3) In the last few figures, authors perform similar experiments with the m3 regulatory region to conclude that the Grn and Antennapedia regulate apterous expression through the m3 enhancer.

      My comments:

      Technically sound: As stated in my previous review, the work is technically excellent (authors use state-of-the-art genetic engineering to manipulate the enhancer and combine it with genetic analysis through RNAi and CRISPR/Cas9 and phenotypic characterization to functionally validate their findings), figures are nicely done and cartoons are self-explanatory.

      Poor paper writing: The paper is too long and difficult to read/understand, many grammatical mistakes are found, and formatting is in some cases heterodox.

      Science:

      (1) The question of "who is locating the relative position of the AP and DV boundaries in the developing wing?" is not resolved. I would then change the intro or reduce the tone of this question. Having said that, I agree that these results shed light on the wing phenotypes of some apterous alleles related to AP identify and growth and, as such, I congratulate the authors.

      (2) Identification of two TFs (Grain and Antp) mediating the regulation of apterous expression is interesting but some contextualization might be required. Data on Antp is not as convincing as data on Grn. I wonder whether Antp data can be removed at all.

      (3) I am not sure whether the term hemizygous is used properly

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The Drosophila wing disc is an epithelial tissue, the study of which has provided many insights into the genetic regulation of organ patterning and growth. One fundamental aspect of wing development is the positioning of the wing primordia, which occurs at the confluence of two developmental boundaries, the anterior-posterior and the dorsal-ventral. The dorsal-ventral boundary is determined by the domain of expression of the gene apterous, which is set early in the development of the wing disc. For this reason, the regulation of apterous expression is a fundamental aspect of wing formation.

      In this manuscript, the authors used state-of-the-art genomic engineering and a bottom-up approach to analyze the contribution of a 463 base pair fragment of apterous regulatory DNA. They find compelling evidence about the inner structure of this regulatory DNA and the upstream transcription factors that likely bind to this DNA to regulate apterous early expression in the Drosophila wing disc.

      Strengths:

      This manuscript has several strengths concerning both the experimental techniques used to address the problem of gene regulation and the relevance of the subject. To identify the mode of operation of the 463 bp enhancer, the authors use a balanced combination of different experimental approaches. First, they use bioinformatic analysis (sequence conservation and identification of transcription factors binding sites) to identify individual modules within the 463 bp enhancer. Second, they identify the functional modules through genetic analysis by generating Drosophila strains with individual deletions. Each deletion is characterized by looking at the resulting adult phenotype and also by monitoring apterous expression in the mutant wing discs. They then use a clever method to interfere in a more dynamic manner with the function of the enhancer, by directing the expression of catalytically inactive Cas9 to specific regions of this DNA. Finally, they recur to a more classical genetic approach to uncover the relevance of candidate transcription factors, some of them previously known and others suggested by the bioinformatic analysis of the 463 bp sequence. This workflow is clearly reflected in the manuscript, and constitutes a great example of how to proceed experimentally in the analysis of regulatory DNA.

      We thank the reviewer for these positive comments on the manuscript.

      Weaknesses:

      There are several caveats with the data that might be constructed as weaknesses, some of them are intrinsic to this detailed analysis or to the experimental difficulties of dealing with the wing disc in its earliest stages, and others are more conceptual and are offered here in case the authors may wish to consider them.

      (1) The primordium of the wing region of the wing imaginal disc is defined by the expression of the gen vestigial, which is regulated by inputs coming from the dorsal-ventral boundary (Notch and wg) and from the anterior-posterior boundary (Dpp). Having such a principal role in wing primordium specification and expansion, I am surprised that this manuscript does not mention this gene in the main text and only contains indirect references to it. I consider that the manuscript would have benefited a lot by including vestigial in the analysis, at least as a marker of early wing primordium. This might allow us to visualize directly the positioning of the primordium in the apterous mutants generated in this study, adding more verisimilitude to the interpretations that place this domain based on indirect evidence.

      Vg does indeed play a critical role on the formation of the wing disc, and it is an ideal marker for the identification of the wing pouch. In the updated version of the article, we have now followed the expression of vg in some of the OR463 mutants via immunostaining of the Vg protein (Supplementary Figure 6). Cells within posterior wing outgrowths in Δm1flies were invariably positive for Vg. This result further supports our previous identification of these cells as pouch cells. In those mutants in which no cross-over between DV and AP was observed, vg expression was severely reduced or absent, indicating that the wing pouch had not been specified. We thank the reviewer for this experimental idea, which we believe strengthens the final manuscript.

      We have added to the text:

      “To identify the nature of the posterior outgrowths, we performed anti-Vestigal (Vg) antibody staining of Δm1 mutants (Supplementary Figure 6). Vg is a key regulator of wing specifications and also participates in wing growth and patterning (Baena-Lopez & García-Bellido, 2006; Kim et al., 1996; Zecca & Struhl, 2007a). In those discs, in which the stripe was extended and the P compartment was enlarged, Vg was detected throughout the outgrowth, supporting the wing pouch identity of this region (Supplementary Figure 6B). Hemizygous Δm3 mutants presented a highly reduced anti-Vg signal, which suggests that no wing pouch is specified in these mutants (Supplementary Figure 6C).”

      (2) The authors place some emphasis on the idea that their work addresses possible coordination between setting the D/V boundary and the A/P boundary:

      Abstract: "Thus, the correct establishment of ap expression pattern with respect to en must be tightly controlled", "...challenging the mechanism by which apE miss-regulation leads to AP defects." "Detailed mutational analyses using CRISPR/Cas revealed a role of apE in positioning the DV boundary with respect to the AP boundary"

      Introduction: "However, little is known about how the expression pattern of ap is set up with respect that of en. In other words, how is the DV boundary positioned with respect to the AP boundary?"

      "How such interaction between ap and the AP specification program arises is unknown."

      Results: "Some of these phenotypes are reminiscent of those reported for apBlot (Whittle, 1979) and point towards a yet undescribed crosstalk between ap early expression and the AP specification program."

      At the same time, they express the notion, with which this reviewer agrees, that all defects observed in A/P patterning arising as a result of apterous miss-regulation are due to the fact that in their mutants, apterous expression is lost mainly in the posterior dorsal compartment, bringing novel confrontations between the A/P and the D/V boundaries.

      To me, the key point is why the expression of apterous in different mutants of the OR463 enhancer affects only the posterior compartment. This should be discussed because it is far from obvious that apterous expression has different regulatory requirements in the anterior and posterior compartments.

      We agree with the reviewer that the differential effect of the mutations on the expression of ap in the A and P compartment is a key factor underlying our explanation of how the phenotypes arise. To clarify this point, we have now extended our first discussion point. Moreover, we have included some other references of differential enhancer regulation in different wing disc compartments. In addition, we have discussed whether this effect has to do with the different regulation of the enhancer in the A and P compartment or due to regulation of downstream effectors.

      Added paragraph:

      “Although apE is active throughout the dorsal compartment, its disruption leads to a preferential loss of ap expression in posterior cells. The asymmetric effect of apE perturbation on the anterior and posterior compartments suggests that apE transcriptional control is not equivalent across the A/P axis. Compartment-dependent differences in enhancer regulation have also been documented in other developmental contexts; for example, the Distal-less DMX-R element is interpreted through distinct cofactor combinations (Sloppy paired anteriorly and Engrailed posteriorly) (Gebelein et al., 2004), and specific mutations within DMX-R preferentially disrupt enhancer function in anterior versus posterior cells. It is possible that apE is more sensitive to misregulation due to differential transcriptional regulation across compartments. Nevertheless, we cannot exclude the possibility that the posterior bias we observe arises not from enhancer logic per se, but from intrinsic differences in tissue architecture or the dynamics of boundary positioning during wing disc development.”

      (3) The description of gene expression in the wing disc of novel apterous mutants is only carried out in late third instar discs (Figs. 2, 3, 5, and 7). This is understandable given the technical difficulties of dealing with early discs, as those shown in the analysis of candidate apterous regulatory transcription factors (Fig. 4F, Fig. 6 C-D). However, because the effects of the mutants on apterous expression are expected to occur much earlier than the time of expression analysis, this fact should be discussed.

      We agree with the reviewer regarding the limitations of our analysis whenever we analyzed third instar larvae to assess the expression of the OE463 enhancer. We have included a statement in which this is mentioned in the discussion:

      “It is important to acknowledge that all expression analyses were conducted in third-instar discs, a stage that follows the initial establishment of ap expression. Earlier effects are therefore inferred rather than directly observed, as imaging and staging of early discs present significant technical challenges due to their small size and fragility. A direct observation of the early wing disc across mutant conditions would likely help to clarify the role of the discovered factors during early ap expression.”

      Reviewer #2 (Public Review):

      In their manuscript, "Transcriptional control of compartmental boundary positioning during Drosophila wing development," Aguilar and colleagues do an exceptional job of exploring how tissue axes are established across Drosophila development. The authors perform a series of functional perturbations using mutational analyses at the native locus of apterous (ap), and perform tissue-specific enhancer disruption via dCas9 expression. This innovative approach allowed them to explore the spatio-temporal requirements of an apterous enhancer. Combining these techniques allowed the authors to explore the molecular basis of apterous expression, connecting the genotypes to the phenotypical effects of enhancer perturbations. To me, this paper was a beautiful example of what can be done using modern drosophila genetics to understand classic questions in developmental biology and transcriptional regulation.

      In sum, this was a rigorous paper bridging scales from the molecular to phenotypes, with new insight into how enhancers control compartmental boundary positioning during Drosophila wing development.

      We would like to thank the reviewer for its positive and encouraging comments, as well as for the careful review of the manuscript and figures. We have adapted most of the suggestions in the new manuscript.

      Reviewer #3 (Public Review):

      In this manuscript, authors use the Drosophila wing as a model system and combine state-ofthe-art genetic engineering to identify and validate the molecular players mediating the activity of one of the cis-regulatory enhancers of the apterous gene involved in the regulation of its expression domain in the dorsal compartment of the wing primordium during larval development.

      (1) The authors raise two very important questions in the Introduction: (1) who is locating the relative position of the AP and DV boundaries in the developing wing, and (2) who is responsible for the maintenance of the apterous expression domain late in larval development. None of these two questions have been responded to and, indeed, the summary of the work (as stated in the conclusions of the last paragraph of the Introduction) does not resolve any of these questions.

      We believe the results presented, together with those added during the revision, shed some on the positioning of the boundary. We proposed that the combined integration of four TFs by the OR463 enhancer is fundamental for the correct positioning. Additionally, we proposed a model on how these positioning problems result in the phenotypes observed (Supplementary figure 7, now also shown in Figure 2D). Our results indicate that ap expression in the PD quadrant is particularly sensitive to mutations in the enhancer, which we have now further elaborated on in the first part of the discussion. Together, we believe that our results do tackle the first problem posed in the introduction, while not completely solving them. As for the second question, we have tried to remove any suggestions that this article tries to explain later regulation of apterous. Probably this misunderstanding arises from a sentence in the introduction which has now been deleted. The means of the maintenance of ap expression in later stages has been partially explored previously (See Bieli et al 2015) and it is subject of our current studies.

      (2) The authors have identified two different regions whose deletions give very interesting phenotypes in the adult wing (AP identify change & outgrowths, and loss of wing), and have bioinformatically identified and functionally verified 4 TFs that mediate the activity of these regions by their capacity to phenocopy the wing phenotype. While identification of the 2 TFs acting on the m1 is incremental with respect to previous work on the identification of the enhancer responsible for the early expression of Ap, identification of Antp and Grn does not explain the loss of function phenotype of the m3 enhancer. Does any of these results shed any light on the first two Qs? Do these results explain the compartment boundary position in the wing as stated in the title? Expression of lacZ reporter assays is fundamental to demonstrate their model of Figure 8. The reduction of the PD compartment is difficult to understand by the sole reduction in ap expression in this region (which has not been demonstrated).

      We agree that the identification of Antp and Grn does not by itself explain the loss-of-function phenotype of the m3 enhancer. However, these transcription factors represent the best current candidates for direct regulators for this enhancer. We have clarified in the text that Antp and Grn may not act as instructive inputs but rather play a permissive role in enabling ap expression through m3. Importantly, the dCas9-mediated perturbation experiments directly demonstrate that targeted manipulation of apE in this region is sufficient to produce the characteristic duplications, providing functional evidence that apE activity underlies the observed phenotypes. In addition, lacZ reporter assays confirm that apE expression is indeed affected in all cases where the experimental setup permitted detection. Together, these results validate that the observed morphological phenotypes stem from perturbation of apE activity and support the proposed model for enhancer regulation and its role in compartment boundary maintenance.

      (3) The authors state in one of the sections "Spatio-temporal analysis of apE via dCas9 ". No temporal manipulation of gene activity is shown. The authors should combine GAL4/UAs with the Gal80ts to demonstrate the temporal requirements of Antp/Grn and Pnt/Hth as depicted in their model of Figure 8.

      We agree with the reviewer that the temporal dimension was not explored in the first version of the manuscript (aside of the temporal constrains of en-Gal4 driver). As suggested by the reviewer, we have now used a tub-Gal80ts allele to temporally control the enhancer perturbation and delimit its window of activity. The results are included in two new panels in the figure 3 (H and H’). The new data agrees with the notion that apE enhancer is important up to L2 stages but dispensable later in development. We have added the following paragraph to the text:

      “To define the developmental time window during which the apE enhancer remains sensitive to repression, we combined the temperature-sensitive tub-Gal80<sup>ts</sup> system with temporally controlled expression of dCas9. Animals carrying the en-Gal4, tub-Gal80<sup>ts</sup>, UAS-dCas9 and U6-OR463gRNA(4x) transgenes were maintained at 18 °C to suppress dCas9 expression. Independent sets of embryos were then shifted to 29 °C at successive developmental intervals ranging from 0 to 168 h after egg laying (AEL), so that dCas9 induction occurred at distinct time points in development (Figure 3H). Under these conditions, dCas9 transcription was induced only after the temperature shift, while the gRNAs were expressed constitutively. Wing phenotypes were quantified in adult progeny as a readout of apE enhancer perturbation. When dCas9 was expressed from embryonic or early larval stages (0–48 h AEL), nearly all wings (70–90%) displayed severe ap-like phenotypes, including posterior compartment duplication and loss of anterior–posterior boundary integrity. Shifting animals later (48–72 h AEL) still produced a majority (~66%) of abnormal wings, whereas induction after 72 h AEL resulted in progressively weaker effects and complete loss of phenotypes by 96 h AEL (Figure 3H’).

      These results delineate the developmental period during which apE activity is required for proper wing patterning. Perturbation during the first half of the second larval instar (≤ 96 h at 18 °C) was sufficient to elicit strong ap-like transformations, consistent with the enhancer being functionally required during early larval stages and becoming dispensable thereafter. The temporal decline in phenotype penetrance thus reflects the progressive loss of apE sensitivity to dCas9-mediated repression, providing a precise estimate of when its activity is no longer required for wing morphogenesis.”

      (4) The authors have not managed to explain the AP phenotype. Thus, this work opens many unresolved questions and does not resolve the title, which is a big overstatement. Thus, strengths (technically excellent), weakness (there is not much to learn about wing development and apterous regulation from these results besides the incremental identification of 4 additional TFs mediating the regulation of ap expression by their ability to phenocopy regulatory mutations of the apterous gene).

      As mentioned in response to reviewer 1, we have indeed no concrete explanation  for why the P compartment seems more sensitive to mutations. We have now further discussed this point (see below paragraph, now included in  the discussion). As for how the adult phenotypes arise from the mutant wing discs, we have a good idea (see Supplementary figure 7 and Figure 2). 

      We are pleased to hear that the reviewer considers our article technically valuable. Therefore, we have reformulated the title such as the technical merits play a bigger role in it:

      ”in situ mutational screening and CRISPR interference demonstrate that the apterous Early enhancer is required for developmental boundary positioning”

      Paragraph added to the discussion:

      " Although apE is active throughout the dorsal compartment, its disruption leads to a preferential loss of ap expression in posterior cells. The asymmetric effect of apE perturbation on the anterior and posterior compartments suggests that apE transcriptional control is not equivalent across the A/P axis. Compartment-dependent differences in enhancer regulation have also been documented in other developmental contexts; for example, the Distal-less DMX-R element is interpreted through distinct cofactor combinations (Sloppy paired anteriorly and Engrailed posteriorly) (Gebelein et al., 2004), and specific mutations within DMX-R preferentially disrupt enhancer function in anterior versus posterior cells. It is possible that apE is more sensitive to misregulation due to differential transcriptional regulation across compartments. Nevertheless, we cannot exclude the possibility that the posterior bias we observe arises not from enhancer logic per se, but from intrinsic differences in tissue architecture or the dynamics of boundary positioning during wing disc development.”

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Formatting of references should be checked throughout the manuscript

      Reviewer #2 (Recommendations For The Authors):

      Here, I note a few points that would help clarify the manuscript and connect it with a broader community.

      Figure 1: it could help the reader to add the landing site genetic scheme to the main figure.

      In a first draft that was exactly the original configuration, but after comparing both versions we determined that the presence of the landing site removes a bit of the focus of the phenotypes.

      Figure 1: what species were used for the conservation alignment? Further details would be nice to add here.

      We have now added a section of bioinformatical analysis, which was missing in the original manuscript:

      Sequence conservation of the OR463 fragment within the ap upstream intergenic region was analysed across different dipteran species using the “Cons 124 Insects” multiple-alignment track of the D. melanogaster dm6 genome on the UCSC Genome Browser (Kent et al., 2002, https://genome.ucsc.edu). Conservation scores were obtained from the phastCons (Siepel et al., 2005) and used to delineate conserved and less conserved blocks within OR463. Conserved transcription factor binding sites were predicted with MotEvo (Arnold et al., 2011), which defined four conserved modules (m1–m4) and six inter-modules (N1–N6). Additional motif analysis was performed using the JASPAR CORE Insecta database and the Target Explorer tool to cross-validate conserved binding-site predictions and refine motif assignments within the enhancer.

      From Figure 2: I would consider moving the model or portions of it to a main figure. These models, while descriptive, really help make the manuscript more approachable. Note that eLife does not have forced figure requirements.

      We have adapted the reviewer’s suggestion and we are very grateful for it. We think the figure has greatly improved. The final figure now highlights a small part of the model, which is still included in the Supplementary Figure.

      Figure 5: This figure is fantastic, and the results are particularly important. I would recommend increasing the weight of the arrows from D to E, making it more obvious. Did the authors consider any temperature or other perturbations to look at robustness? They mention "robustness" a few times, and this could be an excellent system to explore a bit further. For panels F and G, it would be nice to have a bit of biochemistry here to test the spacing requirements' effects on the distances (but it's great phenotypical data, regardless).

      We have chosen a darker grey to highlight the lines. 

      We appreciate the reviewer’s suggestions. With respect to robustness assays, such as temperature perturbations, we agree that the apE enhancer would be a suitable system for such experiments. However, these analyses would move the study beyond its current scope, which is focused on defining the regulatory logic of boundary positioning through mutational dissection and CRISPRi. We therefore prefer not to expand the work in this direction here, but we note that this would be an interesting avenue for future investigation.

      Similarly, biochemical assays probing spacing requirements would provide additional mechanistic insight but would represent a separate line of work. In this manuscript, we aimed to establish the functional consequences of motif spacing using in vivo genetic and phenotypic analyses, which we believe sufficiently support our conclusions.

      Thank you for the insight.

      Discussion: To the point "most point mutations or short deletions in enhancer regions have little effect on gene expression" I would push the authors to discuss their work in relation to Fuqua et al., (Nature 2020) and Kvon et al., (Cell 2020). Their work is consistent with enhancers being sensitive to mutations, and this warrants further discussion because it could be important for the transcription field.

      Hox genes as pioneer factors, I would recommend citing Loker et al., (Curr Biol 2021), as an example of Hox genes functioning as a pioneer factor.

      We thank the reviewer for this suggestion. We have now added a short paragraph in the Discussion noting how our observations may relate to the mutational patterns described in Fuqua et al. (2020) and Kvon et al. (2020), while keeping the interpretation tentative. The text now says:

      “Recent large-scale enhancer mutagenesis studies have shown that the mutational consequences within enhancers can vary widely. In some cases, many nucleotide positions appear tolerant to single-base changes and only a small subset of mutations produce clear functional effects (Kvon et al., 2020). In other enhancers, regulatory information is distributed more densely, and mutations at multiple positions can alter output (Fuqua et al., 2020). Together, these studies illustrate that enhancer sensitivity is not uniform but depends on enhancer-specific features such as motif organization, cooperativity, and redundancy. Within this broader landscape, the apE enhancer appears to represent a particularly sensitive case.”

      We also included a citation to Loker et al. (2021) in connection with the possible pioneer-like contribution of HOX input to apE.

      We would like to thank all reviewers for their effort.

    1. eLife Assessment

      In this valuable study, Parrotta et al. showed that it is possible to modulate pain perception and heart rate by providing false heart rate (HR) acoustic feedback before administering electrical cutaneous shocks. The evidence supporting the claims of the authors is rather solid, although what they consider an interoceptive signal is not necessarily supported as such by the results. In this regard, including a larger number of trials per participant, increasing the sample size, and adding a measure of actual pain perception after its induction would have strengthened the study. Although mechanisms and some alternative explanations for this effect remain to be addressed, the work will nonetheless be of interest to neuroscientists working on predictions and perception, health psychologists, pain researchers, and placebo researchers.

    2. Reviewer #1 (Public review):

      Summary:

      I read the paper by Parrotta et al with great interest. The authors are asking an interesting and important question regarding pain perception, which is derived from predictive processing accounts of brain function. They ask: If the brain indeed integrates information coming from within the body (interoceptive information) to comprise predictions about the expected incoming input and how to respond to it, could we provide false interoceptive information to modulate its predictions, and subsequently alter the perception of such input? To test this question, they use pain as the input and the sounds of heartbeats (falsified or accurate) as the interoceptive signal.

      Strengths:

      I found the question well-established, interesting and important, with important implications and contributions for several fields, including neuroscience of prediction-perception and pain research. The study is clearly written, the methods are generally adequate, and the results indeed support the claim that false cardiac feedback modulates both pain perception and anticipatory cardiac frequency. Importantly, the authors include a control experiment using exteroceptive auditory feedback to test whether effects are specific to heartbeat-like cues. This addition substantially strengthens interpretability.

      Weaknesses:

      In my view, the authors' central interpretation, namely that the effects arise because the manipulation targets interoceptive rather than exteroceptive or high-level threat-related cues, cannot be fully supported by the current design. The evidence does not rule out the possibility that participants interpret increased heartbeat sounds as a generic danger/threat cue rather than as (manipulated) interoceptive input. I also disagree with several other claims, though they are less critical, for example, that the use of specific comparisons without pre-registering them, the use of sensitivity analysis to justify sample size, and the intentional use of only 6 trials per participant.

      Conclusion:

      To conclude, the authors have shown in their findings that predictions about an upcoming aversive (pain) stimulus - and its subsequent subjective perception - can be altered not only by external expectations, or manipulating the pain cue, as was done in studies so far, but also by manipulating a cue that has fundamental importance to human physiological status, namely heartbeats. Whether this is a manipulation of actual interoception as sensed by the brain is, in my view, left to be proven.

      Even if the authors drop this claim, the paper has important implications in several fields of science, ranging from neuroscience prediction-perception research, to pain research, and may have implications for clinical disorders, as the authors propose. Furthermore, it may lead - either the authors or someone else - to further test this interesting question of manipulation of interoception in a different or more controlled manner.

      I salute the authors for coming up with this interesting question and encourage them to continue and explore ways to study it and related follow-up questions.

    3. Reviewer #3 (Public review):

      Parrotta et al provide a convincing and thorough revision of their manuscript "Exposure to false cardiac feedback alters pain perception and anticipatory cardiac frequency". The authors addressed my previous concerns regarding theoretical framing and methodological clarity. For example:

      They provided additional detail on the experimental design, procedure and statistical analyses.

      The predictive coding rationale for the hypotheses has been clarified.

      The limitations of the study are discussed comprehensively

      Additional analyses were performed to investigate the role of learning effects and across-experiment effects

      New supplementary figures allow a closer look at the feedback-related response patterns

      In sum, the revisions improve the manuscript. However, some issues remain present.

      (1) Potential learning/ habituation effects. In my first review of the manuscript, I raised the concern that learning effects may have contributed to the observed differences between interoceptive & exteroceptive cues.<br /> The authors argue that the small number of six trials per condition could limit aversive effects of differential learning between experiments. However, electric nociceptive stimuli are exceptionally potent in classical conditioning experiments and humans can develop conditioned responses to these types of stimuli after a single trial [1-2]. Therefore, six trials are sufficient to allow for associative or expectancy-based learning processes.

      However, the authors are also presenting additional analyses, i.e. LME models which included trial rank as a predictor. While these models do not show a statistically significant learning effect, they do indicate a noteworthy larger effect in earlier trials compared to later ones. However, in my reading, this speaks towards the presence of unspecific effects of attention or arousal. This pattern is compatible with early learning or, alternatively, with non-specific attentional or arousal responses that diminish across repetitions. This is potentially a limitation of the design: repetition-related effects (attention reduction, arousal habituation, early learning) may contribute to the results, and distinguishing between interoceptive inference and non-specific effects remains challenging within this paradigm.

      (1) Haesen K, Beckers T, Baeyens F, Vervliet B. One-trial overshadowing: Evidence for fast specific fear learning in humans. Behav Res Ther. 2017 Mar;90:16-24. doi: 10.1016/j.brat.2016.12.001. Epub 2016 Dec 8. PMID: 27960093.

      (2) Glenn CR, Lieberman L, Hajcak G. Comparing electric shock and a fearful screaming face as unconditioned stimuli for fear learning. Int J Psychophysiol. 2012 Dec;86(3):214-9. doi: 10.1016/j.ijpsycho.2012.09.006. Epub 2012 Sep 21. PMID: 23007035; PMCID: PMC3627354.

      (2) SESOI and power rationale. The authors elaborated on the sensitivity analyses and the rationale of reporting SESOI rather than traditional a-priori power analyses and included this information in the manuscript, which improves transparency.

      (3) Unspecific arousal/ attention mechanisms. The authors argue against unspecific arousal mechanisms based on the absence of main effects in pain ratings and heart rate. This reduces the likelihood of a purely unspecific arousal account, however, these unspecific effects may not need to manifest as main effects. Unspecific mechanisms are likely adding (at least residual) effects onto the results.

      Regarding attention-based mechanisms, the authors have clarified that in Experiment 2 (exteroceptive cue), the participants are instructed that the sound does not have any relation with their heart rate. If participants did not receive any instructions on the meaning of the knocking sounds, they may have simply ignored it - not unlikely, also because the exteroceptive feedback did not elicit any systematic effect on the outcome variables (minus the slowing of HR with slower exteroceptive feedback, which may reflect noise, altering, multiple comparisons?). Ultimately, how the participants did or did not process the exteroceptive cue is unclear.

      (4) The authors provided more context to their hypothesis and strengthened its theoretical motivation (increased pain intensity with incongruent-high cardiac feedback), rooting it in predictive coding accounts of interoception. For instance, their prior study shows that participants report an increased cardiac frequency while anticipating pain. The reasoning behind this study is hence that if pain shapes cardiac perception, cardiac perception should in turn shape pain perception. The introduction has been revised accordingly, adding more references on the interplay between cardiac feedback and pain and emotional responses. While this rooting within the predictive processing framework is now clearly developed, it also underscores a gap between the proposed theoretical mechanism and the current analytical approach. The hypothesis is formulated in a mechanistic, computational-level language, yet the statistical analysis remains primarily descriptive, at a group level, and does not directly test the predictive-coding account.

      New concerns introduced by the revision:

      (1) Some of the newly added paragraphs interrupt the narrative flow. For example, the justification of the supradiaphragmatic focus based on the BPQ questionnaire feels too long for this section and might fit more naturally in the theoretical background or introduction. Similarly, the predictive-coding paragraph appearing after the hypotheses seems better suited to the earlier conceptual framing rather than following the hypothesis statements. It would be better for the argumentative flow if hypotheses followed from theoretical considerations.

      (2) The authors now note that the administration of the BPQ questionnaire was exploratory, explaining the null-results in the methods section as resulting from an underpowered design. But if the design is not appropriate for discovering a connection between self-reported body awareness and pain ratings, why was it administered in the first place? The rationale here is unclear.

      (3) The discussion is longer than before and would benefit greatly from streamlining the arguments.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      I read the paper by Parrotta et al with great interest. The authors are asking an interesting and important question regarding pain perception, which is derived from predictive processing accounts of brain function. They ask: If the brain indeed integrates information coming from within the body (interoceptive information) to comprise predictions about the expected incoming input and how to respond to it, could we provide false interoceptive information to modulate its predictions, and subsequently alter the perception of such input? To test this question, they use pain as the input and the sounds of heartbeats (falsified or accurate) as the interoceptive signal.

      Strengths:

      I found the question well-established, interesting, and important, with important implications and contributions for several fields, including neuroscience of prediction-perception, pain research, placebo research, and health psychology. The paper is well-written, the methods are adequate, and the findings largely support the hypothesis of the authors. The authors carried out a control experiment to rule out an alternative explanation of their finding, which was important.

      Weaknesses:

      I will list here one theoretical weakness or concern I had, and several methodological weaknesses.

      The theoretical concern regards what I see as a misalignment between a hypothesis and a result, which could influence our understanding of the manipulation of heartbeats, and its meaning: The authors indicate from prior literature and find in their own findings, that when preparing for an aversive incoming stimulus, heartbeats *decrease*. However, in their findings, manipulating the heartbeats that participants hear to be slower than their own prior to receiving a painful stimulus had *no effect* on participants' actual heartbeats, nor on their pain perceptions. What authors did find is that when listening to heartbeats that are *increased* in frequency - that was when their own heartbeats decreased (meaning they expected an aversive stimulus) and their pain perceptions increased.

      This is quite complex - but here is my concern: If the assumption is that the brain is collecting evidence from both outside and inside the body to prepare for an upcoming stimulus, and we know that *slowing down* of heartbeats predicts an aversive stimulus, why is it that participants responded in a change in pain perception and physiological response when listened to *increased heartbeats* and not decreased? My interpretation is that the manipulation did not fool the interoceptive signals that the brain collects, but rather the more conscious experience of participants, which may then have been translated to fear/preparation for the incoming stimulus. As the authors indicate in the discussion (lines 704-705), participants do not *know* that decreased heartbeats indicate upcoming aversive stimulus, and I would even argue the opposite - the common knowledge or intuitive response is to increase alertness when we hear increased heartbeats, like in horror films or similar scenarios. Therefore, the unfortunate conclusion is that what the authors assume is a manipulation of interoception - to me seems like a manipulation of participants' alertness or conscious experience of possible danger. I hope the (important) distinction between the two is clear enough because I find this issue of utmost importance for the point the paper is trying to make. If to summarize in one sentence - if it is decreased heartbeats that lead the brain to predict an approaching aversive input, and we assume the manipulation is altering the brain's interoceptive data collection, why isn't it responding to the decreased signal? --> My conclusion is, that this is not in fact a manipulation of interoception, unfortunately

      We thank the reviewer for their comment, which gives us the opportunity to clarify what we believe is a theoretical misunderstanding that we have not sufficiently made clear in the previous version of the manuscript. The reviewer suggests that a decreased heart rate itself might act as an internal cue for a forthcoming aversive stimulus, and questions why our manipulation of slower heartbeats then did not produce measurable effects.

      The central point is this: decreased heart rate is not a signal the brain uses to predict a threat, but is a consequence of the brain having already predicted the threat. This distinction is crucial. The well-known anticipatory decrease of heartrate serves an allostatic function: preparing the body in advance so that physiological responses to the actual stressor (such as an increase in sympathetic activation) do not overshoot. In other words, the deceleration is an output of the predictive model, not an input from which predictions are inferred. It would be maladaptive for the brain to predict threat through a decrease in heartrate, as this would then call for a further decrease, creating a potential runaway cycle.

      Instead, increased heart rate is a salient and evolutionarily conserved cue for arousal, threat, and pain. This association is reinforced both culturally - for example, through the use of accelerating heartbeats in films and media to signal urgency, as R1 mentions - and physiologically, as elevated heart rates reliably occur in response to actual (not anticipated) stressors. Decreased heartrates, in contrast, are reliably associated with the absence of stressors, for example during relaxation and before (and during) sleep. Thus, across various everyday experiences, increased (instead of decreased) heartrates are robustly associated with actual stressors, and there is no a priori reason to assume that the brain would treat decelerating heartrates as cue for threat. As we argued in previous work, “the relationship between the increase in cardiac activity and the anticipation of a threat may have emerged from participants’ first-hand experience of increased heart rates to actual, not anticipated, pain” (Parrotta et al., 2024). The changes in heart rate and pain perception that we hypothesize (and observe) are therefore fully in line with the prior literature on the anticipatory compensatory heartrate response (Bradley et al., 2008, 2005; Colloca et al., 2006; Lykken et al., 1972; Taggart et al., 1976; Tracy et al., 2017; Skora et al., 2022), as well as with Embodied Predictive Coding models (Barrett & Simmons, 2015; Pezzulo, 2014; Seth, 2013; Seth et al., 2012), which assume that our body is regulated through embodied simulations that anticipate likely bodily responses to upcoming events, thereby enabling anticipatory or allostatic regulation of physiological states (Barrett, 2017).

      We now add further explanation to this point to the Discussion (lines 740-758) and Introduction (lines 145-148; 154-156) of our manuscript to make this important point clearer.

      Barrett, L. F., & Simmons, W. K. (2015). Interoceptive predictions in the brain. Nature reviews neuroscience, 16(7), 419-429.

      Barrett, L. F. (2017). The theory of constructed emotion: An active inference account of interoception and categorization. Social cognitive and affective neuroscience, 12(1), 1-23.

      Bradley, M. M., Moulder, B., & Lang, P. J. (2005). When good things go bad: The reflex physiology of defense. Psychological science, 16(6), 468-473.

      Bradley, M. M., Silakowski, T., & Lang, P. J. (2008). Fear of pain and defensive activation. PAIN®, 137(1), 156-163.

      Colloca, L., Petrovic, P., Wager, T. D., Ingvar, M., & Benedetti, F. (2010). How the number of learning trials affects placebo and nocebo responses. Pain®, 151(2), 430-439.

      Lykken, D., Macindoe, I., & Tellegen, A. (1972). Preception: Autonomic response to shock as a function of predictability in time and locus. Psychophysiology, 9(3), 318-333.

      Taggart, P., Hedworth-Whitty, R., Carruthers, M., & Gordon, P. D. (1976). Observations on electrocardiogram and plasma catecholamines during dental procedures: The forgotten vagus. British Medical Journal, 2(6039), 787-789.

      Tracy, L. M., Gibson, S. J., Georgiou-Karistianis, N., & Giummarra, M. J. (2017). Effects of explicit cueing and ambiguity on the anticipation and experience of a painful thermal stimulus. PloS One, 12(8), e0183650.

      Parrotta, E., Bach, P., Perrucci, M. G., Costantini, M., & Ferri, F. (2024). Heart is deceitful above all things: Threat expectancy induces the illusory perception of increased heartrate. Cognition, 245, 105719.

      Pezzulo, G. (2014). Why do you fear the bogeyman? An embodied predictive coding model of perceptual inference. Cognitive, Affective & Behavioral Neuroscience, 14(3), 902-911.

      Seth, A., Suzuki, K., & Critchley, H. (2012). An Interoceptive Predictive Coding Model of Conscious Presence. Frontiers in Psychology, 2. https://www.frontiersin.org/articles/10.3389/fpsyg.2011.00395

      Seth, A. K. (2013). Interoceptive inference, emotion, and the embodied self. Trends in Cognitive Sciences, 17(11), 565-573.

      Skora, L. I., Livermore, J. J. A., & Roelofs, K. (2022). The functional role of cardiac activity in perception and action. Neuroscience & Biobehavioral Reviews, 104655.

      I will add that the control experiment - with an exteroceptive signal (knocking of wood) manipulated in a similar manner - could be seen as evidence of the fact that heartbeats are regarded as an interoceptive signal, and it is an important control experiment, however, to me it seems that what it is showing is the importance of human-relevant signals to pain prediction/perception, and not directly proves that it is considered interoceptive. For example, it could be experienced as a social cue of human anxiety/fear etc, and induce alertness.

      The reviewer asks us to consider whether our measured changes in pain response happen not because the brain treats the heartrate feedback in Experiment 1 as interoceptive stimulus, but because heartbeat sounds could have signalled threat on a more abstract, perhaps metacognitive or affective, level, in contrast to the less visceral control sounds in Experiment 2. We deem this highly unlikely for several reasons.

      First, as we point out in our response to Reviewer 3 (Point 3), if this were the case, the different sounds in both experiments should have induced overall (between-experiment) differences in pain perception and heart rate, induced by the (supposedly) generally more threatening heart beat sounds. However, when we added such comparisons, no such between-experiment differences were obtained (See Results Experiment 2, and Supplementary Materials, Cross-experiment analysis between-subjects model). Instead, we only find a significant interaction between experiment and feedback (faster, slower). Thus, it is not the heartbeat sounds per se that induce the measured changes to pain perception, but the modulation of their rate, and that identical changes to the rate of non-heartrate sounds produce no such effects. In other words, pain perception is sensitive to a change in heart rate feedback, as we predicted, instead of the overall presence of heartbeat sounds (as one would need to predict if heart beat sounds had more generally induced threat or stress).

      Second, one may suspect that it is precisely the acceleration of heartrate feedback that could act as cue to arousal, while accelerated exteroceptive feedback would not. However, if this were the case, one would need to predict a general heart rate increase with accelerated feedback, as this is the general physiological marker of increasing alertness and arousal (e.g. Tousignant-Laflamme et al., 2005; Terkelsen et al., 2005; for a review, see Forte et al., 2022). However, the data shows the opposite, with real heartrates decreasing when the heartrate feedback increases. This result is again fully in line with the predicted interoceptive consequences of accelerated heartrate feedback, which mandates an immediate autonomic regulation, especially when preparing for an anticipated stressor.

      Third, our view is further supported by neurophysiological evidence showing that heartbeat sounds, particularly under the belief they reflect one’s own body, are not processed merely as generic aversive or “human-relevant” signals. For instance, Vicentin et al. (2024) showed that simulated faster heartbeat sounds elicited stronger EEG alpha-band suppression, indicative of increased cortical activation  over frontocentral and right frontal areas, compatible with the localization of brain regions contributing to interoceptive processes (Kleint et al., 2015). Importantly, Kleint et al. also demonstrated via fMRI that heartbeat sounds, compared to acoustically matched tones, selectively activate bilateral anterior insula and frontal operculum, key hubs of the interoceptive network. This suggests that the semantic identity of the sound as a heartbeat is sufficient to elicit internal body representations, despite its exteroceptive nature. Further evidence comes from van Elk et al. (2014), who found that heartbeat sounds suppress the auditory N1 component, a neural marker of sensory attenuation typically associated with self-generated or predicted stimuli. The authors interpret this as evidence that the brain treats heartbeat sounds as internally predicted bodily signals, supporting interoceptive predictive coding accounts in which exteroceptive cues (i.e., auditory cardiac feedback) are integrated with visceral information to generate coherent internal body representations.

      Finally, it is worth noting that the manipulation of heartrate feedback in our study elicited measurable compensatory changes in participants’ actual heart rate. This is striking compared to our previous work (Parrotta et al., 2024), wherein we used a highly similar design as here, combined with a very strong threat manipulation. Specifically, we presented participants with highly salient threat cues (knives directed at an anatomical depiction of a heart), which predicted forthcoming pain with 100% validity (compared to flowers that did predict the absence of pain with 100%). In other words, these cues perfectly predicted actual pain, through highly visceral stimuli. Nevertheless, we found no measurable decrease in actual heartrate. From an abstract threat perspective, it is therefore striking that the much weaker manipulation of slightly increased or decreased heartrates we used here would induce such a change. The difference therefore suggests that what caused the response here is not due to an abstract feeling of threat, but because the brain indeed treated the increased heartrate feedback as an interoceptive signal for (stressor-induced) sympathetic activation, which would then be immediately down-regulated.

      Together, we hope you agree that these considerations make a strong case against a non-specific, arousal or alertness-related explanation of our data. We now make this point clearer in the new paragraph of the Discussion (Accounting for general unspecific contributionslines 796-830), and have added the relevant between experiment comparisons to the Results of Experiment 2.

      Forte, G., Troisi, G., Pazzaglia, M., Pascalis, V. D., & Casagrande, M. (2022). Heart rate variability and pain: a systematic review. Brain sciences, 12(2), 153.

      Vicentin, S., Guglielmi, S., Stramucci, G., Bisiacchi, P., & Cainelli, E. (2024). Listen to the beat: behavioral and neurophysiological correlates of slow and fast heartbeat sounds. International Journal of Psychophysiology, 206, 112447.

      Kleint, N. I., Wittchen, H. U., & Lueken, U. (2015). Probing the interoceptive network by listening to heartbeats: an fMRI study. PloS one, 10(7), e0133164.

      Parrotta, E., Bach, P., Perrucci, M. G., Costantini, M., & Ferri, F. (2024). Heart is deceitful above all things: Threat expectancy induces the illusory perception of increased heartrate. Cognition, 245, 105719.

      Terkelsen, A. J., Mølgaard, H., Hansen, J., Andersen, O. K., & Jensen, T. S. (2005). Acute pain increases heart rate: differential mechanisms during rest and mental stress. Autonomic Neuroscience, 121(1-2), 101-109.

      Tousignant-Laflamme, Y., Rainville, P., & Marchand, S. (2005). Establishing a link between heart rate and pain in healthy subjects: a gender effect. The journal of pain, 6(6), 341-347.

      van Elk, M., Lenggenhager, B., Heydrich, L., & Blanke, O. (2014). Suppression of the auditory N1-component for heartbeat-related sounds reflects interoceptive predictive coding. Biological psychology, 99, 172-182.

      Several additional, more methodological weaknesses include the very small number of trials per condition - the methods mention 18 test trials per participant for the 3 conditions, with varying pain intensities, which are later averaged (and whether this is appropriate is a different issue). This means 6 trials per condition, and only 2 trials per condition and pain intensity. I thought that this number could be increased, though it is not a huge concern of the paper. It is, however, needed to show some statistics about the distribution of responses, given the very small trial number (see recommendations for authors). The sample size is also rather small, on the verge of "just right" to meet the required sample size according to the authors' calculations.

      We provide detailed responses to these points in the “Recommendations for The Authors” section, where each of these issues is addressed point by point in response to the specific questions raised.

      Finally, and just as important, the data exists to analyze participants' physiological responses (ECG) after receiving the painful stimulus - this could support the authors' claims about the change in both subjective and objective responses to pain. It could also strengthen the physiological evidence, which is rather weak in terms of its effect. Nevertheless, this is missing from the paper.

      This is indeed an interesting point, and we agree that analyzing physiological responses such as ECG following the painful stimulus could offer additional insights into the objective correlates of pain. However, it is important to clarify that the experiment was not designed to investigate post-stimulus physiological responses. Our primary focus was on the anticipatory processes leading up to the pain event. Notably, in the time window immediately following the stimulus - when one might typically expect to observe physiological changes such as an increase in heart rate - participants were asked to provide subjective ratings of their nociceptive experience. It is therefore not a “clean” interval that would lend itself for measurement, especially as a substantial body of evidence indicates that one’s heart rate is strongly modulated by higher-order cognitive processes, including attentional control, executive functioning, decision-making and action itself (e.g., Forte et al., 2021a; Forte et al., 2021b; Luque-Casado et al., 2016).

      This limitation is particularly important as the induced change in pain ratings by our heart rate manipulation is substantially smaller than the changes in heart rate induced by actual pain (e.g., Loggia et al., 2011). To confirm this for our study, we simply estimated how much change in heart rate is produced by a change in actual stimulus intensity in the initial no feedback phase of our experiment. There, we find that a change between stimulus intensities 2 and 4 induces a NPS change of 32.95 and a heart rate acceleration response of 1.19 (difference in heart rate response relative to baseline, Colloca et al., 2006), d = .52, p < .001. The change of NPS induced by our implicit heart rate manipulation, however, is only a seventh of this (4.81 on the NPS). This means that the expected effect size of heart rate acceleration produced by our manipulation would only be d = .17. A power analysis, using GPower, reveals that a sample size of n = 266 would be required to detect such an effect, if it exists. Thus, while we agree that this is an exciting hypothesis to be tested, it requires a specifically designed study, and a much larger sample than was possible here.

      Colloca, L., Benedetti, F., & Pollo, A. (2006). Repeatability of autonomic responses to pain anticipation and pain stimulation. European Journal of Pain, 10(7), 659-665.

      Forte, G., Morelli, M., & Casagrande, M. (2021a). Heart rate variability and decision-making: Autonomic responses in making decisions. Brain sciences, 11(2), 243.

      Forte, G., Favieri, F., Oliha, E. O., Marotta, A., & Casagrande, M. (2021b). Anxiety and attentional processes: the role of resting heart rate variability. Brain sciences, 11(4), 480.

      Loggia, M. L., Juneau, M., & Bushnell, M. C. (2011). Autonomic responses to heat pain: Heart rate, skin conductance, and their relation to verbal ratings and stimulus intensity. PAIN®, 152(3), 592-598.

      Luque-Casado, A., Perales, J. C., Cárdenas, D., & Sanabria, D. (2016). Heart rate variability and cognitive processing: The autonomic response to task demands. Biological psychology, 113, 83-90

      I have several additional recommendations regarding data analysis (using an ANOVA rather than multiple t-tests, using raw normalized data rather than change scores, questioning the averaging across 3 pain intensities) - which I will detail in the "recommendations for authors" section.

      We provide detailed responses to these points in the “Recommendations for The Authors” section, where each of these issues is addressed point by point in response to the specific questions raised.

      Conclusion:

      To conclude, the authors have shown in their findings that predictions about an upcoming aversive (pain) stimulus - and its subsequent subjective perception - can be altered not only by external expectations, or manipulating the pain cue, as was done in studies so far, but also by manipulating a cue that has fundamental importance to human physiological status, namely heartbeats. Whether this is a manipulation of actual interoception as sensed by the brain is - in my view - left to be proven.

      Still, the paper has important implications in several fields of science ranging from neuroscience prediction-perception research, to pain and placebo research, and may have implications for clinical disorders, as the authors propose. Furthermore, it may lead - either the authors or someone else - to further test this interesting question of manipulation of interoception in a different or more controlled manner.

      I salute the authors for coming up with this interesting question and encourage them to continue and explore ways to study it and related follow-up questions.

      We sincerely thank the reviewer for the thoughtful and encouraging feedback. We hope our responses to your points below convince you a bit more that what we are measuring does indeed capture interoceptive processes, but we of course fully acknowledge that additional measures - for example from brain imaging (or computational modelling, see Reviewer 3) - could further support our interpretation, and highlights in the Limitations and Future directions section.

      Reviewer #2 (Public Review):

      In this manuscript, Parrotta et al. tested whether it is possible to modulate pain perception and heart rate by providing false HR acoustic feedback before administering electrical cutaneous shocks. To this end, they performed two experiments. The first experiment tested whether false HR acoustic feedback alters pain perception and the cardiac anticipatory response. The second experiment tested whether the same perceptual and physiological changes are observed when participants are exposed to a non-interoceptive feedback. The main results of the first experiment showed a modulatory effect for faster HR acoustic feedback on pain intensity, unpleasantness, and cardiac anticipatory response compared to a control (acoustic feedback congruent to the participant's actual HR). However, the results of the second experiment also showed an increase in pain ratings for the faster non-interoceptive acoustic feedback compared to the control condition, with no differences in pain unpleasantness or cardiac response.

      The main strengths of the manuscript are the clarity with which it was written, and its solid theoretical and conceptual framework. The researchers make an in-depth review of predictive processing models to account for the complex experience of pain, and how these models are updated by perceptual and active inference. They follow with an account of how pain expectations modulate physiological responses and draw attention to the fact that most previous studies focus on exteroceptive cues. At this point, they make the link between pain experience and heart rate changes, and introduce their own previous work showing that people may illusorily perceive a higher cardiac frequency when expecting painful stimulation, even though anticipating pain typically goes along with a decrease in HR. From here, they hypothesize that false HR acoustic feedback evokes more intense and unpleasant pain perception, although the actual HR actually decreases due to the orienting cardiac response. Furthermore, they also test the hypothesis that an exteroceptive cue will lead to no (or less) changes in those variables. The discussion of their results is also well-rooted in the existing bibliography, and for the most part, provides a credible account of the findings.

      Thank you for the clear and thoughtful review. We appreciate your positive comments on the manuscript’s clarity, theoretical framework, and interpretation of results.

      The main weaknesses of the manuscript lies in a few choices in methodology and data analysis that hinder the interpretation of the results and the conclusions as they stand.

      The first peculiar choice is the convoluted definition of the outcomes. Specifically, pain intensity and unpleasantness are first normalized and then transformed into variation rates (sic) or deltas, which makes the interpretation of the results unnecessarily complicated. This is also linked to the definitions of the smallest effect of interest (SESOI) in terms of these outcomes, which is crucial to determining the sample size and gauging the differences between conditions. However, the choice of SESOI is not properly justified, and strangely, it changes from the first experiment to the second.

      We thank the reviewer for this important observation. In the revised manuscript, we have made substantial changes and clarifications to address both aspects of this concern: (1) the definition of outcome variables and their normalization, and (2) the definition of the SESOI.

      First, As explained in our response to Reviewer #1, we have revised the analyses and removed the difference-based change scores from the main results, addressing concerns about interpretability. However, we retained the normalization procedure: all variables (heart rate, pain intensity, unpleasantness) are normalized relative to the no-feedback baseline using a standard proportional change formula (X−bX)/bX(X - bX)/bX(X−bX)/bX, where X is the feedback-phase mean and bX is the no-feedback baseline. This is a widely used normalization procedure (e.g., Bartolo et al., 2013; Cecchini et al., 2020). This method controls for interindividual variability by expressing responses relative to each participant’s own baseline. The resulting normalized values are then used directly in all analyses, and not further transformed into deltas.

      To address potential concerns about this baseline correction approach and its interpretability, we also conducted a new set of supplementary analyses (now reported in the supplementary materials) that include the no-feedback condition explicitly in the models, rather than treating it as a baseline for normalization. These models confirm that our main effects are not driven by the choice of normalization and hold even when no-feedback is analyzed as an independent condition. The new analyses and results are now reported in the Supplementary Materials.

      Second, concerning the SESOI values and their justification: The difference in SESOI values between Experiment 1 and Experiment 2 reflects the outcome of sensitivity analyses conducted for each dataset separately, rather than a post-hoc reinterpretation of our results. Specifically, we followed current methodological recommendations (Anderson, Kelley & Maxwell, 2017; Albers & Lakens, 2017; Lakens, 2022), which advise against estimating statistical power based on previously published effect sizes, especially when working with novel paradigms or when effect sizes in the literature may be inflated or imprecise. Instead, we used the sensitivity analysis function in G*Power (Version 3.1) to determine the smallest effect size our design was capable of detecting with high statistical power (90%), given the actual sample size, test type, and alpha level used in each experiment. This is a prospective, design-based estimation rather than a post-hoc analysis of observed effects. The slight differences in SESOI are due to more participants falling below our exclusions criteria in Experiment 2, leading to slightly larger effect sizes that can be detected (d = 0.62 vs d = 0.57). Importantly, both experiments remain adequately powered to detect effects of a size commonly reported in the literature on top-down pain modulation. For instance, Iodice et al. (2019) reported effects of approximately d = 0.7, which is well above the minimum detectable thresholds of our designs.

      We have now clarified the logic in the Participant section of Experiment 1 (193-218).

      Anderson, S. F., Kelley, K., & Maxwell, S. E. (2017). Sample-Size Planning for More Accurate Statistical Power: A Method Adjusting Sample Effect Sizes for Publication Bias and Uncertainty. Psychological Science, 28(11), 1547-1562.

      Bartolo, M., Serrao, M., Gamgebeli, Z., Alpaidze, M., Perrotta, A., Padua, L., Pierelli, F., Nappi, G., & Sandrini, G. (2013). Modulation of the human nociceptive flexion reflex by pleasant and unpleasant odors. PAIN®, 154(10), 2054-2059.

      Cecchini, M. P., Riello, M., Sandri, A., Zanini, A., Fiorio, M., & Tinazzi, M. (2020). Smell and taste dissociations in the modulation of tonic pain perception induced by a capsaicin cream application. European Journal of Pain, 24(10), 1946-1955.

      Lakens, D. (2022). Sample size justification. Collabra: psychology, 8(1), 33267.

      Albers, C., & Lakens, D. (2018). When power analyses based on pilot data are biased: Inaccurate effect size estimators and follow-up bias. Journal of experimental social psychology, 74, 187-195.

      Furthermore, the researchers propose the comparison of faster vs. slower delta HR acoustic feedback throughout the manuscript when the natural comparison is the incongruent vs. the congruent feedback.

      We very much disagree that the natural comparison is congruent vs incongruent feedback. First, please note that congruency simply refers to whether the heartrate feedback was congruent with (i.e., matched) the participant’s heartrate measurements in the no feedback trials, or whether it was incongruent, and was therefore either faster or slower than this baseline frequency. As such, simply comparing congruent with incongruent feedback could only indicate that pain ratings change when the feedback does not match the real heart rate, irrespective of whether it is faster or slower. Such a test can therefore only reveal potential general effects of surprise or salience, when the feedback heartrate does not match the real one.

      We therefore assume that the reviewer specifically refers to the comparison of congruent vs incongruent faster feedback. However, this is not a good test either, as this comparison is, by necessity, confounded with the factor of surprise described above. In other words, if a difference would be found, it would not be clear if it emerges because, as we assume, that faster feedback is represented as an interoceptive signal for threat, or simply because participants are surprised about heartrate feedback that diverges from their real heartrate. Note that even a non-significant result in the analogous comparison of congruent vs incongruent slower feedback would not be able to resolve this confound, as in null hypothesis testing the absence of a significant effect does, per definition, not indicate that there is no effect - only that it could not be detected here.

      Instead, the only possible test of our hypothesis is the one we have designed our experiment around and focussed on with our central t-test: the comparison of incongruent faster with incongruent slower feedback. This keeps any possible effects of surprise/salience from generally altered feedback constant and allows us to test our specific hypothesis: that real heart rates will decrease and pain ratings will increase when receiving false interoceptive feedback about increased compared to decreasing heartrates. Note that this test of faster vs slower feedback is also statistically the most appropriate, as it collapses our prediction onto a single and highest-powered hypothesis test: As faster and slower heartrate feedback are assumed to induce effects in the opposite direction, the effect size of their difference is, per definition, double than the averaged effect size for the two separate tests of faster vs congruent feedback and slower vs congruent feedback.

      That being said, we also included comparisons with the congruent condition in our revised analysis, in line with the reviewer’s suggestion and previous studies. These analyses help explore potential asymmetries in the effect of false feedback. While faster feedback (both interoceptive and exteroceptive) significantly modulated pain relative to congruent feedback, the slower feedback did not, consistent with previous literature showing stronger effects for arousal-increasing cues (e.g., Valins, 1966; Iodice et al., 2019). To address this point, in the revised manuscript we have added a paragraph to the Data Analysis section of Experiment 1 (lines 405-437) to make this logic clearer.

      Valins, S. (1966). Cognitive effects of false heart-rate feedback. Journal of personality and social psychology, 4(4), 400.

      Iodice, P., Porciello, G., Bufalari, I., Barca, L., & Pezzulo, G. (2019). An interoceptive illusion of effort induced by false heart-rate feedback. Proceedings of the National Academy of Sciences, 116(28), 13897-13902.

      This could be influenced by the fact that the faster HR exteroceptive cue in experiment 2 also shows a significant modulatory effect on pain intensity compared to congruent HR feedback, which puts into question the hypothesized differences between interoceptive vs. exteroceptive cues. These results could also be influenced by the specific choice of exteroceptive cue: the researchers imply that the main driver of the effect is the nature of the cue (interoceptive vs. exteroceptive) and not its frequency. However, they attempt to generalize their findings using knocking wood sounds to all possible sounds, but it is possible that some features of these sounds (e.g., auditory roughness or loomingness) could be the drivers behind the observed effects.

      We appreciate this thoughtful comment. We agree that low-level auditory features can potentially introduce confounds in the experimental design, and we acknowledge the importance of distinguishing these factors from the higher-order distinction that is central to our study: whether the sound is perceived as interoceptive (originating from within the body) or exteroceptive (perceived as external). To this end, the knocking sound was chosen not for its specific acoustic profile, but because it lacked bodily relevance, thus allowing us to test whether the same temporal manipulations (faster, congruent, slower) would have different effects depending on whether the cue was interpreted as reflecting an internal bodily state or not. In this context, the exteroceptive cue served as a conceptual contrast rather than an exhaustive control for all auditory dimensions.

      Several aspects of our data make it unlikely that the observed effects are driven by unspecific acoustic characteristics of the sounds used in the exteroceptive and interoceptive experiments (see also our responses to Reviewer 1 and Reviewer 3 who raised similar points).

      First, if the knocking sound had inherent acoustic features that strongly influenced perception or physiological responses, we would expect it to have produced consistent effects across all feedback conditions (Faster, Slower, Congruent), regardless of the interpretive context. This would have manifested as an overall difference between experiments in the between-subjects analyses and in the supplementary mixed-effects models that included Experiment as a fixed factor. Yet, we observed no such main effects in any of our variables. Instead, significant differences emerged only in specific theoretically predicted comparisons (e.g., Faster vs. Slower), and critically, these effects depended on the cue type (interoceptive vs. exteroceptive), suggesting that perceived bodily relevance, rather than a specific acoustic property, was the critical modulator. In other words, any alternative explanation based on acoustic features would need to be able to explain why these acoustic properties would induce not an overall change in heart rate and pain perception (i.e., similarly across slower, faster, and congruent feedback), but the brain’s response to changes in the rate of this feedback – increasing pain ratings and decreasing heartrates for faster relative to slower feedback. We hope you agree that a simple effect of acoustic features would not predict such a sensitivity to the rate with which the sound was played.

      Please refer to our responses to Reviewers 1 and 2 for further aspects of the data, arguing strongly against other features associated with the sounds (e.g., alertness, arousal) could be responsible for the results, as the data pattern again goes in the opposite direction than that predicted by such accounts (e.g., faster heartrate feedback decreased real heartrate, instead of increasing them, as would be expected if accelerated heartrate feedback increased arousal).

      Finally, to further support this interpretation, we refer to neurophysiological evidence showing that heartbeat sounds are not processed as generic auditory signals, but as internal, bodily relevant cues especially when believed to reflect one’s own physiological state. For instance, fMRI research (Kleint et al., 2015) shows that heartbeat sounds engage key interoceptive regions such as the anterior insula and frontal operculum more than acoustically matched control tones. EEG data (Vicentin et al., 2024) showed that faster heartbeat sounds produce stronger alpha suppression over frontocentral areas, suggesting enhanced processing in networks associated with interoceptive attention. Moreover, van Elk et al. (2014) found that heartbeat sounds attenuate the auditory N1 response, a neural signature typically linked to self-generated or predicted bodily signals. These findings consistently demonstrate that heartbeats sounds are processed as interoceptive and self-generated signals, which is in line with our rationale that the critical factor at play concern whether it is semantically perceived as reflecting one’s own bodily state, rather than the physical properties of the sound.

      We now explicitly discuss these issues in the revised Discussion section (lines 740-758).

      Kleint, N. I., Wittchen, H. U., & Lueken, U. (2015). Probing the interoceptive network by listening to heartbeats: an fMRI study. PloS one, 10(7), e0133164.

      van Elk, M., Lenggenhager, B., Heydrich, L., & Blanke, O. (2014). Suppression of the auditory N1-component for heartbeat-related sounds reflects interoceptive predictive coding. Biological psychology, 99, 172-182.

      Vicentin, S., Guglielmi, S., Stramucci, G., Bisiacchi, P., & Cainelli, E. (2024). Listen to the beat: behavioral and neurophysiological correlates of slow and fast heartbeat sounds. International Journal of Psychophysiology, 206, 112447.

      Finally, it is noteworthy that the researchers divided the study into two experiments when it would have been optimal to test all the conditions with the same subjects in a randomized order in a single cross-over experiment to reduce between-subject variability. Taking this into consideration, I believe that the conclusions are only partially supported by the evidence. Despite of the outcome transformations, a clear effect of faster HR acoustic feedback can be observed in the first experiment, which is larger than the proposed exteroceptive counterpart. This work could be of broad interest to pain researchers, particularly those working on predictive coding of pain.

      We appreciate the reviewer’s suggestion regarding a within-subject crossover design. While such a design indeed offers increased statistical power by reducing interindividual variability (Charness, Gneezy, & Kuhn, 2012), we intentionally opted for a between-subjects design due to theoretical and methodological considerations specific to studies involving deceptive feedback. Most importantly, carryover effects are a major concern in deception paradigms. Participants exposed to one type of feedback initially (e.g., interoceptive), and then the other (exteroceptive) would be more likely to develop suspicion or adaptive strategies that would alter their responses. Such expectancy effects could contaminate results in a crossover design, particularly when participants realize that feedback is manipulated. In line with this idea, past studies on false cardiac feedback (e.g., Valins, 1966; Pennebaker & Lightner, 1980) often employed between-subjects or blocked designs to mitigate this risk.

      Pennebaker, J. W., & Lightner, J. M. (1980). Competition of internal and external information in an exercise setting. Journal of personality and social psychology, 39(1), 165.

      Valins, S. (1966). Cognitive effects of false heart-rate feedback. Journal of personality and social psychology, 4(4), 400.

      Reviewer #3 (Public Review):

      In their manuscript titled "Exposure to false cardiac feedback alters pain perception and anticipatory cardiac frequency", Parrotta and colleagues describe an experimental study on the interplay between false heart rate feedback and pain experience in healthy, adult humans. The experimental design is derived from Bayesian perspectives on interoceptive inference. In Experiment 1 (N=34), participants rated the intensity and unpleasantness of an electrical pulse presented to their middle fingers. Participants received auditory cardiac feedback prior to the electrical pulse. This feedback was congruent with the participant's heart rate or manipulated to have a higher or lower frequency than the participant's true heart rate (incongruent high/ low feedback). The authors find heightened ratings of pain intensity and unpleasantness as well as a decreased heart rate in participants who were exposed to the incongruent-high cardiac feedback. Experiment 2 (N=29) is equivalent to Experiment 1 with the exception that non-interoceptive auditory feedback was presented. Here, mean pain intensity and unpleasantness ratings were unaffected by feedback frequency.

      Strengths:

      The authors present interesting experimental data that was derived from modern theoretical accounts of interoceptive inference and pain processing.

      (1) The motivation for the study is well-explained and rooted within the current literature, whereas pain is the result of a multimodal, inferential process. The separation of nociceptive stimulation and pain experience is explained clearly and stringently throughout the text.

      (2) The idea of manipulating pain-related expectations via an internal, instead of an external cue, is very innovative.

      (3) An appropriate control experiment was implemented, where an external (non-physiological) auditory cue with parallel frequency to the cardiac cue was presented.

      (4) The chosen statistical methods are appropriate, albeit averaging may limit the opportunity for mechanistic insight, see weaknesses section.

      (5) The behavioral data, showing increased unpleasantness and intensity ratings after exposure to incongruent-high cardiac feedback, but not exteroceptive high-frequency auditory feedback, is backed up by ECG data. Here, the decrease in heart rate during the incongruent-high condition speaks towards a specific, expectation-induced physiological effect that can be seen as resulting from interoceptive inference.

      We thank the reviewer for their positive feedback. We are glad that the study’s theoretical foundation, innovative design, appropriate control conditions, and convergence of behavioral and physiological data were well received.

      Weaknesses:

      Additional analyses and/ or more extensive discussion are needed to address these limitations:

      (1) I would like to know more about potential learning effects during the study. Is there a significant change in ∆ intensity and ∆ unpleasantness over time; e.g. in early trials compared to later trials? It would be helpful to exclude the alternative explanation that over time, participants learned to interpret the exteroceptive cue more in line with the cardiac cue, and the effect is driven by a lack of learning about the slightly less familiar cue (the exteroceptive cue) in early trials. In other words, the heartbeat-like auditory feedback might be "overlearned", compared to the less naturalistic tone, and more exposure to the less naturalistic cue might rule out any differences between them w.r.t. pain unpleasantness ratings.

      We thank the reviewer for raising this important point. Please note that the repetitions in our task were relatively limited (6 trials per condition), which limits the potential influence of such differential learning effects between experiments. To address this concern, we performed an additional analysis, reported in the Supplementary Materials, using a Linear Mixed-Effects Model approach. This method allowed us to include "Trial" (the rank order of each trial) as a variable to account for potential time-on-task effects such as learning, adaptation, or fatigue (e.g., Möckel et al., 2015). All feedback conditions (no-feedback, congruent, faster, slower) and all stimulus intensity levels were included.

      Specifically, we tested the following models:

      Likert Pain Unpleasantness Ratings ~ Experiment × Feedback × StimInt × Trial + (StimInt + Trial | Subject)

      Numeric Pain Scale of Intensity Ratings ~ Experiment × Feedback × StimInt × Trial + (StimInt + Trial | Subject)

      In both models, no significant interactions involving Trial × Experiment or Trial × Feedback × Experiment were found. Instead, we just find generally larger effects in early trials compared to later ones (Main effect of Trial within each Experiment), similar to other cognitive illusions where repeated exposure diminishes effects. Thus, although some unspecific changes over time may have occurred (e.g., due to general task exposure), these changes did not differ systematically across experimental conditions (interoceptive vs. exteroceptive) or feedback types. However, we are fully aware that the absence of significant higher-order interactions does not conclusively rule out the possibility of learning-related effects. It is possible that our models lacked the statistical power to detect more subtle or complex time-dependent modulations, particularly if such effects differ in magnitude or direction across feedback conditions.

      We report the full description of these analyses and results in the Supplementary materials 1. Cross-experiment analysis (between-subjects model).

      (2) The origin of the difference in Cohen's d (Exp. 1: .57, Exp. 2: .62) and subsequently sample size in the sensitivity analyses remains unclear, it would be helpful to clarify where these values are coming from (are they related to the effects reported in the results? If so, they should be marked as post-hoc analyses).

      Following recommendations (Anderson, Kelley & Maxwell, 2017; Albers &  Lakens, 2017), we do not report theoretical power based on previously reported effect sizes as this neglects uncertainty around effect size measurements, especially for new effects for which no reliable expected effect size estimates can be derived across the literature. Instead, the power analysis is based on a sensitivity analysis, conducted in G*Power (Version 3.1). Importantly, these are not post-hoc analyses, as they are not based on observed effect sizes in our study, but derived a priori. Sensitivity analyses estimate effect sizes that our design is well-powered (90%) to detect (i.e. given target power, sample size, type of test), for the crucial comparison between faster and slower feedback in both experiments (Lakens, 2022). Following recommendations, we also report the smallest effect size this test can in principle detect in our study (SESOI, Lakens, 2022). This yields effect sizes of d = .57 in Experiment 1 and d = .62 in Experiment 2 at 90% power and SESOIs of d = .34 and .37, respectively. Note that values are slightly higher in Experiment 2, as more participants were excluded based on our exclusion criteria. Importantly, detectable effect sizes in both experiments are smaller than reported effect sizes for comparable top-down effects on pain measurements of d = .7 (Iodice et al., 2019).  We have now added more information to the power analysis sections to make this clearer (lines 208-217).

      Albers, C., & Lakens, D. (2018). When power analyses based on pilot data are biased: Inaccurate effect size estimators and follow-up bias. Journal of experimental social psychology, 74, 187-195.

      Anderson, S. F., Kelley, K., & Maxwell, S. E. (2017). Sample-Size Planning for More Accurate Statistical Power: A Method Adjusting Sample Effect Sizes for Publication Bias and Uncertainty. Psychological Science, 28(11), 1547-1562.

      Lakens, D. (2022). Sample size justification. Collabra: psychology, 8(1), 33267.

      (3) As an alternative explanation, it is conceivable that the cardiac cue may have just increased unspecific arousal or attention to a larger extent than the exteroceptive cue. It would be helpful to discuss the role of these rather unspecific mechanisms, and how it may have differed between experiments.

      We thank the reviewer for raising this important point. We agree that, in principle, unspecific mechanisms such as increased arousal or attention driven by cardiac feedback could be an alternative explanation for the observed effects. However, several aspects of our data indicate that this is unlikely:

      (1) No main effect of Experiment on pain ratings:

      If the cardiac feedback had simply increased arousal or attention in a general (non-specific) way, we would expect a main effect of Experiment (i.e., interoceptive vs exteroceptive condition) on pain intensity or unpleasantness ratings, regardless of feedback frequency. However, such a main effect was never observed when we compared between experiments (see between-experiment t-tests in results, and in supplementary analyses). Instead, effects were specific to the manipulation of feedback frequency.

      (2) Heart rate as an arousal measure:

      Heart rate (HR) is a classical physiological index of arousal. If there had been an unspecific increase in arousal in the interoceptive condition, we would expect a main effect of Experiment on HR. However, no such main effect was found. Instead, our HR analyses revealed a significant interaction between feedback and experiment, suggesting that HR changes depended specifically on the feedback manipulation rather than reflecting a general arousal increase.

      (3) Arousal predicts faster, not slower, heart rates

      In Experiment 1, faster interoceptive cardiac feedback led to a slowdown in heartrates both when compared to slower feedback and to congruent cardiac feedback. This is in line with the predicted compensatory response to faster heart rates. In contrast, if faster feedback would have only generally increased arousal, heart rates should have increased instead of decreased, as indicated by several prior studies (Tousignant-Laflamme et al., 2005; Terkelsen et al., 2005; for a review, see Forte et al., 2022), predicting the opposite pattern of responses than was found in Experiment 1.

      Taken together, these findings indicate that the effects observed are unlikely to be driven by unspecific arousal or attention mechanisms, but rather are consistent with feedback-specific modulations, in line with our interoceptive inference framework.

      We have now integrated these considerations in the revised discussion (lines 796-830), and added the relevant between-experiment comparisons to the Results of Experiment 2 and the supplementary analysis.

      Terkelsen, A. J., Mølgaard, H., Hansen, J., Andersen, O. K., & Jensen, T. S. (2005). Acute pain increases heart rate: differential mechanisms during rest and mental stress. Autonomic Neuroscience, 121(1-2), 101-109.

      Tousignant-Laflamme, Y., Rainville, P., & Marchand, S. (2005). Establishing a link between heart rate and pain in healthy subjects: a gender effect. The journal of pain, 6(6), 341-347.

      Forte, G., Troisi, G., Pazzaglia, M., Pascalis, V. D., & Casagrande, M. (2022). Heart rate variability and pain: a systematic review. Brain sciences, 12(2), 153.

      (4) The hypothesis (increased pain intensity with incongruent-high cardiac feedback) should be motivated by some additional literature.

      We thank the reviewer for this helpful suggestion. Please note that the current phenomenon was tested in this experiment for the first time. Therefore, there is no specific prior study that motivated our hypotheses; they were driven theoretically, and derived from our model of interoceptive integration of pain and cardiac perception. The idea that accelerated cardiac feedback (relative to decelerated feedback) will increase pain perception and reduce heart rates is grounded on Embodied Predictive coding frameworks. Accordingly, expectations and signals from different sensory modalities (sensory, proprioceptive, interoceptive) are integrated both to efficiently infer crucial homeostatic and physiological variables, such as hunger, thirst, and, in this case, pain, and regulate the body’s own autonomic responses based on these inferences.

      Within this framework, the concept of an interoceptive schema (Tschantz et al., 2022; Iodice et al., 2019; Parrotta et al., 2024; Schoeller et al., 2022) offers the basis for understanding interoceptive illusions, wherein inferred levels of interoceptive states (i.e., pain) deviate from the actual physiological state. Cardiac signals conveyed by the feedback manipulation act as a misleading prior, shaping the internal generative model of pain. Specifically, an increased heart rate may signal a state of threat, establishing a prior expectation of heightened pain. Building on predictive models of interoception, we predict that this cardiac prior is integrated with interoceptive (i.e., actual nociceptive signal) and exteroceptive inputs (i.e., auditory feedback input), leading to a subjective experience of increased pain even when there is no corresponding increase in the nociceptive input.

      This idea is not completely new, but it is based on our previous findings of an interoceptive cardiac illusion driven by misleading priors about anticipated threat (i.e., pain). Specifically, in Parrotta et al. (2024), we tested whether a common false belief that heart rate increases in response to threat lead to an illusory perception of accelerated cardiac activity when anticipating pain. In two experiments, we asked participants to monitor and report their heartbeat while their ECG was recorded. Participants performed these tasks while visual cues reliably predicted a forthcoming harmless (low-intensity) vs. threatening (high-intensity) cutaneous electrical stimulus. We showed that anticipating a painful vs. harmless stimulus causes participants to report an increased cardiac frequency, which does not reflect their real cardiac response, but the common (false) belief that heart rates would accelerate under threat, reflecting the hypothesised integration of prior expectations and interoceptive inputs when estimating cardiac activity.

      Here we tested the counterpart of such a cardiac illusion. We reasoned that if cardiac interoception is shaped by expectations about pain, then the inverse should also be true: manipulating beliefs about cardiac activity (via cardiac feedback) in the context of pain anticipation should influence the perception of pain. Specifically, we hypothesized that presenting accelerated cardiac feedback would act as a misleading prior, leading to an illusory increase in pain experience, even in the absence of an actual change in nociceptive input.

      Moreover, next to the references already provided in the last version of the manuscript, there is ample prior research that provides more general support for such relationships. Specifically, studies have shown that providing mismatched cardiac feedback in contexts where cardiovascular changes are typically expected (i.e. sexual arousal, Rupp & Wallen, 2008; Valins, 1996; physical exercise, Iodice et al., 2019) can enhance the perception of interoceptive states associated with those experiences. Furthermore, findings that false cardiac feedback can influence emotional experience suggest that it is the conscious perception of physiological arousal, combined with the cognitive interpretation of the stimulus, that plays a key role in shaping emotional responses (Crucian et al., 2000).

      This point is now addressed in the revised Introduction, wherein additional references have been integrated (lines 157-170).

      Crucian, G. P., Hughes, J. D., Barrett, A. M., Williamson, D. J. G., Bauer, R. M., Bowers, D., & Heilman, K. M. (2000). Emotional and physiological responses to false feedback. Cortex, 36(5), 623-647.

      Iodice, P., Porciello, G., Bufalari, I., Barca, L., & Pezzulo, G. (2019). An interoceptive illusion of effort induced by false heart-rate feedback. Proceedings of the National Academy of Sciences, 116(28), 13897-13902.

      Parrotta, E., Bach, P., Perrucci, M. G., Costantini, M., & Ferri, F. (2024). Heart is deceitful above all things: Threat expectancy induces the illusory perception of increased heartrate. Cognition, 245, 105719.

      Rupp, H. A., & Wallen, K. (2008). Sex differences in response to visual sexual stimuli: A review. Archives of sexual behavior, 37(2), 206-218.

      Schoeller, F., Horowitz, A., Maes, P., Jain, A., Reggente, N., Moore, L. C., Trousselard, M., Klein, A., Barca, L., & Pezzulo, G. (2022). Interoceptive technologies for clinical neuroscience.

      Tschantz, A., Barca, L., Maisto, D., Buckley, C. L., Seth, A. K., & Pezzulo, G. (2022). Simulating homeostatic, allostatic and goal-directed forms of interoceptive control using active inference. Biological Psychology, 169, 108266.

      Valins, S. (1966). Cognitive effects of false heart-rate feedback. Journal of personality and social psychology, 4(4), 400.

      (5) The discussion section does not address the study's limitations in a sufficient manner. For example, I would expect a more thorough discussion on the lack of correlation between participant ratings and self-reported bodily awareness and reactivity, as assessed with the BPQ.

      We thank the reviewer for this valuable observation. In response, we have revised the Discussion section to explicitly acknowledge and elaborate on the lack of significant correlations between participants’ pain ratings and their self-reported bodily awareness and reactivity as assessed with the BPQ.

      We now clarify that the inclusion of this questionnaire was exploratory. While it would be theoretically interesting to observe a relationship between subjective pain modulation and individual differences in interoceptive awareness, detecting robust correlations between within-subject experimental effects and between-subjects trait measures such as the BPQ typically requires much larger sample sizes (often exceeding N = 200) due to the inherently low reliability of such cross-level associations (see Hedge, Powell & Sumner, 2018; the “reliability paradox”). As such, the absence of a significant correlation in our study does not undermine the conclusions we draw from our main findings. Future studies with larger samples will be needed to systematically address this question. We now acknowledge this point explicitly in the revised manuscript (lines 501-504; 832-851).

      Hedge, C., Powell, G., & Sumner, P. (2018). The reliability paradox: Why robust cognitive tasks do not produce reliable individual differences. Behavior Research Methods, 50(3), 1166-1186. https://doi.org/10.3758/s13428-017-0935-1

      (a) Some short, additional information on why the authors chose to focus on body awareness and supradiaphragmatic reactivity subscales would be helpful.

      We chose to focus on the body awareness and supradiaphragmatic reactivity subscales because these aspects are closely tied to emotional and physiological processing, particularly in the context of interoception. Body awareness plays a critical role in how individuals perceive and interpret bodily signals, which in turn affects emotional regulation and self-awareness. Supradiaphragmatic reactivity refers specifically to organs located or occurring above the diaphragm (i.e., the muscle that separates the chest cavity from the abdomen), which includes the heart, compared to subdiaphragmatic reactivity subscales further down. Our decision to include these subscales is further motivated by recent research, including the work by Petzschner et al. (2021), which demonstrates that the focus of attention can modulate the heartbeat-evoked potential (HEP), and that this modulation is predicted by participants’ responses on the supradiaphragmatic reactivity subscales. Thus, this subscale, and the more general body awareness scale, allows us to explore the interplay between bodily awareness, physiological reactivity, and emotional processing in our study. We now clarify this point in the revised version of the Methods - Body Perception Questionnaire (lines 384-393).

      (6) The analyses presented in this version of the manuscript allow only limited mechanistic conclusions - a computational model of participants' behavior would be a very strong addition to the paper. While this may be out of the scope of the article, it would be helpful for the reader to discuss the limitations of the presented analyses and outline avenues towards a more mechanistic understanding and analysis of the data. The computational model in [7] might contain some starting ideas.

      Thank you for your valuable feedback. We agree that a computational model would enhance the mechanistic understanding of our findings. While this is beyond the current scope, we now discuss the limitations of our analysis in the Limitations and Future directions section (lines 852-863). Specifically, we acknowledge that future studies could use computational models to better understand the interactions between physiological, cognitive, and perceptual factors.

      Some additional topics were not considered in the first version of the manuscript:

      (1) The possible advantages of a computational model of task behavior should be discussed.

      We agree that a computational model of task behavior could provide several advantages. By formalizing principles of predictive processing and active inference, such a model could generate quantitative predictions about how heart rate (HR) and feedback interact, providing a more precise understanding of their respective contributions to pain modulation. However, this is a first demonstration of a theoretically predicted phenomenon, and computationally modelling it is currently outside the scope of the article. We would be excited to explore this in the future. We have added a brief discussion of these potential advantages in the revised manuscript and suggest that future work could integrate computational modelling to further deepen our understanding of these processes (lines 852-890).

      (2) Across both experiments, there was a slightly larger number of female participants. Research suggests significant sex-related differences in pain processing [1,2]. It would be interesting to see what role this may have played in this data.

      Thank you for your insightful comment. While we acknowledge that sex-related differences in pain processing are well-documented in the literature, we do not have enough participants in our sample to test this in a well-powered way. As such, exploring the role of sex differences in pain perception will need to be addressed in future studies with more balanced samples. It would be interesting if more sensitive individuals, with a more precise representation of pain, also show smaller effects on pain perception. We have noted this point in the revised manuscript (lines 845-851) and suggest that future research could specifically investigate how sex differences might influence the modulation of pain and physiological responses in similar experimental contexts.

      (3) There are a few very relevant papers that come to mind which may be of interest. These sources might be particularly useful when discussing the roadmap towards a mechanistic understanding of the inferential processes underlying the task responses [3,4] and their clinical implications.

      Thank you for highlighting these relevant papers. We appreciate your suggestion and have now cited them in the Limitations and Future directions paragraph (lines 852-863).

      (4) In this version of the paper, we only see plots that illustrate ∆ scores, averaged across pain intensities - to better understand participant responses and the relationship with stimulus intensity, it would be helpful to see a more descriptive plot of task behavior (e.g. stimulus intensity and raw pain ratings)

      To directly address the reviewer’s request, we now provide additional descriptive plots in the supplementary material of the revised manuscript, showing raw pain ratings across different stimulus intensities and feedback conditions. These plots offer a clearer view of participant behavior without averaging across pain levels, helping to better illustrate the relationship between stimulus intensity and reported pain.

      Mogil, J. S. (2020). Qualitative sex differences in pain processing: emerging evidence of a biased literature. Nature Reviews Neuroscience, 21(7), 353-365. https://www.nature.com/articles/s41583-020-0310-6

      Sorge, R. E., & Strath, L. J. (2018). Sex differences in pain responses. Current Opinion in Physiology, 6, 75-81. https://www.sciencedirect.com/science/article/abs/pii/S2468867318300786?via%3Dihub

      Unal, O., Eren, O. C., Alkan, G., Petzschner, F. H., Yao, Y., & Stephan, K. E. (2021). Inference on homeostatic belief precision. Biological Psychology, 165, 108190.

      Allen, M., Levy, A., Parr, T., & Friston, K. J. (2022). In the body's eye: the computational anatomy of interoceptive inference. PLoS Computational Biology, 18(9), e1010490.

      Stephan, K. E., Manjaly, Z. M., Mathys, C. D., Weber, L. A., Paliwal, S., Gard, T., ... & Petzschner, F. H. (2016). Allostatic self-efficacy: A metacognitive theory of dyshomeostasis-induced fatigue and depression. Frontiers in human neuroscience, 10, 550.

      Friston, K. J., Stephan, K. E., Montague, R., & Dolan, R. J. (2014). Computational psychiatry: the brain as a phantastic organ. The Lancet Psychiatry, 1(2), 148-158.

      Eckert, A. L., Pabst, K., & Endres, D. M. (2022). A Bayesian model for chronic pain. Frontiers in Pain Research, 3, 966034.

      We thank the reviewer for highlighting these relevant references which have now been integrated in the revised version of the manuscript.

      Recommendations For The Authors: 

      Reviewer #1 (Recommendations For The Authors):

      At the time I was reviewing this paper, I could not think of a detailed experiment that would answer my biggest concern: Is this a manipulation of the brain's interoceptive data integration, or rather a manipulation of participants' alertness which indirectly influences their pain prediction?

      One incomplete idea that came to mind was delivering this signal in a more "covert" manner (though I am not sure it will suffice), or perhaps correlating the effect size of a participant with their interoceptive abilities, as measured in a different task or through a questionnaire.... Another potential idea is to tell participants that  this is someone else's HR that they hear and see if that changes the results (though requires further thought). I leave it to the authors to think further, and perhaps this is to be answered in a different paper - but if so, I am sorry to say that I do not think the claims can remain as they are now, and the paper will need a revision of its arguments, unfortunately. I urge the authors to ask further questions if my point about the concern was not made clear enough for them to address or contemplate it.

      We thank the reviewer for raising this important point. As detailed in our previous response, this point invites an important clarification regarding the role of cardiac deceleration in threat processing. Rather than serving as an interoceptive input from which the brain infers the likelihood of a forthcoming aversive event, heart rate deceleration is better described as an output of an already ongoing predictive process, as it reflects an allostatic adjustment of the bodily state aimed at minimizing the impact of the predicted perturbation (e.g., pain) and preventing sympathetic overshoot. It would be maladaptive for the brain to use a decelerating heart rate as evidence of impending threat, since this would paradoxically trigger further parasympathetic activation, initiating a potentially destabilizing feedback loop. Conversely, increased heart rate represents an evolutionarily conserved cue for arousal, threat, and pain. Our results therefore align with the idea that the brain treats externally manipulated increases in cardiac signals as congruent with anticipated sympathetic activation, prompting a compensatory autonomic and perceptual response consistent with embodied predictive processing frameworks (e.g., Barrett & Simmons, 2015; Seth, 2013).

      We would also like to re-iterate that our results cannot be explained by general differences induced by the different heart rate sounds relative to the exteroceptive (see also our detailed comments to your point above, and our response to a similar point from Reviewer 3), for three main reasons.

      (1) No main effect of Experiment on pain ratings:

      If the cardiac feedback had simply increased arousal or attention in a general (non-specific) way, we would expect a main effect of Experiment (i.e., interoceptive vs exteroceptive condition) on pain intensity or unpleasantness ratings, regardless of feedback frequency. However, such a main effect was never observed. Instead, effects were specific to the manipulation of feedback frequency.

      (2) Heart rate as an arousal measure:

      Heart rate (HR) is a classical physiological index of arousal. If there had been an unspecific increase in arousal in the interoceptive condition, we would expect a main effect of Experiment on HR. However, no such main effect was found. Instead, our HR analyses revealed a significant interaction between feedback and experiment, suggesting that HR changes depended specifically on the feedback manipulation rather than reflecting a general arousal increase.

      (3) Arousal predicts faster, not slower, heart rates

      In Experiment 1, faster interoceptive cardiac feedback led to a slowdown in heartrates both when compared to slower feedback and to congruent cardiac feedback. This is in line with the predicted compensatory response to faster heart rates. In contrast, if faster feedback would have only generally increased arousal, heart rates should have increased instead of decreased, as indicated by several prior studies (for a review, see Forte et al., 2022), predicting the opposite pattern of responses than was found in Experiment 1.

      Taken together, these findings indicate that the effects observed are unlikely to be driven by unspecific arousal or attention mechanisms, but rather are consistent with feedback-specific modulations, in line with our interoceptive inference framework. We now integrate these considerations in the general discussion (lines 796-830).

      Barrett, L. F., & Simmons, W. K. (2015). Interoceptive predictions in the brain. Nature reviews neuroscience, 16(7), 419-429.

      Forte, G., Troisi, G., Pazzaglia, M., Pascalis, V. D., & Casagrande, M. (2022). Heart rate variability and pain: a systematic review. Brain sciences, 12(2), 153.

      Seth, A. K. (2013). Interoceptive inference, emotion, and the embodied self. Trends in Cognitive Sciences, 17(11), 565-573.

      Additional recommendations:

      Major (in order of importance):

      (1) Number of trials per participant, per condition: as I mentioned, having only 6 trials for each condition is very little. The minimum requirement to accept so few trials would be to show data about the distribution of participants' responses to these trials, both per pain intensity (which was later averaged across - another issue discussed later), and across pain intensities, and see that it allows averaging across and that it is not incredibly variable such that the mean is unreliable.

      We appreciate the reviewer’s concern regarding the limited number of trials per condition. This choice was driven by both theoretical and methodological considerations.

      First, as is common in body illusion paradigms (e.g., the Rubber Hand Illusion, Botvinick & Cohen, 1998; the Full Body Illusion, Ehrsson, 2007; the Cardio-visual full body illusion, Pratviel et al., 2022) only a few trials are typically employed due to the immediate effects these manipulations elicit. Repetition can reduce the strength of the illusion through habituation, increased awareness, or loss of believability.

      Second, the experiment was already quite long (1.5h to 2h per participant) and cognitively demanding. It would not have been feasible to expand it further without compromising data quality due to fatigue, attentional decline, or participant disengagement.

      Third, the need for a large number of trials is more relevant when using implicit measures such as response times or physiological indices, which are typically indirectly related to the psychological constructs of interest. In contrast, explicit ratings are often more sensitive and less noisy, and thus require fewer repetitions to yield reliable effects (e.g., Corneille et al., 2024).

      Importantly, we also addressed your concern analytically. We ran therefore linear mixed-effects model analyses across all dependent variables (See Supplementary materials), with Trial (i.e., the rank order of each trial) included as a predictor to account for potential time-on-task effects such as learning, adaptation, or fatigue (e.g., Möckel et al., 2015). These models captured trial-by-trial variability and allowed us to test for systematic changes in heart rate (HR) and pain ratings including interactions with feedback conditions (e.g., Klieg et al., 2011; Baayen et al., 2010; Ambrosini et al., 2019). The consistent effects of Trial suggest that repetition dampens the illusion, reinforcing our decision to limit the number of exposures.

      In the interoceptive experiment, these analyses revealed a significant Feedback × Trial interaction (F(3, 711.19) = 6.16, p < .001), indicating that the effect of feedback on HR was not constant over time. As we suspected, and in line with other illusion-like effects, the difference between Faster and Slower feedback, which was significant early on (estimate = 1.68 bpm, p = .0007), decreased by mid-session (estimate = 0.69 bpm, p = .0048), and was no longer significant in later trials (estimate = 0.30 bpm, p = .4775). At the end of the session, HR values in the Faster and Slower conditions even numerically converged (Faster: M = 74.4, Slower: M = 74.1), and the non-significant contrast confirms that the difference had effectively vanished (for further details about slope estimation, see Supplementary material).

      The same pattern emerged for pain-unpleasantness ratings. A significant Feedback × Trial interaction (F (3, 675.33) = 3.44, p = .0165) revealed that the difference between Faster and Slower feedback was strongest at the beginning of the session and progressively weakened. Specifically, Faster feedback produced higher unpleasantness than Slower in early trials (estimate= -0.28, p = .0058) and mid-session (estimate = - 0.19, p = .0001), but this contrast was no longer significant in the final trials, wherein all the differences between active feedback conditions vanished (all ps > .55).

      Finally, similar results were yielded for pain intensity ratings. A significant Feedback × Trial interaction (F (3, 669.15) = 9.86, p < .001) showed that the Faster vs Slower difference was greatest at the start of the session and progressively vanished over trials. In early trials Faster feedback exceeded Slower (estimate=-8.33, p = .0001); by mid-session this gap had shrunk to 4.48 points (p < .0001); and in the final trials it was no longer significant (all ps > .94).

      Taken together, our results show that the illusion induced by Faster relative to slower feedback fades with repetition; adding further trials would likely have masked this key effect, confirming the methodological choice to restrict each condition to fewer exposures. To conclude, given that this is the first study to investigate an illusion of pain using heartbeat-based manipulation, we intentionally limited repeated exposures to preserve the integrity of the illusion. The use of mixed models as complementary analyses strengthens the reliability of our conclusions within these necessary design constraints. We now clarify this point in the Procedure paragraph (lines 328-335)

      Ambrosini, E., Peressotti, F., Gennari, M., Benavides-Varela, S., & Montefinese, M. (2023). Aging-related effects on the controlled retrieval of semantic information. Psychology and Aging, 38(3), 219.

      Baayen, R. H., & Milin, P. (2010). Analyzing reaction times. International Journal of Psychological Research, 3(2), 12-28.

      Botvinick, M., & Cohen, J. (1998). Rubber hands ‘feel’touch that eyes see. Nature, 391(6669), 756-756.

      Corneille, O., & Gawronski, B. (2024). Self-reports are better measurement instruments than implicit measures. Nature Reviews Psychology, 3(12), 835–846.

      Ehrsson, H. H. (2007). The experimental induction of out-of-body experiences. Science, 317(5841), 1048-1048.

      Kliegl, R., Wei, P., Dambacher, M., Yan, M., & Zhou, X. (2011). Experimental effects and individual differences in linear mixed models: Estimating the relation of spatial, object, and attraction effects in visual attention. Frontiers in Psychology, 1, 238. https://doi.org/10.3389/fpsyg.2010.00238

      Möckel, T., Beste, C., & Wascher, E. (2015). The effects of time on task in response selection-an ERP study of mental fatigue. Scientific reports, 5(1), 10113.

      Pratviel, Y., Bouni, A., Deschodt-Arsac, V., Larrue, F., & Arsac, L. M. (2022). Avatar embodiment in VR: Are there individual susceptibilities to visuo-tactile or cardio-visual stimulations?. Frontiers in Virtual Reality, 3, 954808.

      (2) Using different pain intensities: what was the purpose of training participants on correctly identifying pain intensities? You state that the aim of having 5 intensities is to cause ambiguity. What is the purpose of making sure participants accurately identify the intensities? Also, why then only 3 intensities were used in the test phase? The rationale for these is lacking.

      We thank the reviewer for raising these important points regarding the use of different pain intensities. The purpose of using five levels during the calibration and training phases was to introduce variability and increase ambiguity in the participants’ sensory experience. This variability aimed to reduce predictability and prevent participants from forming fixed expectations about stimulus intensity, thereby enhancing the plausibility of the illusion. It also helped prevent habituation to a single intensity and made the manipulation subtler and more credible. We had no specific theoretical hypotheses about this manipulation. Regarding the accuracy training, although the paradigm introduced ambiguity, it was important to ensure that participants developed a stable and consistent internal representation of the pain scale. This step was essential to control for individual differences in sensory discrimination and to ensure that illusion effects were not confounded by participants’ inability to reliably distinguish between intensities.

      As for the use of only three pain intensities in the test phase, the rationale was to focus on a manageable subset that still covered a meaningful range of the stimulus spectrum. This approach followed the same logic as Iodice et al. (2019, PNAS), who used five (rather than all seven) intensity levels during their experimental session. Specifically, they excluded the extreme levels (45 W and 125 W) used during baseline, to avoid floor and ceiling effects and to ensure that each test intensity could be paired with both a “slower” and a “faster” feedback from an adjacent level. This would not have been possible at the extremes of the intensity range, where no adjacent level exists in one direction. We adopted the same strategy to preserve the internal consistency and plausibility of our feedback manipulation.

      We further clarified these points in the revised manuscript (lines 336-342).

      Iodice, P., Porciello, G., Bufalari, I., Barca, L., & Pezzulo, G. (2019). An interoceptive illusion of effort induced by false heart-rate feedback. Proceedings of the National Academy of Sciences, 116(28), 13897-13902.

      (3) Averaging across pain intensities: this is, in my opinion, not the best approach as by matching a participant's specific responses to a pain stimulus before and after the manipulation, you can more closely identify changes resulting from the manipulation. Nevertheless, the minimal requirement to do so is to show data of distributions of pain intensities so we know they did not differ between conditions per participant, and in general - as you indicate they were randomly distributed.

      We thank the reviewer for this thoughtful comment. The decision to average across pain intensities in our main analyses was driven by the specific aim of the study: we did not intend to determine at which exact intensity level the illusion was most effective, and the limited number of trials makes such an analysis difficult. Rather, we introduced variability in nociceptive input to increase ambiguity and reduce predictability in the participants’ sensory experience. This variability was critical for enhancing the plausibility of the illusion by preventing participants from forming fixed expectations about stimulus strength. Additionally, using a range of intensities helped to minimize habituation effects and made the feedback manipulation subtler and more credible.

      That said, we appreciate the reviewer’s point that matching specific responses before and after the manipulation at each intensity level could provide further insights into how the illusion operates across varying levels of nociceptive input. We therefore conducted supplementary analyses using linear mixed-effects models in which all three stimulus intensities were included as a continuous fixed factor. This allowed us to examine whether the effects of feedback were intensity-specific or generalized across different levels of stimulation

      These analyses revealed that, in both the interoceptive and exteroceptive experiments, the effect of feedback on pain ratings was significantly modulated by stimulus intensity, as indicated by a Feedback × Stimulus Intensity interaction (Interoceptive: unpleasantness F(3, 672.32)=3.90, p=.0088; intensity ratings F(3, 667.07)=3.46, p=.016. Exteroceptive: unpleasantness F(3, 569.16)=8.21, p<.0001; intensity ratings F(3, 570.65)=3.00, p=.0301). The interaction term confirmed that the impact of feedback varied with stimulus strength, yet the pattern that emerged in each study diverged markedly.

      In the interoceptive experiment, the accelerated-heartbeat feedback (Faster) systematically heightened pain relative to the decelerated version (Slower) at every level of noxious input: for low-intensity trials Faster exceeded Slower by 0.22 ± 0.08 points on the unpleasantness scale (t = 2.84, p = .0094) and by 3.87 ± 1.69 units on the numeric intensity scale (t = 2.29, p = .0448); at the medium intensity the corresponding differences were 0.19 ± 0.05 (t = -4.02, p = .0001) and 4.52 ± 1.06 (t = 4.28, p < .0001); and even at the highest intensity, Faster still surpassed Slower by 0.17 ± 0.08 on unpleasantness (t = 2.21, p = .0326) and by 5.16 ± 1.67 on intensity (t = 3.09, p = .0032). This uniform Faster > Slower pattern indicates that the interoceptive manipulation amplifies perceived pain in a stimulus-independent fashion.

      The exteroceptive control experiment told a different story: the Faster-Slower contrast reached significance only at the most noxious setting (unpleasantness: estimate = 0.24 ± 0.07, t = -3.24, p = .0019; intensity: estimate = - 5.14 ± 1.82, t = 2.83, p = .0072) and was absent at the medium level (intensity , p=0.29; unpleasantness,  p=0.45), while at the lowest level Slower actually produced numerically higher unpleasantness (2.56 versus 2.40) and intensity ratings (44.7 versus 42.2).

      Thus, although both studies show that feedback effects depend on the actual nociceptive level of the stimulus, the results suggest that the faster vs. slower interoceptive feedback manipulation delivers a robust and intensity-invariant enhancement of pain, whereas the exteroceptive cue exerts a sporadic influence that surfaces solely under maximal stimulation.

      These new results are now included in the Supplementary Materials, where we report the detailed analyses for both the Interoceptive and Exteroceptive experiments on the Likert unpleasantness ratings and the numeric pain intensity ratings.

      (4) Sample size: It seems that the sample size was determined after the experiment was conducted, as the required N is identical to the actual N. I would be transparent about that, and say that retrospective sample size analyses support the ability of your sample size to support your claims. In general, a larger sample size than is required is always recommended, and if you were to run another study, I suggest you increase the sample size.

      As also addressed in our responses to your later comments (see our detailed reply regarding the justification of SESOI and power analyses), the power analyses reported here were not post-hoc power analyses based on obtained results. In line with current recommendations (Anderson, Kelley & Maxwell, 2017; Albers & Lakens, 2018), we did not base our analyses on previously reported effect sizes, as these can carry considerable uncertainty, particularly for novel effects where robust estimates are lacking. Instead, we used sensitivity analyses, conducted using the sensitivity analysis function in G*Power (Version 3.1). Sensitivity analyses allow us to report effect sizes that our design was adequately powered (90%) to detect, given the actual sample size, desired power level, and the statistical test used in each experiment (Lakens, 2022). Following further guidance (Lakens, 2022), we also report the smallest effect size of interest (SESOI) that these tests could reliably detect.

      This approach indicated that our design was powered to detect effect sizes of d = 0.57 in Experiment 1 and d = 0.62 in Experiment 2, with corresponding SESOIs of d = 0.34 and d = 0.37, respectively. The slightly higher value in Experiment 2 reflects the greater number of participants excluded (from an equal number originally tested) based on pre-specified criteria. Importantly, both experiments were well-powered to detect effects smaller than those typically reported in similar top-down pain modulation studies, where effect sizes around d = 0.7 have been observed (Iodice et al., 2019).

      We have now clarified this rationale in the revised manuscript, Experiment 1- Methods - Participants (lines 208-217).

      Albers, C., & Lakens, D. (2018). When power analyses based on pilot data are biased: Inaccurate effect size estimators and follow-up bias. Journal of experimental social psychology, 74, 187-195.

      Anderson, S. F., Kelley, K., & Maxwell, S. E. (2017). Sample-Size Planning for More Accurate Statistical Power: A Method Adjusting Sample Effect Sizes for Publication Bias and Uncertainty. Psychological Science, 28(11), 1547-1562. https://doi.org/10.1177/0956797617723724

      Lakens, D. (2022). Sample size justification. Collabra: psychology, 8(1), 33267.

      (5) Analysis: the use of change scores instead of the actual scores is not recommended, as it is a loss of data, but could have been ignored if it didn't have a significant effect on the analyses conducted. Instead of conducting an RM-ANOVA of conditions (faster, slower, normal heartbeats) across participants, finding significant interaction, and then moving on to specific post-hoc paired comparisons between conditions, the authors begin with the change score but then move on to conduct the said paired comparisons without ever anchoring these analyses in an appropriate larger ANOVA. I strongly recommend the use of an ANOVA but if not, the authors would have to correct for multiple comparisons at the minimum.

      We thank the reviewer for their comment regarding the use of change scores. These were originally derived from the difference between the slower and faster feedback conditions relative to the congruent condition. In line with the reviewer’s recommendation, we have now removed these difference-based change scores from the main analysis. The results remain identical. Please note that we have retained the normalization procedure, relative to each participant’s initial baseline in the no feedback trials, as it is widely used in the interoceptive and pain literature (e.g., Bartolo et al., 2013; Cecchini et al., 2020; Riello et al., 2019). This approach helps to control for interindividual variability and baseline differences by expressing each participant’s response relative to their no-feedback baseline. As before, normalization was applied across all dependent variables (heart rate, pain intensity, and pain unpleasantness).

      To address the reviewer’s concern about statistical validity, we now first report a 1-factor repeated-measures ANOVA (Greenhouse-Geisser corrected) for each dependent variable, with feedback condition (slower, congruent, faster) as the within-subject factor.

      These show in each case a significant main effect, which we then follow with planned paired-sample t-tests comparing:

      Faster vs. slower feedback (our main hypothesis, as these manipulations are expected to produce largest, most powerful, test of our hypothesis, see response to Reviewer 3),

      Faster vs. congruent and slower vs. congruent (to test for potential asymmetries, as suggested  by previous false heart rate feedback studies).

      The rationale of these analyses is further discussed in the Data Analysis of Experiment 1 (lines 405-437).

      Although we report the omnibus one-factor RM-ANOVAs to satisfy conventional expectations, we note that such tests are not statistically necessary, nor even optimal, when the research question is fully captured by a priori, theory-driven contrasts. Extensive methodological work shows that, in this situation, going straight to planned contrasts maximises power without inflating Type I error and avoids the logical circularity of first testing an effect one does not predict (e.g., Rosenthal & Rosnow, 1985). In other words, an omnibus F is warranted only when one wishes to protect against unspecified patterns of differences. Here our hypotheses were precise (Faster ≠ Slower; potential asymmetry relative to Congruent), so the planned paired comparisons would have sufficed statistically. We therefore include the RM-ANOVAs solely for readers who expect to see them, but our inferential conclusions rest on the theoretically motivated contrasts.

      Rosenthal, R., & Rosnow, R. L. (1985). Contrast analysis. New York: Cambridge.

      (6) Correlations: were there correlations between subjects' own heartbeats (which are considered a predictive cue) and pain perceptions? This is critical to show that the two are in fact related.

      We thank the reviewer for this thoughtful suggestion. While we agree that testing for a correlation between anticipatory heart rate responses and subjective pain ratings is theoretically relevant. However, we have not conducted this analysis in the current manuscript, as our study was not designed or powered to reliably detect such individual differences. As noted by Hedge, Powell, and Sumner (2018), robust within-subject experimental designs tend to minimize between-subject variability in order to detect clear experimental effects. This reduction in variance at the between-subject level limits the reliability of correlational analyses involving trait-like or individual response patterns. This issue, known as the reliability paradox, highlights that measures showing robust within-subject effects may not show stable individual differences, and therefore correlations with other individual-level variables (like subjective ratings used here) require much larger samples to produce interpretable results than available here (and commonly used in the literature), typically more than 200 participants. For these reasons, we believe that running such an analysis in our current dataset would not yield informative results and could be misleading.

      We now explicitly acknowledge this point in the revised version of the manuscript (Limitations and future directions, lines 832-851) and suggest that future studies specifically designed to examine individual variability in anticipatory physiological responses and pain perception would be better suited to address this question.

      Hedge, C., Powell, G., & Sumner, P. (2018). The reliability paradox: Why robust cognitive tasks do not produce reliable individual differences. Behavior Research Methods, 50(3), 1166-1186. https://doi.org/10.3758/s13428-017-0935-1

      (7) The direct comparison between studies is great! and finally the use of ANOVA - but why without the appropriate post-hoc tests to support the bold claims in lines 542-544? This is needed. Same for 556-558.

      We apologize if our writing was not clear here, but the result of the ANOVAs fully warrants the claims in 542-544 (now lines 616-618) and 556-558 (now lines 601-603).

      In a 2x2 design, the interaction term is mathematically identical to comparing the difference induced by Factor 1 at one level of Factor 2 with the same difference induced at the other level of Factor 2. In our 2x2 analysis with the factors Experiment (Cardiac feedback, Exteroceptive feedback - between participants) and Feedback Frequency (faster, slower - within participants), the interaction therefore directly tests whether the effect of Feedback frequency differs statistically (i.e., is larger or smaller) in the participants in the interoceptive and exteroceptive experiments. Thus, the conclusion that “faster feedback affected the perceptual bias more strongly in the Experiment 1 than in Experiment 2” captures the outcome of the significant interaction exactly. Indeed, this test would be statistically equivalent (and would produce identical p values) to a simple between-group t-test between each participant’s difference between the faster and slower feedback in the interoceptive group and the analogous differences between the faster and slower feedback in the exteroceptive group, as illustrated in standard examples of factorial analysis (see, e.g., Maxwell, Delaney and Kelley, 2018).

      Please note that, for the above reason, mathematically the conclusion of larger effects in one experiment than the other is licensed by the significant interaction even without follow-up t-tests. However, if the reader would like to see these tests, they are simply the main analysis results reported in each of the two experiment sections, where significant (t-test) differences between faster and slower feedback were induced with interoceptive cues (Experiment 1) but not exteroceptive cues (Experiment 2). Reporting them in the between-experiment comparison section again would therefore be redundant.

      To avoid this lack of clarity, we have now re-written the results section of each experiment. First, as noted above, we now precede our main hypothesis test - the crucial t-test comparing heartrate and pain ratings after faster vs slower feedback - with an ANOVA including all three levels (faster, congruent, slower feedback). Moreover, we removed the separate between-experiment comparison section. Instead, in the Result section of the exteroceptive Experiment 2, we now directly compare the (absent or reversed) effects of faster vs slower feedback directly, with a between-groups t-test, with the present effects in the interoceptive Experiment 1. This shows conclusively, and hopefully more clearly, that the effects in both experiments differ. We hope that this makes the logic of our analyses clearer.

      Maxwell, S. E., Delaney, H. D., & Kelley, K. (2017). Designing experiments and analyzing data: A model comparison perspective. Routledge.

      (8) The discussion is missing a limitation paragraph.

      Thank you for the suggestion. We have now added a dedicated limitations paragraph in the Discussion section (lines 832-890).

      Additional recommendations:

      Minor (chronological order):

      (1) Sample size calculations for both experiments: what was the effect size based on? A citation or further information is needed. Also, clarify why the effect size differed between the two experiments.

      Please see above

      (2) "Participants were asked to either not drink coffee or smoke cigarettes" - either is implying that one of the two was asked. I suspect it is redundant as both were not permitted.

      The intention was to restrict both behaviors, so we have corrected the sentence to clarify that participants were asked not to drink coffee or smoke cigarettes before the session.

      (3) Normalization of ECG - what exactly was normalized, namely what measure of the ECG?

      The normalized measure was the heart rate, expressed in beats per minute (bpm). We now clarify this in the Data Analysis section of Experiment 1 (Measures of the heart rate recorded with the ECG (beats per minute) in the feedback phase were normalized)

      (4) Line 360: "Mean Δ pain unpleasantness ratings were analysed analogously" - this is unclear, if already described in methods then should be removed here, if not - should be further explained here.

      Thank you for your observation. We are no longer using change scores.

      (5) Lines 418-420: "Consequently, perceptual and cardiac modulations associated with the feedback manipulation should be reduced over the exposure to the faster exteroceptive sound." - why reduced and not unchanged? I didn't follow the logic.

      We chose the term “reduced” rather than “unchanged” to remain cautious in our interpretation. Statistically, the absence of a significant effect in one experiment does not necessarily mean that no effect is present; it simply means we did not detect one. For this reason, we avoided using language that would suggest complete absence of modulation. It also more closely matches the results of the between experiment comparisons that we report in the Result section of Experiment 2, which can in principle only show that the effect in Experiment 2 was smaller than that of Experiment 1, not that it was absent. Even the TOST analysis that we utilize to show the absence of an effect can only show that any effect that is present is smaller than we could reasonably expect to detect with our experimental design, not its complete absence.

      Also, on a theoretical level, pain is a complex, multidimensional experience influenced not only by sensory input but also by cognitive, emotional, social and expectancy factors. For this reason, we considered it important to remain open to the possibility that other mechanisms beyond the misleading cardiac prior induced by the feedback might have contributed to the observed effects. If such other influences had contributed to the induced differences between faster and slower feedback in Experiment 1, some remainder of this difference could have been observed in Experiment 2 as well.

      Thus, for both statistical and theoretical reasons, we were careful to predict a reduction of the crucial difference, not its complete elimination. However, to warrant the possibility that effects could be completely eliminated we now write that “perceptual and cardiac modulations associated with the feedback manipulation should be reduced or eliminated with exteroceptive feedback”

      (6) Study 2 generation of feedback - was this again tailored per participants (25% above and beyond their own HR at baseline + gradually increasing or decreasing), or identical for everyone?

      Yes, in Study 2, the generation of feedback was tailored to each participant, mirroring the procedure or Experiment 1. Specifically, the feedback was set to be 25% above or below their baseline heart rate, with the feedback gradually increasing or decreasing. This individualized approach ensured that each participant experienced feedback relative to their own baseline heart rate. We now clarify this in the Methods section (lines 306-318).

      (7) I did not follow why we need the TOST and how to interpret its results.

      We thank the reviewer for raising this important point. In classical null hypothesis significance testing (NHST), a non-significant p-value (e.g., p > .05) only indicates that we failed to find a statistically significant difference, not that there is no difference. It therefore does not allow us to conclude that two conditions are equivalent – only that we cannot confidently say they are different. In our case, to support the claim that exteroceptive feedback does not induce perceptual or physiological changes (unlike interoceptive feedback), we needed a method to test for the absence of a meaningful effect, not just the absence of a statistically detectable one.

      The TOST (Two One-Sided Tests) procedure reverses the logic of NHST by testing whether the observed effect falls within a predefined equivalence interval, called the smallest effect size of interest (SESOI) that is in principle measurable with our design parameters (e.g., type of test, number of participants). This approach is necessary when the goal is not to detect a difference, but rather to demonstrate that an observed effect is so small that it can be considered negligible – or at the least smaller than we could in principle expect to observe in the given experiment. We used the TOST procedure in Experiment 2 to test for statistical equivalence between the effects of faster and slower exteroceptive feedback on pain ratings and heart rate.

      We hope that the clearer explanation now provided in data analysis of Experiment 2 section (lines 5589-563) fully addresses the reviewer’s concern.

      (8) Lines 492-3: authors say TOST significant, while p value = 0.065

      We thank the reviewer for spotting this inconsistency. The discrepancy was due to a typographical error in the initial manuscript. During the revision of the paper, we rechecked and fully recomputed all TOST analyses, and the results have now been corrected throughout the manuscript to accurately reflect the statistical outcomes. In particular, for the comparison of heart rate between faster and slower exteroceptive feedback in Experiment 2, the corrected TOST analysis now shows a significant equivalence, with the observed effect size being d = -0.19 (90% CI [-0.36, -0.03]) and both one-sided tests yielding p = .025 and p < .001. These updated results are reported in the revised Results section.

      Reviewer #2 (Recommendations For The Authors):

      I would suggest the authors revise their definition of pain in the introduction, since it is not always a protective experience. The new IASP definition specifically takes this into consideration.

      We thank the reviewer for this suggestion. We have updated the definition of pain in the Introduction (lines 2-4) to align with the most recent IASP definition (2020), which characterizes pain as “an unpleasant sensory and emotional experience associated with, or resembling that associated with, actual or potential tissue damage” (lines 51-53).

      The work on exteroceptive cues does not necessarily neglect the role of interoceptive sources of information, although it is true that it has been comparatively less studied. I suggest rephrasing this sentence to reflect this.

      We thank the reviewer for pointing out this important nuance. We agree that studies employing exteroceptive cues to modulate pain perception do not necessarily neglect the role of interoceptive sources, even though these are not always the primary focus of investigation. Our intention was not to imply a strict dichotomy, but rather to highlight that interoceptive mechanisms have been comparatively under-investigated. We have revised the sentence in the Introduction accordingly to better reflect this perspective (Introduction, lines 110-112, “Although interoceptive processes may have contributed to the observed effects, these studies did not specifically target interoceptive sources of information within the inferential process.”).

      The last paragraph of the introduction (lines 158-164) contains generalizations beyond what can be supported by the data and the results, about the generation of predictive processes and the origins of these predictions. The statements regarding the understanding of pain-related pathologies in terms of chronic aberrant predictions in the context of this study are also unwarranted.

      We have deleted this paragraph now.

      I could not find the study registration (at least in clinicaltrials.gov). This is curious considering that the hypothesis and the experimental design seem in principle well thought out, and a study pre-registration improves the credibility of the research (Nosek et al., 2018). I also find the choice for the smallest effect of interest (SESOI) odd. Besides the unnecessary variable transformations (more on that later), there is no justification for why that particular SESOI was chosen, or why it changes between experiments (Dienes, 2021; King, 2011), which makes the choice look arbitrary. The SESOI is a fundamental component of a priori power analysis (Lakens, 2022), and without rationale and preregistration, it is impossible to tell whether this is a case of SPARKing or not (Sasaki & Yamada, 2023).

      We acknowledge that the study was not preregistered. Although our hypotheses and design were developed a priori and informed by established theoretical frameworks, the lack of formal preregistration is a limitation.

      The SESOI values for Experiments 1 and 2 were derived from sensitivity analyses based on the fixed design parameters (type of test, number of participants, alpha level) of our study, not from any post-hoc interpretation based on observed results - they can therefore not be a case of SPARKing. Following current recommendations (Anderson, Kelley & Maxwell, 2017; Albers & Lakens, 2017; Lakens, 2022), we avoided basing power estimates on published effect sizes, as no such values exist for in novel paradigms, and are typically inflated due to publication and other biases. Instead, sensitivity analyses (using G*Power, v 3.1) allows us to calculate, prospectively, the smallest effect each design could detect with 90 % power, given the actual sample size, test type, and α level. Because more participants were excluded in Experiment 2, this design can detect slightly larger effects (d = 0.62) than Experiment 1 (d = 0.57). Please note that both studies therefore remain well-powered to capture effects of the magnitude typically reported in previous research using feedback manipulations to explore interoceptive illusions (e.g., Iodice et al., 2019, d ≈ 0.7).

      We have added this clarification to the Participants section of Experiment 1 (Lines 208-217).

      Anderson, S. F., Kelley, K., & Maxwell, S. E. (2017). Sample-Size Planning for More Accurate Statistical Power: A Method Adjusting Sample Effect Sizes for Publication Bias and Uncertainty. Psychological Science, 28(11), 1547-1562.

      Lakens, D. (2022). Sample size justification. Collabra: psychology, 8(1), 33267.

      Albers, C., & Lakens, D. (2018). When power analyses based on pilot data are biased: Inaccurate effect size estimators and follow-up bias. Journal of experimental social psychology, 74, 187-195.

      In the Apparatus subsection, it is stated that the intensity of the electrical stimuli was fixed at 2 ms. I believe the authors refer to the duration of the stimulus, not its intensity.

      You are right, thank you for pointing that out. The text should refer to the duration of the electrical stimulus, not its intensity. We have corrected this wording in the revised manuscript to avoid confusion.

      It would be interesting to report (in graphical form) the stimulation intensities corresponding to the calibration procedure for the five different pain levels identified for all subjects.

      That's a good suggestion. We have included a supplementary figure showing the stimulation intensities corresponding to the five individually calibrated pain levels across all participants (Supplementary Figure 11.)

      It is questionable that researchers state that "pain and unpleasantness should be rated independently" but then the first level of the Likert scale for unpleasantness is "1=no pain". This is particularly relevant since simulation (and specifically electrical stimulation) can be unpleasant but non-painful at the same time. Since the experiments were already performed, the researchers should at least explain this choice.

      Thank you for raising this point. You are right in that the label of “no pain” in the pain unpleasantness scale was not ideal, and we now acknowledge this in the text (lines 886-890). Please note that this was always the second rating that participants gave (after pain intensity), and the strongest results come from this first rating.

      Discussion.

      I did not find in the manuscript the rationale for varying the frequency of the heart rate by 25% (instead of any other arbitrary quantity).

      We thank the Reviewer for this observation, which prompted us to clarify the rationale behind our choice of a ±25% manipulation of heart rate feedback. False feedback paradigms have historically relied on a variety of approaches to modulate perceived cardiac signals. Some studies have adopted non-individualised values, using fixed frequencies (e.g., 60 or 110 bpm) to evoke states of calm or arousal, independently of participants’ actual physiology (Valins, 1966; Shahidi & Baluch, 1991; Crucian et al., 2000; Tajadura-Jiménez et al., 2008). Others have used the participant’s real-time heart rate as a basis, introducing accelerations or decelerations without applying a specific percentage transformation (e.g., Iodice et al., 2019). More recently, a growing body of work has employed percentage-based alterations of the instantaneous heart rate, offering a controlled and participant-specific manipulation. These include studies using −20% (Azevedo et al., 2017), ±30% (Dey et al., 2018), and even ±50% (Gray et al., 2007).

      These different methodologies - non-individualised, absolute, or proportionally scaled - have all been shown to effectively modulate subjective and physiological responses. They suggest that the impact of false feedback does not depend on a single fixed method, but rather on the plausibility and salience of the manipulation within the context of the task. We chose to apply a ±25% variation because it falls well within the most commonly used range and strikes a balance between producing a detectable effect and maintaining the illusion of physiological realism. The magnitude is conceptually justified as being large enough to shape interoceptive and emotional experience (as shown by Azevedo and Dey), yet small enough to avoid implausible or disruptive alterations, such as those approaching ±50%. We have now clarified this rationale in the revised Procedure paragraph of Experiment 1 (lines 306-318).

      T. Azevedo, R., Bennett, N., Bilicki, A., Hooper, J., Markopoulou, F., & Tsakiris, M. (2017). The calming effect of a new wearable device during the anticipation of public speech. Scientific reports, 7(1), 2285.

      Crucian, G. P., Hughes, J. D., Barrett, A. M., Williamson, D. J. G., Bauer, R. M., Bowers, D., & Heilman, K. M. (2000). Emotional and physiological responses to false feedback. Cortex, 36(5), 623-647.

      Dey, A., Chen, H., Billinghurst, M., & Lindeman, R. W. (2018, October). Effects of manipulating physiological feedback in immersive virtual environments. In Proceedings of the 2018 Annual Symposium on Computer-Human Interaction in Play (pp. 101-111).

      Gray, M. A., Harrison, N. A., Wiens, S., & Critchley, H. D. (2007). Modulation of emotional appraisal by false physiological feedback during fMRI. PLoS one, 2(6), e546.

      Shahidi, S., & Baluch, B. (1991). False heart-rate feedback, social anxiety and self-attribution of embarrassment. Psychological reports, 69(3), 1024-1026.

      Tajadura-Jiménez, A., Väljamäe, A., & Västfjäll, D. (2008). Self-representation in mediated environments: the experience of emotions modulated by auditory-vibrotactile heartbeat. CyberPsychology & Behavior, 11(1), 33-38.

      Valins, S. (1966). Cognitive effects of false heart-rate feedback. Journal of personality and social psychology, 4(4), 400.

      The researchers state that pain ratings collected in the feedback phase were normalized to the no-feedback phase to control for inter-individual variability in pain perception, as established by previous research. They cite three studies involving smell and taste, of which the last two contain the same normalization presented in this study. However, unlike these studies, the outcomes here require no normalization whatsoever, because there should be no (or very little) inter-individual variability in pain intensity ratings. Indeed, pain intensity ratings in this study are anchored to 30, 50, and 70 / 100 as a condition of the experimental design. The researchers go to extreme lengths to ensure this is the case, by adjusting stimulation intensities until at least 75% of stimulation intensities are correctly matched to their pain ratings counterpart in the pre-experiment procedure. In other words, inter-individual variability in this study is in stimulation intensities, and not pain intensity ratings. Even if it could be argued that pain unpleasantness and heart rate still need to account for inter-individual variability, the best way to do this is by using the baseline (no-feedback) measures as covariates in a mixed linear model. Another advantage of this approach is that all the effects can be described in terms of the original scales and are readily understandable, and post hoc tests between levels can be corrected for multiple comparisons. On the contrary, the familywise error rate for the comparisons between conditions in the current analysis is larger than 5% (since there is a "main" paired t-test and additional "simple" tests).

      We disagree that there is little to no variability in the no feedback phase. Participants were tested in their ability to distinguish intensities in an initial pre-experiment calibration phase. In the no feedback phase, participants rated the pain stimuli in the full experimental context.

      In the pre-experiment calibration phase, participants were tested only once in their ability to match five electrical‐stimulation levels to the 0-100 NPS scale, before any feedback manipulation started. During this pre-experiment calibration we required that each level was classified correctly on ≥ 75 % of the four repetitions; “correct” meant falling within ± 5 NPS units of the target anchor (e.g., a response of 25–35 was accepted for the 30/100 anchor). This procedure served one purpose only: to make sure that every participant entered the main experiment with three unambiguously distinguishable stimulation levels (30 / 50 / 70). We integrated this point in the revised manuscript lines 263-270.

      Once the real task began, the context changed: shocks are unpredictable, attention is drawn to the heartbeat, and participants must judge both intensity and unpleasantness. In this full experimental setting the no-feedback block indeed shows considerable variability, even for the pain intensity ratings. Participants mean rating on the NPS scale was 46.4, with a standard deviation of 11.9 - thus participants vary quite strongly in their mean ratings (range 14.5 to 70). Moreover, while all participants show a positive correlation between actual intensities and their ratings (i.e., they rate the higher intensities as more intense than the lower ones), they vary in how much of the scale they use, with differences between reported highest and lowest intensities ranging between 8 and 91, for the participants showing the smallest and largest differences, respectively.

      Thus, while we simplified the analysis to remove the difference scoring relative to the congruent trials and now use these congruent trials as an additional condition in the analysis, we retained the normalisation procedure to account for the in-fact-existing between-participant variability, and ensure consistency with prior research (Bartolo et al., 2013; Cecchini et al., 2020; Riello et al., 2019) and our a priori analysis plan.

      However, to ensure we fully address your point here (and the other reviewers’ points about potential additional factors affecting the effects, like trial number and stimulus intensity), we also report an additional linear mixed-effects model analysis without normalization. It includes every feedback level as condition (No-Feedback, Congruent, Slower, Faster), plus additional predictors for actual stimulus intensity and trial rank within the experiment (as suggested by the other reviewers). This confirms that all relevant results remain intact once baseline and congruent trials are explicitly included in the model.

      In brief, cross‐experiment analyses demonstrated that the Faster vs Slower contrast was markedly larger when the feedback was interoceptive than when it was exteroceptive. This held for heart-rate deceleration (b = 0.94 bpm, p = .005), for increases in unpleasantness (b = -0.16 Likert units, p = .015), and in pain-intensity ratings (b = -3.27 NPS points, p = .037).

      These findings were then further confirmed by within-experiment analyses. Within the interoceptive experiment, the mixed-model on raw scores replicated every original effect: heart rate was lower after Faster than Slower feedback (estimate = –0.69 bpm, p = .005); unpleasantness was higher after Faster than Slower feedback (estimate = 0.19, p < .001); pain-intensity rose after Faster versus Slower (estimate=-4.285, p < .001). In the exteroceptive experiment, however, none of these Faster–Slower contrasts reached significance for heart rate (all ps > .33), unpleasantness (all ps > .43) or intensity (all ps > .10).  Because these effects remain significant even with No-Feedback and Congruent trials explicitly included in the model and vanish under exteroceptive control, the supplementary, non-normalised analyses confirm that the faster vs. slower interoceptive feedback uniquely lowers anticipatory heart rate while amplifying both intensity and unpleasantness of pain, independent of data transformation or reference conditions.  Please see Supplementary analyses for further details.

      Bartolo, M., Serrao, M., Gamgebeli, Z., Alpaidze, M., Perrotta, A., Padua, L., Pierelli, F., Nappi, G., & Sandrini, G. (2013). Modulation of the human nociceptive flexion reflex by pleasant and unpleasant odors. PAIN®, 154(10), 2054-2059.

      Cecchini, M. P., Riello, M., Sandri, A., Zanini, A., Fiorio, M., & Tinazzi, M. (2020). Smell and taste dissociations in the modulation of tonic pain perception induced by a capsaicin cream application. European Journal of Pain, 24(10), 1946-1955.

      Riello, M., Cecchini, M. P., Zanini, A., Di Chiappari, M., Tinazzi, M., & Fiorio, M. (2019). Perception of phasic pain is modulated by smell and taste. European Journal of Pain, 23(10), 1790-1800.

      I could initially not find a rationale for bringing upfront the comparison between faster vs. slower HR acoustic feedback when in principle the intuitive comparisons would be faster vs. congruent and slower vs. congruent feedback. This is even more relevant considering that in the proposed main comparison, the congruent feedback does not play a role: since Δ outcomes are calculated as (faster - congruent) and (slower - congruent), a paired t-test between Δ faster and Δ slower outcomes equals (faster - congruent) - (slower - congruent) = (faster - slower). I later realized that the statistical comparison (paired t-test) of pain intensity ratings of faster vs. slower acoustic feedback is significant in experiment 1 but not in experiment 2, which in principle would support the argument that interoceptive, but not exteroceptive, feedback modulates pain perception. However, the "simple" t-tests show that faster feedback modulates pain perception in both experiments, although the effect is larger in experiment 1 (interoceptive feedback) compared to experiment 2 (exteroceptive feedback).

      The comparison between faster and slower feedback is indeed crucial, and we regret not having made this clearer in the first version of the manuscript. As noted in our response to your point in the public review, this comparison is both statistically most powerful, and theoretically the most appropriate, as it controls for any influence of salience or surprise when heart rates deviate (in either direction) from what is expected. It therefore provides a clean measure of how much accelerated heartrate affects pain perception and physiological response, relative to an equal change in the opposite direction. However, as noted above, in the new version of the manuscript we have now removed the analysis via difference scores, and directly compared all three relevant conditions (faster, congruent, slower), first via an ANOVA and then with follow-up planned t-tests.

      Please refer to our previous response for further details (i.e., Furthermore, the researchers propose the comparison of faster vs. slower delta HR acoustic feedback throughout the manuscript when the natural comparison is the incongruent vs. the congruent feedback [..]).

      The design of experiment two involves the selection of knocking wood sounds to act as exteroceptive acoustic feedback. Since the purpose is to test whether sound affects pain intensity ratings, unpleasantness, and heart rate, it would have made sense to choose sounds that would be more likely to elicit such changes, e.g. Taffou et al. (2021), Chen & Wang (2022), Zhou et al. (2022), Tajadura-Jiménez et al. (2010). Whereas I acknowledge that there is a difference in effect sizes between experiment 1 and experiment 2 for the faster acoustic feedback, I am not fully convinced that this difference is due to the nature of the feedback (interoceptive vs. exteroceptive), since a similar difference could arguably be obtained by exteroceptive sound with looming or rough qualities. Since the experiment was already carried out and this hypothesis cannot be tested, I suggest that the researchers moderate the inferences made in the Discussion regarding these results.

      Please refer to our previous response for a previous detailed answer to this point in the Public Review (i.e., This could be influenced by the fact that the faster HR exteroceptive cue in experiment 2 also shows a significant modulatory effect [..]). As we describe there, we see little grounds to suspect such a non-specific influence of acoustic parameters, as it is specifically the sensitivity to the change in heart rate (faster vs slower) that is affected by our between-experiment manipulation, not the overall response to the different exteroceptive or interoceptive sounds. Moreover, the specific change induced by the faster interoceptive feedback - a heartrate deceleration - is not consistent with a change in arousal or alertness (which would have predicted an increase in heartrate with increasing arousal). See also Discussion-Accounting for general unspecific contributions.

      Additionally, the fact that no significant effects were found for unpleasantness ratings or heart rate (absence of evidence) should not be taken as proof that faster exteroceptive feedback does not induce an effect on these outcomes (evidence of absence). In this case, it could be that there is actually no effect on these variables, or that the experiment was not sufficiently powered to detect those effects. This would depend on the SESOIs for these variables, which as stated before, was not properly justified.

      We very much agree that the absence of significant effects should not be interpreted as definitive evidence of absence. Indeed, we were careful not to overinterpret the null findings for heart rate and unpleasantness ratings, and we conducted additional analyses to clarify their interpretation. First, the TOST analysis shows that any effects in Experiment 2 are (significantly) smaller than the smallest effect size that can possibly be detected in our experiment, given the experimental parameters (number of participants, type of test, alpha level). Second, and more importantly, we run between-experiments comparisons (see Results Experiment 2, and Supplementary materials, Cross-experiment analysis between-subjects model) of the crucial difference in the changes induced by faster and slower feedback. This showed that the differences were larger with interoceptive (Experiment 1) than exteroceptive cues (Experiment 2). Thus, even if a smaller than is in principle detectable effect is induced by the exteroceptive cues in Experiment 2, it is smaller than with interoceptive cues in Experiment 1.

      To ensure we fully address this point, we have now simplified our main analysis (main manuscript), replicated it with a different analysis (Supplementary material), we motivate more clearly (Methods Experiment 1), why the comparison between faster and slower feedback is crucial, and we make clearer that the difference between these conditions is larger in Experiment 1 than Experiment 2 (Results Experiment 2). Moreover, we went through the manuscript and ensured that our wording does not over-interpret the absence of effects in Experiment 2, as an absence of a difference.

      The section "Additional comparison analysis between experiments" encompasses in a way all possible comparisons between levels of the different factors in both experiments. My original suggestion regarding the use of a mixed linear model with covariates is still valid for this case. This analysis also brings into question another aspect of the experimental design: what is the rationale for dividing the study into two experiments, considering that variability and confounding factors would have been much better controlled in a single experimental session that includes all conditions?

      We thank the reviewer for their comment. We would like to note, first, that the between-experiment analyses did not encompass all possible comparisons between levels, as it just included faster and slower feedback for the within-experiment comparison Instead, they focus on the specific interaction between faster and slower feedback on the one hand, and interoceptive vs exteroceptive cues on the other. This interaction essentially compares, for each dependent measure (HR, pain unpleasantness, pain intensity), the difference between faster and slower feedback in Experiment 1 with that the same difference in Experiment 2 (and would produce identical p values to a between-experiment t-test). The significant interactions therefore indicate larger effects of interoceptive cues than exteroceptive ones for each of the measures. To make this clearer, we have now exchanged the analysis with between-experiment t-tests of the difference between faster and slower feedback for each measure (Results Experiment 2), producing identical results. Moreover, as suggested, we also now report linear mixed model analyses (see Supplementary Materials), which provide a comprehensive comparison across experiments.

      Regarding the experimental design, we appreciate the reviewer’s suggestion regarding a within-subject crossover design. While such an approach indeed offers greater statistical power by reducing interindividual variability (Charness, Gneezy, & Kuhn, 2012), we intentionally chose a between-subjects design due to theoretical and methodological considerations specific to deceptive feedback paradigms. First, carryover effects are a major concern in deception studies. Participants exposed to one type of feedback could develop suspicion or adaptive strategies that would alter their responses in subsequent conditions (Martin & Sayette, 1993). Expectancy effects could thus contaminate results in a crossover design, particularly when feedback manipulation becomes apparent. In line with this idea, past studies on false cardiac feedback (e.g., Valins, 1966; Pennebaker & Lightner, 1980) often employed between-subjects or blocked designs to maintain the ecological validity of the illusion.

      Charness, G., Gneezy, U., & Kuhn, M. A. (2012). Experimental methods: Between-subject and within-subject design. Journal of economic behavior & organization, 81(1), 1-8.

      Martin, C. S., & Sayette, M. A. (1993). Experimental design in alcohol administration research: limitations and alternatives in the manipulation of dosage-set. Journal of studies on alcohol, 54(6), 750-761.

      Pennebaker, J. W., & Lightner, J. M. (1980). Competition of internal and external information in an exercise setting. Journal of personality and social psychology, 39(1), 165.

      Valins, S. (1966). Cognitive effects of false heart-rate feedback. Journal of personality and social psychology, 4(4), 400.

      References

      Chen ZS, Wang J. Pain, from perception to action: A computational perspective. iScience. 2022 Dec 1;26(1):105707. doi: 10.1016/j.isci.2022.105707.

      Dienes Z. Obtaining Evidence for No Effect. Collabra: Psychology 2021 Jan 4; 7 (1): 28202. doi: 10.1525/collabra.28202

      King MT. A point of minimal important difference (MID): a critique of terminology and methods. Expert Rev Pharmacoecon Outcomes Res. 2011 Apr;11(2):171-84. doi: 10.1586/erp.11.9.

      Lakens D. Sample Size Justification. Collabra: Psychology 2022 Jan 5; 8 (1): 33267. doi: 10.1525/collabra.33267

      Nosek BA, Ebersole CR, DeHaven AC, Mellor DT. The preregistration revolution. Proc Natl Acad Sci U S A. 2018 Mar 13;115(11):2600-2606. doi: 10.1073/pnas.1708274114.

      Sasaki K, Yamada Y. SPARKing: Sample-size planning after the results are known. Front Hum Neurosci. 2023 Feb 22;17:912338. doi: 10.3389/fnhum.2023.912338.

      Taffou M, Suied C, Viaud-Delmon I. Auditory roughness elicits defense reactions. Sci Rep. 2021 Jan 13;11(1):956. doi: 10.1038/s41598-020-79767-0.

      Tajadura-Jiménez A, Väljamäe A, Asutay E, Västfjäll D. Embodied auditory perception: The emotional impact of approaching and receding sound sources. Emotion. 2010, 10(2), 216-229.https://doi.org/10.1037/a0018422

      Zhou W, Ye C, Wang H, Mao Y, Zhang W, Liu A, Yang CL, Li T, Hayashi L, Zhao W, Chen L, Liu Y, Tao W, Zhang Z. Sound induces analgesia through corticothalamic circuits. Science. 2022 Jul 8;377(6602):198-204. doi: 10.1126/science.abn4663.

      Reviewer #3 (Recommendations For The Authors):

      The manuscript would benefit from some spelling- and grammar checking.

      Done

      Discussion:

      The discussion section is rather lengthy and would benefit from some re-structuring, editing, and sub-section headers.

      In response, we have restructured and edited the Discussion section to improve clarity and flow.

      I personally had a difficult time understanding how the data relates to the rubber hand illusion (l.623-630). I would recommend revising or deleting this section.

      We thank the reviewer for this valuable feedback. We have revised the paragraph and made the parallel clearer (lines 731-739).

      Other areas are a bit short and might benefit from some elaboration, such as clinical implications. Since they were mentioned in the abstract, I had expected a bit more thorough discussion here (l. 718).

      Thank you for this suggestion. We have expanded the discussion to more thoroughly address the clinical implications of our interoceptive pain illusion (See Limitations and Future Directions paragraph).

      Further, clarification is needed for the following:

      I would like some more details on participant instructions; in particular, the potential difference in instruction between Exp. 1 and 2, if any. In Exp. 1, it says: (l. 280) "Crucially, they were also informed that over the 60 seconds preceding the administration of the shock, they were exposed to acoustic feedback, which was equivalent to their ongoing heart rate". Was there a similar instruction for Exp. 2? If yes, it would suggest a more specific effect of cardiac auditory feedback; if no, the ramifications of this difference in instructions should be more thoroughly discussed.

      Thank you for this suggestion. We have clarified this point in the Procedure of Experiment 2 (548-550).

    1. eLife Assessment

      Using their unique Fish-On-Chips optofluidics platform, the authors make three important findings: the presence of precise coupling between saccades and tail flips can be used to discriminate between turning or gliding behaviours; aversive and appetitive chemosensory cues differentially modulate these behaviours; transformation from cue valence to behaviour is encoded by the pallium. The evidence supporting these findings is solid. The work advances our understanding of the ancient interplay between chemosensation and motor output through the modulation of eye-body coordination.

    2. Reviewer #1 (Public review):

      Summary:

      This study was designed to manipulate and analyze the effects of chemosensory cues on visuomotor control. They approach this by analyzing how eye-body coordination and brain-wide activity are altered with specific chemosensation in larval zebrafish. After analyzing the dynamics of coupled saccade-tail coordination sequences - directionally linked and typically coupled to body turns - the authors investigated the effects of sensory cues shown to be either aversive or appetitive on freely swimming zebrafish on the eye-body coordination. Aversive chemicals lead to an increase in saccade-tail sequences in both number and dynamics, seemingly facilitating behaviors like escape. Brain-wide imaging led the authors to neurons in the telencephalic pallium as a target to study eye-body coordination. Pallium neuron activity correlated with both aversive chemicals and coupled saccade-tail movements.

      Recommendations for improvement are minimal. So much of the data is ultimately tabular, and the figures are an impenetrable wall of datapoints. 1c is an excellent example: three concentrations are presented, but it's rare for the three averages to trend appropriately. The key point, which is that aversive odors are repulsive and attractive odors (sometimes) attractive just gets lost in showing the three concentrations individually; it also makes direct comparisons impossible. There are similar challenges abound in the violin plots in 4e-4h, the error bars on the "fits" in 4i-4m, and so on. We recommend selecting an illustrative subset of data to present to permit interpretation and putting the rest in a supplemental table. (Presenting) less is more (effective).

    3. Reviewer #2 (Public review):

      Summary:

      The manuscript by Sy SKH. et al. on pallium encoded chemosensory impact of eye-body coordination describes how the valence of chemosensory stimuli can affect the coordination of eye saccades with tail flips. They show that aversive valence stimuli can increase both the strength and frequency of tail flips through a pallium-mediated circuit.

      Overall, the manuscript is well-written and easy to follow, although the figures are quite dense, the methodology is mostly sound, and the improvement to the fish on chips system is very interesting. The methods description is thorough and welcome, making the experiments clear. The limited number of animals, and the spread between 5 and 6dpf is a concern as most of the statistics seem to have been done on the individual events, and not the number of biological samples.

      The initial behavioural experiments are very promising. However, the conclusions surrounding the role of the pallium are a lot more speculative and not supported by the results.

      Comments:

      (1) The fish on chips 2.0 methods show a lot of promise for future studies of chemosensory stimuli, combined with whole-brain imaging. This will provide new avenues of research for zebrafish neuroscientists.

      (2) Chemosensory cues would have a very different timing than visual cues; timing is very important for multisensory integration. How do the authors suggest those are integrated? How would they differentiate between an integration of various cues or a different arousal state, as they describe in the introduction?

      (3) Studies have looked at chemosensation in Drosophila, including multisensory integration, which should be discussed by the authors (see the work of Mark Frye, amongst others).

      (4) In the brain imaging methods, there is a mention of robustly behaving larvae. Does that mean that an exclusion criterion was used to select only 5 larvae? If so, this should be stated clearly. The authors also do not mention how they avoid the switch to a passive state that one of the coauthors has observed in closed closed-loop setup. The authors should comment on this point.

      (5) Were the statistics in Figure 2 done with an n of 5, or do they assume that each tail flip and saccade is an independent event? I would imagine the latter would have inflated p-values and should be avoided.

      (7) Page 7: Why do the authors think that the cumulative effect of these minor differences could lead to very different behavioural goals? Especially when comparing to actual startle responses, which are extremely strong and stereotypical. How do their observations compare to the thermosensory navigation of larval zebrafish observed by Martin Haesemeyer, for example, or the work of the RoLi lab?

      (8) Page 8: Figure 5, I am confused by the y-axis of g, in e and f, the values are capped at 2, whereas in g they go up to 6, with apparently a number of cells whose preference is out of the y-axis limit (especially in Q2). Having the number of cells in each quadrant would also help to assess if indeed there is some preference in the pallium towards Q1.

      (9) Figure 6: How is the onset of neuronal activity determined compared to the motor stimulus? Looking at Supplementary Figure 8, it is quite unclear how the pallium is different from the OB or subpallium. The label of onset delay is also confusing in this figure.

      (10) Page 9: I do not think that the small differences observed in the pallium are as clear-cut as the authors make them out to be, or that they provide such strong evidence of their importance. As there are no interventions showing any causality in the presence of these pallium responses and the sensorimotor responses, these could represent different arousal states rather than any integration of sensory information.

    4. Reviewer #3 (Public review):

      The manuscript investigates the coupling of saccadic eye movements (S) with directed tail flips (T). The remarkable discovery is that tail flips that are preceded by a conjugate sacced (S-T) can be credibly classified as specific "volitional" turns that are distinguished from the standard tail movements that seem to be more of a spontaneous and "impulsive" nature.

      They show that 'turning intent', as indicated by a small increase in S, is elevated by aversive odors, while 'gliding intent', as indicated by a decrease in S and an increase in undulation cycles, is elevated by appetitive odors.

      This is a very important finding, which is backed up by a thorough behavioral analysis, and the identification of neural populations in the pallium and sub-pallium that clearly distinguish between these kinds of turns is very promising. Here they identify neuronal populations that are preferentially active during - and predictive of - coupled (S-T) versus isolated (T) tail flips.

      Especially the fact that S-T turns (but not T turns) can be predicted already by pre-event, ramping, pallial activity is intriguing.

      The authors then go on and demonstrate that the frequency of (S-T) turns is modulated in fish exposed to appetitive or aversive odors.<br /> Specifically, they quantify the aversiveness and appetitive-ness of several odors in a free swimming assay. They select a couple of these odors based on their valence, and they demonstrate that these odors induce moderate modulation in the frequency of eye saccades (S) and tail flips (T) and (S-T) turns.

      The study is rigorous and thorough, and the findings are informative and novel.

      In important controls, they confirm that brain-wide imaging can distinguish between appetitive and aversive contexts, and they show that pallial activation by aversive odors is consistent with neural activity in the rhombencephalon that correlates with turning activity, whereas sub-pallial activation by appetitive odors correlates with rhombencephalic activity related to gliding.

      Overall, this manuscript is very good.

    5. Author response:

      We thank the editors and all reviewers for the detailed evaluation of the work and the overall positive remarks, as well as the constructive feedback to improve our manuscript. Based on the integrated comments of the reviewers and advice of the reviewing editor, we will suitably address all comments raised by the reviewers, and we outline our revision plan below:

      Interpretation of findings

      ● We will carefully reframe our interpretation of the data regarding the role of the pallium in the coupled saccade-tail turning events, and clearly state that we have not established a causal role, which requires additional perturbation experiments.

      ● We will also acknowledge the confounding interpretation that the pallial activities recorded may also represent or include arousal state signals.

      Streamlining the presentation

      ● In the introduction, we will better contextualize our study with additional discussions on (i) the advantageous use of zebrafish to study chemosensation, factoring in differences in the spread of chemical cues in water vs. air, and (ii) the disruption of eye-body coordination and underlying neural circuits.

      ● We will streamline the presentation of data in Fig. 1 by keeping the overall responses of the larvae to each chemical across concentrations in the main figure, while moving suitable additional details to a supplementary figure.

      ● Similarly, for each of the subsequent main figures, wherever suitable we will select an illustrative, core set of panels to retain in the main figure, and move other more detailed plots to supplementary figures.

      ● We will incorporate additional references and discussions of the past literature, including relating our findings to (i) chemosensation/multisensory integration in Drosophila, (ii) thermosensation-driven and navigational behavior in larval zebrafish, and (iii) fleeing or escape behavior in zebrafish and other species.

      ● We will clarify our animal subject inclusion criteria, that all larval subjects with sufficiently high-quality, stable imaging were included (i.e., we only excluded larvae because of insufficient quality of imaging, but not other factors).

      ● For applicable plots, adding suitable additional details to the plots or legends (e.g., clarification of measures, specifying numbers of cells).

      Data analysis and statistics

      We will perform additional data analysis, by making comparisons with statistics performedon fish subject-level, and include confident intervals wherever applicable.

    1. eLife Assessment

      This important study examined age-related changes in cerebellar function by testing a large sample of younger and older adults, including 30 over 80 years old, on motor and cognitive tasks linked to the cerebellum and conducting structural imaging. Their findings show that cerebellar-dependent functions are mostly maintained or even enhanced across the lifespan, with cerebellar-mediated motor abilities remaining intact despite degeneration, in contrast to non-cerebellar measures. Overall, the authors provide solid evidence in support of preserved cerebellar function with age. These results highlight the resilience and redundancy of cerebellar circuits and offer key insights into aging and motor behavior.

    2. Reviewer #1 (Public review):

      Summary:

      Witte et al. examined whether canonical behavioral functions attributed to the cerebellum decline with age. To test this, they recruited younger, old, and older-old adults in a comprehensive battery of tasks previously identified as cerebellar-dependent in the literature. Remarkably, they found that cerebellar function is largely preserved across the lifespan-and in some cases even enhanced. Structural imaging confirmed that their older adult cohort was representative in terms of both cerebellar gray- and white-matter volume. Overall, this is an important study with strong theoretical implications and convincing evidence supporting the motor reserve hypothesis, demonstrating that cerebellar-dependent measures remain largely intact with aging.

      Strengths:

      (1) Relatively large sample size.

      (2) Most comprehensive behavioral battery to date assessing cerebellar-dependent behavior.

      (3) Structural MRI confirmation of age-related decline in cerebellar gray and white matter, ensuring representativeness of the sample.

      Weaknesses:

      (1) Although the authors note this was outside the study's scope, the absence of a voxel-based morphometry (VBM) analysis limits anatomical and functional specificity. Such an analysis would clarify which functions are cerebellar-dependent rather than solely inferring this from prior neuropsychological literature.

      (2) As acknowledged in the Discussion, task classification (cerebellar-dependent vs. general measures) remains somewhat ambiguous. Some "general" measures may still rely on cerebellar processes based on the paper's own criteria - for example, tasks in which individuals with cerebellar degeneration show impairments.

      (3) Cerebellar-dependent and general measures may inherently differ in measurement noise, potentially biasing results toward detecting effects in general measures but not in cerebellar-dependent ones.

    3. Reviewer #2 (Public review):

      Summary:

      The authors are investigating cerebellar-mediated motor behaviors in a large sample of adults, including 30 individuals over the age of 80 (a great strength of this work). They employed a large battery of motor tasks that are tied to cerebellar function, in addition to a cognitive task and motor tasks that are more general. They also evaluated cerebellar structure. Across their behavioral metrics, they found that even with cerebellar degeneration, cerebellar-mediated motor behavior remained intact relative to young adults. However, this was not the case for measures not directly tied to cerebellar function. The authors suggest that these functions are preserved and speak to the resiliency and redundancy of function in the cerebellum. They also speculate that cerebellar circuits may be especially good for preserving function in the face of structural change. The tasks are described very well, and their implementation is also well-done with consideration for rigor in the data collection and processing. The inclusion of Bayesian estimates is also particularly useful, given the theoretically important lack of age differences reported. This work is methodologically rigorous with respect to the behavior, and certainly thought-provoking.

      Strengths:

      The methodological rigor, inclusion of Bayesian statistics, and the larger sample of individuals over the age of 80 in particular are all great strengths of this work. Further, as noted in the text, the fact that all participants completed the full testing battery is of great benefit.

      Weaknesses:

      The suggestion of cerebellar reserve, given that at the group level there is a lack of difference for cerebellar-specific behavioral components, could be more robustly tested. That is, the authors suggest that this is a reserve given that the volume of cerebellar gray matter is smaller in the two older groups, though behavior is preserved. This implies volume and behavior are seemingly dissociated. However, there is seemingly a great deal of behavioral variability within each group and likewise with respect to cerebellar volume. Is poorer behavior associated with smaller volume? If so, this would still suggest that volume and behavior are linked, but rather than being age that is critical, it is volume. On the flip side, a lack of associations between behavior and volume would be quite compelling with respect to reserve. More generally, as explicated in the recommendations, there are analyses that could be conducted that, in my opinion, would more robustly support their arguments given the data that they have available. This is a well-executed and thought-provoking investigation, but there is also room for a bit more discussion.

    1. eLife Assessment

      This important study employs functional magnetic resonance spectroscopy (fMRS) to demonstrate that GABAergic inhibition in the parietal cortex actively suppresses goal-irrelevant distractors, thereby facilitating goal-directed visual tracking. The data and analyses are solid, and the methodology is validated. However, the link between the metabolic changes and the purported functional mechanisms is incomplete due to concerns with experimental design and interpretations. The study will be of interest to researchers studying goal-directed behavior and neurochemical dynamics in cognitive processing.

    2. Reviewer #1 (Public review):

      Summary

      Wang et al. address the challenge of tracking goal-relevant visual signals amidst distractions, a fundamental aspect of adaptive visual information processing. By employing functional magnetic resonance spectroscopy (fMRS) during a visual tracking task, they quantify changes in both inhibitory (GABA) and excitatory (glutamate) neurotransmitter concentrations in the parietal and visual cortices. The results reveal that increases in GABA and glutamate in the parietal cortex are closely tied to the number of targets, and individual differences in GABAergic and glutamatergic responses within the parietal cortex predict tracking performance and distractor suppression. These findings underscore a neural mechanism in which GABAergic inhibition in the parietal cortex actively suppresses goal-irrelevant distractors, thereby facilitating goal-directed visual tracking and highlighting the dynamic role of these key metabolites in cognitive control during visual processing. I found the study to be well-written and thoughtful from an experimental standpoint, although it would benefit from some targeted revisions.

      Strengths

      (1) The study employs robust and validated fMRS methodology, allowing for real-time monitoring of metabolite changes during goal-directed tasks.

      (2) Simultaneous measurement of both GABA and Glx in parietal and visual cortices yields nuanced insights into the neurochemical correlates of visual attention.

      (3) The link between neurochemical changes and behavioral performance is clearly established, providing strong evidence for GABAergic involvement in distractor suppression.

      (4) Experimental protocols align with current standards for MEGA-PRESS, bolstering the technical reliability of the findings.

      Weaknesses

      (1) Certain aspects of terminology, methodological reporting, and confound management are inconsistently described throughout the manuscript.

      (2) Important confounding factors are not systematically reported or controlled.

      (3) Opportunities for additional analysis (e.g., behavioral dynamics, use of alternate fitting methods, more comprehensive quality metrics) have not been fully explored.

      (4) Open access data and/or codes for the analysis are not shared in the main manuscript

    3. Reviewer #2 (Public review):

      Summary:

      This study investigates how the visual system is able to track target objects when these are presented in the visual field together with other irrelevant and distracting visual objects. The authors use functional Magnetic Resonance Spectroscopy to measure the two most important excitatory and inhibitory neurotransmitters, glutamate and GABA, in both the visual and parietal cortex.

      Strengths:

      (1) Well-designed functional challenge.

      (2) Number of subjects.

      (3) Good quality spectra and appropriate reporting of MRS methods and quality assurance.

      (4) Introduction and discussion are clear for non-experts in visual processing.

      Weaknesses:

      (1) Rejection of spectra based on high % CRLB may artificially remove data with the lowest metabolite concentration.

      (2) SN description as percentage does not make sense.

    4. Reviewer #3 (Public review):

      Wang et al. report multiple experiments using functional magnetic resonance spectroscopy (fMRS) in a multiple object tracking (MOT) task to investigate the effect of experimentally manipulating a) the number of targets, b) object size, and c) total number of objects in the display on GABA and glutamate (Glx) concentrations in parietal and visual cortex. Data is analyzed in two orthogonal ways throughout: via condition differences in behavorial performance (inverse efficiency), GABA, and Glx concentrations and through correlations between changes in inverse efficiency and GABA or Glx. All three experimental manipulations affected inverse efficiency, with worse performance with more targets, smaller objects, and a larger total number of objects. However, only the manipulation of the target number produced a condition difference in GABA and Glx, with higher concentrations of both in the parietal VOI and only of Glx in the visual VOI with more targets ('high load'). Correlational analyses revealed that participants with a larger change in GABA in the parietal VOI with a higher number of targets showed a smaller drop in behavioral performance with more targets. The opposite direction of correlation was observed for Glx in both the visual and parietal VOI.

      In the two control experiments, correlations were only investigated in the parietal VOI. There was a negative correlation between change in Glx and change in inverse efficiency with manipulation of object size, i.e. participants exhibiting a positive change in Glx showed no or little difference in performance, but those with an increase in Glx with smaller targets showed a more pronounced drop in performance. There was no correlation with GABA for the manipulation of object size. For the manipulation of total object number, participants exhibiting an increasing GABA concentration with more objects showed a smaller drop in performance.

      The authors' main claim is that GABAergic suppression of goal-irrelevant distractors in parietal cortex is key to goal-directed visual information processing.

      The study is, to my knowledge, the first to employ fMRS in an MOT paradigm, and I read it with great interest. I am admittedly not an expert on the fMRS technique and have therefore refrained from commenting on the technical aspects of its use. Although the application of fMRS to MOT is novel and adds new knowledge to the field, I have some critiques and believe that a much more nuanced interpretation of the findings is warranted.

      Major

      (1) Especially the control experiments lean heavily on Bettencourt and Somers (2009) and adopt and to some extent exaggerate claims from that paper uncritically. This is obvious in referring to the manipulations of object size and object number as high/low enhancement and high/low suppression, as if the association of these physical manipulations of the stimulus display with attentional mechanisms were so obvious and beyond doubt that drawing any distinction between these manipulations and their supposed effects is entirely superfluous. This seems far beyond what is warranted to me. It may seem plausible that adding distractors engages distractor suppression more, but whether this is truly the case is an empirical question, and Bettencourt and Somers (2009) have no direct measure of distractor suppression to substantiate this claim. Their study is purely behavioral, and there is no attempt to assess distractor processing separately. The case for the 'target enhancement' manipulation is even weaker: objects are of a sufficient size and at maximum contrast (white on black screen, but exact details are omitted) to be clearly visible in either condition, so why would smaller objects require more enhancement? Although the present data shows a clear effect of manipulating object size, the corresponding size of the effect in Bettencourt and Somers (2009) is rather underwhelming and does not warrant such a strong conclusion. In summary, the link between the object number and object size manipulations with suppression and enhancement is very far from the 1:1 that the authors seem to assume. Accordingly, I believe that the manipulations should be labelled as object number and object size rather than their hypothesized effects, throughout and that there should be a much more critical discussion as to whether these manipulations are indeed related to these effects as expected.

      (2) The author's interpretation of the results seems rather uncritical. What is observed (at least in the first experiment) is a change in GABA and Glx concentrations with changes in the number of tracked targets. Is the only conceivable way in which this could happen through target enhancement and distractor suppression? The processing of targets and distractors is not measured directly, so any claims are indirect, at best. The authors cite the recent 'Ten simple rules to study distractor suppression' paper (Wöstmann et al., 2022), which presents a consensus between leading researchers in the field. Neither Bettencourt & Somers (2009) nor the design of the current study live up to the rules established in that paper, so a much more nuanced interpretation and discussion of the current findings seems warranted. It is anything but obvious to me that the only activity in the parietal cortex that could possibly be suppressed by GABA is the representation of distractors. Indeed, cueing more targets (high load) decreases the number of distractors in the first experiment, so the need for distractor suppression in the high load condition is less than in the low load condition. So, shouldn't we observe lower GABA concentrations in the 'high load' condition?

      (3) It seems that the authors included data from both correctly tracked and incorrectly tracked trials in their fMRS analysis. In MOT, attending target objects is the task per se, so task errors indicate that participants did not actually track the targets. So when comparing conditions with different error levels, it is ambiguous whether changes in brain activity reflect the experimental manipulation as such, or rather the different mix of correctly tracked and incorrectly tracked trials that result from this physical manipulation. Are the correlations perhaps driven by the inclusion of different proportions of correctly tracked trials across participants? It seems that the authors may have to separate correct and error trials in the analysis to check for the possibility that effects are due to the inclusion of data from trials in which participants may have stopped tracking at least some of the target objects. Of course, such an analysis is somewhat limited by the fact that only one target was probed, yielding a 50% guessing chance (i.e. even if the response is correct, we do not know whether the other, unprobed, objects were tracked correctly on that trial).

      (4) The key findings from the control experiments are purely correlational. The supposed cause may be what the authors claim, but there is an infinity of alternative explanations. Correlational findings cannot simply be interpreted as if they resulted from an experimental manipulation (...although this is, unfortunately, by no means rare in the cognitive neuroscience literature). The authors should make a rigorous effort to consider the most plausible alternative explanations for these correlations and argue why or why not they believe that they can be discounted.

      (5) Related to the previous point: the experimental manipulations did not produce mean differences in GABA/Glx in the control experiments. Doesn't this speak against the authors' interpretation? They briefly acknowledge this in the discussion, but I think there is a deeper problem. The absence of these effects casts doubt on what these manipulations actually do, and therefore also on the interpretation of the correlations in these experiments. For example, the authors might also have concluded from the same data that the absence of increased GABA in the 'high suppression' condition refutes the very idea that GABA concentrations are related to distractor suppression.

      (6) 'Inverse Efficiency' is a highly unusual measure of MOT performance in the literature, and its use reduces the comparability of the findings with previous work. The standard is to assess the correctness ('accuracy') of responses with no focus on speed. This makes sense as responses are given after the object motion has stopped. At the same time, reaction time can be informative too (e.g., Störmer et al., 2013). I think the authors should justify their use of inverse efficiency as the dependent variable.

      (7) The choice of variable names is problematic: it is sometimes misleading and makes understanding the findings harder (see also points 1 and 6): obvious, unambiguous, and importantly, interpretation free names for conditions such as target number (2/4), object size (small/large), and total object number (8/12) become load (high/low), target enhancement (high/low) and distractor suppression (low/high). This reduces clarity and, especially in the case of enhancement and suppression, conflates the actual manipulation with its interpretation.

    1. eLife Assessment

      This important study shows that a controlled pause in gene reading is required for early heart cells to form during development. The authors demonstrate that loss of this pause prevents the proper activation of the heart-producing program across animal and stem cell systems. The evidence is compelling, supported by careful genomic and functional analyses that clearly define the developmental block. Overall, this work will interest developmental biologists and inspire further studies on the origins of early heart defects.

    2. Reviewer #1 (Public review):

      This is a highly original and impactful study that significantly advances our understanding of transcriptional regulation, in particular RNAPII pausing, during early heart development. The Chen lab has a long history of producing influential studies in cardiac morphogenesis, and this manuscript represents another thorough and mechanistically insightful contribution. The authors have thoroughly addressed this Reviewer's concerns and incorporated all of my suggestions in the revised manuscript. In addition, their responses to the other reviewer's comments are also very clear. As it is, this work is of great interest to the readership of Elife, as well as to the general scientific community.

      The authors reveal a fundamentally new role for Rtf1-a component of the PAF1 complex-in governing promoter-proximal RNAPII pausing in the context of myocardial lineage specification. While transcriptional pausing has been implicated in stress responses and inducible gene programs, its developmental relevance has remained poorly defined. This study fills that gap with rigorous in vivo evidence demonstrating that Rtf1-dependent pausing is indispensable for activating the cardiac gene program from the lateral plate mesoderm.

      Importantly, the study also provides compelling therapeutic implications. Showing that CDK9 inhibition-using either flavopiridol or targeted knockdown-can restore promoter-proximal pausing and rescue cardiomyocyte formation in Rtf1-deficient embryos suggests that modulation of pause-release kinetics may represent a new avenue for correcting transcriptionally driven congenital heart defects. Given that many CDK inhibitors are clinically approved or in active development, this connection significantly elevates the translational impact of the findings.

      In sum, this study is rigorous, innovative, and transformative in its implications for developmental biology and cardiac medicine. I strongly support its publication.

    3. Reviewer #2 (Public review):

      Summary:

      Langenbacher at el. examine the requirement of Rtf1, a component of the PAF1C complex, which regulates transcriptional pausing in cardiac development. The authors first confirm that newly generated rtf1 mutant alleles recapitulate the defects in cardiac progenitor differentiation found using morpholinos from their previous work. The authors then show that conditional loss of Rtf1 in mouse embryos and depletion in mouse ESCs both demonstrates a failure to turn on cardiac progenitor and differentiation marker genes, supporting conservation of Rtf1 in promoting vertebrate cardiac progenitor development. The authors then employ bulk RNA-seq on flow-sorted hand2:GFP+ cells and multiomic single-cell RNA-seq on whole Rtf1-depleted zebrafish embryos at the 10-12 somite stage. These experiments corroborate that gene expression associated with cardiac progenitor differentiation is lost. Furthermore, analysis of differentiation trajectories suggests that the expression of genes associated with cardiac, blood, and endothelial progenitor differentiation is not initiated within the anterior lateral plate mesoderm. Structure-function analysis supports that the Rtf1 Plus3 domain is necessary for its function in promoting cardiac progenitor differentiation. ChIP-seq for RNA Pol II on 10-12 somite stage zebrafish embryos supports that Rtf1 is required for proper promoter pausing at the transcriptional start site. The transcriptional promoter pausing defect and cardiac differentiation can partially be rescued in zebrafish rtf1 mutants through pharmacological inhibition and depletion of Cdk9, a kinase that inhibits elongation. Thus, the authors have provided a clear analysis of the requirements and basic mechanism that Rf1 employs regulating cardiac progenitor development.

      Strengths and weaknesses:

      Overall, the data presented are strong and the message of the study is clear. The conclusions that Rtf1 is required for transcriptional pause release and promotes vertebrate cardiac progenitor differentiation are supported. Areas of strength include the complementary approaches in zebrafish and mouse embryos, and mouse embryonic stem cells, which together support the conserved requirement for Rtf1 in promoting cardiac differentiation. The bulk and single-cell RNA-sequencing analyses provide further support for this model via examining broader gene expression. In particular, the pseudotime analysis bolsters that there is a broader effect on differentiation of anterior lateral plate mesoderm derivatives. The structure-function analysis provides a relatively clean demonstration of the requirement of the Rtf1 Plus3 domain. The pharmacological and depletion epistasis of Cdk9 combined with the RNA Pol II ChIP-seq nicely support the mechanism implicating Cdk9 in the Rtf1-dependent RNA Pol II promoter pausing. Additionally, this is a revised manuscript. The authors were overall responsive to the previous critiques. The new analysis and revisions have helped to strengthen their hypothesis and improve the clarity of their study. While the revised manuscript is significantly improved, the lack of analysis from the multiomic analysis still represents a lost opportunity to provide further insight into Rtf1 mechanisms within this study. However, the authors have nevertheless achieved their goal for this study. The data sets reported will also be useful tools for further analysis and integration by the cardiovascular development community. Thus, the study will be of interest to scientists studying cardiovascular development and those broadly interested in epigenetic regulation controlling vertebrate development.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews: 

      Reviewer #1 (Public Review): 

      Summary:

      The manuscript submitted by Langenbacher et al., entitled " Rtf1-dependent transcriptional pausing regulates cardiogenesis", describes very interesting and highly impactful observations about the function of Rtf-1 in cardiac development. Over the last few years, the Chen lab has published novel insights into the genes involved in cardiac morphogenesis. Here, they used the mouse model, the zebrafish model, cellular assays, single cell transcription, chemical inhibition, and pathway analysis to provide a comprehensive view of Rtf1 in RNAPII (Pol2) transcription pausing during cardiac development. They also conducted knockdown-rescue experiments to dissect the functions of Rtf1 domains. 

      Strengths:

      The most interesting discovery is the connection between Rtf1 and CDK9 in regulating Pol2 pausing as an essential step in normal heart development. The design and execution of these experiments also demonstrate a thorough approach to revealing a previously underappreciated role of Pol2 transcription pausing in cardiac development. This study also highlights the potential amelioration of related cardiac deficiencies using small molecule inhibitors against cyclin dependent kinases, many of which are already clinically approved, while many other specific inhibitors are at various preclinical stages of development for the treatment of other human diseases. Thus, this work is impactful and highly significant. 

      We thank the reviewer for appreciating our work.

      Reviewer #2 (Public Review): 

      Summary: 

      Langenbacher at el. examine the requirement of Rtf1, a component of the PAF1C, which regulates transcriptional pausing in cardiac development. The authors first confirm their previous morphant study with newly generated rtf1 mutant alleles, which recapitulate the defects in cardiac progenitor and diUerentiation gene expression observed previously in morphants. They then examine the conservation of Rtf1 in mouse embryos and embryonic stem cell-derived cardiomyocytes. Conditional loss of Rtf1 in mesodermal lineages and depletion in murine ESCs demonstrates a failure to turn on cardiac progenitor and diUerentiation marker genes, supporting conservation of Rtf1 in promoting cardiac development. The authors subsequently employ bulk RNA-seq on flow-sorted hand2:GFP+ cells and multiomic single-cell RNA-seq on whole Rtf1-depleted embryos at the 10-12 stage. These experiments corroborate that genes associated with cardiac and muscle development are lost. Furthermore, the diUerentiation trajectories suggest that the expression of genes associated with cardiac maturation is not initiated.  Structure-function analysis supports that the Plus3 domain is necessary for its function in promoting cardiac progenitor formation. ChIP-seq for RNA Pol II on 1012 somite stage embryos suggests that Rtf1 is required for proper promoter pausing. This defect can partially be rescued through use of a pharmacological inhibitor for Cdk9, which inhibits elongation, can partially restore elongation in rtf1 mutants.  

      Strengths: 

      Many aspects of the data are strong, which support the basic conclusions of the authors that Rtf1 is required for transcriptional pausing and has a conserved requirement in vertebrate cardiac development. Areas of strength include the genetic data supporting the conserved requirement for Rtf1 in promoting cardiac development, the complementary bulk and single-cell RNA-sequencing approaches providing some insight into the gene expression changes of the cardiac progenitors, the structure-function analysis supporting the requirement of the Plus3 domain, and the pharmacological epistasis combined with the RNA Pol II ChIP-seq, supporting the mechanism implicating Cdk9 in the Rtf1 dependent mechanism of RNA Pol II pausing. 

      We thank the reviewer for the summary and for recognizing many strengths of our work. 

      Weaknesses: 

      While most of the basic conclusions are supported by the data, there are a number of analyses that are confusing as to why they chose to perform the experiments the way they did and some places where the interpretations presently do not support the interpretations. One of the conclusions is that the phenotype aUects the maturation of the cardiomyocytes and they are arresting in an immature state. However, this seems to be mostly derived from picking a few candidates from the single cell data in Fig. 6. If that were the case, wouldn't the expectation be to observe relatively normal expression of earlier marker genes required for specification, such as Nkx2.5 and Gata5/6? The in situ expression analysis from fish and mice (Fig. 2 and Fig. 3) and bulk RNA-seq (Fig. 5) seems to suggest that there are pretty early specification and diUerentiation defects. While some genes associated with cardiac development are not changed, many of these are not specific to cardiomyocyte progenitors and expressed broadly throughout the ALPM. Similarly, it is not clear why a consistent set of cardiac progenitor genes (for instance mef2ca, nkx2.5, and tbx20) was analyzed for all the experiments, in particular with the single cell analysis. 

      A major conclusion of our study is that Rtf1 deficiency impairs myocardial lineage differentiation from mesoderm, as suggested by the reviewer. Thus, the main goal of this study is to understand how Rtf1 drives cardiac differentiation from the LPM, rather than the maturation of cardiomyocytes.  Multiple lines of evidence support this conclusion:

      (a) In situ hybridization showed that Rtf1 mutant embryos do not have nkx2.5+ cardiac progenitor cells and subsequently fail to produce cardiomyocytes (Figs. 2, 3).

      (b) RT-PCR analysis showed that knockdown of Rtf1 in mouse embryonic stem cells causes a dramatic reduction of cardiac gene expression and production of significantly fewer beating patches (Fig.4).

      (c) Bulk RNA sequencing revealed significant downregulation of cardiac lineage genes, including nkx2.5 (Fig. 5).

      (d) Single cell RNA sequencing clearly showed that lateral plate mesoderm (LPM) cells are significantly more abundant in Rtf1 morphant,s whereas cardiac progenitors are less abundant (Fig. 6 and Fig.6 Supplement 1-5). 

      When feasible, we used cardiac lineage restricted markers in our assays. Nkx2.5 and tbx5a are not highlighted in the single cell analysis because their expression in our sc-seq dataset was too low to examine in the clustering/trajectory analysis.  In this revised manuscript, we provide violin plots showing the low expression levels of these genes in single cells from Rtf1 deficient embryos (Figure 6 Supplement 5).

      The point of the multiomic analysis is confusing. RNA- and ATAC-seq were apparently done at the same time. Yet, the focus of the analysis that is presented is on a small part of the RNA-seq data. This data set could have been more thoroughly analyzed, particularly in light of how chromatin changes may be associated with the transcriptional pausing. This seems to be a lost opportunity. Additionally, how the single cell data is covered in Supplemental Fig. 2 and 3 is confusing. There is no indication of what the diUerent clusters are in the Figure or the legend. 

      In this study, we performed single cell multiome analysis and used both scRNAseq and scATACseq datasets to generate reliable clustering.  The scRNAseq analysis reveals how Rtf1 deficiency impacts cardiac differentiation from mesoderm, which inspired us to investigate the underlying mechanism and led to the discovery of defects in Rtf1-dependent transcriptional pause release.

      We agree with the reviewer that deep examination of Rtf1-dependent chromatin changes would provide additional insights into how Rtf1 influences early development and careful examination of the scATACseq dataset is certainly a good future direction.  

      In this revised manuscript, we have revised Fig.6 Supplement 1 to include the predicted cell types and provide an additional excel file showing the annotation of all 39 clusters (Supplementary Table 2). 

      While the effect of Rtf1 loss on cardiomyocyte markers is certainly dramatic, it is not clear how well the mutant fish have been analyzed and how specific the eUect is to this population. It is interpreted that the eUects on cardiomyocytes are not due to "transfating" of other cell fates, yet supplemental Fig. 4 shows numerous eUects on potentially adjacent cell populations. Minimally, additional data needs to be provided showing the live fish at these stages and marker analysis to support these statements. In some images, it is not clear the embryos are the same stage (one can see pigmentation in the eyes of controls that is not in the mutants/morphants), causing some concern about developmental delay in the mutants. 

      Single cell RNA sequencing showed an increased abundance of LPM cells and a reduced abundance of cardiac progenitors in Rtf1 morphants (Fig. 6 and Fig.6 Supplement 1-5). The reclustering of anterior lateral plate mesoderm (ALPM) cells and their derivatives further showed that cells representing undifferentiated ALPM were increased whereas cells representing all three ALPM derivatives were reduced. These findings indicate a defect in ALPM differentiation. 

      The reviewer questioned whether we examined stage-matched embryos. In our assay, Rtf1 mutant embryos were collected from crosses of Rtf1 heterozygotes. Each clutch from these crosses consists of ¼ embryos showing rtf1 mutant phenotypes and ¾ embryos showing wild type phenotypes which were used as control. Mutants and their wild type siblings were fixed or analyzed at the same time.

      The reviewer questioned the specificity of the Rtf1 deficient cardiac phenotype and pointed out that Rtf1 mutant embryos do not have pigment cells around the eye.  Rtf1 is a ubiquitously expressed transcriptional regulator.  Previous studies in zebrafish have shown that Rtf1 deficiency significantly impacts embryonic development. Rtf1 deficiency causes severe defects in cardiac lineage and neural crest cell development; consequently, Rtf1 deficient embryos do not have cardiomyocytes and pigmentation (Langenbacher et al., 2011, Akanuma et al., 2007, and Jurynec et al., 2019).  We now provide an image showing a 2-day-old Rtf1 mutant embryo and their wild type sibling to illustrate the cardiac, neural crest, and somitogenesis defects caused by loss of Rtf1 activity (Fig. 2 Supplement 1).

      With respect to the transcriptional pausing defects in the Rtf1 deficient embryos, it is not clear from the data how this eUect relates to the expression of the cardiac markers. This could have been directly analyzed with some additional sequencing, such as PRO-seq, which would provide a direct analysis of transcriptional elongation. 

      We showed that Rtf1 deficiency results in a nearly genome-wide decrease in promoterproximal pausing and downregulation of cardiac makers. Attenuating transcriptional pause release could restore cardiomyocyte formation in Rtf1 deficient embryos. In this revised manuscript, we provide additional RNAseq data showing that the expression levels of critical cardiac development genes such as nkx2.5, tbx5a, tbx20, mef2ca, mef2cb, ttn.2, and ryr2b are significantly rescued.  We agree with the reviewer that further analyses using the PRO-seq approach could provide additional insights, but it is beyond the scope of this manuscript. 

      Some additional minor issues include the rationale that sequence conservation suggests an important requirement of a gene (line 137), which there are many examples this isn't the case, referencing figures panels out of order in Figs. 4, 7, and 8) as described in the text, and using the morphants for some experiments, such as the rescue, that could have been done in a blinded manner with the mutants. 

      We have clarified the rationale in this revised manuscript and made the eRort to reference figures in order. 

      The reviewer commented that rescue experiments “could have been done in a blinded manner with the mutants”. This was indeed how the flavopiridol rescue and cdk9 knockdown experiments were carried out. Embryos from crosses of Rtf1 heterozygotes were collected, fixed after treatment and subjected to in situ hybridization. Embryos were then scored for cardiac phenotype and genotyped (Fig.8 d-g). Morpholino knockdown was used in genomic experiments because our characterization of rtf1 morphants showed that they faithfully recapitulate the rtf1 mutant phenotype during the timeframe of interest (Fig. 2).

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors): 

      This reviewer has a few suggestions below, aimed at improving the clarity and impact of the current study. Once these items are addressed, the manuscript should be of interest to the Elife reader. 

      Item 1. Strengthening the interaction between Rfh1 and CDK9 on Pol2 pausing. 

      The authors have convincingly shown that the chemical inhibition of CDK9 by flavopiridol can partially rescue the expression of cardiac genes in the zebrafish model. Although flavopiridol is FDA approved and has been a classical inhibitor for the dissection of CDK9 function, it also inhibits related CDKs (such as Flavopiridol (Alvocidib) competes with ATP to inhibit CDKs including CDK1, CDK2, CDK4, CDK6, and CDK9 with IC50 values in the 20-100 nM range) Therefore, this study could be more impactful if the authors can provide evidence on which of these CDKs may be most relevant during Rtf1-dependent cardiogenesis. To determine whether the observed cardiac defect indicates a preferential role for CDK9, or that other CDKs may also be able to provide partial rescue may be clarified using additional, more selective small molecules (e.g., BAY1251152, LDC000067 are commercially available). 

      The reviewer raised a reasonable concern about the specificity of flavopiridol. We thank the reviewer for the insightful suggestion and share the concern about specificity. To address this question, we have used an orthogonal testing through morpholino inhibition where we directly targeted CDK9 and observed the same level of rescue, supporting a critical role of transcription pausing in cardiogenesis.

      Item 2. Differences between CRISPR lines and morphants 

      Much of the work presented used Rtf1 morphants while the authors have already generated 2 CRISPR lines. What is the diUerence between morphants and mutants? The authors should comment on the similarities and/or differences between using morphants or mutants in their study and whether the same Rtf1- CDK9 connection also occurs in the CRISPR lines. 

      The morphology of our mutants (rtf1<sup>LA2678</sup> and rtf1<sup>LA2679</sup>) resembles the morphants and the previously reported ENU-induced rtf1<sup>KT641</sup> allele. Extensive in situ hybridization analysis showed that the morphants faithfully recapitulate the mutant phenotypes (Fig.2). We have performed rescue experiments (flavopiridol and CDK9 morpholino) using Rtf1 mutant embryos and found that inhibiting Cdk9 restores cardiomyocyte formation (Fig.8). 

      Item 3. Discuss the therapeutic relevance of study 

      The authors have already generated a mouse model of Rtf1 Mesp1-Cre knockout where cardiac muscle development is severely derailed (Fig 3B). Thus, a demonstration of a conserved role for CDK9 inhibitor in rescuing cardiogenesis using mouse cells or the mouse model will provide important information on a conserved pathway function relevant to mammalian heart development. In the Discussion, how this underlying mechanistic role may be useful in the treatment of congenital heart disease should be provided.  

      Thank you for the insight. We have incorporated your comments in the discussion. 

      Item 4. Insights into the role of CDK9-Rtf1 in response to stress versus in cardiogenesis. 

      In the Discussion, the authors commented on the role of additional stress-related stimuli such as heat shock and inflammation that have been linked to CDK9 activity. However, the current ms provides the first, endogenous role of Pol2 pausing in a critical developmental step during normal cardiogenesis. The authors should emphasize the novelty and significance of their work by providing a paragraph on the state of knowledge on the molecular mechanisms governing cardiogenesis, then placing their discovery within this framework. This minor addition will also clarify the significance of this work to the broad readership of eLife. 

      Thank you for the suggestion. We have incorporated your comments and elaborate on the novelty and significance of our work in the discussion. 

      Reviewer #2 (Recommendations For The Authors): 

      (1) It is diUicult to assess what the overt defects are in the embryos at any stages. Images of live images were not included in the supplement. Do these have a small, malformed heart tube later or are the embryos just deteriorating due to broad defects? 

      The Rtf1 deficient embryos do not produce nkx2.5+ cardiac progenitors. Consequently, we never observed a heart tube or detected cells expressing cardiomyocyte marker genes such as myl7. This finding is consistent with previous reports using rtf1 morphants and rtf<sup>1KT64</sup>, an ENU-induced point mutation allele (Langenbacher et al., 2011 and Akanuma, 2007). In this revised manuscript, we provide a live image of 2-day-old wild type and rtf1<sup>LA2679/LA2679</sup> embryos (Fig. 2 Supplement 1). After two days, rtf1 mutant embryos undergo broad cell death. 

      (2) Fig. 2, although the in situs are convincing, there is not a quantitative assessment of expression changes for these genes. This could have been done for the bulk or single cell RNA-seq experiments, but was not and these genes weren't not included in the heat maps. A quantitative assessment of these genes would benefit the study. 

      The top 40 most significantly differentially expressed genes are displayed in the heatmap presented in Fig.5d. The complete differential gene expression analysis results for our hand2 FACS-based comparison of rtf1 morphants and controls is presented in Supplementary Data File 1.  In this revised manuscript, we provide a new supplemental figure with violin plots showing the expression levels of genes of interest in our single cell sequencing dataset (Fig.6 Supplement 5).

      (3) It doesn't not appear that any statistical tests were used for the comparisons in Fig. 2.

      We now provide the statistical data in the legend and Fig.2 b, d, f, h and i.

      (4) It's not clear the magnifications and orientations of the embryos in Fig. 3b are the same. 

      Embryos shown in Fig.3b are at the same magnification. However, because Rtf1 mutant embryos display severe morphological defects, the orientation of mutant embryos was adjusted to examine the cardiac tissue.

      (5) The n's for analysis of MLC2v in WT Rtf1 CKO embryos in Fig. 3b are only 1. At least a few more embryos should be analyzed to confirm that the phenotype is consistent. 

      We have revised the figure and present the number of embryos analyzed and statistics in Fig.3c. 

      (6) A number of figure panels are referred to out of order in the text. Fig. 4E-G are before Fig. 4C, D, Fig. 7C  before 7B, Fig. 8D-I before 8A ,B. In general, it is easier for the reader if the figures panels are presented in the order they are referred to in the text. 

      Revised as suggested.

      (7) While additional genes can be included, it is not clear why the same sets of genes are not examined in the bulk or single-cell RNA-seq as with the in situs or expression was analyzed in embryos. I suggest including the genes like nkx2.5, tbx20, myl7, in all the sequencing analysis. 

      We used the same set of genes in all analyses when possible. However, the low expression of genes such as nkx2.5 and myl7 in our sc-seq dataset preclude them from the clustering/trajectory analysis. In this revised manuscript, we present violin plots showing their expression in wild type and rtf1 morphants (Fig. 6 Supplement 5).

      (8) If a multiomic approach was used, why wasn't its analysis incorporated more into the manuscript? In general, a clearer presentation and deeper analysis of the single cell data would benefit the study. The integration of the RNA and ATAC would benefit the analysis.

      As addressed in our response to the reviewer’s public review, both datasets were used in clustering. Examining changes in chromatin accessibility is certainly interesting, but beyond the scope of this study. 

      (9) Many of the markers analyzed are not cardiac specific or it is not clear they are expressed in cardiac progenitors at the stage of the analysis. Hand2 has broader expression. Additional confirmation of some of the genes through in situ would help the interpretations. 

      Markers used for the in situ hybridization analysis (myl7, mef2ca, nkx2.5, tbx5a, and tbx20) are known for their critical role in heart development. For sc-seq trajectory analyses, most displayed genes (sema3e, bmp6, ttn.2, mef2cb, tnnt2a, ryr2b, and myh7bb) were identified based on their differential expression along the LPM-cardiac progenitor pseudotime trajectory. Rather than selecting genes based on their cardiac specificity, our goal was to examine the progressive gene expression changes associated with cardiac progenitor formation and compare gene expression of wild type and rtf1 deficient embryos.

      (10) Additional labels of the cell clusters are needed for Supplemental Figs. 2 and 3. 

      The cluster IDs were presented on Supplementary Figures 2 and 3. In this revised version, we added predicted cell types to the UMAP (revised Fig.6 Supplement 1) and provided an excel file with this information (revised Supplementary Table 2). 

      (11) On lines 101-102, the interpretation from the previous data is that diUerentiation of the LPM requires Rtf1. However, later from the single cell data the interpretation based on the markers is that Rtf1 loss aUects maturation. However, it is not clear this interpretation is correct or what changed from the single cell data. If that were the case, one would expect to see maintenance of more early marks and subsequent loss of maturation markers, which does not appear to the be the case from the presented data.

      Our data suggests that cardiac progenitor formation is not accomplished by simultaneously switching on all cardiac marker genes. Our pseudotime trajectory analysis highlights tnnt2a, ryr2b, and myh7bb as genes that increase in expression in a lagged manner compared to mef2cb (Fig. 6). Thus, the abnormal activation of mef2cb without subsequent upregulation of tnnt2a, ryr2b, and myh7bb in rtf1 morphants suggests a requirement for rtf1 in the progressive gene expression changes required for proper cardiac progenitor differentiation. Our single cell experiment focuses on the process of cardiac progenitor differentiation and does not provide insights into cardiomyocyte maturation. We have edited the text to clarify these interpretations. 

      (12) The interpretation that there is not "transfating" is not supported by the shown data. Analysis of markers in other tissues, again with in situ, to show spatially would benefit the study. 

      As stated in our response to the reviewer’s public review, we observed a dramatic increase of ALPM cells, but a decrease of ALPM derivatives including the cardiac lineage. We did not observe the expansion of one ALPM-derived subpopulation at the expense of the others. These observations suggest a defect in ALPM differentiation and argue against the notion that the region of the ALPM that would normally give rise to cardiac progenitors is instead differentiating into another cell type.

      (13) The rationale that sequence conservation means a gene is important (lines 137-139) is not really true. There are examples a lot of highly conserved genes whose mutants don't have defects. 

      We have revised the text to avoid confusion. 

      (14) The data showing that the 8 bp mutations do not aUect the RNA transcript is not shown or at least indicated in Fig. 7. It would seem that this experiment could have been done in the mutant embryos, in which case the experiment would have been semi-blinded as the genotyping would occur after imaging. 

      The modified Rtf1 wt RNA (Rtf1 wt* in revised Fig. 7) robustly rescued nkx2.5 expression in rtf1 deficient embryos, demonstrating that the 8 bp modifications do not negatively impact the activity of the injected RNA. As stated previously, morpholino knockdown was used in some experiments because our characterization of rtf1 morphants showed that they faithfully recapitulate the rtf1 mutant phenotype during the timeframe of interest.

      (15) Using a technique like PRO-seq at the same stage as the ChIP-seq would complement the ChIP-seq and allow a more detailed analysis of the transcriptional pausing on specific genes observed in WT and mutant embryos. 

      As stated in our response to the reviewer’s public review, we appreciate the suggestion but PRO-seq is beyond the scope of this study.

    1. eLife Assessment

      This useful study reports that the exogenous expression of the microRNA miR-195 can partially compensate in early B cell development for the loss of EBF1, one of the key transcription factors in B cells. While this finding will be of interest to those studying lymphocyte development, the evidence, particularly with regard to the molecular mechanisms that underpin the effect of miR-195, is currently incomplete.

    2. Reviewer #1 (Public review):

      Summary:

      Here, the authors are proposing a role for miR-196, a microRNA that has been shown to bind and enhance degradation of mRNA targets in the regulation of cell processes, has a novel role in allowing the emergence of CD19+ cells in cells in which Ebf1, a critical B-cell transcription factor, has been genetically removed.

      Strengths:

      That over-expression of mR-195 can allow the emergence of CD19+ cells missing Ebf1 is somewhat novel.

      Their data does perhaps support to a degree the emergence of a transcriptional network that may bypass the absence of Ebf1, including the FOXO1 transcription factor, but this data is not strong or definitive.

      Weaknesses:

      It is unclear whether this observation is in fact physiological. When the authors analyse a knockout model of miR-195, there is not much of a change in the B-cell phenotype. Their findings may therefore be an artefact of an overexpression system.

      The authors have provided insufficient data to allow a thorough appraisal of the step-wise molecular changes that could account for their observed phenotype.

      On review of the resubmitted manuscript, while I note the authors have attempted to address several of my comments, unfortunately, their resubmission is not sufficient to address several of the comments I had previously made.

      In particular, in the resubmitted data that includes western blots for PAX5 and ERG in their EBF1-/- model, Supp Fig S3, the bands they show infer that that PAX5 and ERG expression can still be significantly detected in their EBF1-/- early B-cell model. This should not be the case, as no expression of PAX5 or ERG should be seen, as has been shown in prior literature.

    3. Reviewer #2 (Public review):

      Summary:

      The authors investigate miRNA miR-195 in the context of B-cell development. They demonstrate that ectopic expression of miR-195 in hematopoietic progenitor cells can, to a considerable extent, override the consequences of deletion of Ebf1, a central B-lineage defining transcription factor, in vitro and upon short-term transplantation into immunodeficient mice in vivo. In addition, the authors demonstrate that the reverse experiment, genetic deletion of miR-195, has virtually no effect on B-cell development. Mechanistically, the authors identify Foxo1 phosphorylation as one pathway partially contributing to the rescue effect of miR-195. An additional analysis of epigenetics by ATACseq adds potential additional factors that might also contribute to the effect of ectopic expression of miR-195.

      Strengths:

      The authors employ a robust assay system, Ebf1-KO HPC, to test for B-lineage promoting factors. The manuscript overall takes on an interesting perspective rarely employed for analysis of miRNA by overexpressing the miRNA of interest. Ideally, this approach may reveal, if not the physiological function of this miRNA, the role of distinct pathways in developmental processes.

      Weaknesses:

      At the same time, this approach constitutes a major weakness: It does not reveal information on the physiological role of miR-195. In fact, the authors themselves demonstrate in their KO approach, that miR-195 has virtually no role in B-cell development, as has been demonstrated already in 2020 by Hutter and colleagues. While the authors cite this paper, unfortunately, they do so in a different context, hence omitting that their findings are not original.

      Conceptually, the authors stress that a predominant function of miRNA (in contrast to transcription factors, as the authors suggest) lies in fine-tuning. However, there appears to be a misconception. Misregulation of fine tuning of gene expression may result in substantial biological effects, especially in developmental processes. The authors want to highlight that miR-195 is somewhat an exception in that regard, but this is clearly not the case. In addition to miR-150, as referenced by the authors, also the miR-17-92 or miR-221/222 families play a significant role in B-cell development, their absence resulting in stage-specific developmental blocks, and other miRNAs, such as miR-155, miR-142, miR-181, and miR-223 are critical regulators of leukocyte development and function. Thus, while in many instances a single miRNA moderately affects gene expression at the level of an individual target, quite frequently targets converge in common pathways, hence controlling critical biological processes.

      The paper has some methodological weaknesses as well: For the most part, it lacks thorough statistical analysis and only representative FACS plots are provided. Many bar graphs are based on heavy normalization making the T-tests employed inapplicable. No details are provided regarding statistical analysis of microarrays. Generation of the miR-195-KO mice is insufficiently described and no validation of deletion is provided. Important controls are missing as well, the most important one being a direct rescue of Ebf1-KO cells by re-expression of Ebf1. This control is critical to quantify the extent of override of Ebf1-deficiency elicited by miR-195 and should essentially be included in all experiments. A quantitative comparison is essential to support the authors' main conclusion highlighted in the title of the manuscript. As the manuscript currently stands, only negative controls are provided, which, given the profound role of Ebf1, are insufficient, because many experiments, such as assessment of V(D)J recombination, IgM surface expression, or class-switch recombination, are completely negative in controls. In addition, the authors should also perform long-term reconstitution experiments. While it is somewhat surprising that the authors obtain splenic IgM+ B cells after just 10 days, these experiments would certainly be much more informative after longer periods of time. Using "classical" mixed bone marrow chimeras using a combination of B-cell defective (such as mb1/mb1) bone marrow and reconstituted Ebf1-KO progenitors would permit much more refined analyses.

      With regard to mechanism, the authors show that the Foxo1 phosphorylation pathway accounts for the rescue of CD19 expression, but not of other factors, and mentioned in the discussion. The authors then resort to epigenetic analysis, but their rationale remains somewhat vague. It remains unclear how miR-195 is linked to epigenetic changes.

    4. Reviewer #3 (Public review):

      Summary:

      In this study, Miyatake et al. present the interesting finding that ectopic expression of miR-195 in EBF1-deficient hematopoietic progenitor cells can partially rescue their developmental block and allows B cells to progress to a B220+ CD19+ cells stage. Notably, this is accompanied by an upregulation of B cell specific genes and, correspondingly, a downregulation of T, myeloid and NK lineage-related genes, suggesting that miR-195 expression is at least in part equivalent to EBF1 activity in orchestrating the complex gene regulatory network underlying B cell development. Strengthening this point, ATAC sequencing of miR-195-expressing EBF1-deficient B220+CD19+ cells and a comparison of these data to public datasets of EBF1-deficient and -proficient cells suggest that miR-195 indirectly regulates gene expression and chromatin accessibility of some, but not all regions regulated by EBF1.

      Mechanistically, the authors identify a subset of potential target genes of miR-195 involved in MAPK and PI3K signalling. Dampening of these pathways has previously been demonstrated to activate FOXO1, a key transcription factor for early B cells downstream of EBF1. Accordingly, the authors hypothesize that miR-195 exerts its function through FOXO1. Supporting this claim, also exogenous FOXO1 expression is able to promote the development of EBF1-deficient cells to the B220+CD19+ stage and thus recapitulates the miR-195 phenotype.

      Strengths:

      The strength of the presented study is the detailed assessment of the altered chromatin accessibility in response to ectopic miR-195 expression. This provides insight into how miR-195 impacts on the gene regulatory network that governs B cell development and allows the formation of mechanistic hypotheses.

      Weaknesses:

      The key weakness of this study is that its findings are based on the artificial and ectopic expression of a miRNA out of its normal context, which in my opinion strongly limits the biological relevance of the presented work.

      While the authors performed qPCRs for miR-195 on different B cell populations and show that its relative expression peaks in early B cells, it remains unclear whether the absolute miR-195 expression is sufficiently high to have any meaningful biological activity. In fact, other miRNA expression data from immune cells (e.g. DOI 10.1182/blood-2010-10-316034 and DOI 10.1016/j.immuni.2010.05.009) suggest that miR-195 is only weakly, if at all, expressed in the hematopoietic system.<br /> Update to this part after revision: The authors now state in the discussion that their study does not aim to uncover and characterize a physiological role of miR-195 in lymphocytes development, but rather reveals "the potential of miR-195 to compensate for EBF1 deficiency". However, in my opinion, the absence of any physiological context still limits this study's relevance.

      The authors support their finding by a CRISPR-derived miR-195 knockout mouse model which displays mild but significant differences in the hematopoietic stem cell compartment and in B cell development. However, they fail to acknowledge and discuss a lymphocyte-specific miR-195 knockout mouse that does not show any B cell defects in the bone marrow or spleen and thus contradicts the authors' findings (DOI 10.1111/febs.15493). Of note, B-1 B cells in particular have been shown to be elevated upon loss of miR-15-16-1 and/or miR-15b-16-2, which contradicts the data presented here for loss of the family member miR-195.

      A second weakness is that some claims by the authors appear overstated or at least not fully backed up by the presented data. In particular, the findings that miR-195-expressing cells can undergo VDJ recombination, express the pre-BCR/BCR and can class switch need to be strengthened. It would be beneficial to include additional controls to these experiments, e.g. a RAG-deficient mouse as a reference/negative control for the ddPCR and the surface IgM staining, and cells deficient in class switching for the IgG1 flow cytometric staining.

      Moreover, the manuscript would be strengthened by a more thorough investigation of the hypothesis that miR-195 promotes the stabilization and activity of FOXO1, e.g. by comparing the authors' ATACseq data to the FOXO1 signature.

    5. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment 

      This useful study reports that the exogenous expression of the microRNA miR-195 can partially compensate in early B cell development for the loss of EBF1, one of the key transcription factors in B cells. While this finding will be of interest to those studying lymphocyte development, the evidence, particularly with regard to the molecular mechanisms that underpin the effect of miR-195, is currently incomplete. 

      Public Reviews: 

      Reviewer #1 (Public review):

      Summary: 

      Here, the authors are proposing a role for miR-196, a microRNA that has been shown to bind and enhance the degradation of mRNA targets in the regulation of cell processes, and has a novel role in allowing the emergence of CD19+ cells in cells in which Ebf1, a critical B-cell transcription factor, has been genetically removed. 

      Strengths: 

      That over-expression of mR-195 can allow the emergence of CD19+ cells missing Ebf1 is somewhat novel. 

      Their data does perhaps support to a degree the emergence of a transcriptional network that may bypass the absence of Ebf1, including the FOXO1 transcription factor, but this data is not strong or definitive. 

      Weaknesses: 

      It is unclear whether this observation is in fact physiological. When the authors analyse a knockout model of miR-195, there is not much of a change in the B-cell phenotype. Their findings may therefore be an artefact of an overexpression system. 

      The authors have provided insufficient data to allow a thorough appraisal of the stepwise molecular changes that could account for their observed phenotype. 

      Reviewer #2 (Public review): 

      Summary: 

      The authors investigate miRNA miR-195 in the context of B-cell development. They demonstrate that ectopic expression of miR-195 in hematopoietic progenitor cells can, to a considerable extent, override the consequences of deletion of Ebf1, a central Blineage defining transcription factor, in vitro and upon short-term transplantation into immunodeficient mice in vivo. In addition, the authors demonstrate that the reverse experiment, genetic deletion of miR-195, has virtually no effect on B-cell development. Mechanistically, the authors identify Foxo1 phosphorylation as one pathway partially contributing to the rescue effect of miR-195. An additional analysis of epigenetics by ATACseq adds potential additional factors that might also contribute to the effect of ectopic expression of miR-195. 

      Strengths: 

      The authors employ a robust assay system, Ebf1-KO HPC, to test for B-lineage promoting factors. The manuscript overall takes on an interesting perspective rarely employed for the analysis of miRNA by overexpressing the miRNA of interest. Ideally, this approach may reveal, if not the physiological function of this miRNA, the role of distinct pathways in developmental processes. 

      Weaknesses: 

      At the same time, this approach constitutes a major weakness: It does not reveal information on the physiological role of miR-195. In fact, the authors themselves demonstrate in their KO approach, that miR-195 has virtually no role in B-cell development, as has been demonstrated already in 2020 by Hutter and colleagues. While the authors cite this paper, unfortunately, they do so in a different context, hence omitting that their findings are not original. 

      Conceptually, the authors stress that a predominant function of miRNA (in contrast to transcription factors, as the authors suggest) lies in fine-tuning. However, there appears to be a misconception. Misregulation of fine-tuning of gene expression may result in substantial biological effects, especially in developmental processes. The authors want to highlight that miR-195 is somewhat of an exception in that regard, but this is clearly not the case. In addition to miR-150, as referenced by the authors, also the miR-17-92 or miR-221/222 families play a significant role in B-cell development, their absence resulting in stage-specific developmental blocks, and other miRNAs, such as miR-155, miR-142, miR-181, and miR-223 are critical regulators of leukocyte development and function. Thus, while in many instances a single miRNA moderately affects gene expression at the level of an individual target, quite frequently targets converge in common pathways, hence controlling critical biological processes. 

      The paper has some methodological weaknesses as well: For the most part, it lacks thorough statistical analysis, and only representative FACS plots are provided. Many bar graphs are based on heavy normalization making the T-tests employed inapplicable. No details are provided regarding the statistical analysis of microarrays. Generation of the miR-195-KO mice is insufficiently described and no validation of deletion is provided. Important controls are missing as well, the most important one being a direct rescue of Ebf1-KO cells by re-expression of Ebf1. This control is critical to quantify the extent of override of Ebf1-deficiency elicited by miR-195 and should essentially be included in all experiments. A quantitative comparison is essential to support the authors' main conclusion highlighted in the title of the manuscript. As the manuscript currently stands, only negative controls are provided, which, given the profound role of Ebf1, are insufficient, because many experiments, such as assessment of V(D)J recombination, IgM surface expression, or class-switch recombination, are completely negative in controls. In addition, the authors should also perform long-term reconstitution experiments. While it is somewhat surprising that the authors obtained splenic IgM+ B cells after just 10 days, these experiments would be certainly much more informative after longer periods of time. Using "classical" mixed bone marrow chimeras using a combination of B-cell defective (such as mb1/mb1) bone marrow and reconstituted Ebf1-KO progenitors would permit much more refined analyses. 

      With regard to mechanism, the authors show that the Foxo1 phosphorylation pathway accounts for the rescue of CD19 expression, but not for other factors, as mentioned in the discussion. The authors then resort to epigenetics analysis, but their rationale remains somewhat vague. It remains unclear how miR-195 is linked to epigenetic changes. 

      Reviewer #3 (Public review): 

      Summary: 

      In this study, Miyatake et al. present the interesting finding that ectopic expression of miR-195 in EBF1-deficient hematopoietic progenitor cells can partially rescue their developmental block and allow B cells to progress to a B220+ CD19+ cells stage. Notably, this is accompanied by an upregulation of B-cell-specific genes and, correspondingly, a downregulation of T, myeloid, and NK lineage-related genes, suggesting that miR-195 expression is at least in part equivalent to EBF1 activity in orchestrating the complex gene regulatory network underlying B cell development. Strengthening this point, ATAC sequencing of miR-195-expressing EBF1-deficient B220+CD19+ cells and a comparison of these data to public datasets of EBF1-deficient and -proficient cells suggest that miR-195 indirectly regulates gene expression and chromatin accessibility of some, but not all regions regulated by EBF1. 

      Mechanistically, the authors identify a subset of potential target genes of miR-195 involved in MAPK and PI3K signaling. Dampening of these pathways has previously been demonstrated to activate FOXO1, a key transcription factor for early B cells downstream of EBF1. Accordingly, the authors hypothesize that miR-195 exerts its function through FOXO1. Supporting this claim, also exogenous FOXO1 expression is able to promote the development of EBF1-deficient cells to the B220+CD19+ stage and thus recapitulates the miR-195 phenotype. 

      Strengths: 

      The strength of the presented study is the detailed assessment of the altered chromatin accessibility in response to ectopic miR-195 expression. This provides insight into how miR-195 impacts the gene regulatory network that governs B-cell development and allows the formation of mechanistic hypotheses. 

      Weaknesses: 

      The key weakness of this study is that its findings are based on the artificial and ectopic expression of a miRNA out of its normal context, which in my opinion strongly limits the biological relevance of the presented work. 

      While the authors performed qPCRs for miR-195 on different B cell populations and show that its relative expression peaks in early B cells, it remains unclear whether the absolute miR-195 expression is sufficiently high to have any meaningful biological activity. In fact, other miRNA expression data from immune cells (e.g. DOI

      10.1182/blood-2010-10-316034 and DOI 10.1016/j.immuni.2010.05.009) suggest that miR-195 is only weakly, if at all, expressed in the hematopoietic system. 

      The authors support their finding by a CRISPR-derived miR-195 knockout mouse model which displays mild, but significant differences in the hematopoietic stem cell compartment and in B cell development. However, they fail to acknowledge and discuss a lymphocyte-specific miR-195 knockout mouse that does not show any B cell defects in the bone marrow or spleen and thus contradicts the authors' findings (DOI

      10.1111/febs.15493). Of note, B-1 B cells in particular have been shown to be elevated upon loss of miR-15-16-1 and/or miR-15b-16-2, which contradicts the data presented here for loss of the family member miR-195. 

      A second weakness is that some claims by the authors appear overstated or at least not fully backed up by the presented data. In particular, the findings that miR-195expressing cells can undergo VDJ recombination, express the pre-BCR/BCR and class switch needs to be strengthened. It would be beneficial to include additional controls to these experiments, e.g. a RAG-deficient mouse as a reference/negative control for the ddPCR and the surface IgM staining, and cells deficient in class switching for the IgG1 flow cytometric staining. 

      Moreover, the manuscript would be strengthened by a more thorough investigation of the hypothesis that miR-195 promotes the stabilization and activity of FOXO1, e.g. by comparing the authors' ATACseq data to the FOXO1 signature. 

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors): 

      Miyatake et al., present a manuscript that explores the role of miR-195 in B cell development. 

      Their data suggests a role for this microRNA: 

      Using an Ebf1 fetal liver knockout of B-cell differentiation that a small population of CD19 expressing with some evidence of V(D)J recombination capable of class switch can be derived by transduction of miR-195. 

      In the emergent CD19+ Ebf1-/- cells, the authors provide some evidence that Mapk and Akt3 may be miR-195 targets that are downregulated allowing FOXO1 transcription factor pathway may be involved in the emergent CD19+ cells arising from miR-195 transduction. 

      Perhaps less compelling data is provided with regards to a role for miR-195 in normal Bcell development through analysis of a miR-195 knockout model. 

      While there are some interesting preliminary data presented for a role for miR-195 in the context of Ebf1-/- cells, there are some questions I think the authors could consider. 

      Comments: 

      (1-1) It is difficult to ascertain the potential role of miR-195 transduction in allowing the emergence of CD19+ cells from the data provided. miR-195 has been generally shown to destabilize mRNA transcripts by 3' UTR binding that targets mRNA transcripts for degradation. The effect of transduction of miR-195 would therefore be expected to be related to the degradation of factors opposing aspects of B-lineage specification or maintenance. I would be particularly interested in transcriptional or epigenetic regulators that may be modified in this way, at an mRNA as well as protein level.

      We appreciate the reviewerʼs thoughtful comments and agree that miRNAs often exert their effects through the degradation or translational repression of mRNAs encoding regulatory factors. In our study, we attempted to address this point by combining predictive analysis (using TargetScan and starBase) with luciferase reporter assays and qPCR to validate several potential targets of miR-195, including Mapk3 and Akt3. We acknowledge that this is not a comprehensive mechanistic analysis. We agree that a broader and systematic identification of direct targets of miR-195, particularly those involved in transcriptional and epigenetic regulation, would further clarify the mechanisms involved. However, due to limitations in resources and time, we are currently unable to perform global proteomic or ChIP-based validations. Nevertheless, our ATAC-seq and microarray data indicate that miR-195 overexpression leads to increased accessibility and expression of several key B-lineage transcription factors (Pax5, Runx1, Irf8), suggesting that miR-195 indirectly activates transcriptional programs relevant to B cell commitment. We have now clarified this limitation in the revised Discussion section (lines 505‒524), and we emphasize that our current findings represent the potential of miR-195 rather than its physiological role. We hope that this clarification addresses the concern.

      (1-2) While I acknowledge the authors have undertaken TargetScan and starBase analysis to try and predict miR-195 interactions, they do not provide a comprehensive list of putative targets that can be referenced against their cDNA data. Though they postulate Mapk3 and Akt3 as putative miR-195 targets and assay these in luciferase reporter systems (Figure 4), these were not clearly differentially regulated in the microarray data they provided (Figure 1E) as being downregulated on miR-195 transduction in Ebf1-/- cells.

      We thank the reviewer for pointing out the need for a more comprehensive list of predicted miR-195 targets. In response, we have now included a supplementary table 4 (human) and 5 (mouse) listing all putative miR-195 targets predicted by TargetScan and starBase. As noted, Mapk3 expression was indeed downregulated upon miR-195 transduction, consistent with our luciferase reporter and qPCR results. For Akt3, we observed variability in the microarray data depending on the probe used, resulting in inconsistent expression levels. We acknowledge this and have added a clarification in the revised manuscript (lines 335‒339), noting that the regulation of Akt3 by miR-195 is potentially probe-dependent and may require further validation. We hope this clarification resolves the concern.

      (1-3) The authors should provide a more comprehensive analysis of transcriptional changes induced by miR-195 Ebf1-/- specifically in the preproB cell stage of development in Ebf1-/- and miR-195 Ebf1-/- cells. The differentially expressed gene list should be provided as a supplemental file. The gene expression data should be provided for the different B-cell differentiation stages, eg. Ebf1-/- preproB cells, and Ebf1-/- miR-195 preproB cells, CD19+ cells and more differentiated subsets induced by miR-195 transduction.

      We appreciate the reviewerʼs suggestion to provide a more comprehensive transcriptomic analysis at different B-cell differentiation stages. Unfortunately, due to the limited availability of cells and technical constraints, we were unable to perform RNA-seq on miR-195 transduced Ebf1<sup>−/−</sup> pre-pro-B or CD19+ cells. However, to address this point, we referenced publicly available RNA-seq data (GEO accession: GSE92434), which includes transcriptomic profiles of Ebf1<sup>−/−</sup> pro-B cells and wild-type controls. By comparing our microarray data from miR-195 transduced Ebf1<sup>−/−</sup> cells with this dataset, we found partial restoration of expression for several key B-lineage genes, such as Pax5, Runx1, and Irf8, which are normally downregulated in the absence of EBF1. This comparison supports the notion that miR-195 partially reactivates the transcriptional network essential for B cell development. We have added this interpretation to the Discussion section (lines 528‒533).

      (1-4) More replicates (at least 3 of each genotype) are required for their Western Blots for FOXO1 and pFOXO1 (Fig 4C, D). Western blots should also be provided for other known B-lineage transcriptional regulators such as PAX5 and ERG.

      We thank the reviewer for these valuable suggestions. In response, we have now quantified and added the relative band intensities of FOXO1 and pFOXO1 from three independent experiments in the revised Figure 4C, and we include statistical analysis to support the reproducibility of these results. Additionally, as requested, we performed western blotting for PAX5 and ERG using the same samples. The results showed no significant change in these protein levels between miR-195-transduced and control Ebf1<sup>−/−</sup> cells, consistent with the modest upregulation observed in our microarray data. We have included the PAX5 and ERG western blot images in Supplementary Figure S3 and have revised the text in the Results section (lines 351‒35)

      (1-5) The authors have not shown a transcriptional binding by ChIPseq or other methods such as cut and tag/ cut and run for FOXO1 binding to B-lineage genes in their Ebf1-/- miR-195 CD19+ cells to be able to definitively show this TF is critical for the emergence of the C19+ cell phenotype by demonstrating direct binding to "upregulated" genes cis-regulatory regions in the Ebf1-/- miR-195 CD19+ cells

      We appreciate the reviewerʼs suggestion regarding the use of ChIP-seq or related methods to demonstrate direct FOXO1 binding to cis-regulatory regions of B-lineage genes in Ebf1<sup>−/−</sup> miR-195 CD19⁺ cells. We agree that such data would provide definitive evidence of FOXO1's direct involvement in promoting the B cell-like transcriptional program. However, due to current technical limitations, including the scarcity of CD19⁺ cells derived from Ebf1<sup>−/−</sup> miR-195 transduction and the requirement for large cell numbers in ChIP-seq or CUT&RUN protocols, we were unable to perform these assays in this study. Nevertheless, our current data provide multiple lines of indirect evidence supporting the involvement of FOXO1:

      miR-195 transduction leads to reduced phosphorylation and increased accumulation of FOXO1 protein (Fig. 4C).

      Overexpression of FOXO1 in Ebf1<sup>−/−</sup> HPCs partially recapitulates the miR-195 phenotype (Fig. 4D).

      ATAC-seq data show increased chromatin accessibility at known FOXO1 target gene loci (e.g., Pax5, Runx1, Irf8) in miR-195-induced CD19⁺ cells, many of which overlap with FOXO1 motifs(Fig.5)

      These observations collectively suggest that FOXO1 activity is functionally important for the emergence of CD19⁺ cells, even though direct binding has not been confirmed. We have added this limitation to the Discussion (lines 531‒537), and we note that future studies using FOXO1 CUT&RUN in this system would be valuable to further define the underlying mechanism.

      (1-6) The authors have not shown significant upregulation of expression of other critical B-cell regulatory transcription factors in their Ebf1-/- miR-195 CD19+ cells that could account for the emergence of these cells such as Pax5 or Erg. The legend in Figure 1E suggests for example the change in expression of Pax5 is modest if anything at best as no LogFC or western blot data is presented. 

      We thank the reviewer for raising this point. In our microarray analysis (Figure 1D, original Figure 1E), we observed that both Pax5 and Erg mRNA levels were upregulated in Ebf1<sup>−/−</sup> cells upon miR-195 transduction. Specifically, Pax5 showed an increase of approximately log₂FC 1.2, and Erg was also consistently elevated across biological replicates. These changes, although modest, were statistically significant and consistent with the upregulation of other B-lineage-associated transcription factors, such as Runx1 and Irf8. We agree that the magnitude of Pax5 upregulation is not as high as typically seen during full B cell commitment, and therefore may not have been immediately apparent in Figure 1D (original Figure 1E). To clarify this point, we have now revised the text in the Results section (lines 170‒174) to highlight the observed changes in Pax5 and Erg expression. We believe that the upregulation of these transcription factors, together with increased FOXO1 activity and changes in chromatin accessibility (Figure 5), contributes to the partial reactivation of the B cell gene regulatory network in the absence of EBF1.

      (1-7) Which V(D)J transcripts have been produced? A more detailed analysis other than ddPCR is required to help understand the emergence of this population that can presumably proceed through the preBCR and BCR checkpoints.

      We appreciate the reviewerʼs interest in understanding the nature of the V(D)J rearrangements in Ebf1<sup>−/−</sup> miR-195 CD19⁺ cells. As noted, our current data rely on droplet digital PCR (ddPCR), which was used to detect rearranged VH-JH segments in the bone marrow of engrafted mice. While this approach does not allow for detailed mapping of specific V, D, or J gene usage, it provides a sensitive and quantitative measure of V(D)J recombination activity. The detection of rearranged VH-JH fragments in miR-195-transduced Ebf1<sup>−/−</sup> cells suggests that at least partial recombination of the immunoglobulin heavy chain locus is occurring̶an essential checkpoint for progression past the pro-B cell stage. Given the lack of such rearrangements in control-transduced Ebf1<sup>−/−</sup> cells, we interpret this as evidence that miR-195 enables cells to initiate the recombination process. We acknowledge the limitations of ddPCR and agree that a more detailed analysis using VDJ-seq or singlecell RNA-seq would be valuable in determining the diversity and completeness of the V(D)J transcripts produced. This is a direction we intend to pursue in future work. We have added this limitation to the Discussion section (lines 538‒543).

      (1-8) The authors reveal that the Foxo1 transduced Ebf1-/- cells (Fig. 4D) do not persist in vitro or be detected via transplant assay (line 256) and therefore does not represent a truly "rescued" B cell, suggesting that CD19+ cells Ebf1-/- miR-195 transduced cells have more B-cell potential. Further characterisation is therefore warranted of this cell population. For instance, can these cells be induced to undergo myeloid differentiation in myeloid cytokine conditions? What other B-lineage transcriptional regulators are expressed in this cell population that could account for VDJ recombination and expression of a B-lineage transcriptional program (see comments 1, 3, and 5) that allow transition through preBCR and BCR checkpoints as well as undergo class switching?

      We thank the reviewer for this insightful comment. We agree that the persistence and lineage potential of the CD19⁺ cells emerging from Ebf1<sup>−/−</sup> miR-195-transduced progenitors deserve further characterization. Although we were unable to perform additional lineage re-direction assays, our current data provide several lines of evidence suggesting that these cells are stably committed toward the B-lineage:

      Gene expression profiling revealed upregulation of multiple B cell transcriptional regulators, including Pax5, Runx1, and Irf8.

      ATAC-seq analysis showed increased chromatin accessibility at B cell‒specific loci and enrichment of motifs bound by key B-lineage factors such as FOXO1 and E2A.

      The cells express surface IgM and undergo class switch recombination to IgG1 upon stimulation, indicating successful transition through the pre-BCR and BCR checkpoints and acquisition of mature B cell functions.

      Importantly, no upregulation of myeloid- or T-lineage genes was detected in the microarray analysis, arguing against multipotency at this stage.We acknowledge that functional tests for lineage plasticity under altered cytokine conditions would provide important insights and plan to address this question in future studies. This limitation has now been noted in the revised Discussion (lines 544‒550).

      (1-9) In the original Ebf1-/- miR-195 CD19+ experiments, a wild-type control should be provided for each experiment. 

      We appreciate the reviewerʼs suggestion to include wild-type controls in all experiments. While we did not include wild-type samples side-by-side in every assay, we carefully designed our experiments to include biologically appropriate and informative comparisons. For example, in the bone marrow transplantation experiments (Figure 2), Ebf1<sup>−/−</sup> cells transduced with empty vector served as negative controls, clearly lacking CD19 expression, V(D)J recombination, IgM surface expression, and class switch capability. This allowed us to specifically assess the gain-of-function effects of miR-195 in the EBF1-deficient background. In several analyses̶such as the ATAC-seq and microarray comparisons̶we did incorporate or refer to existing wild-type datasets (e.g., GSE92434), providing context for the extent of recovery toward a WT-like profile. We agree, however, that including parallel WT controls across all experimental platforms would enhance interpretability.

      (1-10) For ATACseq data, a comparison between Ebf1-/- preproB cells and Ebf1-/- miR-195 CD19+ cells should be undertaken.

      We thank the reviewer for this important point. As suggested, we have performed a direct comparison of chromatin accessibility between Ebf1<sub>−/−</sub> pre-pro-B‒like cells (CD19<sub>-</sub>, control transduction) and Ebf1<sub>−/−</sub> miR-195‒transduced CD19⁺ cells. This comparison is shown in green in Figure 5B and represents the ATAC-seq peaks differentially accessible between these two populations.  

      (1-11) I cannot agree with the authors with some of their statements such as Line 242 - "therefore miR-195 considered to have similar function with EBF1 to some extent" - how can this be the case when miR-195 is a miRNA and EBF1 is a transcription factor with pioneering transcriptional activity? Surely the effects of miR-195 must be secondary.

      We thank the reviewer for pointing out the inappropriateness of comparing miR-195 to EBF1 in terms of functional similarity. We agree that miR-195, as a microRNA, operates through post-transcriptional regulation and does not possess the pioneering transcriptional activity characteristic of EBF1. To avoid confusion or overstatement, we have removed the sentence in line 242 ("therefore miR-195 is considered to have similar function with EBF1 to some extent").

      (1-12) It is unclear whether this observation is in fact physiological. When the authors analyse a knockout model of miR-195, there is not much of a change in the B-cell phenotype. Their findings may therefore be an artefact of an overexpression system. The authors should comment on this observation in their discussion.  

      We thank the reviewer for this important observation. We agree that the mild phenotype observed in our miR-195 knockout mice suggests that miR-195 is not essential for B cell development under steady-state physiological conditions. Accordingly, we do not claim a physiological requirement for miR-195. Rather, our study demonstrates that miR-195 possesses the potential to activate a B-lineage program in the absence of EBF1 when ectopically expressed. This functional potential̶rather than its endogenous necessity̶ is the main focus of our work. We have now clarified this distinction in the revised Discussion section (lines 551‒560), and we emphasize that our findings highlight an alternative regulatory pathway that can be artificially engaged under specific conditions.

      (1-13) I recommend the authors check spelling and grammar throughout their manuscript.

      We thank the reviewer for the suggestion. In response, we have carefully reviewed the manuscript for spelling, grammar, and clarity. Minor corrections have been made throughout the text to improve readability and ensure consistency. We hope that the revised version addresses any language-related concerns. In addition, the manuscript has been reviewed by professional editing service to improve the language quality.

      (1-14) In general, I recommend more comprehensive primary data be presented in the manuscript or supplementary files to add value to their submission.

      We thank the reviewer for this helpful suggestion. In response, we have revised the manuscript and supplementary materials to include additional primary data wherever possible. The bar graphs have been updated to include individual data points to show variability and replicate information. Uncropped western blot images are now provided in Supplementary Figure S2. We hope these additions provide greater transparency and value to the manuscript. 

      Reviewer #2 (Recommendations for the authors): 

      I have a number of suggestions with regard to inclusion of details and controls: 

      (2-1) The authors need to provide more details on in vitro differentiation, especially culture times. 

      Thank you for your comment. The culture conditions for in vitro differentiation of Ebf1<sup>−/−</sup> hematopoietic progenitor cells are described in the Methods section (lines 648‒ 649) under “Culture of lineage-negative (Lin‒) cells from the fetal liver.” As stated, cells were cultured more than 7 days under the specified conditions.

      (2-2) In Figure 1E, the authors need to provide information on statistics (FDR or similar). 

      I thank the reviewer for the suggestion. In Figure 1D (Original Figure 1E) (the microarray analysis), only two biological replicates were available for each condition (n = 2 per group). Due to this limited sample size, we did not perform statistical testing, as the power would be insufficient to produce reliable p-values or adjusted FDRs. Instead, we focused on genes with consistent and biologically meaningful changes in expression, and presented representative examples based on fold change values.

      (2-3) For in vivo experiments (Figure 2) the authors should comment on their use of two different recipient mouse strains despite very low n numbers. As described above, classical mixed BM chimeras would be much more informative. In these experiments, the authors should also show the formation of other lymphoid lineages. This would answer the question of whether miR-195 redirects cells to the B lineage. Most importantly, absolute numbers need to be provided, especially in conjunction with Ebf1 rescue as described above. 

      We thank the reviewer for the thoughtful and detailed suggestions regarding our in vivo experiments. Regarding the use of different recipient mouse strains, our initial intention was to perform the transplantations in BRG mice; however, due to facility restrictions and animal husbandry considerations, we had to switch to NOG mice. All in vivo experiments were performed with n = 3 per group, in accordance with ethical guidelines and efforts to minimize animal use while still ensuring reproducibility. With respect to the suggestion of mixed bone marrow chimeras, we agree that this approach can provide valuable information on lineage competitiveness. However, in our system, miR-195 confers only a very limited B cell developmental potential in Ebf1<sup>−/−</sup> progenitors. In such a setting, the inclusion of wild-type competitor cells would overwhelmingly dominate the B cell compartment, likely masking any measurable effect of miR-195. Therefore, we opted to assess the gain-of-function potential of miR-195 in a noncompetitive setting. Regarding the assessment of other lymphoid lineages, we focused our analysis on the emergence of B-lineage cells, as the frequency of CD19⁺ cells induced by miR-195 is quite low. Given this low efficiency, we consider it unlikely that miR-195 significantly alters the development of non-B lineages, and thus did not observe substantial lineage diversion effects. Our aim was not to demonstrate lineage redirection, but rather to show that miR-195 can confer partial B cell potential in the absence of EBF1.

      Finally, we acknowledge the importance of presenting absolute cell numbers. However, the cell number collected from the mice were so few that we did not get the reliable results, we described it in the manuscript. (lines 498-501)

      (2-4) The statistics in Figure 3 are inadequate. No S.D. is provided for WT. How then was normalization performed? Student's T-test cannot be applied to ratios. 

      We thank the reviewer for highlighting the need for more appropriate statistical analysis. Due to considerable inter-batch variability in absolute measurements, we normalized the KO values to their paired WT counterparts from the same experimental batch. Specifically, for each replicate, we calculated the KO/WT ratio to control for batch-specific variation. We then applied a one-sample t-test (against a null hypothesis of ratio = 1) to determine statistical significance. We have now revised the figure to show individual ratio values for each replicate and updated the legend and Methods to clearly explain the statistical approach. We hope this addresses the concern and improves the clarity and rigor of the analysis.

      (2-5) In Figure 4A, the authors should comment on the strong repression of the Akt3UTR. 

      We appreciate the reviewerʼs observation regarding the strong repression observed with the Akt3 3'UTR construct. Indeed, we also noted that luciferase activity was markedly reduced in the presence of the Akt3 3'UTR, even in cells transduced with a control vector. We hypothesize that the Akt3 3'UTR contains strong post-transcriptional regulatory elements̶such as AU-rich elements or binding sites for endogenous miRNAs or RNA-binding proteins̶which may suppress mRNA stability or translation independent of miR-195. Alternatively, the secondary structure or length of the UTR may inherently reduce luciferase expression. We have added this limitation to the Discussion section (lines 561‒569).

      (2-6) The Western blot in Figure 4C is of insufficient quality. The authors need to provide unspliced versions of the bands including markers. 

      We thank the reviewer for this important comment. In response, we have included the unprocessed, full-length Western blot images corresponding to Figure 4C as Fig. S2. This provides a transparent view of the original data and addresses the concern about image cropping.

      (2-7) The ATACseq experiment in Figure 5 is difficult to comprehend. A simpler design including Ebf1 rescue controls would clearly improve this part. 

      We thank the reviewer for this valuable feedback. We agree that the original presentation of the ATAC-seq data may have been difficult to interpret. To address this, we have included a clear interpretation of the overlapping regions in the revised figure legend (lines 1018-1022). We hope this improves the clarity of the data and facilitates understanding of the chromatin changes mediated by EBF1 and miR-195.

      (2-8) The miR-195 KO mouse lacks validation (RT-PCR, genomic PCR) as well as a clear description of the deleted region and whether miR-497 is affected. In addition, the genetic background and number of backcrosses for the removal of potential off-target effects need to be mentioned. 

      We thank the reviewer for this important comment. The miR-195 knockout mouse was generated via CRISPR/Cas9, and Sanger sequencing confirmed a 628 bp deletion on chromosome 11 (GRCm38/mm10 chr11:70,234,425‒70,235,103). This deletion includes the entire miR-497 locus and part of the miR-195 precursor sequence. Although we do not show PCR gel images, the deletion was validated by sequencing, and the results are now clearly described in the revised Methods section (lines 607619). All transgenic mice in this study were backcrossed to the C57BL/6 background for at least eight generations.

      (2-9) The manuscript requires extensive editing for language. 

      We appreciate the reviewerʼs comment. The manuscript has now been revised and professionally edited for language by a native English-speaking editor. We believe clarity and readability have been significantly improved.

      Reviewer #3 (Recommendations for the authors): 

      (3-1) What is the expression level of miR-195 after viral overexpression? In Figure 4B, the authors show a 2.5-fold increase, but this appears very low for the experimental system (expression through the MDH1 retroviral construct) and the observed repressive effects (e.g. Figure 4A and B). 

      We thank the reviewer for this insightful comment. We agree that the apparent ~2.5fold increase in miR-195 levels (Figure 4B) may seem modest in the context of retroviral overexpression and the associated functional effects. However, due to the high sequence similarity within the miR-15/16/195/497 family, it is technically challenging to measure mature miR-195 levels with complete specificity. The baseline signal observed in control samples likely reflects cross-reactivity with endogenous miRNAs such as miR-497 or miR-16, which share similar seed sequences. Therefore, the reported fold-change may underestimate the true level of ectopic miR-195 expression. Despite this, we observed robust repression of validated targets (e.g., Mapk3, Akt3) in both qPCR and luciferase assays, indicating that functionally effective levels of miR-195 were achieved. We have now clarified this limitation and interpretation in the revised Results sections (lines 332‒335).

      (3-2) In alignment with the transparency of the data, I would encourage the authors to display the individual data points for all bar graphs. 

      We thank the reviewer for this helpful suggestion. In the revised manuscript, we have updated bar graphs to include individual data points to increase transparency and allow better visualization of data variability. In the ddPCR experiments, we provided the raw data in Fig. S1 for full transparency. In Fig. 1A, we have confirmed miR-195 expression profiles using the deposit data which the reviewer suggested, but miR-195 expression was very lower than we expected. We also performed scRNA-seq using hematopoietic lineage cells in 8-week-old C57BL/6 mice, but we could not get the reproducibility of miR-195 expression profiles. Therefore, we determined that this is an artifact caused by the miR-195 probe used for qPCR, and deleted Fig. 1A.

      (3-3) The references appear to be compromised. For example, the authors state that "The Ebf1−/+ mouse was originally generated by R. Grosschedl (39)" (line 297), but this is not the respective paper. Likewise, the knockout mouse was generated "based on the CRISPR/Cas9 system established by C. Gurumurthy (40)" (line 299), but he/she is not involved in the referenced study. 

      We thank the reviewer for pointing out the discrepancies in the reference citations. Upon revising the Methods section to integrate it with the main text, the reference numbering became misaligned. We have corrected the reference in the revised manuscript, and we thank the reviewer for bringing this to our attention.

      (3-4) Given that the miRNA Taqman assays the authors used here have difficulties to discriminate closely related miRNAs such as e.g. miR-16 (highly expressed in the hematopoietic system) and miR-195, I would suggest that the authors test their qPCR in an appropriate setup, e.g. in their knockout mouse model. In this context, did the authors use another small RNA as a reference for the qPCR analysis? In the methods, only GAPDH is mentioned, but in my opinion, another RNA that uses the same stemloop-based cDNA synthesis protocol would be better suited.

      We thank the reviewer for this valuable and technically insightful comment.

      As correctly pointed out, TaqMan-based qPCR assays for miRNAs such as miR-195 can show cross-reactivity with closely related family members, particularly miR-16, which is abundantly expressed in hematopoietic cells. Indeed, due to this limitation, we do not treat the qPCR results shown in the original Figures 1A and 4B as definitive quantification of miR-195 expression. Rather, these data are used to provide a suggestion and a rough estimate of overexpression efficiency, while our core functional analyses rely on phenotypic and molecular outcomes such as target gene repression and lineage emergence. With this in mind, although we acknowledge that a small RNA reference based on the same stem-loop cDNA synthesis would offer a more compatible normalization in principle, the inherent variability and lack of absolute specificity in such assays also limits their interpretive value. Therefore, we used GAPDH as a normalization control for consistency with other qPCR analyses in the manuscript. We have now clarified this rationale and limitation in the revised Methods sections (lines 712‒716), and we thank the reviewer again for highlighting this important technical consideration.

      (3-5) The Western blot data used to support the hypothesis that FOXO1 phosphorylation is reduced upon overexpression of miR-195 are not convincing. The authors should not crop everything but the band. 

      We thank the reviewer for the helpful comment. In response, we have now provided the full-length, uncropped Western blot images corresponding to Figure 4C, including both total FOXO1 and phospho-FOXO1 blots. These images are included in Fig. S2.

    1. eLife Assessment

      In reporting on a valuable "learning proteome" for a C. elegans gustatory associative learning paradigm, this work identifies a new set of genes to be tested for roles in learning and memory, describes molecular pathways involving these genes and relevant for learning and memory in C. elegans, and deliver a new set of tools for prodding worm behavior. The methods and results convincingly support the findings, which will be of interest to neuroscientists and developmental biologists seeking to understand the self-assembly and operation of neural circuits for learning and memory.

      [Editors' note: this paper was reviewed by Review Commons.]

    2. Reviewer #1 (Public review):

      Summary:

      Rahmani et al. utilize the TurboID method to characterize global proteome changes in the worm's nervous system induced by a salt-based associative learning paradigm. Altogether, they uncover 706 proteins tagged by the TurboID method in worms that underwent the memory-inducing protocol. Next, the authors conduct a gene enrichment analysis that implicates specific molecular pathways in salt-associative learning, such as MAP kinase and cAMP-mediated pathways, as well as specific neuronal classes including pharyngeal neurons, and specific sensory neurons, interneurons, and motor neurons. The authors then screen a representative group of hits from the proteome analysis. They find that mutants of candidate genes from the MAP kinase pathway, namely dlk-1 and uev-3, do not affect performance in the learning paradigm. Instead, multiple acetylcholine signaling mutants, as well as a protein-kinase-A mutant, significantly affected performance in the associative memory assay (e.g., acc-1, acc-3, lgc-46, and kin-2). Finally, the authors demonstrate that protein-kinase-A mutants, as well as acetylcholine signaling mutants, do not exhibit a phenotype in a related but distinct conditioning paradigm-aversive salt conditioning-suggesting their effect is specific to appetitive salt conditioning.

      Overall, the authors addressed the concerns raised in the previous review round, including the statistics of the chemotaxis experiments and the systems-level analysis of the neuron class expression patterns of their hits. I also appreciate the further attempt to equalize the sample size of the chemotaxis experiments and the transparent reporting of the sample size and statistics in the figure captions and Table S9. The new results from the panneuronal overexpression of the kin-2 gain-of-function allele also contribute to the manuscript. Together, these make the paper more compelling.

    3. Reviewer #2 (Public review):

      Summary:

      In this study by Rahmani in colleagues, the authors sought to define the "learning proteome" for a gustatory associative learning paradigm in C. elegans. Using a cytoplasmic TurboID expressed under the control of a pan-neuronal promoter, the authors labeled proteins during the training portion of the paradigm, followed by proteomics analysis. This approach revealed hundreds of proteins potentially involved in learning, which the authors describe using gene ontology and pathway analysis. The authors performed functional characterization of over two dozen of these genes for their requirement in learning using the same paradigm. They also compared the requirement for these genes across various learning paradigms and found that most hits they characterized appear to be specifically required for the training paradigm used for generating the "learning proteome".

      Strengths:

      - The authors have thoughtfully and transparently designed and reported the results of their study. Controls are carefully thought-out, and hits are ranked as strong and weak. By combining their proteomics with behavioral analysis, the authors also highlight the biological significance of their proteomics findings, and support that even weak hits are meaningful.

      - The authors display a high degree of statistical rigor, incorporating normality tests into their behavioral data which is beyond the field standard.

      - The authors include pathway analysis that generates interesting hypotheses about processes involved learning and memory

      -The authors generally provide thoughtful interpretations for all of their results, both positive and negative, as well as any unexpected outcomes.

    4. Reviewer #3 (Public review):

      Summary:

      In this manuscript, authors used a learning paradigm in C. elegans; when worms were fed in a saltless plate, its chemotaxis to salt is greatly reduced. To identify learning-related proteins, authors employed nervous system-specific transcriptome analysis to compare whole proteins in neurons between high-salt-fed animals and saltless-fed animals. Authors identified "learning-specific proteins" which are observed only after saltless feeding. They categorized these proteins by GO analyses, pathway analyses and expression site analyses, and further stepped forward to test mutants in selected genes identified by the proteome analysis. They find several mutants that are defective or hyper-proficient for learning, including acc-1/3 and lgc-46 acetylcholine receptors, F46H5.3 putative arginine kinase, and kin-2, a cAMP pathway gene. These mutants were not previously reported to have abnormality in the learning paradigm.

      Concerns:

      Upon revision, authors addressed all concerns of this reviewer, and the results are now presented in a way that facilitates objective evaluation. Authors' conclusions are supported by the results presented, and the strength of the proteomics approach is persuasively demonstrated.

      Significance:

      (1) Total neural proteome analysis has not been conducted before for learning-induced changes, though transcriptome analysis has been performed for odor learning (Lakhina et al., http://dx.doi.org/10.1016/j.neuron.2014.12.029). This warrants the novelty of this manuscript, because for some genes, protein levels may change even though mRNA levels remain the same. Although in a few reports TurboID has been used in C. elegans, this is the first report of a systematic analysis of tissue-specific differential proteomics.

      (2) Authors found five mutants that have abnormality in the salt learning. These genes have not been described to have the abnormality, providing novel knowledge to the readers, especially those who work on C. elegans behavioural plasticity. Especially, involvement of acetylcholine neurotransmission has not been addressed before. Although transgenic rescue experiments have not been performed except kin-2, and the site of action (neurons involved) has not been tested in this manuscript, it will open the venue to further determine the way in which acetylcholine receptors, cAMP pathway etc. influences the learning process.

      [Editors' note: this version has been assessed without input from the reviewers.]

    5. Author response:

      The following is the authors’ response to the original reviews

      Comment from the editors at eLife:

      You could consider further strengthening the manuscript with the incorporation of new relevant public datasets for network modeling, but that is entirely your choice.

      We thank the editors and reviewers for their thoughtful and positive feedback on our article. We are particularly appreciative of the eLife assessment describing our work as valuable with a convincing methodology.

      As suggested, we have expanded our neuron class analysis by incorporating transcriptomic data from young adult animals (Kaletsky et al., 2016 Nature; Ghaddar et al., 2023 Science Advances; St Ange et al., 2024 Cell Genomics) to complement our existing analysis of larval stage 4 (L4) animals.

      In addition, we have updated Table S1 to include the outcross status of all strains used in this study, providing clearer information on the genotypes tested. We have also corrected the typographical errors noted by the reviewers. Please note that page and line numbers below refer to the MS Word Document with tracked changes set to ‘simple markup’.

      We greatly appreciate the reviewers’ input and hope these revisions further enhance the value and clarity of our study.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Rahmani et al. utilize the TurboID method to characterize global proteome changes in the worm's nervous system induced by a salt-based associative learning paradigm. Altogether, they uncover 706 proteins tagged by the TurboID method in worms that underwent the memory-inducing protocol. Next, the authors conduct a gene enrichment analysis that implicates specific molecular pathways in salt-associative learning, such as MAP kinase and cAMP-mediated pathways, as well as specific neuronal classes including pharyngeal neurons, and specific sensory neurons, interneurons, and motor neurons. The authors then screen a representative group of hits from the proteome analysis. They find that mutants of candidate genes from the MAP kinase pathway, namely dlk-1 and uev-3, do not affect performance in the learning paradigm. Instead, multiple acetylcholine signaling mutants, as well as a protein-kinase-A mutant, significantly affected performance in the associative memory assay (e.g., acc-1, acc-3, lgc-46, and kin-2). Finally, the authors demonstrate that protein-kinase-A mutants, as well as acetylcholine signaling mutants, do not exhibit a phenotype in a related but distinct conditioning paradigm-aversive salt conditioning-suggesting their effect is specific to appetitive salt conditioning.

      Overall, the authors addressed the concerns raised in the previous review round, including the statistics of the chemotaxis experiments and the systems-level analysis of the neuron class expression patterns of their hits. I also appreciate the further attempt to equalize the sample size of the chemotaxis experiments and the transparent reporting of the sample size and statistics in the figure captions and Table S9. The new results from the panneuronal overexpression of the kin-2 gain-of-function allele also contribute to the manuscript. Together, these make the paper more compelling. The additional tested hits provide a comprehensive analysis of the main molecular pathways that could have affected learning. However, the revised manuscript includes more information and analysis, raising additional concerns.

      Major comments:

      As reviewer 4 noted, and as also shown to be relevant for C30G12.6 presented in Figure 6, the backcrossing of the mutants is important, as background mutations may lead to the observed effects. Could the authors add to Table 1, sheet 1, the outcrossing status of the tested mutants?

      We appreciate this important point. A column has now been added to Table S1 to indicate the outcross status of all strains used in this study. Additionally, we have updated the table legend on page 77 to clarify how to interpret the information provided in this column.

      It is important to validate that the results of the positive hits (where learning was affected), such as acc-1, acc-3, and lgc-46, do not stem from background mutations.

      While we agree that confirming the absence of background mutations is important, we have taken alternative steps to address this concern:

      - The outcross status of each strain is now clearly indicated in Table S1.

      - Observed phenotypes were consistent across multiple biological replicates over extended periods (months, sometimes years), reducing the likelihood that results stem from background mutations.

      We believe these measures provide confidence in the validity of our findings.

      The fold change in the number of hits for different neurons in the CENGEN-based rank analysis requires a statistical test (discussed on pages 17-19 and summarized in Table S7). Similar to the other gene enrichment analyses presented in the manuscript, the new rank analysis also requires a statistical test. Since the authors extensively elaborate on the results from this analysis, I think a statistical analysis is especially important for its interpretation. For example, if considering the IL1 neurons, which ranked highest, and assuming random groups of genes-each having the same size as those of the ranked neurons (209 genes in total for IL1 in Table S7)-how common would it be to get the calculated fold change of 1.38 or higher? Such bootstrapping analysis is common for enrichment analysis. Perhaps the authors could consult with an institutional expert (Dr. Pawel Skuza, Flinders University) for the statistical aspects of this analysis.

      We appreciate the suggestion and agree that statistical testing can be valuable for enrichment analyses. However, implementing additional tests such as bootstrapping is beyond the scope of this study. Our aim was to provide a descriptive overview rather than inferential statistics. To ensure transparency and interpretability, we have:

      - Clearly reported fold changes and rankings in Table S7.

      - Discussed the limitations of this approach in the manuscript text (page 18, lines 17–20).

      - Clearly outlined the methods used to perform this analysis (pages 53–54).

      We believe this descriptive analysis provides sufficient context for interpreting these results.

      The learning phenotypes from Figure S8, concerning acc-1, acc-3, and lgc-46 mutants, are summarized in a scheme in Figure 4; however, the chemotaxis results are found in the supplemental Figure S8. Perhaps I missed the reasoning, but for transparency, I think the relevant Figure S8 results should be shown together with their summary scheme in Figure 4.

      Thank you for this suggestion to improve clarity. We have now moved the panels corresponding to cholinergic signalling components from Figure S8 into Figure 4 on page 21, so that the summary scheme and underlying data are presented together. The figure legends and main text have been updated accordingly to reflect the correct figure numbers.

      Reviewer #2 (Public review):

      Summary:

      In this study by Rahmani in colleagues, the authors sought to define the "learning proteome" for a gustatory associative learning paradigm in C. elegans. Using a cytoplasmic TurboID expressed under the control of a pan-neuronal promoter, the authors labeled proteins during the training portion of the paradigm, followed by proteomics analysis. This approach revealed hundreds of proteins potentially involved in learning, which the authors describe using gene ontology and pathway analysis. The authors performed functional characterization of over two dozen of these genes for their requirement in learning using the same paradigm. They also compared the requirement for these genes across various learning paradigms and found that most hits they characterized appear to be specifically required for the training paradigm used for generating the "learning proteome".

      Strengths:

      The authors have thoughtfully and transparently designed and reported the results of their study. Controls are carefully thought-out, and hits are ranked as strong and weak. By combining their proteomics with behavioral analysis, the authors also highlight the biological significance of their proteomics findings, and support that even weak hits are meaningful.

      The authors display a high degree of statistical rigor, incorporating normality tests into their behavioral data which is beyond the field standard.

      The authors include pathway analysis that generates interesting hypotheses about processes involved learning and memory

      The authors generally provide thoughtful interpretations for all of their results, both positive and negative, as well as any unexpected outcomes.

      Weaknesses:

      - The authors use the Cengen single cell-transcriptomic atlas to predict where the proteins in the "learning proteome" are likely to be expressed and use this data to identify neurons that are likely significant to learning, and building hypothetical circuit. This is an excellent idea; however, the Cengen dataset only contains transcriptomic data from juvenile L4 animals, while the authors performed their proteome experiments in Day 1 Adult animals. It is well documented that the C. elegans nervous system transcriptome is significant different between these two stages (Kaletsky et al., 2016, St. Ange et al., 2024), so the authors might be missing important expression data, resulting in inaccurate or incomplete networks. The adult neuronal single-cell atlas data (https://cestaan.princeton.edu/) would be better suited to incorporate into neuronal expression analysis.

      Thank you for highlighting this important point. We have now incorporated transcriptomic data from young adult animals to complement the L4-based CeNGEN dataset. Specifically, we integrated data from CeSTAAN (https://cestaan.princeton.edu/, including St. Ange et al., 2024) and WormSeq (https://wormseq.org/, including Ghaddar et al., 2023), as outlined below. Importantly, CeSTAAN and WormSeq provide data for 79 and 104 neuron classes, respectively (compared to 128 from CeNGEN); for this reason, the main analysis focuses on CeNGEN due to its broader coverage, with additional datasets noted in brackets for completeness. This is stated on page 18, lines 15–17 to ensure transparency regarding our rationale.

      The main text has been updated to describe these datasets and their integration into our analysis (pages 18–20), and further details on how these resources were used have been added to the Experimental Procedures (pages 53–54).

      We also incorporated data from Kaletsky et al. (2016) and St. Ange et al. (2024) into our neuron identity checks for all assigned and unassigned hits (page 16, lines 8–19). This analysis shows that the nervous system is highly represented in our proteome data: 75–87% of assigned hits and 75–83% of all hits correspond to neuron-enriched genes identified by St. Ange et al. and Kaletsky et al.

      In addition, we used several transcriptomic databases to confirm that learning regulators identified in this study through TurboID and validation experiments are expressed in the same neuron classes as suggested by CenGEN (page 36).

      - The authors offer many interpretations for why mutants in "learning proteome" hits have no detectable phenotype, which is commendable. They are however overlooking another important interpretation, it is possible that these changes to the proteome are important for memory, which is dependent upon translation and protein level changes, and is molecularly distinct from learning. It is well established in the field mutating or knocking down memory regulators in other paradigms will often have no detectable effect on learning. Incorporating this interpretation into the discussion and highlighting it as an area for future exploration would strengthen the manuscript.

      Thank you for this suggestion. We have incorporated this interpretation into the Results section (page 31, lines 17–23), specifying the potential role of these proteomic changes in memory encoding and retention, which are molecularly distinct from learning.

      - A minor weakness - In the discussion, the authors state that the Lakhina, et al 2015 used RNA-seq to assess memory transcriptome changes. This study used microarray analysis.

      This has been corrected on page 38, line 5.

      Significance:

      The approach used in this study is interesting and has the potential to further our knowledge about the molecular mechanisms of associative behaviors. There have been multiple transcriptomic studies in the worm looking at gene expression changes in the context of behavioral training. This study compliments and extends those studies, by examining how the proteome changes in a different training paradigm. This approach here could be employed for multiple different training paradigms, presenting a new technical advance for the field. This paper would be of interest to the broader field of behavioral and molecular neuroscience. Though it uses an invertebrate system, many findings in the worm regarding learning and memory translate to higher organisms, making this paper of interest and significant to the broader field of behavioral neuroscience.

      Reviewer #4 (Public review):

      Summary:

      In this manuscript, authors used a learning paradigm in C. elegans; when worms were fed in a saltless plate, its chemotaxis to salt is greatly reduced. To identify learning-related proteins, authors employed nervous system-specific transcriptome analysis to compare whole proteins in neurons between high-salt-fed animals and saltless-fed animals. Authors identified "learning-specific proteins" which are observed only after saltless feeding. They categorized these proteins by GO analyses, pathway analyses and expression site analyses, and further stepped forward to test mutants in selected genes identified by the proteome analysis. They find several mutants that are defective or hyper-proficient for learning, including acc-1/3 and lgc-46 acetylcholine receptors, F46H5.3 putative arginine kinase, and kin-2, a cAMP pathway gene. These mutants were not previously reported to have abnormality in the learning paradigm.

      Concerns:

      Upon revision, authors addressed all concerns of this reviewer, and the results are now presented in a way that facilitates objective evaluation. Authors' conclusions are supported by the results presented, and the strength of the proteomics approach is persuasively demonstrated.

      Thank you, we appreciate this positive feedback.

      Significance:

      (1) Total neural proteome analysis has not been conducted before for learning-induced changes, though transcriptome analysis has been performed for odor learning (Lakhina et al., http://dx.doi.org/10.1016/j.neuron.2014.12.029). This warrants the novelty of this manuscript, because for some genes, protein levels may change even though mRNA levels remain the same. Although in a few reports TurboID has been used in C. elegans, this is the first report of a systematic analysis of tissue-specific differential proteomics.

      (2) Authors found five mutants that have abnormality in the salt learning. These genes have not been described to have the abnormality, providing novel knowledge to the readers, especially those who work on C. elegans behavioural plasticity. Especially, involvement of acetylcholine neurotransmission has not been addressed before. Although transgenic rescue experiments have not been performed except kin-2, and the site of action (neurons involved) has not been tested in this manuscript, it will open the venue to further determine the way in which acetylcholine receptors, cAMP pathway etc. influences the learning process.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      The authors stated in their response to reviewers that "referring to a phenotype as both a trend and non-significant may confuse readers, which was originally stated in the manuscript in two locations," and that such sentences were removed. Unfortunately, in the new text (page 28, lines 18-19), the authors write: "uev-3 mutants showed a lower average CI after training compared with wild-type, but this did not reach statistical significance." As stated before, I find such sentences confusing and not interpretable. If the changes are not significant, then the lower average CI is not informative.

      Thank you for pointing this out. This has been corrected to improve clarity – we say instead that “trained phenotypes between wild-type and uev-3 mutants were not statistically significant” (page 29, lines 21–22).

      In response to reviewers' comments, the authors added more information about the biotinylation efficiency of the experiment, which is also described in the text:

      Page 8, line 27: "we found that biotin exposure increased the signal 1.3-fold for non-Tg and 1.7-fold for TurboID C. elegans."

      Page 10, line 4: "Quantification of the signal within entire lanes showed a 1.1-fold increase in the 'TurboID, control' lane compared with the 'non-Tg, control' lane, and a 1.9-fold increase in the 'TurboID, trained' lane compared with the 'non-Tg, trained' lane."

      Is it common in this field not to show the actual raw quantified numbers? I was expecting either a bar graph or instead that the measured values would appear in the text alongside the fold-change information.

      Table S2 (and its table legend on page 77) have been edited to include raw area values.

      Figure 5: Typo? - "pan neuronal expression of ..." The allele number is written as 139, but I believe it should be 179, as in the rest of the paper.

      The typo has been corrected on page 25.

      The results describing the absence of a learning phenotype in backcrossed C30G12.6 are presented in the main figure. If the authors believe this is an important result, I understand keeping it in the main figure; however, I find this uncommon.

      Thank you for your comment. We consider the absence of a learning phenotype in backcrossed C30G12.6 to be an important control for interpreting the original findings, which is why we have retained it in the main figure.

      Reviewer #4 (Recommendations for the authors):

      I noted a few typos.

      (1) In Fig 5B, the transgene is depicted kin-2(ce139) but it is probably kin-2(ce179).

      The typo has been corrected on page 25.

      (2) In text, R97C and ce179 are used interchangeably, but in fact there is no description that they are identical.

      We now state the following in the manuscript: “We tested worms with the ce179 mutant allele in kin-2, in which a conserved residue in the inhibitory domain (which normally functions to keep PKA turned off in the absence of cAMP) is mutated to cause an R92C amino acid change – this results in increased PKA activity (Schade et al., 2005).” (page 25, lines 1–3),

      (3) p31 line 7, Figure S7 -> Fig S9 C-E

      We apologise for this typographical error. This figure number is meant to correspond to salt associative learning assay data (Fig. S8), not salt aversive learning (Fig. S9). Since the data from Fig. S8 was moved to Fig. 4, the figure citation has been changed from Fig. S7 (which was incorrect) to Fig. 4 (page 32, line 17).

      (4) p45 line 11, Fig S9 -> Fig S6

      The typo has been corrected (page 47, line 12).

    1. eLife Assessment

      This valuable work demonstrates that M. tuberculosis protein PPE2 perturbs adipose tissue biology by modulating adipogenesis, lipolysis, and inflammatory remodeling, thereby contributing to fat loss and insulin resistance during TB. Using M. smegmatis overexpression strains, PPE2-deficient Mtb mutants, and mouse models, the study links PPE2 to downregulation of PPAR-γ, C/EBP-α, adiponectin, and broader transcriptional changes in host fatty acid metabolism. These findings convincingly highlight, for the first time, a direct role for a bacterial virulence factor in TB-associated wasting. However, despite strong associative evidence, the mechanistic basis of PPE2-mediated regulation remains unresolved.

    2. Reviewer #1 (Public review):

      Summary:

      In this manuscript, Bisht et al. investigate the role of PPE2, a Mycobacterium tuberculosis (Mtb) secreted virulence factor, in adipose tissue physiology during tuberculosis (TB) infection. Previous work by this group established the significance of PPE proteins in Mtb virulence and their role in modulating the innate immune response. Here, the authors present compelling evidence that PPE2 regulates host cell adipogenesis and lipolysis, thereby establishing a link to the development of insulin resistance during TB infection. These fundamental findings demonstrate, for the first time, that a bacterial virulence factor is directly involved in the profound body fat loss, or "wasting," which is a long-established clinical symptom of active TB.

      Key Strengths:

      The confidence in the major findings of this study is significantly strengthened by the authors' comprehensive approach. They judiciously employ multiple experimental systems, including:

      (1) Purified PPE2 protein.

      (2) A non-pathogenic Mycobacterium strain engineered to express PPE2.

      (3) A pathogenic clinical Mtb strain (CDC1551) utilizing a targeted PPE2 deletion mutant.

      (4) While the presence of Mtb in adipose tissues in human and animal models is well-documented, this study is groundbreaking in demonstrating that an Mtb virulence-associated factor actively modulates host fatty acid metabolism within the adipose tissue.

      Key Weakness:

      Although the manuscript provides solid evidence associating the presence of PPE2 with transcriptional changes in host fatty acid machinery within the adipose tissue, the underlying mechanistic details remain elusive. A focused, deep mechanistic follow-up study will be essential to fully appreciate the complex biological implications of the findings reported here.

    3. Reviewer #2 (Public review):

      Summary:

      In the manuscript entitled "The PPE2 protein of Mycobacterium tuberculosis is responsible for the development of hyperglycemia and insulin resistance during tuberculosis" the authors identify PPE2, a secretory protein of Mycobacterium tuberculosis, as a modulator of adipose function. They show that PPE2 treatment in mice causes fat loss, immune cell infiltration into adipose, reduced gene expression of PPAR-γ, C/EBP-α, and adiponectin, and glucose intolerance. Overall, the authors link PPE2 with adipose tissue perturbation and insulin resistance following infection with M. tuberculosis. PPE2, a secretory protein of Mycobacterium tuberculosis, is a modulator of adipose function. They show that PPE2 treatment in mice causes fat loss, immune cell infiltration into adipose, reduced gene expression of PPAR-γ, C/EBP-α, and adiponectin, and glucose intolerance. Overall, the authors link PPE2 with adipose tissue perturbation and insulin resistance following infection with M. tuberculosis.

      Strengths:

      While it is known that M. tuberculosis persists in adipose, the mycobacterial factors contributing to adipose dysfunction are unknown. The study uses multiple mechanisms, including recombinant purified protein, non-pathogenic mycobacterium expressing PPE2, and a clinical strain of M. tuberculosis depleted of PPE2, to show that PPE2 may play an important role in causing fat loss, lipolysis, and insulin resistance following infection. The authors show that PPE2, through unknown mechanisms, decreases gene expression of proteins involved in adipogenesis. Although the mechanisms are unclear, this study advances the field as it is the first to identify a secreted factor (PPE2) from M. tuberculosis to play a role in disrupting adipose tissue.

      Weaknesses:

      There is a lack of completeness amongst the figures that greatly diminishes the claims and impact of the manuscript. For example, in Figures 2 and 5, the authors measure adipocyte area in H&E-stained adipose tissue to show adipose hypertrophy. However, this was not completed in Figures 3 and 4 despite the authors claiming that treatment with rPPE2 induces adipose hypertrophy. It is unclear why the adipocyte area was not measured in these figures, and having this included would support the author's claim and strengthen the manuscript. The same is true for immune cell infiltration, where the authors say there is increased immune cell infiltration following PPE2 treatment. This is based on H&E staining, but the data supporting this is limited. Although the authors measure CD3+ T cell infiltration in adipose tissue from mice infected with the clinical strain where PPE was depleted, staining was performed in only this experiment. Completing these experiments by showing data to support that PPE2 induces immune cell infiltration would greatly strengthen the manuscript.

      The authors state that a Student's t-test was performed to calculate the significance between two samples. However, there is no discussion of what statistical method was used when there were more than 2 groups, which occurs throughout the manuscript, such as in Figure 5, where 4 groups are analyzed. Having the appropriate statistical analysis is important for the impact of the manuscript.

    4. Reviewer #3 (Public review):

      Summary:

      In this manuscript titled "The PPE protein of Mycobacterium tuberculosis is responsible for the development of hyperglycemia and insulin resistance during tuberculosis", Bisht et al describe that PPE2 protein from Mtb is a key modulator of adipose tissue physiology that contributes to the development of insulin resistance. The authors have used 3T3-L1 preadipocyte cell lines, M. smegmatis overexpression strain, mice model, and genetically modified Mtb deletion strains to demonstrate that PPE promotes persistence in adipose tissue and regulates glucose homeostasis. Using qPCR and RNA-seq experiments, the authors demonstrate that PPE2 regulates the expression of key genes involved in adipogenesis.

      Strengths:

      Using purified protein, the authors show that PPE2 regulates adipose tissue physiology, and this effect was neutralised in the presence of anti-PPE2. The expression of several adipogenic markers was also reduced in 3TL-1 adipocytes treated with rPPE2 and in mice infected with M. smegmatis strains overexpressing PPE2. Using a mouse model of infection, the authors show that PPE2 contributes to enhanced mycobacterial survival within fat tissues. The authors also show infiltration of immune cells in the fat tissues of mice infected with wild-type and ppe2-complemented strains compared to the ppe2 KO strain. In order to gain a better mechanistic understanding of how PPE2 regulates adipogenesis, the authors employed an RNA-seq approach and identified 191 genes that were significantly differentially expressed in the fat tissues of mice infected with wild-type and ppe2 KO Mtb strains. The differentially expressed genes included transcripts encoding for proteins involved in chemokine/cytokine signalling, ER stress response. The expression of a few of these markers was also validated by qPCR and western blot analysis. Finally, the authors also show that PPE2 promotes lipolysis by reducing phosphodiesterase levels and activating PKA-HSL signalling. The experimental design is overall reasonable, and the methods used are reliable. Overall, the current study did provide some new information on the contribution of PPE2 in regulating adipose tissue physiology.

      Weaknesses:

      (1) The authors have used several methodologies to show that PPE2 regulates adipose tissue physiology and glucose homeostasis. But the exact mechanism is still not clear.

      (2) Mtb encodes several PE/PPE proteins? The authors have used PPE2 for their study. Will secretory PPE2 homologs also regulate similar cellular processes?

      (3) How do the authors rule out that the differences observed in the fat tissues of mice infected with wild-type and mutant strains are not associated with reduced bacterial burdens? Is it possible to include another Mtb attenuated strain as a control in mice experiments for few critical experiments?

    5. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this manuscript, Bisht et al. investigate the role of PPE2, a Mycobacterium tuberculosis (Mtb) secreted virulence factor, in adipose tissue physiology during tuberculosis (TB) infection. Previous work by this group established the significance of PPE proteins in Mtb virulence and their role in modulating the innate immune response. Here, the authors present compelling evidence that PPE2 regulates host cell adipogenesis and lipolysis, thereby establishing a link to the development of insulin resistance during TB infection. These fundamental findings demonstrate, for the first time, that a bacterial virulence factor is directly involved in the profound body fat loss, or "wasting," which is a long-established clinical symptom of active TB.

      Key Strengths:

      The confidence in the major findings of this study is significantly strengthened by the authors' comprehensive approach. They judiciously employ multiple experimental systems, including:

      (1) Purified PPE2 protein.

      (2) A non-pathogenic Mycobacterium strain engineered to express PPE2.

      (3) A pathogenic clinical Mtb strain (CDC1551) utilizing a targeted PPE2 deletion mutant.

      (4) While the presence of Mtb in adipose tissues in human and animal models is well-documented, this study is groundbreaking in demonstrating that an Mtb virulence-associated factor actively modulates host fatty acid metabolism within the adipose tissue.

      We thank the reviewer for his appreciation that in this work we demonstrated for the first time that an Mtb virulent factor is directly linked to TB-associated wasting.

      Weakness:

      Although the manuscript provides solid evidence associating the presence of PPE2 with transcriptional changes in host fatty acid machinery within the adipose tissue, the underlying mechanistic details remain elusive. A focused, deep mechanistic follow-up study will be essential to fully appreciate the complex biological implications of the findings reported here.

      We agree with the reviewer that a deep-focused, mechanistic follow-up study is necessary to further elucidate the complex biological implications of PPE2 actions. However, we believe that we have uncovered at least one of the possible mechanisms by which PPE2 increases lipolysis and circulating free fatty acids during infection by targeting cAMP-PKA-HSL pathway (Figure 7). In future studies we will aim to dissect out the mechanisms by which PPE2 triggers hyperglycaemia and insulin resistance.

      Reviewer #2 (Public review):

      Summary:

      In the manuscript entitled "The PPE2 protein of Mycobacterium tuberculosis is respon,sible for the development of hyperglycemia and insulin resistance during tuberculosis" the authors identify PPE2, a secretory protein of Mycobacterium tuberculosis, as a modulator of adipose function. They show that PPE2 treatment in mice causes fat loss, immune cell infiltration into adipose, reduced gene expression of PPAR-γ, C/EBP-α, and adiponectin, and glucose intolerance. Overall, the authors link PPE2 with adipose tissue perturbation and insulin resistance following infection with M. tuberculosis. PPE2, a secretory protein of Mycobacterium tuberculosis, is a modulator of adipose function. They show that PPE2 treatment in mice causes fat loss, immune cell infiltration into adipose, reduced gene expression of PPAR-γ, C/EBP-α, and adiponectin, and glucose intolerance. Overall, the authors link PPE2 with adipose tissue perturbation and insulin resistance following infection with M. tuberculosis.

      Strengths:

      While it is known that M. tuberculosis persists in adipose, the mycobacterial factors contributing to adipose dysfunction are unknown. The study uses multiple mechanisms, including recombinant purified protein, non-pathogenic mycobacterium expressing PPE2, and a clinical strain of M. tuberculosis depleted of PPE2, to show that PPE2 may play an important role in causing fat loss, lipolysis, and insulin resistance following infection. The authors show that PPE2, through unknown mechanisms, decreases gene expression of proteins involved in adipogenesis. Although the mechanisms are unclear, this study advances the field as it is the first to identify a secreted factor (PPE2) from M. tuberculosis to play a role in disrupting adipose tissue.

      We thank the reviewer for his appreciation of our findings presented in the manuscript.

      Weaknesses:

      (1) There is a lack of completeness amongst the figures that greatly diminishes the claims and impact of the manuscript. For example, in Figures 2 and 5, the authors measure adipocyte area in H&E-stained adipose tissue to show adipose hypertrophy. However, this was not completed in Figures 3 and 4 despite the authors claiming that treatment with rPPE2 induces adipose hypertrophy. It is unclear why the adipocyte area was not measured in these figures, and having this included would support the author's claim and strengthen the manuscript. The same is true for immune cell infiltration, where the authors say there is increased immune cell infiltration following PPE2 treatment. This is based on H&E staining, but the data supporting this is limited. Although the authors measure CD3+ T cell infiltration in adipose tissue from mice infected with the clinical strain where PPE was depleted, staining was performed in only this experiment. Completing these experiments by showing data to support that PPE2 induces immune cell infiltration would greatly strengthen the manuscript.

      As per the suggestion of the esteemed reviewer, in the revised manuscript we will attempt to analyse adipocyte area in both Figures 3 and 4. In the original manuscript, immune cell infiltration analyses (H&E staining and CD3+ staining) was restricted to only M. tuberculosis-mouse infection model, which best reflects the human tuberculosis pathology.  In other experiments involving infection with M. smegmatis expressing PPE2, immune cell infiltration studies will be carried out.

      (2) The authors state that a Student's t-test was performed to calculate the significance between two samples. However, there is no discussion of what statistical method was used when there were more than 2 groups, which occurs throughout the manuscript, such as in Figure 5, where 4 groups are analyzed. Having the appropriate statistical analysis is important for the impact of the manuscript.

      We agree with the reviewer that we missed to include ANOVA in the statistical analyses. We will include one-way ANOVA analysis where more than two groups are present and mention the statistical methods in the figure legends as well in the text of the revised manuscript.

      Reviewer #3 (Public review):

      Summary:

      In this manuscript titled "The PPE protein of Mycobacterium tuberculosis is responsible for the development of hyperglycemia and insulin resistance during tuberculosis", Bisht et al describe that PPE2 protein from Mtb is a key modulator of adipose tissue physiology that contributes to the development of insulin resistance. The authors have used 3T3-L1 preadipocyte cell lines, M. smegmatis overexpression strain, mice model, and genetically modified Mtb deletion strains to demonstrate that PPE promotes persistence in adipose tissue and regulates glucose homeostasis. Using qPCR and RNA-seq experiments, the authors demonstrate that PPE2 regulates the expression of key genes involved in adipogenesis.

      Strengths:

      Using purified protein, the authors show that PPE2 regulates adipose tissue physiology, and this effect was neutralised in the presence of anti-PPE2. The expression of several adipogenic markers was also reduced in 3TL-1 adipocytes treated with rPPE2 and in mice infected with M. smegmatis strains overexpressing PPE2. Using a mouse model of infection, the authors show that PPE2 contributes to enhanced mycobacterial survival within fat tissues. The authors also show infiltration of immune cells in the fat tissues of mice infected with wild-type and ppe2-complemented strains compared to the ppe2 KO strain. In order to gain a better mechanistic understanding of how PPE2 regulates adipogenesis, the authors employed an RNA-seq approach and identified 191 genes that were significantly differentially expressed in the fat tissues of mice infected with wild-type and ppe2 KO Mtb strains. The differentially expressed genes included transcripts encoding for proteins involved in chemokine/cytokine signalling, ER stress response. The expression of a few of these markers was also validated by qPCR and western blot analysis. Finally, the authors also show that PPE2 promotes lipolysis by reducing phosphodiesterase levels and activating PKA-HSL signalling. The experimental design is overall reasonable, and the methods used are reliable. Overall, the current study did provide some new information on the contribution of PPE2 in regulating adipose tissue physiology.

      We thank the reviewer for encouraging comments about the manuscript.

      Weaknesses:

      (1) The authors have used several methodologies to show that PPE2 regulates adipose tissue physiology and glucose homeostasis. But the exact mechanism is still not clear.

      We have clearly demonstrated that PPE2 inhibit PPAR-γ and C/EBP-α expression to block adipogenic differentiation. Further, we demonstrated a possible mechanism by which PPE2 trigger lipolysis via activation of the ER stress and cAMP/PKA/HSL pathway which is responsible for increasing free fatty acids in circulation (Figure 7) as confirmed by our observation that PPE2KO (ppe2 knock-out) Mtb infected mice had lower NEFA as compared to the those infected with wild-type Mtb (Figure 7F). Crucially, we showed that this mechanism is clinically relevant since NEFA levels in the sera of TB patients were higher as compared to the healthy controls (Figure 7G) confirming presence of dyslipidemia in TB patients which is an established risk factor for insulin resistance (Karpe et al., 2011; Bhattacharya et al., 2007), As increased free fatty acids have been shown to be linked to development of insulin resistance in several studies, this mechanism links PPE2 with the regulation of glucose homeostasis.

      (2) Mtb encodes several PE/PPE proteins? The authors have used PPE2 for their study. Will secretory PPE2 homologs also regulate similar cellular processes?

      It is known that Mtb encodes several PE/PPE family proteins and some of these have been implicated to play a role in host–pathogen interactions (Mukhopadhyay and Balaji, 2011; Dahiya et al., 2025). However, so far only PPE2 is shown to be present in the circulation (Bisht et al., 2023) which is the main reason we chose it for this study. Presence of PPE2 homologues in the circulation is not known so far.

      (3) How do the authors rule out that the differences observed in the fat tissues of mice infected with wild-type and mutant strains are not associated with reduced bacterial burdens? Is it possible to include another Mtb attenuated strain as a control in mice experiments for few critical experiments?

      We agree with the reviewer that the differences in bacterial burden can influence host tissue responses.  Precisely for this reason, we did not rely on just one infection model alone. We used a multi-pronged approach to de-couple the effects of PPE2 from the effects of bacterial load, like;

      (1) In vitro Model using recombinantly purified PPE2 protein (rPPE2) (Figure 1): In cultured 3T3-L1 adipocytes, purified rPPE2 protein directly inhibited adipogenesis by downregulating important factors like PPAR-g,C/EBP-α and Fatty acid synthase (which play a critical role in triglyceride metabolism) demonstrating a direct effect of PPE2 in the complete absence of infection.

      (2) Recombinant Protein Injection (Figure 3): By injecting recombinantly purified PPE2 protein (rPPE2) into mice, we observed similar metabolic perturbations (fat loss, impaired glucose tolerance) in the complete absence of any bacteria, demonstrating that PPE2 can drive these phenotypes independent of bacterial burden. Further study of rescuing of PPE2 action in rPPE2-immunized mice strongly confirm the specific role of PPE2 in establishing hyperglycaemia and insulin resistance (Figure 4).

      While the Mtb aerosol model can be questioned for bacterial load effects, it provides crucial in vivo validation that PPE2 function is relevant in the context of mycobacterial infection.

      References

      Bhattacharya S, Dey D, Roy SS. Molecular mechanism of insulin resistance. J Biosci. 2007 Mar;32(2):405-13. doi: 10.1007/s12038-007-0038-8. PMID: 17435330.

      Bisht MK, Pal R, Dahiya P, Naz S, Sanyal P, Nandicoori VK, Ghosh S, Mukhopadhyay S. The PPE2 protein of Mycobacterium tuberculosis is secreted during infection and facilitates mycobacterial survival inside the host. Tuberculosis (Edinb). 2023 Dec;143:102421. doi: 10.1016/j.tube.2023.102421. Epub 2023 Oct 12. PMID: 37879126.

      Dahiya P, Bisht MK, Mukhopadhyay S. Role of PE family of proteins in mycobacterial virulence: Potential on anti-TB vaccine and drug design. Int Rev Immunol. 2025; 44(4):213-228. doi: 10.1080/08830185.2025.2455161. Epub 2025 Jan 31. PMID: 39889764.

      Karpe F, Dickmann JR, Frayn KN. Fatty acids, obesity, and insulin resistance: time for a reevaluation. Diabetes. 2011 Oct;60(10):2441-9. doi: 10.2337/db11-0425. PMID: 21948998; PMCID: PMC3178283.

      Mukhopadhyay S, Balaji KN. The PE and PPE proteins of Mycobacterium tuberculosis. Tuberculosis (Edinb). 2011 Sep;91(5):441-7. doi: 10.1016/j.tube.2011.04.004. Epub 2011 May 6. PMID: 21527209.

    1. eLife Assessment

      Combining connectomics, optogenetics, behavioral analysis and modeling, this study delivers important findings on the role of inhibitory neurons in the generation of leg grooming movements in Drosophila. The results include convincing evidence that the identified neuronal populations are key in the generation of rhythmic leg movements, structured in distinct polysynaptic pathways articulating inhibition and disinhibition of antagonistic sets of motor neurons, as mapped from an electron microscopy volume of the ventral nerve cord, which orchestrate an alternation of flexion and extension. By analyzing limb kinematics upon experimentally silencing specific populations of premotor inhibitory neurons, together with computational modelling, the potential role of these neurons in rhythmic leg movement is shown. This work will be of interest to neuroscientists working in motor control and limbed locomotion.

    2. Reviewer #1 (Public review):

      Summary:

      Syed et al. investigate the circuit underpinnings for leg grooming in the fruit fly. They identify two populations of local interneurons in the right front leg neuromere of ventral nerve cord, i.e. 62 13A neurons and 64 13B neurons. Hierarchical clustering analysis identifies each 10 morphological classes for both populations. Connectome analysis reveals their circuit interactions: these GABAergic interneurons provide synaptic inhibition either between the two subpopulations, i.e. 13B onto 13A, or among each other, i.e. 13As onto other 13As, and/or onto leg motoneurons, i.e. 13As and 13Bs onto leg motoneurons. Interestingly, 13A interneurons fall into two categories with one providing inhibition onto a broad group of motoneurons, being called "generalists", while others project to few motoneurons only, being called "specialists". Optogenetic activation and silencing of both subsets strongly effects leg grooming. As well activating or silencing subpopulations, i.e. 3 to 6 elements of the 13A and 13B groups has marked effects on leg grooming, including frequency and joint positions and even interrupting leg grooming. The authors present a computational model with the four circuit motifs found, i.e. feed-forward inhibition, disinhibition, reciprocal inhibition and redundant inhibition. This model can reproduce relevant aspects of the grooming behavior.

      Strengths:

      The authors succeeded in providing evidence for neural circuits interacting by means of synaptic inhibition to play an important role in the generation of a fast rhythmic insect motor behavior, i.e. grooming of the body using legs. Two populations of local interneurons in the fruit fly VNC comprise four inhibitory circuit motifs of neural action and interaction: feed-forward inhibition, disinhibition, reciprocal inhibition and redundant inhibition. Connectome analysis identifies the similarities and differences between individual members of the two interneuron populations. Modulating the activity of small subsets of these interneuron populations markedly affects generation of grooming behavior thereby exemplifying their relevance. The authors carefully discuss strengths and limitations of their approaches and place their findings into the broader context of motor control.

      Weaknesses:

      Effects of modulating activity in the interneuron populations by means of optogenetics were conducted in the so-called "closed-loop" condition. This does not allow to differentiate between direct and secondary effects of the experimental modification in neural activity, as feedforward and feedback effects cannot be disentangled. To do so open loop experiments, e.g. in deafferented conditions, would be needed. Given that many members of the two populations of interneurons do not show one, but two or more circuit motifs, it remains to be disentangled which role the individual circuit motif plays in the generation of the motor behavior in intact animals.

      Comments on revisions:

      The authors have carefully revised the manuscript. I have no further suggestions or criticisms.

    3. Reviewer #3 (Public review):

      Summary:

      The authors set out to determine how GABAergic inhibitory premotor circuits contribute to the rhythmic alternation of leg flexion and extension during Drosophila grooming. To do this, they first mapped the ~120 13A and 13B hemilineage inhibitory neurons in the prothoracic segment of the VNC and clustered them by morphology and synaptic partners. They then tested the contribution of these cells to flexion and extension using optogenetic activation and inhibition and kinematic analyses of limb joints. Finally, they produced a computational model representing an abstract version of the circuit to determine how the connectivity identified in EM might relate to functional output. The study makes important contributions to the literature.

      The authors have identified an interesting question and use a strong set of complementary tools to address it:

      They analysed serial‐section TEM data to obtain reconstructions of every 13A and 13B neuron in the prothoracic segment. They manually proofread over 60 13A neurons and 64 13B neurons, then used automated synapse detection to build detailed connectivity maps and cluster neurons into functional motifs.

      They used optogenetic tools with a range of genetic driver lines in freely behaving flies to test the contribution of subsets of 13A and 13B neurons.

      They used a connectome-constrained computational model to determine how the mapped connectivity relates to the rhythmic output of the behavior.

      Comments on revisions:

      I appreciate that the authors have updated the GitHub repository to include the model and analysis code. Still lacking is: for the authors to explicitly separate empirical findings from modelling inferences in the text, and a supplemental table to make it clear which cell types are included. I should also point out that the code lacks annotations necessary for the results to be reproduced and the model to be reused.

    4. Author response:

      The following is the authors’ response to the previous reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Syed et al. investigate the circuit underpinnings for leg grooming in the fruit fly. They identify two populations of local interneurons in the right front leg neuromere of ventral nerve cord, i.e. 62 13A neurons and 64 13B neurons. Hierarchical clustering analysis identifies each 10 morphological classes for both populations. Connectome analysis reveals their circuit interactions: these GABAergic interneurons provide synaptic inhibition either between the two subpopulations, i.e. 13B onto 13A, or among each other, i.e. 13As onto other 13As, and/or onto leg motoneurons, i.e. 13As and 13Bs onto leg motoneurons. Interestingly, 13A interneurons fall into two categories with one providing inhibition onto a broad group of motoneurons, being called "generalists", while others project to few motoneurons only, being called "specialists". Optogenetic activation and silencing of both subsets strongly effects leg grooming. As well activating or silencing subpopulations, i.e. 3 to 6 elements of the 13A and 13B groups has marked effects on leg grooming, including frequency and joint positions and even interrupting leg grooming. The authors present a computational model with the four circuit motifs found, i.e. feed-forward inhibition, disinhibition, reciprocal inhibition and redundant inhibition. This model can reproduce relevant aspects of the grooming behavior.

      Strengths:

      The authors succeeded in providing evidence for neural circuits interacting by means of synaptic inhibition to play an important role in the generation of a fast rhythmic insect motor behavior, i.e. grooming. Two populations of local interneurons in the fruit fly VNC comprise four inhibitory circuit motifs of neural action and interaction: feed-forward inhibition, disinhibition, reciprocal inhibition and redundant inhibition. Connectome analysis identifies the similarities and differences between individual members of the two interneuron populations. Modulating the activity of small subsets of these interneuron populations markedly affects generation of the motor behavior thereby exemplifying their important role for generating grooming. The authors carefully discuss strengths and limitations of their approaches and place their findings into the broader context of motor control.

      We thank the reviewer for their thoughtful and constructive evaluation of our work.

      Weaknesses:

      Effects of modulating activity in the interneuron populations by means of optogenetics were conducted in the so-called closed-loop condition. This does not allow to differentiate between direct and secondary effects of the experimental modification in neural activity, as feedforward and feedback effects cannot be disentangled. To do so open loop experiments, e.g. in deafferented conditions, would be important. Given that many members of the two populations of interneurons do not show one, but two or more circuit motifs, it remains to be disentangled which role the individual circuit motif plays in the generation of the motor behavior in intact animals.

      Our optogenetic experiments show a role for 13A/B neurons in grooming leg movements – in an intact sensorimotor system - but we cannot yet differentiate between central and reafferent contributions. Activation of 13As or 13Bs disinhibits motor neurons and that is sufficient to induce walking/grooming. Therefore, we can show a role for the disinhibition motif.

      Proprioceptive feedback from leg movements could certainly affect the function of these reciprocal inhibition circuits. Given the synapses we observe between leg proprioceptors and 13A neurons, we think this is likely.

      Our previous work (Ravbar et al 2021) showed that grooming rhythms in dusted flies persist when sensory feedback is reduced, indicating that central control is possible. In those experiments, we used dust to stimulate grooming and optogenetic manipulation to broadly silence sensory feedback. We cannot do the same here because we do not yet have reagents to separately activate sparse subsets of inhibitory neurons while silencing specific proprioceptive neurons. More importantly, globally silencing proprioceptors would produce pleiotropic effects and severely impair baseline coordination, making it difficult to distinguish whether observed changes reflect disrupted rhythm generation or secondary consequences of impaired sensory input. Therefore, the reviewer is correct – we do not know whether the effects we observe are feedforward (central), feedback sensory, or both. We have included this in the revised results and discussion section to describe these possibilities and the limits of our current findings.

      Additionally, we have used a computational model to test the role of each motif separately and we show that in the results.  

      Comments on revisions:

      The careful revision of the manuscript improved the clarity of presentation substantially.

      Reviewer #2 (Public review):

      Summary:

      This manuscript by Syed et al. presents a detailed investigation of inhibitory interneurons, specifically from the 13A and 13B hemilineages, which contribute to the generation of rhythmic leg movements underlying grooming behavior in Drosophila. After performing a detailed connectomic analysis, which offers novel insights into the organization of premotor inhibitory circuits, the authors build on this anatomical framework by performing optogenetic perturbation experiments to functionally test predictions derived from the connectome. Finally, they integrate these findings into a computational model that links anatomical connectivity with behavior, offering a systems-level view of how inhibitory circuits may contribute to grooming pattern generation.

      Strengths:

      (1) Performing an extensive and detailed connectomic analysis, which offers novel insights into the organization of premotor inhibitory circuits.

      (2) Making sense of the largely uncharacterized 13A/13B nerve cord circuitry by combining connectomics and optogenetics is very impressive and will lay the foundation for future experiments in this field.

      (3) Testing the predictions from experiments using a simplified and elegant model.

      Thank you for the positive assessment of our work.

      Weaknesses:

      (1) In Figure 4-figure supplement 1, the inclusion of walking assays in dusted flies is problematic, as these flies are already strongly biased toward grooming behavior and rarely walk. To assess how 13A neuron activation influences walking, such experiments should be conducted in undusted flies under baseline locomotor conditions.

      We agree that there are better ways to assay potential contributions of 13A/13B neurons to walking. We intended to focus on how normal activity in these inhibitory neurons affects coordination during grooming, and we included walking because we observed it in our optogenetic experiments and because it also involves rhythmic leg movements. The walking data is reported in a supplementary figure because we think this merits further study with assays designed to quantify walking specifically. We will make these goals clearer in the revised manuscript and we are happy to share our reagents with other research groups more equipped to analyze walking differences.

      (2) Regarding Fig 5: The 70ms on/off stimulation with a slow opsin seems problematic. CsChrimson off kinetics are slow and unlikely to cause actual activity changes in the desired neurons with the temporal precision the authors are suggesting they get. Regardless, it is amazing the authors get the behavior! It would still be important for authors to mention the optogentics caveat, and potentially supplement the data with stimulation at different frequencies, or using faster opsins like ChrimsonR.

      We were also intrigued by the behavioral consequences of activating these inhibitory neurons with CsChrimson. We appreciate the reviewer’s point that CsChrimson’s slow off-kinetics limit precise temporal control. To address this, we repeated our frequency analysis using a range of pulse durations (10/10, 50/50, 70/70, 110/110, and 120/120 ms on/off) and compared the mean frequency of proximal joint extension/flexion cycles across conditions. We found no significant difference in frequency (LLMS, p > 0.05), suggesting that the observed grooming rhythm is not dictated by pulse period but instead reflects an intrinsic property of the premotor circuit once activated. We now include these results in ‘Figure 5—figure supplement 1’ and clarify in the text that we interpret pulsed activation as triggering, rather than precisely pacing, the endogenous grooming rhythm. We continue to note in the manuscript that CsChrimson’s slow off-kinetics may limit temporal precision. We will try ChrimsonR in future experiments.

      Overall, I think the strengths outweigh the weaknesses, and I consider this a timely and comprehensive addition to the field.

      Reviewer #3 (Public review):

      Summary:

      The authors set out to determine how GABAergic inhibitory premotor circuits contribute to the rhythmic alternation of leg flexion and extension during Drosophila grooming. To do this, they first mapped the ~120 13A and 13B hemilineage inhibitory neurons in the prothoracic segment of the VNC and clustered them by morphology and synaptic partners. They then tested the contribution of these cells to flexion and extension using optogenetic activation and inhibition and kinematic analyses of limb joints. Finally, they produced a computational model representing an abstract version of the circuit to determine how the connectivity identified in EM might relate to functional output. The study makes important contributions to the literature.

      The authors have identified an interesting question and use a strong set of complementary tools to address it:

      They analysed serial‐section TEM data to obtain reconstructions of every 13A and 13B neuron in the prothoracic segment. They manually proofread over 60 13A neurons and 64 13B neurons, then used automated synapse detection to build detailed connectivity maps and cluster neurons into functional motifs.

      They used optogenetic tools with a range of genetic driver lines in freely behaving flies to test the contribution of subsets of 13A and 13B neurons.

      They used a connectome-constrained computational model to determine how the mapped connectivity relates to the rhythmic output of the behavior.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      I still have the following specific suggestions and questions, which need the attention of the authors:

      P5, 2nd para, li 1: shouldn't "(Figures 1E and 1E')" be (Figures 1G and 1H)?

      P7, last para, li 3: shouldn't "(Figures 2C and 2D)" be (Figures 2A and 2B)?

      P19, para 2, last 2li: "...we observe that optogenetic activation......triggers grooming movements." I could not find the place in the text or a figure, where this was reported or shown. Please specify

      P19, last para: "... shows that 13A neurons can generate rhyhtmic movements....." Given that the experiments were conducted in closed-loop, i.e. including the loop through the leg and its movements, the following formulation appears more justified: "....shows that 13A neurons significantly contribute to the generation of rhythmic movements,....."

      P28, para 1, li 3 from bottom: "...themselves, rather than solely between antagonistsic motor neurons." While the authors are correct that in the stick insect and locust alternating inhibitory synaptic drive to flexor and extensor motoneurons has been shown to underly alternating activity of these two antagonistic motoneuron pools the previous studies have not shown or claimed that these synaptic inputs arise from direct interactions between these motoneuron pools. Based on this this text should be moved to the part "feed-forward inhibition" on page 27.

      P28: "redundant inhibition": this motif has been shown to be instrumental in the locust flight CPG, e.g. Robertson & Pearson, 1985, Fig. 16.

      P28: "reciprocal inhibition" The reviewer agrees with the authors that this motif has been shown for the mouse spinal cord, but also for other CPGs in vertebrates and invertebrates, e.g. clione, leech, xenopus - see the initial comment "(3) Intro and Discussion"

      Thank you, we have incorporated the suggested corrections and clarifications into the revised manuscript.

      Reviewer #2 (Recommendations for the authors):

      I'm satisfied with the revised version

      Reviewer #3 (Recommendations for the authors):

      The authors have made a substantial effort to address my original points. They corrected the title, expanded Discussion and Methods sections, reran statistical tests using mixed models, added modelling clarifications and constraints, and fixed or removed confusing figure panels. Those changes have improved clarity and reduced some of the claims that I thought were exaggerated.

      That said, some of my concerns remain only partially addressed, which could be fixed with relatively small tweaks. The authors should:

      (1) Explicitly separate empirical findings from modelling inferences throughout the manuscript, including the Abstract, Results and Discussion (i.e., label claims of "intrinsic rhythmogenesis" as model-based inferences, not direct experimental demonstrations)

      (2) Provide supplemental information on modelling to quantify the role of the black-box input (e.g., quantitative coordination/phase/frequency metrics for full model vs constant-input vs no black box), show pre- vs post-fine-tuning weight changes and the exact tuning constraints/optimization details (I could not find these details)

      (3) To ensure results are reproducible, provide a supplemental table mapping each split line to EM-identified neuron(s) with NBLAST/morphological scores for each match;

      (4) Fully document the statistical models (exact LMM/GLMM formulas, software/packages, etc);

      (5) Deposit model code, trained weights and analysis scripts in a public repository.

      We have updated the GitHub repository with the full statistical analysis documentation and model code, including trained weights and scripts.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1:

      (1) As such amount of work has been put into developing this community tool, it would be worth thinking about how it could serve other multiplex-immunofluorescence methods (such as immunoSABER, 4i, etc). Adding an extra tab where the particular method that uses those reagents is mentioned. This would also help as IBEX itself and related methods evolve in the future.

      We agree and currently support six other methods beyond the original ”IBEX2D Manual”, with the most generic being ”Multiplexed 2D Imaging”: standard, single cycle (non-iterative) imaging method applied to thin, 2D (5-30 micron) tissue sections. Descriptions of supported methods are given in the reagent glossary. We plan to evolve to include multiplex IF methods such as Immuno-SABER, 4i, Cell DIVE, etc. The current structure of the reagent resources table can support other immunofluorescence methods without modifications. The table contains information for IBEX and related methods. The particular method for which a reagent validation was evaluated is specified in the column titled ”Method”. Descriptions of supported methods are given in the reagent glossary.

      (2) It has a rather minimal description of the software. In particular, there is software that has not been developed for IBEX specifically but that could be used for IBEX datasets (ASHLAR, WSIReg, VALIS, WARPY, and QuPath, etc). It would be nice if there was mention of those.

      ASHLAR, WSIReg, VALIS, and Warpy have been added to the Knowledge-Base. These software components are specifically relevant for iterative imaging protocols which require image alignment. With respect to QuPath, Fiji, Napari and other general microscopy image analysis frameworks, these are not listed. Such frameworks provide a wide range of operations relevant for many microscopy image analysis tasks and are likely already familiar to researchers who are interested in the information contained in the Knowledge-Base.

      (3) There is a concern about how the negative data information will be added, as no publication or peer-review process can back it up. Perhaps the particular conditions of the experiment should be very well described to allow future users to assess the validity.

      We agree with this observation and have added the following language to the contribute page:

      ”When reporting information that has not appeared in a peer-reviewed publication, both negative and positive results, include more details with respect to experimental conditions and provide sample images as part of the supporting material files. In all cases, peer reviewed or not, we encourage providing additional details in the supporting material that you deem important and are not part of the csv file structure. These include, but are not limited to, lot numbers, versioned protocols used in the work, and any other information which will facilitate validation reproducibility.”

      (4) The proposed scheme where a reagent can be validated or recommended against by up to 4 different labs should be good. It may be good to make sure that researchers who validate belong to different labs and are not only different ORCID that belong to the same group. Similar to making a case of recommendations against a reagent.

      We generally support this recommendation. Based on our experience, even members within the same laboratory encounter challenges when attempting to validate reagents contributed by current or former colleagues. Additionally, research labs often experience significant personnel turnover, with minimal overlap over a five year span.

      To address these concerns, we have updated the instructions on the contribute page as follows: ”We only accept up to 5 ORCID additions in the Agree or Disagree columns. This means that the original contributor’s work was replicated by up to 4 individuals or refuted by up to 5 people. Priority is given to contributions from individuals in laboratories distinct from the original source.”

      (5) It is very interesting to keep track of the protocol versions used. Perhaps users should be able to validate independent versions and it will be important to know how information is kept.

      Thank you for your suggestion. We encourage members of the community to cite the latest version of the Knowledge-Base in the “Citing the Knowledge-Base” section.

      (6) The final point I would make is that the need to form a GitHub repository may deter some people from submitting data. For sporadic contributions, authors could think that users could either reach out to main developers and/or provide a submission form that can help less experienced users of command-line and GitHub programming, but still promote the contribution from the community.

      We have given this significant thought and now support a secondary path for contributing that does not require familiarity with git or GitHub. This path involves downloading a zip file, modifying the contents of the csv files and providing supporting material text files and images. Once the work is completed, the contributor contacts the Knowledge-Base maintainers and we complete the submission together, with the maintainers dealing with the usage of git and GitHub. This information has been added to the notes which are listed at the top of the Contribute page. We have recently completed the first contribution that followed this new workflow.

      We still encourage researchers to familiarize themselves with git and the GitHub repository hosting service. These tools have been shown to be useful for collaborative and reproducible laboratory research.

      Reviewer #2:

      (1) The potential impact of IBEX KB is very clear. However, the paper would benefit by also discussing more on KB maintenance and outreach, and how higher participation could be incentivized.

      We have added the following details to the discussion:

      The KB is actively maintained by its chairs, who meet bi-weekly to ensure its continued development and maintenance. In addition to these regular meetings, we engage with both current and prospective community members to gather feedback, encourage contributions, and expand the collective knowledge supporting the KB. To broaden outreach and foster sustained engagement, the IBEX community will collaborate with synergistic initiatives such as the HuBMAP Affinity Reagents Working Group, the European Society for Spatial Biology (ESSB), and the Global Alliance for Spatial Technologies (GESTALT).

      As a further incentive for participation, we intend to launch an annual “Reagent Validation Week”, a community driven event inspired by software hackathons. During this dedicated week, researchers would focus on validating or reproducing validation for selected reagents and contribute their findings to the KB. We have also discussed hosting an “Around the World” symposium, featuring presentations from both junior and senior scientists across the community, to showcase diverse perspectives and foster global collaboration.

      (2) Use of resources like GitHub may limit engagement from non-coding members of the scientific community. Will there be alternative options like a user-friendly web interface to contribute more easily?

      We agree with this observation and have addressed it. Please see detailed response to point 6 from Reviewer 1.

      Reviewer #3:

      (1) IBEX is a specific immunofluorescence method. However, the utility of the Knowledge base is not limited to the specific IBEX method. Therefore, I suggest removing the unnecessary branding of the term IBEX from the KB and citing potentially other similar cyclic immunofluorescence methods in the manuscript (e.g. CycIF Lin et al 2018). This would also emphasize the wider impact and applicability of the KB to the wider imaging community.

      For now, we have decided to keep the original reference to the IBEX method in the resource name and re-brand it in the next development phase. In that phase we intend to solicit reagent validations for methods unrelated to IBEX. We have added the reference to the CycIF publication. The manuscript text now reads: “We are optimistic that future versions will include extension of the IBEX method to other tissues and species and we intend to solicit contributions of reagent validations for other multiplexed imaging techniques such as CycIF Lin et al. (2015). At that point in time we expect to re-brand the KB as the IBEX++ Knowledge-Base...”

      (2) I believe reporting negative results with reagents is highly valuable. However, the way to report antibodies must include more details. To ensure data quality, every report should be linked to a specific protocol + images (or doc with the standard document variations, and sample information. This should be a mandatory requirement.

      We agree that this information is desirable, but we do not agree that it should be mandatory. In the contribution instructions we now explicitly list lot numbers and versioned protocols as examples of details that we encourage contributors to include in their supporting material files. We believe that requiring this information for a contribution sets the bar too high and will deter many from contributing information that can benefit others.

      (3) While cross-validation among researchers is beneficial, even if five individuals fail to reproduce results with a given antibody, their findings may be influenced by techniquespecific factors. It is also important to consider whether these researchers come from the same group, institution, or geographical region, as this could impact reproducibility. Additionally, entries that have not been reproduced at least five times using the same protocol should still be considered valuable information. To address this, an ”insufficient validation data” flag could be implemented, ensuring that incomplete but useful findings remain accessible.

      The contribution instructions now state that ”Priority is given to contributions from individuals in laboratories distinct from the original source”.

      While our goal is to support reproducing reagent validations, we do not expect these type of contributions be the rule as the only incentive we can provide to encourage this behavior is co-authorship on the authoritative dataset. As a result, it is likely that many of the validations will have a single endorser, the original contributor. These results are valuable information and we do not think they should be singled out (insufficient validation label). We leave it up to the users of the KB to decide whether they trust recommendations with multiple endorsers or if endorsement by a single highly trusted contributor is sufficient for them. In all cases, issues with contributions can be rasied and discussed on the KB discussion forum.

      The rationale for limiting the number of reproduction studies to five was that this is a minimal, yet sufficiently large, number that provides confidence in the results. Placing an upper limit ensures that researchers do not provide reproduction results for widely used and well established reagents just because these results are readily available to them.

      (4) This system could flag reagents with inconsistent reports, highlight potential techniquespecific issues, and suggest alternative reagents with stronger validation records. Furthermore, a validation confidence ranking could be introduced, taking into account the number of independent confirmations, protocol consistency, and reproducibility data. These measures would help refine the reporting process while maintaining transparency and scientific rigor.

      We agree that the functionality described here is desirable, but this is not part of the KB. At its core the KB is a dataset and we do not envision developing dedicated tools to perform these tasks. Instead, we foresee using the KB as context for interacting with AI agents. Providing the KB as context to an AI, one can currently use it to answer domain specific questions and perform related tasks such as designing imaging panels (under subject matter expert supervision). This was added to the sample usecases in the manuscript with a transcript from interaction with an AI model using the website as context provided as supplemental material.

      (5) Regarding image formats for results reporting, while JPG files are convenient due to their small size, TIFF files offer significant advantages, such as preserving metadata and maintaining the integrity of real data values. Proper signal adjustments may not always be applied by researchers, making TIFF crucial for accurate data analysis. I suggest in this regard making available the possibility of including a link to the original TIFF data

      The goal of the supporting material image is similar to that of an image used in a manuscript and it should not be used for data analysis purposes. This is the reason we chose the JPG format. Sharing these images is not intended to be a substitute for publicly sharing the original images and their associated metadata. This is now noted in the contributing instructions.

      (6) Homepage:

      Include a brief summary of the knowledge base’s purpose and tabs to provide clarity for new users. The current homepage is a bit misleading for newcomers.

      The homepage has been modified to include information about the Knowledge-Base, contents and how to use it including as context for interaction with AI agents.

      (7) Reagent Resources Section: Enable users to search for a target name directly, rather than filtering through dropdown options.

      The dropdown menu explicitly shows all available targets and also allows for direct search of target name. To use it for direct search, once the dropdown is selected start typing the name of the target and the focus will jump to it. Thus, if looking for ”Zrf1” there is no need to scroll through all targets in the dropdown. This also facilitates easy clearing of a filter, select the dropdown and start typing the word ”clear”, then press enter when it is highlighted. This information has been added to the page.

      Provide an option to download the dataset as a CSV file. This feature will be highly valued by non-computational researchers.

      Links to download the reagent resources csv file and the whole Knowledge-Base have been added.

      Add the same column documentation here as in the contributor instructions. For example, you need to make clear the distinctions between ”Recommend,” ”Agree,” and ”Disagree” ratings, as they may be misleading to those who have not visited the rules to contribute.

      A link to the column documentation in the contributor instructions has been added here. Information on the website is displayed in one location and linked as needed. Duplicated display of information creates uncertainty for users and results in more complex instructions when referring to the information.

      Include additional details in the dataset, such as lot numbers, or the date of the contribution, that could be relevant in different settings.

      Please see response to point 2.

      (8) Data & Software Section:

      Add filtering options in the table based on organism and tissue availability

      This data is not encoded in the available information in an independent manner so we do not directly enable filtering. It is usually included in the ”Details” free form text. This text is duplicated from the original dataset descriptions. One can still search this page using the browsers search functionality to achieve behavior similar to filtering. While the ”Details” text may not be visible due to the usage of the accordion user interface, it is still searchable and will automatically expand when the search text is found under the collapsed accordion button.

      (9) Contributor Section:

      Incorporate figures from the manuscript to make it more visual and improve understanding of rules and standards.

      Figure 4 from the manuscript was added to this page.

      I believe reporting negative results with reagents is highly valuable. However, to ensure data quality, every report should be linked to a specific protocol and sample information. This should be a mandatory requirement. To streamline the process, warnings for certain reagents could be implemented, but a reagent should not be outright labeled as ineffective without proper validation.

      Please see response to point 2.

      Cross-validation among researchers is beneficial, but even if five individuals fail to reproduce results with a given antibody, it may still be due to technique-specific factorsparticularly for non-routine antibodies.

      We agree with this observation and have modified the contribution instructions accordingly:

      When overturning previously reported results, the number of ORCIDs in the Disagree column becomes greater than those in the Agree column, we will open the contribution for public discussion on the Knowledge-Base forum before accepting it.

      The intent is to increase the community’s confidence in the results, particularly when dealing with non-routine antibodies. This allows the original contributor and other members of the community to engage with the researchers who were unable to replicate a specific validation, possibly helping them to replicate the original results by adding missing details to the KB, or explicitly identifying and documenting issues with the original work.

      Regarding image formats, JPG files are convenient due to their small size, but TIFF offers significant advantages, such as preserving metadata and maintaining the integrity of real data values. Proper signal adjustments may not always be applied by researchers, making TIFF crucial for accurate data analysis.

      Please see response to point 5.

    2. eLife Assessment

      The IBEX Knowledge-Base is a fundamental tool that will enhance scientific collaboration by providing a centralized, community-driven resource for immunofluorescence imaging and reagent validation. Its detailed use cases, open-source design, and transparent reporting offer exceptional evidence of its broad utility and impact in the life sciences. It is now up to the community to contribute to its growth. Overall, the resource sets a high standard as a blueprint for future community initiatives in reproducibility and standardization.

    3. Reviewer #1 (Public review):

      IBEX Knowledge Database

      Here, Yanid Z. and colleagues present the IBEX knowledge base. A community tool developed to centralize knowledge and help its adoption by more users. Authors have done a fantastic job, and there is careful consideration of the many aspects of the data management and FAIR principles. The manuscript needs no further work, as it is very well written and have detailed descriptions for data contribution as well as describing the KB itself. Overall, it is a great initiative, especially the aim to inform about negative data and non-recommended reagents, which will positively affect the user community and scientific reproducibility.

      This initiative will serve as a groundwork to include technical details of other multiple immunofluoresecence methods (such as immunoSABER, 4i, etc). Including other methods would help the knowledge base itself and related methods to evolve and assist their communities in the future.

      Significant care has been taken to allow the report of negative data. While there might be limitations as to how this information is included, transparency and community usage will ensure the knowledge base offers a fair representation.

      There are two ways to contribute to the knowledge base. While authors have contributed significantly to its creation, it will be the role of the maintainers to assist potential users and contributors. It is specially appreciated that a path to contribute is possible with no coding skills. I am keen to see how the KB evolves and it helps disseminate the use of this and other great techniques.

    4. Reviewer #2 (Public review):

      Summary:

      The paper introduces the IBEX Knowledge-Base (KB), a shared online resource designed to help scientists working with immunofluorescence imaging. It acts as a central hub where researchers can find and share information about reagents, protocols, and imaging methods. The KB is not static like traditional publications; instead, it evolves as researchers contribute new findings and refinements. A key highlight is that it includes results of both successful and unsuccessful experiments, helping scientists avoid repeating failed experiments and saving time and resources. The platform is built on open-access tools ensuring that the information remains available to everyone. Overall, the KB aims to collaboratively accelerate research, improve reproducibility, and reduce wasted effort in imaging experiments.

      Strengths:

      (1) The IBEX KB is built entirely on open-source tools, ensuring accessibility and long-term sustainability. This approach aligns with FAIR data principles and ensures that the KB remains adaptable to future advancements.

      (2) The KB also follows strict data organization standards, ensuring that all information about reagents and protocols is clearly documented and easy to find with little ambiguity.

      (3) The KB allows scientists to report both positive and negative results, reducing duplication of effort and speeds up the research process.

      (4) The KB is helpful for all researchers, but even more so for scientists in resource-limited settings. It provides guidance on finding affordable alternatives to expensive or discontinued reagents, making it easier for researchers with fewer resources to perform high-quality experiments.

      (5) The KB includes a community discussion forum where scientists can ask for advice, share troubleshooting tips, and collaborate with others facing similar challenges.

      (6) The authors discuss plans for active maintenance of the database and also to incentivize higher participation from the community.

      (7) Even those unfamiliar with Github may contribute with the help of the database maintenance team.

      Note: The authors have addressed my comments on the previous version of the article and the current version has been strengthened as a result.

    5. Reviewer #3 (Public review):

      Summary:

      The authors have developed and interactive knowledge-base that uses crowdsourcing information on antibodies and reagents for immunofluorescence imaging.

      Strengths:

      The authors provide an extremely relevant and needed interphase for collaboration through a well-built platform. All the links in their website work, the information provided, reagents, datasets, videos and protocols are very informative. The instructions for the community researchers to contribute is clear and they provide detailed instructions in how to technically proceed. Additionally, the interface has been refined to enable the contribution regardless of the computational expertise of the researcher.

      Weaknesses:

      The Knowledge-Base relies on community contributions without mandatory, standardized metadata and validation criteria. Whilst this enhances the contributions, it limits the reliability of the database.

    1. eLife Assessment

      This manuscript by Kaur et al. identifies differential gene expression in distinct cell populations, specifically myeloid and lymphoid cells, following short-term exposure to e-cigarette aerosols with various flavors. Their findings are useful because they provide a single-cell sequencing data resource for assessing which genes and cellular pathways could be affected by e-cig aerosols and their components. However, the evidence is incomplete due to limited number of biological replicates per condition, as well as due to the lack of in vivo validation.

    2. Reviewer #1 (Public review):

      Summary:

      The authors assess the impact of E-cigarette smoke exposure on mouse lungs using single-cell RNA sequencing. Air was used as control and several flavors (fruit, menthol, tobacco) were tested. Differentially expressed genes (DEGs) were identified for each group and compared against the air control. Changes in gene expression in either myeloid or lymphoid cells were identified for each flavor and the results varied by sex. The scRNAseq dataset will be of interest to the lung immunity and e-cig research communities, and some of the observed effects could be important. Unfortunately, the revision did not address the reviewers' main concerns about low replicate numbers and lack of validations. The study remains preliminary and no solid conclusions could be drawn about the effects of E-cig exposure as a whole or any flavor-specific phenotypes.

      Strengths:

      The study is the first to use scRNAseq to systematically analyze the impact of e-cigarettes on the lung. The dataset will be of broad interest.

      Weaknesses:

      This study had only N=1 biological replicates for the single-cell sequencing data per sex per group and some sex-dependent effects were observed. This could have been remedied by validating key observations from the study using traditional methods such as flow cytometry and qPCR, but the limited number of validation experiments did not support the conclusions of the scRNAseq analysis. An important control group (PG:VG) had extremely low cell numbers and therefore could not be used to derive meaningful conclusions. Statistical analysis is lacking in almost all figures. Overall, this is a preliminary study with some potentially interesting observations.

      (1) The only new validation experiment for this revision is the immunofluorescent staining of neutrophils in Figure 4. The images are very low resolution and low quality and it is not clear which cells are neutrophils. S100A8 (calprotectin) is highly abundant in neutrophils but not strictly neutrophil-specific. It's hard to distinguish positive cells from autofluorescence in both ly6g and S100a8 channels. No statistical analysis is presented for the quantified data from this experiment.

      (2) The relevance of Fig. 3A and B are unclear since these numbers only reflect the number of cells captured in the scRNAseq experiment and the biological meaning of this data is not explained. Flow cytometry quantification is presented as cell counts but percentage of cells from the CD45+ gate should be shown. No statistical analysis is shown, and flow cytometry results do not support the conclusions of scRNAseq data.

    3. Reviewer #3 (Public review):

      This work aims to establish cell-type-specific changes in gene expression upon exposure to different flavors of commercial e-cigarette aerosols compared to control or vehicle. Kaur et al. conclude that immune cells are most affected, with the greatest dysregulation found in myeloid cells exposed to tobacco-flavored e-cigs and lymphoid cells exposed to fruit-flavored e-cigs. The up- and down-regulated genes are heavily associated with innate immune response. The authors suggest that a Ly6G-deficient subset of neutrophils is found to be increased in abundance for the treatment groups, while gene expression remains consistent, which could indicate impaired function. Increased expression of CD4+ and CD8+ T cells along with their associated markers for proliferation and cytotoxicity is thought to be a result of activation following this decline in neutrophil-mediated immune response.

      Strengths:

      Single-cell sequencing data can be very valuable in identifying potential health risks and clinical pathologies of lung conditions associated with e-cigarettes considering they are still relatively new.

      Not many studies have been performed on cell-type-specific differential gene expression following exposure to e-cig aerosols.

      The assays performed address several factors of e-cig exposure such as metal concentration in the liquid and condensate, coil composition, cotinine/nicotine levels in serum and the product itself, cell types affected, which genes are up- or down-regulated and what pathways they control.

      Considerations were made to ensure clinical relevance such as selecting mice whose ages corresponded with human adolescents so that data collected was relevant.

      The discussion addresses the limitations of this study.

      Weaknesses:

      The exposure period of 1 hour a day for 5 days is not representative of chronic use and this time point may be too short to see a full response in all cell types. There is no gold standard in the field.

      Most findings are based on scRNA-seq alone, so interpretations should be made with care as some conclusions are observational.

      This paper provides a good foundation for future follow-up studies that will examine the effects of e-cig exposure on innate immunity.

    4. Author response:

      The following is the authors’ response to the previous reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      The authors assess the impact of E-cigarette smoke exposure on mouse lungs using single cell RNA sequencing. Air was used as control and several flavors (fruit, menthol, tobacco) were tested. Differentially expressed genes (DEGs) were identified for each group and compared against the air control. Changes in gene expression in either myeloid or lymphoid cells were identified for each flavor and the results varied by sex. The scRNAseq dataset will be of interest to the lung immunity and e-cig research communities and some of the observed effects could be important. Unfortunately, the revision did not address the reviewers' main concerns about low replicate numbers and lack of validations. The study remains preliminary, and no solid conclusions could be drawn about the effects of E-cig exposure as a whole or any flavor-specific phenotypes.

      Strengths:

      The study is the first to use scRNAseq to systematically analyze the impact of e-cigarettes on the lung. The dataset will be of broad interest.

      Weaknesses:

      scRNAseq studies may have low replicate numbers due to the high cost of studies but at least 2 or 3 biological replicates for each experimental group is required to ensure rigor of the interpretation. This study had only N=1 per sex per group and some sex-dependent effects were observed. This could have been remedied by validating key observations from the study using traditional methods such as flow cytometry and qPCR, but the limited number of validation experiments did not support the conclusions of the scRNA seq analysis. An important control group (PG:VG) had extremely low cell numbers and was basically not useful. Statistical analysis is lacking in almost all figures. Overall, this is a preliminary study with some potentially interesting observations, but no solid conclusions can be made from the data presented.

      The only new validation experiment is the immunofluorescent staining of neutrophils in Figure 4. The images are very low resolution and low quality and it is not clear which cells are neutrophils. S100A8 (calprotectin) is highly abundant in neutrophils but not strictly neutrophil-specific. It's hard to distinguish positive cells from autofluorescence in both Ly6g and S100a8 channels. No statistical analysis in the quantification.

      We thank the reviewer for identifying the strengths of this study and pointing out the gaps in knowledge. Overall, our purpose to present this data is to provide the scRNA seq results as a resource to a wider community. We have used techniques like flow cytometry, multianalyte cytokine array and immunofluorescence to validate some of the results. We agree with the reviewer that we were unable to rightly point out the significance of our findings with the immunofluorescent stain in the previous edit. We have revised the manuscript and included the quantification for both Ly6G+ and S100A8+ cells in e-cig aerosol exposed and control lung tissues. Briefly, we identified a marked decrease in the staining for S100A8 (marker for neutrophil activation) in tobacco-flavored e-cig exposed mouse lungs as compared to controls. Upon considering the corroborating evidence from scRNA seq and flow cytometry with regards to increased neutrophil percentages in experimental group and lowered staining for active neutrophils using immunofluorescence, we speculate that exposure to e-cig (tobacco) aerosols may alter the neutrophil dynamics within the lungs. Also, co-immunofluorescence identified a more prominent co-localization of the two markers in control samples as compared to the treatment group which points towards some changes in the innate immune milieu within the lungs upon exposures. Future work is required to validate these speculations.

      We have now discussed all the above-mentioned points in the Discussion section of the revised manuscript and toned down our conclusions regarding sex-dependent changes from scRNA seq data.

      It is unclear what the meaning of Fig. 3A and B is, since these numbers only reflect the number of cells captured in the scRNAseq experiment and are not biologically meaningful. Flow cytometry quantification is presented as cell counts, but the percentage of cells from the CD45+ gate should be shown. No statistical analysis is shown, and flow cytometry results do not support the conclusions of scRNAseq data.

      We thank the reviewer for this question. However, we would like to highlight that scRNA seq and flow cytometry may show similar trends but cannot be identical as one relies on cell surface markers (protein) for identification of cell types, while other is dependent on the transcriptomic signatures to identify the cell types. In our data, for the myeloid cells (alveolar macrophages and neutrophils), the scRNA and flow cytometry data match in trend. However, the trends do not match with respect to the lymphoid cells being studied (CD4 and CD8 T cells). The possible explanation for such a finding could be possible high gene dropout rates in scRNA seq, different analytical resolution for the two techniques and pooling of samples in our single cell workflow. We realize these shortcomings in our analyses and mention it clearly in the discussion as limitation of our work. It is important to note also that cell frequencies identified in scRNA seq just provide wide and indistinct indications which need to be further validated, which we tried to accomplish in our work to some degree. Our flow-based results clearly highlight the sex-specific variations in the immune cell percentages (something we could not have anticipated earlier). In future studies, we will include more replicates to tease out sex-based variations upon acute and chronic exposure to e-cig aerosols.

      We have now replotted the graphs in Fig 3A and B and plotted the flow quantification as the percentage of total CD45+ cells. The gating strategy for the flow plots is also included as Figure S6 in the revised manuscript.

      Reviewer #2 (Public review):

      This study provides some interesting observations on how different flavour e-cigarettes can affect lung immunology; however, there are numerous flaws, including a low replicate number and a lack of effective validation methods, meaning findings may not be repeated. This is a revised article but several weaknesses remain related to the analysis and interpretation of the data.

      Strengths:

      The strength of the study is the successful scRNA-seq experiment which gives some preliminary data that can be used to create new hypotheses in this area.

      Weaknesses:

      Although some text weaknesses have been addressed since resubmission, other specific weaknesses remain: The major weakness is the n-number and analysis methods. Two biological n per group is not acceptable to base any solid conclusions. Any validatory data was too little (only cell % data) and not always supporting the findings (e.g. figure 3D does not match 3B/4A). Other examples include:

      There aren't enough cells to justify analysis - only 300-1500 myeloid cells per group with not many of these being neutrophils or the apparent 'Ly6G- neutrophils'.

      We thank the reviewer for the comment, but we disagree with the reviewer in terms of the justification of analyses. All the flavored e-cig aerosol groups were compared with air controls to deduce the outcomes in the current study. We already acknowledge low sample quality for PGVG group and have only included the comparisons with PGVG upon reviewer’s request which is open to interpretation by the reader.

      By that measure, each treatment group (except PGVG group) has over 1000 cells with 24777 genes being analyzed for each cell type, which by the standards of single cell is sufficient. We understand that this strategy should not be used for detection of rare cell populations, which was neither the purpose of this manuscript nor was attempted. We conduct comparisons of broader cell types and mention more samples need to be added in the Discussion section of the revised manuscript.

      As for the Ly6G neutrophil category, we don’t only base our results on scRNA analyses but also perform co-immunofluorescence and multi-analyte analyses and use evidence from previous literature to back our outcome. To avoid over-stating our results we have revamped the whole manuscript and ensured to tone down our results with relation to the presence of Ly6G- neutrophils. We do understand that more work is required in the future, but our work clearly shows the shift in neutrophil dynamics upon exposure which should be reported, in our opinion.

      The dynamic range of RNA measurement using scRNAseq is known to be limited - how do we know whether genes are not expressed or just didn't hit detection? This links into the Ly6G negative neutrophil comments, but in general the lack of gene expression in this kind of data should be viewed with caution, especially with a low n number and few cells. The data in the entire paper is not strong enough to base any solid conclusion - it is not just the RNA-sequencing data.

      We acknowledge this to be a valid point and have revamped the manuscript and toned down our conclusions. However, such limitations exist with any scRNA seq dataset and so must be interpreted accordingly by the readers. We do understand that due to the low cell counts and the limitations with scRNA seq we should not perform DESeq2 analyses for Ly6G+ versus Ly6G- neutrophil categories, which was never attempted at the first place. However, our results with co-immunofluorescence, multianalyte assay and scRNA expression analyses in myeloid cluster do point towards a shift in neutrophil activation which needs to be further investigated. Furthermore, Ly6G deficiency has been linked to immature neutrophils in many previous studies and is not an unlikely outcome that needs to be treated with immense skepticism.

      We wish to make this dataset available as a resource to influence future research. We are aware of its limitations and have been transparent with regards to our experimental design, capture strategy, the quality of obtained results, and possible caveats to make it is open for discussion by the readers.

      There is no data supporting the presence of Ly6G negative neutrophils. In the flow cytometry only Ly6G+ cells are shown with no evidence of Ly6G negative neutrophils (assuming equal CD11b expression). There is no new data to support this claim since resubmission and the New figures 4C and D actually show there are no Ly6G negative cells - the cells that the authors deem Ly6G negative are actually positive - but the red overlay of S100A8 is so strong it blocks out the green signal - looking to the Ly6G single stains (green only) you can see that the reported S100A8+Ly6G- cells all have Ly6G (with different staining intensities).

      We thank the reviewer for this query and do understand the skepticism. We have now quantified the data to provide more clarity for interpretation. As we were using paraffin embedded tissues, some autofluorescence is expected which could explain some of reviewer’s concerns. However we expect that the inclusion of better quality images and quantification must address some of the concerns raised by the reviewer.

      Eosinophils are heavily involved in lung macrophage biology, but are missing from the analysis - it is highly likely the RNA-sequence picked out eosinophils as Ly6G- neutrophils rather than 'digestion issues' the authors claim

      We thank the reviewer for raising a valid concern. However, the Ly6G- cluster cannot be eosinophils in our case. Literature suggests SiglecF as an important biomarker of eosinophils which was absent in the Ly6G- cluster our in scRNA seq analyses as shown in File S18 and Figure 6B of the revised manuscript. We have now provided a detailed explanation (Lines 476-488; 503-506) of the observed results pertaining to eosinophil population in the revised manuscript to further address some of the concerns raised by this reviewer.

      After author comments, it appears the schematic in Figure 1A is misleading and there are not n=2/group/sex but actually only n=1/group/sex (as shown in Figure 6A). Meaning the n number is even lower than the previous assumption.

      We concur with reviewers’ valid concern and so are willing to provide this data as a resource for a wider audience to assist future work. Pooling of samples have been practiced by many groups previously to save resources and expense. We did it for the very same reason. It may not be the preferred approach, but it still has its merit considering the vast amount of cell-specific data generated using this strategy. To avoid overstating our results we have ensured to maintain transparency in our reporting and acknowledge all the limitations of this study.

      We do not believe that the strength of scRNA seq lies in drawing conclusive results, but to tease our possible targets and direction that need to be validated with more work. In that respect, our study does identify the target cell types and biological processes which could be of importance for future studies.

      Reviewer #3 (Public review):

      This work aims to establish cell-type specific changes in gene expression upon exposure to different flavors of commercial e-cigarette aerosols compared to control or vehicle. Kaur et al. conclude that immune cells are most affected, with the greatest dysregulation found in myeloid cells exposed to tobacco-flavored e-cigs and lymphoid cells exposed to fruit-flavored e-cigs. The up- and down-regulated genes are heavily associated with innate immune response. The authors suggest that a Ly6G-deficient subset of neutrophils is found to be increased in abundance for the treatment groups, while gene expression remains consistent, which could indicate impaired function. Increased expression of CD4+ and CD8+ T cells along with their associated markers for proliferation and cytotoxicity is thought to be a result of activation following this decline in neutrophil-mediated immune response.

      Strengths:

      Single cell sequencing data can be very valuable in identifying potential health risks and clinical pathologies of lung conditions associated with e-cigarettes considering they are still relatively new.

      Not many studies have been performed on cell-type specific differential gene expression following exposure to e-cig aerosols.

      The assays performed address several factors of e-cig exposure such as metal concentration in the liquid and condensate, coil composition, cotinine/nicotine levels in serum and the product itself, cell types affected, which genes are up- or down-regulated and what pathways they control.

      Considerations were made to ensure clinical relevance such as selecting mice whose ages corresponded with human adolescents so that data collected was relevant.

      Weaknesses:

      The exposure period of 1 hour a day for 5 days is not representative of chronic use and this time point may be too short to see a full response in all cell types. The experimental design is not well-supported based on the literature available for similar mouse models. Clinical relevance of this short exposure remains unclear.

      We thank the reviewer for this query. However, we would like to emphasize that chronic exposure was never the intention of this study. We wished to design a study for acute nose-only exposure owing to which the study duration was left shorter. Shorter durations limit the stress and discomfort to the animal. The in vivo study using nose-only exposure regimen is still developing with multiple exposure regimen being used by different groups. To our knowledge there is no gold standard of e-cig aerosol exposure which is widely accepted other than the CORESTA recommendations, which we followed. Also, we show in our study how the daily exposure to leached metals vary in a flavor-dependent manner thus validating that exposure regime does need more attention in terms of equal dosing, particle distribution and composition- something we have started doing in our future studies. We have included all the explanations in the revised manuscript (Lines 82-85, 425-435, 648-654).

      Several claims lack supporting evidence or use data that is not statistically significant. In particular, there were no statistical analyses to compare results across sex, so conclusions stating there is a sex bias for things like Ly6G+ neutrophil percentage by condition are observational.

      We agree with reviewer’s comment and have taken this into consideration. We have now revamped the whole manuscript and toned down most of the sex-based conclusions stated in this work. Having said that, it is important to note that most of the work relying solely on scRNA seq, as is the case for this study, is observational in nature and needs to be assessed bearing this in mind.

      Overall, the paper and its discussion are relatively surface-level and do not delve into the significance of the findings or how they fit into the bigger picture of the field. It is not clear whether this paper is intended to be used as a resource for other researchers or as an original research article.

      We have now reworked on the Discussion and tried to incorporate more in-depth discussion and the results providing our insights regarding the observations, discrepancies and the possible explanations. We have also made it clear that this paper is intended to be used as a resource by other researchers (Lines 577-579)

      The manuscript has some validation of findings but not very comprehensive.

      We have now revamped the manuscript. We have Included quantification for immunofluorescence data with better representation of the GO analyses. We have worked on the Results and Discussion sections to make this a useful resource for the scientific community.

      This paper provides a strong foundation for follow-up experiments that take a closer look at the effects of e-cig exposure on innate immunity. There is still room to elaborate on the differential gene expression within and between various cell types.

      We thank the reviewer for pointing out the strength of this paper. The reason why we refrained from elaborating of the differential gene expressions within and between various cell types was due to low sample number and sequencing depth for this study. However the raw data will be provided with the final publication, which should be freely accessible to the public to re-analyze the data set as they deem fit.

      Comments on revisions:

      The reviewers have addressed major concerns with better validation of data and improved organization of the paper. However, we still have some concerns and suggestions pertaining to the statistical analyses and justifications for experimental design.

      We appreciate the nuance of this experimental design, and the reviewers have adequately commented on why they chose nose-only exposure over whole body exposure. However, the justification for the duration of the exposure, and the clinical relevance of a short exposure, have not been addressed in the revised manuscript.

      We thank the editor for this query. We have now addressed this query briefly in Lines 82-85, 425-435, 648-654 of the revised manuscript. We would like to add, however, that we intend to design a study for acute nose-only exposure for this project. Shorter durations limit the stress and discomfort to the animal, owing to which a duration of 1hour per day was chosen. The in vivo study using nose-only exposure regimen is still developing with multiple exposure regimen being used by different groups. Ours is one such study in that direction just intended to identify cell-specific changes upon exposure. Considering our results in Figure 1B showing variations in the level of metals leached in each flavor per day, the appropriate exposure regimen to design a controlled, reproducible experiment needs to be discussed. There could be room for improvement in our strategy, but this was the best regimen that we found to be appropriate per the literature and our prior knowledge in the field.

      The presentation of cell counts should be represented by a percentage/proportion rather than a raw number of cells. Without normalization to the total number of cells, comparisons cannot be made across groups/conditions. This comment applies to several figures.

      We thank the editor for this comment and have now made the requested change in the revised manuscript.

      We appreciate that the authors have taken the reviewers' advice to validate their findings. However, we have concerns regarding the immunofluorescent staining shown in Figure 4. If the red channel is showing a pan-neutrophil marker (S100A8) and the green channel is showing only a subset of neutrophils (LY6G+), then the green channel should have far less signal than the red channel. This expected pattern is not what is shown in the figure, with the Ly6G marker apparently showing more expression than S100A8. Additionally, the FACS data states that only 4-5% of cells are neutrophils, but the red channel co-localizes with far more than 4-5% of the DAPI stain, meaning this population is overrepresented, potentially due to background fluorescence (noise). In addition, some of the shapes in the staining pattern do not look like true neutrophils, although it is difficult to tell because there remains a lot of background staining. The authors need to verify that their S100A8 and Ly6G antibodies work and are specific to the populations they intend to target. It is possible that only the brightest spots are truly S100A8+ or Ly6G+.

      We thank the editor for this comment and acknowledge that we may have made broad generalizations in our interpretation of our data previously. We have now revisited the data and quantified the two fluorescence for better interpretation of our results. We have also reassessed our conclusions from this data and reworded the manuscript accordingly. Briefly we believe that Ly6G deficiency could be an indication of the presence of immature neutrophils in the lungs. This is a common process of neutrophil maturation. An active neutrophil population has Ly6G and should also express S100A8 indicating a normal neutrophilic response against stressors. However, our results, despite some autofluorescence which is common with lung tissues, shows a marked decline in the S100A8+ cells in the lung of tobacco-flavored e-cig aerosol exposed mice as compared to air controls. We also do not see prominent co-localization of the two markers in exposed group thus proving a shift in neutrophil dynamics which requires further investigation. We would also like to mention here that S100A8 is predominantly expressed in neutrophils, but is also expressed by monocytes and macrophages, so that could explain the over-representation of these cells in our immunofluorescence results. We have now included this in the Discussion section (Lines 489- 538) of the revised manuscript.

      Paraffin sections do not always yield the best immunostaining results and the images themselves are low magnification and low resolution.

      We agree with the editor that paraffin sections may not yield best results, we have worked on the final figure to improve the quality of the displayed results and zoomed-in some parts of the merged image to show the differences in the co-localization patterns for the two markers in our treated and control groups for easier interpretation.

      Please change the scale bars to white so they are more visible in each channel.

      The merged image in Figure 6C now has a white scale bar.

      We appreciate that this is a preliminary test used as a resource for the community, but there is interesting biology regarding immune cells that warrants DEG analysis by the authors. This computational analysis can be easily added with no additional experiments required.

      We thank the editor for this comment and agree that interesting biology regarding immune cells could be explored upon performing the DEG analyses on individual immune populations. However, due to the small sample size, low sequencing depth and pooling of same sex animals in each treatment group, we refrained from performing that analyses fearing over-representation of our results. We will be providing the link to the raw data with this publication which will be freely accessible to public on NIH GEO resource to allow further analyses on this dataset by the judgement of the investigator who utilizes it as a resource.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (Minor) The pathway analyses in Fig. 6-8 have different fonts than what's used in all other figures.

      We have now made the requested change in the revised manuscript.

    1. eLife Assessment

      This important study identifies a new key factor in orchestrating the process of glial wrapping of axons in Drosophila wandering larvae. The evidence supporting the claims of the authors is convincing and the EM studies are of outstanding quality. After the revision, the authors have addressed most of the concerns and the manuscript has been significantly improved. Both reviewers have agreed on the significance of the work. The work will be of interest to neuroscientists working on glial cell biology.

    2. Reviewer #1 (Public review):

      Summary:

      A central function of glial cells is the ensheathment of axons. Wrapping of larger-diameter axons involves myelin-forming glial classes (such as oligodendrocytes), whereas smaller axons are covered by non-myelin forming glial processes (such as olfactory ensheathing glia). While we have some insights into the underlying molecular mechanisms orchestrating myelination, our understanding of the signaling pathways at work in non-myelinating glia remains limited. As non-myelinating glial ensheathment of axons is highly conserved in both vertebrates and invertebrates, the nervous system of Drosophila melanogaster, and in particular the larval peripheral nerves, have emerged as powerful model to elucidate the regulation of axon ensheathment by a class of glia called wrapping glia. This study seeks to specifically address the question, as to which molecular mechanisms contribute to the regulation of the extent of glial ensheathment focusing on the interaction of wrapping glia with axons.

      Strengths and Weaknesses:

      For this purpose, the study combines state-of-the-art genetic approaches with high-resolution imaging, including classic electron microscopy. The genetic methods involve RNAi mediated knockdown, acute Crispr-Cas9 knock-outs and genetic epistasis approaches to manipulate gene function with the help of cell-type specific drivers. The successful use of acute Crispr-Cas9 mediated knockout tools (which required the generation of new genetic reagents for this study) will be of general interest to the Drosophila community.

      The authors set out to identify new molecular determinants mediating the extent of axon wrapping in the peripheral nerves of third instar wandering Drosophila larvae. They could show that over-expressing a constitutive-active version of the Fibroblast growth factor receptor Heartless (Htl) causes an increase of wrapping glial branching, leading to the formation of swellings in nerves close to the cell body (named bulges). To identify new determinants involved in axon wrapping acting downstream of Htl, the authors next conducted an impressive large-scale genetic interaction screen (which has become rare, but remains a very powerful approach), and identified Uninflatable (Uif) in this way. Uif is a large single-pass transmembrane protein which contains a whole series of extracellular domains, including Epidermal growth factor-like domains. Linking this protein to glial branch formation is novel, as it has so far been mostly studied in the context of tracheal maturation and growth. Intriguingly, a knock-down or knock-out of uif reduces branch complexity and also suppresses htl over-expression defects. Importantly, uif over-expression causes the formation of excessive membrane stacks. Together these observations are in in line with the notion that htl may act upstream of uif.

      Further epistasis experiments using this model implicated also the Notch signaling pathway as a crucial regulator of glial wrapping: reduction in Notch signaling reduces wrapping, whereas over-activation of the pathway increases axonal wrapping (but does not cause the formation of bulges). Importantly, defects caused by over-expression of uif can be suppressed by activated Notch signaling. Knock-down experiments in neurons suggest further that neither Delta nor Serrate act as neuronal ligands to activate Notch signaling in wrapping glia, whereas knock-down of Contactin, a GPI anchored Immunoglobulin domain containing protein led to reduced axon wrapping by glia, and thus could act as an activating ligand in this context.

      Based on these results the authors put forward a model proposing that Uif normally suppresses Notch signaling, and that activation of Notch by Contactin leads to suppression of Htl, to trigger the ensheathment of axons. While these are intriguing propositions, future experiments will need to conclusively address whether and how Uif could "stabilize" a specific membrane domain capable to interact with specific axons.

      Moreover, to obtain evidence for Uif suppression by Notch to inhibit "precocious" axon wrapping and for a "gradual increase" of Notch signaling that silences uif and htl, (1) reporters for N and Htl signaling in larvae, (2) monitoring of different stages at a time point when branch extension begins, and (3) a reagent enabling the visualization of Uif expression could be important next tools/approaches. Considering the qualitatively different phenotypes of reduced branching, compared to excessive membrane stacks close to cell bodies, it would perhaps be worthwhile to explore more deeply how membrane formation in wrapping glia is orchestrated at the subcellular level by Uif.

      However, the points raised above remain at present technically difficult to address because of the lack of appropriate genetic reagents. Also more detailed electron microscopy analyses of early developmental stages and comparisons of effects on cell bodies compared to branches will be very labor-intensive, and indeed may represent a new study.

      In summary, in light of the importance of correct ensheathment of axons by glia for neuronal function, the proposed model for the interactions between Htl, Uif and N to control the correct extent of neuron and glial contacts will be of general interest to the glial biology community.

      Comments on revisions:

      The authors have addressed all my comments. However, the sgRNAs in the Star method table are still all for cleavage just before the transmembrane domain, while the Supplemental figure suggests different locations.

    3. Reviewer #2 (Public review):

      The FGF receptor Heartless has previously been implicated in Drosophila peripheral glial growth and axonal wrapping. Here, the authors performed a large-scale screen of over 2,600 RNAi lines to identify factors regulating the downstream signaling of this process. They identified the transmembrane protein Uninflatable (Uif) as essential for the formation of plasma membrane domains. Furthermore, they found that Notch, a regulatory target of Uif, is required for glial wrapping. Interestingly, additional evidence implies that Notch reciprocally regulates uif and htl, suggesting a feedback loop. Consequently, the authors propose that Uif functions as a 'switch' to regulate the balance between glial growth and axonal wrapping.

      Little is known about how glial cell properties are coordinated with axons, and the identification of Uif provides essential insight into this orchestration. The manuscript is well-written, and the experiments are generally well-controlled. The electron microscopy studies, in particular, are of outstanding quality and help mechanistically dissect the consequences of Uif and Notch signaling in the regulation of glial processes. Together, this important study provides convincing evidence of a new player coordinating the glial wrapping of axons.

      Comments on revisions:

      Overall, the authors have done an excellent job of responding to my substantive concerns in this significantly improved manuscript. In particular, the authors have provided important additional details about the design, prioritization, and outcomes of their screen, and relayed changes that strengthen and extend the impact of their study. I have revised my assessment accordingly, and I expect this study to be of high interest to a variety of researchers in the field.

    4. Author response:

      The following is the authors’ response to the current reviews.

      We would like to proceed with this paper as a Version of Record but we will correct the mistake that we made in the Key resources table. As the reviewer noted we had added the wrong guide RNA sequence here. We are super thankful to the reviewer and apologize for the mistake.


      The following is the authors’ response to the original reviews.

      eLife Assessment 

      This important study identifies a new key factor in orchestrating the process of glial wrapping of axons in Drosophila wandering larvae. The evidence supporting the claims of the authors is convincing and the EM studies are of outstanding quality.

      We are thankful for this kind and very positive judgment.

      However, the quantification of the wrapping index, the role of Htl/Uif/Notch signaling in differentiation vs growth/wrapping, and the mechanism of how Uif "stabilizes" a specific membrane domain capable of interacting with specific axons might require further clarification or discussion.

      This is now addressed

      Reviewer #1 (Public review):

      Summary:

      A central function of glial cells is the ensheathment of axons. Wrapping of larger-diameter axons involves myelin-forming glial classes (such as oligodendrocytes), whereas smaller axons are covered by non-myelin-forming glial processes (such as olfactory ensheathing glia). While we have some insights into the underlying molecular mechanisms orchestrating myelination, our understanding of the signaling pathways at work in non-myelinating glia remains limited. As non-myelinating glial ensheathment of axons is highly conserved in both vertebrates and invertebrates, the nervous system of Drosophila melanogaster, and in particular the larval peripheral nerves, have emerged as a powerful model to elucidate the regulation of axon ensheathment by a class of glia called wrapping glia. Using this model, this study seeks to specifically address the question, as to which molecular mechanisms contribute to the regulation of the extent of glial ensheathment focusing on the interaction of wrapping glia with axons. 

      Strengths and Weaknesses:

      For this purpose, the study combines state-of-the-art genetic approaches with high-resolution imaging, including classic electron microscopy. The genetic methods involve RNAi-mediated knockdown, acute Crispr-Cas9 knock-outs, and genetic epistasis approaches to manipulate gene function with the help of cell-type specific drivers. The successful use of acute Crispr-Cas9 mediated knockout tools (which required the generation of new genetic reagents for this study) will be of general interest to the Drosophila community. 

      The authors set out to identify new molecular determinants mediating the extent of axon wrapping in the peripheral nerves of third-instar wandering Drosophila larvae. They could show that over-expressing a constitutive-active version of the Fibroblast growth factor receptor Heartless (Htl) causes an increase in wrapping glial branching, leading to the formation of swellings in nerves close to the cell body (named bulges). To identify new determinants involved in axon wrapping acting downstream of Htl, the authors next conducted an impressive large-scale genetic interaction screen (which has become rare, but remains a very powerful approach), and identified Uninflatable (Uif) in this way. Uif is a large single-pass transmembrane protein that contains a whole series of extracellular domains, including Epidermal growth factor-like domains. Linking this protein to glial branch formation is novel, as it has so far been mostly studied in the context of tracheal maturation and growth. Intriguingly, a knock-down or knock-out of uif reduces branch complexity and also suppresses htl over-expression defects. Importantly, uif over-expression causes the formation of excessive membrane stacks. Together these observations are in in line with the notion that htl may act upstream of uif. 

      Further epistasis experiments using this model implicated also the Notch signaling pathway as a crucial regulator of glial wrapping: reduction in Notch signaling reduces wrapping, whereas over-activation of the pathway increases axonal wrapping (but does not cause the formation of bulges). Importantly, defects caused by the over-expression of uif can be suppressed by activated Notch signaling. Knock-down experiments in neurons suggest further that neither Delta nor Serrate act as neuronal ligands to activate Notch signaling in wrapping glia, whereas knock-down of Contactin, a GPI anchored Immunoglobulin domain-containing protein led to reduced axon wrapping by glia, and thus could act as an activating ligand in this context. 

      Based on these results the authors put forward a model proposing that Uif normally suppresses Notch signaling, and that activation of Notch by Contactin leads to suppression of Htl, to trigger the ensheathment of axons. While these are intriguing propositions, future experiments would need to conclusively address whether and how Uif could "stabilize" a specific membrane domain capable of interacting with specific axons.

      We absolutely agree with the reviewer that it would be fantastic to understand whether and how Uif could stabilize specific membrane domains that are capable of interacting with axons. To address this we need to be able to label such membrane domains and unfortunately we still cannot do so. We analyzed the distribution of PIP2/PIP3 but failed to detect any differences. Thus we still lack wrapping glial membrane markers that are able to label specific compartments.

      Moreover, to obtain evidence for Uif suppression by Notch to inhibit "precocious" axon wrapping and for a "gradual increase" of Notch signaling that silences uif and htl, (1) reporters for N and Htl signaling in larvae, (2) monitoring of different stages at a time point when branch extension begins, and (3) a reagent enabling to visualize Uif expression could be important next tools/approaches. Considering the qualitatively different phenotypes of reduced branching, compared to excessive membrane stacks close to cell bodies, it would perhaps be worthwhile to explore more deeply how membrane formation in wrapping glia is orchestrated at the subcellular level by Uif.

      In the revised version of the manuscript we have now included the use of Notch and RTK-signaling reporters.

      (1) reporters for N and Htl signaling in larvae,

      We had already employed the classic reporter generated by the Bray lab: Gbe-Su(H)-lacZ. This unfortunately failed to detect any activity in larval wrapping glia nuclei but was able to detect Notch activity in the adult wrapping glia (Figure S5C,F).

      We did, as requested, the analysis of a RTK signaling reporter.  The activity of sty-lacZ that we had previously characterized in the lab (Sieglitz et al., 2013) increases by 22% when Notch is silenced. Given the normal distribution of the data points, this shows a trend which, however, is not in the significance range. We have not included this in the paper, but would be happy to do so, if requested.

      Author response image 1.

       

      (2) monitoring of different stages at a time point when branch extension begins,

      The reviewer asks for an important question; however, this is extremely difficult to tackle experimentally. It would require a detailed electron microscopic analysis of early larval stages which cannot be done in a reasonable amount of time. We have however added additional information on wrapping glia growth summarizing recently published work from the lab (Kautzmann et al., 2025).

      (3) a reagent enabling to visualize Uif expression could be important next tools/approaches.

      The final comment of the reviewer also addresses an extremely relevant and important issue. We employed antibodies generated by the lab of R. Ward, but they did not allow detection of the protein in larval nerves. We also attempted to generate anti-Uif peptide antibodies but these antibodies unfortunately do not work in tissue. We are still trying to generate suitable reagents but for the current revision cannot offer any solution.

      Lastly, we agree with the reviewer that it would be worthwhile to explore how Uif controls membrane formation at the subcellular level. This, however, is a completely new project and will require the identification of the binding partners of Uif in wrapping glia to start working on a link between Uif and membrane extension. The reduced branching phenotype might well be a direct consequence of excessive membrane formation as it likely blocks recourses needed for efficient growth of glial processes.

      Finally, in light of the importance of correct ensheathment of axons by glia for neuronal function, this study will be of general interest to the glial biology community. 

      We are very grateful for this very positive comment.

      Reviewer #2 (Public review): 

      The FGF receptor Heartless has previously been implicated in Drosophila peripheral glial growth and axonal wrapping. Here, the authors perform a large-scale screen of over 2600 RNAi lines to find factors that control the downstream signaling in this process. They identify a transmembrane protein Uninflatable to be necessary for the formation of plasma membrane domains. They further find that a Uif regulatory target, Notch, is necessary for glial wrapping. Interestingly, additional evidence suggests Notch itself regulates uif and htl, suggesting a feedback system. Together, they propose that Uif functions as a "switch" to regulate the balance between glial growl and wrapping of axons. 

      Little is known about how glial cell properties are coordinated with axons, and the identification of Uif is a promising link to shed light on this orchestration. The manuscript is well-written, and the experiments are generally well-controlled. The EM studies in particular are of outstanding quality and really help to mechanistically dissect the consequences of Uif and Notch signaling in the regulation of glial processes. Together, this valuable study provides convincing evidence of a new player coordinating the interactions controlling the glial wrapping of axons.

      Reviewer #1 (Recommendations for the authors): 

      (1) To be reproducible and understandable, it would be important to provide detailed information about crosses and genotypes, as reagents are currently listed individually and genotypes are provided in rather simplified versions. 

      We have added the requested information to the text.

      (2) Neurons are inherently resistant to RNAi-mediated knockdown and it thus may be necessary to introduce the over-expression of UAS-dcr2 when assessing neuronal requirements and to specifically exclude Delta or Serrate as ligands. 

      We agree with the reviewer and have repeated the knockdown experiments using UAS-dcr2 and obtained the same results. To use an RNAi independent approach we also employed sgRNA expression in the presence of Cas9. The neuron specific gene knockout also showed no glial wrapping phenotype. These results are now added to the manuscript.

      (3) Throughout the manuscript, the authors use the terms "growth" and "differentiation" referring to the extent of branch formation versus axon wrapping. However glial differentiation and growth could have different meanings (for instance, growth could implicate changes in cell size or numbers, while differentiation could refer to a change from an immature precursor-like state to a mature cell identity). It may thus be useful to replace these general terms with more specific ones. 

      This is a very good point. When we use the term “growth” we only infer on glial cell growth and thus, the increase in cell mass. Proliferation is excluded and this is now explicitly stated in the manuscript. The term “differentiation” is indeed difficult and therefore we changed it either directly addressing the morphology or to axon wrapping.

      (4) Page 4. "remake" fibers should be Remak fibers. 

      We have corrected this typo.

      (5) Page 5. "Heartless controls glial growth but does promote axonal wrapping", this sentence is not clear in its message because of the "but".

      We have corrected this sentence.

      (6) Generally, many gene names are used as abbreviations without introductions (e.g. Sos, Rl, Msk on page 7). These would require an introduction.

      All genetic elements are now introduced.

      (7) Page 8. When Cas9 is expressed ubiquitously ... It would be helpful to add how this is done (nsyb-Gal4, nrv2-Gal4, or another Gal4 driver are used to express UAS-Cas9, as the listed Gal4 drivers seem to be specific to neurons or glia?).

      This now added. We used the following genotype for ubiquitous knockout using the four different uif specific sgRNAs (UAS-uif<sup>sgRNA X</sup>): [w; UAS-Cas9/ Df(2L)ED438; da-Gal4 /UAS-uif<sup>sgRNA X</sup>]. We used the following genotype for a glial knockout in wrapping glia ([+/+; UAS-Cas9/+; nrv2-Gal4,UAS-CD8::mCherry/UAS-uif<sup>sgRNA X</sup>].

      We had previously shown that nrv2-Gal4 is a wrapping glia specific driver in the larval PNS (Kottmeier et al., 2020).

      Moreover, the authors mention that "This indicates that a putatively secreted version of Uif is not functional". This conclusion would need to be explained in detail.

      First, because it requires quite some detective work to understand the panels in Figure 1 on which this statement is based; second, since the acutely induced double-stranded breaks in the DNA and subsequent repair may cause variable defects, it may indeed be not certain what changes have been induced in each cell; and third considering that there is a putative cleavage site, would it be not be expected that the protein is not functional, when it is not cleaved, and there is no secreted extracellular part (unless the cleavage site is not required). The latter could probably only be addressed by rescue experiments with UAS transgenes with identified changes.

      We agree with the reviewer. The rescue experiments are unfortunately difficult, since even expression of a full length uif construct does not fully rescue the uif mutant phenotype (Loubéry et al., 2014). We therefore explained the conclusion taken from the different sgRNA knockout experiments better and also removed the statement that secreted Uif forms are non-functional.

      In the Star Method reagent table, it is not clear, why all 8 oligonucleotides are for "uif cleavage just before transmembrane domain" despite targeting different locations. 

      We are very sorry for this mistake and corrected it now. Thank you very much for spotting this.

      (8) Page 13. However, we expressed activated Notch,... the word "when" seems to be missing, and it would be helpful to specify how this was done (over-expression of N[ICD].

      We corrected it now accordingly.

      (9) To strengthen the point similarity of phenotypes caused by Htl pathway over-activation and Uif over-expression, it would be helpful to also show an EM electron micrograph of the former.

      We now added an extensive description of the phenotype caused by activated Heartless. This is shown as new Figure 2.

      (10) Figure 4C, the larval nerve seems to be younger, as many extracellular spaces between axons are detected.

      This perception is a misunderstanding and we are sorry for not explaining this better. The third instar larvae are all age matched. The particular specimen in Figure 4C shows some fixation artifacts that result in the loss of material. Importantly, however, membranes are not affected. Similar loss of material is also seen in Figure 6C. For further examples please see a study on nerve anatomy by (Kautzmann et al., 2025).

      (11) The model could be presented as a figure panel in the manuscript. To connect the recommendation section with the above public review, a step forward could be to adjust the model and the wording in the Result section and to move some of the less explored points and thoughts to the discussion.

      We are thankful for this advice and have moved an updated model figure to the end of the main text (now Figure 7).

      Reviewer #2 (Recommendations for the authors):

      (1) Screen and the interest in Uif: Out of the ~62 genes that came out of the RNAi screen, why did the authors prioritize and focus on Uif? What were the other genes that came out of the screen, and did any of those impinge on Notch signaling? 

      We have now more thoroughly described the results of the screen.  We selected Uif as it was the only transmembrane // adhesion protein identified and given the findings that Uif decorate apical membrane domains in epithelial cells, we hoped to identify a protein specific for a similar membrane domain in wrapping glia.

      Notch as well as its downstream transcription factors were not included in the initial screen, and were only analyzed, once we had seen the contribution of Notch. Interestingly, here is one single hit in our screen linked to Notch signaling: Gp150. Here however, we have tested additional dsRNA expressing lines and were not able to reproduce the phenotype. This information is added to the discussion.

      The authors performed a large-scale screen of 2600 RNAi lines, it seems more details about what came out of the screen and why the focus on Uif would benefit the manuscript. 

      See above comment.

      Relatedly, there would be a discussion of the limitations of the screen, and that it was really a screen looking to modify a gain-of-function phenotype from the activated Htl allele; it seems a screen of this design may lead to artifacts that may not reflect endogenous signaling.

      We have now added a short paragraph on suppressor screens, employing gain of function alleles to the introduction.

      “In Drosophila, such suppressor screens have been used successfully many times (Macagno et al., 2014; Rebay et al., 2000; Therrien et al., 2000). Possibly, such screens also uncover genes that are not directly linked to the signaling pathway under study but this can be tested in further experiments. Our screen led to the unexpected identification of the large transmembrane protein Uninflatable, which in epithelial cells localizes to the apical plasma membrane. Loss of uninflatable suppresses the phenotype caused by activated RTK signaling. In addition, we find that uif knockdown and uif knockout larvae show impaired glial growth while an excess of Uninflatable leads to the formation of ectopic wrapping membrane processes that, however, fail to interact with axons. uninflatable is also known to inhibit Notch.  “

      (2) In general this study relies on RNAi knockdown, and is generally well controlled in using multiple RNAi lines giving the same phenotype, and also controlled for by tissue-specific gene knockout. However, there is little in the way of antibody staining to directly confirm the target of interest is lost/reduced, which would obviously strengthen the study. 

      Lacking the tools or ability to assess RNAi efficiency (qPCR, antibody staining), some conclusions need to be tempered. For example, in the experiments in Figure S6 regarding canonical Notch signaling, the authors do not find a phenotype by Delta or Serrate knockdown, but there are no experiments that show Delta or Serrate are lost. Thus, if the authors cannot directly test for RNAi efficiency, these conclusions should be tempered throughout the manuscript. 

      We agree with the reviewer and now provide information on the use of Dicer in our RNAi experiments and conducted new sgRNA/Cas9 experiments. In addition we tempered our wording stating that Dl and or Ser are still possible ligands.

      (3) More description is needed regarding how the authors are measuring and calculating the "wrapping index". In principle, the approach seems sound. However, are there cases where axons are "partially" wrapped of various magnitudes, and how are these cases treated in the analysis? Are there additional controls of previously characterized mutants to illustrate the dynamic range of the wrapping index in various conditions?

      This is now explained.

      Further, can the authors quantify the phenotypes in the axonal "bulges" in Figures 1, 3, and 5?

      This is a difficult question. Although we can easily quantify the number of bulges we cannot quantify the severity of the phenotype as this will require EM analysis. Sectioning nerves at a specific distance of the ventral nerve cord already requires very careful adjustments. Sectioning at the level of a bulge is way more difficult and it is not possible to get the number of sections needed to quantify the bulge phenotype.

      The fact is that all wrapping glial cells develop swellings (bulges) at the position of the nucleus. As there are in general three wrapping glial cells per segmental nerve, the number of bulges is three.

      (4) It seems difficult to clearly untangle the functions of Htl/Uif/Notch in differentiation itself vs subsequent steps in growth/wrapping. For example, if the differentiation steps are not properly coordinated, couldn't this give rise to some observed differences in growth or wrapping at later stages? I'm not sure of any obvious experiments to pursue here, but at least a brief discussion of these issues in the manuscript would be of use.

      We have discussed this in our discussion now more carefully. To discriminate the function of the three genes in either differentiation or in a stepwise mode of growth and differentiation.

      When comparing the different loss of function phenotypes they al appear the same, which would argue all three genes act in a common process.

      However, when we look at gain of function phenotypes, Htl and Uif behave different compared to Notch. This would favor for two distinct processes.

      We have now added activity markers for RTK signaling to directly show that Notch silences RTK activity. Unfortunately we were not able to do a similar reciprocal experiment.

      Minor:

      (1) The Introduction is too long, and would benefit from revisions to make it shorter and more concise.

      We have shortened the introduction and hopefully made it more concise.

      (2) A schematic illustrating the model the authors propose about Htl, Uif, and Notch in glial differentiation, growth, and wrapping would benefit the clarity of this work. 

      We had previously added the graphical abstract below that we updated and included as a Figure in the main text.

      References

      Kautzmann, S., Rey, S., Krebs, A., and Klämbt, C. (2025). Cholinergic and glutamatergic axons differentially require glial support in the Drosophila PNS. Glia. 10.1002/glia.70011.

      Kottmeier, R., Bittern, J., Schoofs, A., Scheiwe, F., Matzat, T., Pankratz, M., and Klämbt, C. (2020). Wrapping glia regulates neuronal signaling speed and precision in the peripheral nervous system of Drosophila. Nature communications 11, 4491-4417. 10.1038/s41467-020-18291-1.

      Loubéry, S., Seum, C., Moraleda, A., Daeden, A., Fürthauer, M., and González-Gaitán, M. (2014). Uninflatable and Notch control the targeting of Sara endosomes during asymmetric division. Current biology : CB 24, 2142-2148. 10.1016/j.cub.2014.07.054.

      Macagno, J.P., Diaz Vera, J., Yu, Y., MacPherson, I., Sandilands, E., Palmer, R., Norman, J.C., Frame, M., and Vidal, M. (2014). FAK acts as a suppressor of RTK-MAP kinase signalling in Drosophila melanogaster epithelia and human cancer cells. PLoS Genet 10, e1004262. 10.1371/journal.pgen.1004262.

      Rebay, I., Chen, F., Hsiao, F., Kolodziej, P.A., Kuang, B.H., Laverty, T., Suh, C., Voas, M., Williams, A., and Rubin, G.M. (2000). A genetic screen for novel components of the Ras/Mitogen-activated protein kinase signaling pathway that interact with the yan gene of Drosophila identifies split ends, a new RNA recognition motif-containing protein. Genetics 154, 695-712. 10.1093/genetics/154.2.695.

      Sieglitz, F., Matzat, T., Yuva-Adyemir, Y., Neuert, H., Altenhein, B., and Klämbt, C. (2013). Antagonistic Feedback Loops Involving Rau and Sprouty in the Drosophila Eye Control Neuronal and Glial Differentiation. Science signaling 6, ra96. 10.1126/scisignal.2004651.

      Therrien, M., Morrison, D.K., Wong, A.M., and Rubin, G.M. (2000). A genetic screen for modifiers of a kinase suppressor of Ras-dependent rough eye phenotype in Drosophila. Genetics 156, 1231-1242.

    1. eLife Assessment

      This important study investigates why the 13-lined ground squirrel (13LGS) retina is unusually rich in cone photoreceptors, the cells responsible for color and daylight vision. The authors perform deep transcriptomic and epigenetic comparisons between the mouse and the 13-lined ground squirrel (13LGS) to provide convincing evidence that identifies mechanisms that drive rod vs cone-rich retina development. Overall, this key question is investigated using an impressive collection of new data, cross-species analysis, and subsequent in vivo experiments.